Problems and new solutions in the boolean domain 9781443892421, 1443892424, 1443889474

684 105 4MB

English Pages [482] Year 2016

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Advanced Boolean Techniques. Selected Papers from the 15th International Workshop on Boolean Problems 9783031289156, 9783031289163

167 61 6MB Read more

Further Improvements in the Boolean Domain [1 ed.] 1527503712, 9781527503717

The amount of digital systems supporting our daily life is increasing continuously. Improved technical facilities for th

305 101 7MB Read more

Problems in Mathematics with Hints and Solutions

The book contains more than three thousand mathematics problems and covers each topic taught at school. The problems wer

548 257 17MB Read more

New Strategies for Wicked Problems : Science and Solutions in the 21st Century 9780870718946, 9780870718939

According to Wikipedia: "A wicked problem is one that is impossible or difficult to solve because of incomplete, co

203 68 1MB Read more

Princeton Problems in Physics with Solutions 9781400873142

Aimed at helping the physics student to develop a solid grasp of basic graduate-level material, this book presents worke

414 87 11MB Read more

Practice Problems, Methods, and Solutions 9783030650568

975 95 7MB Read more

Solutions and Other Problems 9781982156961, 9781982156947

978 120 77MB Read more

Database Security: Problems and Solutions 9781683926634, 1683926633

Database Security: Problems and Solutions describes and demonstrates how to resolve database security issues at the desi

273 46 18MB Read more

101 Problems and Solutions in Historical Linguistics: A Workbook 9781474429221

A hands-on approach to historical linguistics, working through 101 problems in five different categories This workbook

476 127 3MB Read more

Solutions and Other Problems 9781982156961, 9781982156947

For the first time in seven years, Allie Brosh—beloved author and artist of the extraordinary #1 New York Times bestsell

127 57 Read more

Problems and new solutions in the boolean domain
9781443892421, 1443892424, 1443889474

Author / Uploaded
Steinbach
Bernd

Table of contents :
Content: I Methods, Algorithms, and Programs --
General methods --
Efficient calculations --
II Applications --
Several aspects of security --
Exploration of properties --
III Towards Future Technologies --
Reversible circuits --
Quantum circuits. A vector space method for Boolean switching networks /Mitchell A. Thornton --
Solving combinatorial problems using Boolean equations /Christian Posthoff, Bernd Steinbach --
Simplification of extremely large expressions /Ben Ruijl, Jos Vermaseren, Aske Plaat, Jaap van den Herik --
A novel approach of polynomial expansions of symmetric functions /Danila A. Gorodecky --
XBOOLE-CUDA : fast calculations of large Boolean problems on the GPU /Bernd Steinbach, Matthias Werner --
Efficient computing of the Gibbs dyadic derivatives /Radomir S. Stanković, Dusan Gajić, Suzana Stojković, Milos Radmanović --
Understanding the performance of randomized algorithms /Jan Schmidt, Rudolf B. Blažek, Petr Fišer --
Fast network intrusion detection systems with high maintainability /Shinobu Nagayama, Shinʼichi Wakabayashi --
Utilization of Boolean functions in cryptography /Chunhui Wu, Bernd Steinbach --
Minimization of ESOP forms for secure computation /Stelvio Cimato, Valentina Ciriani, Matteo Moroni --
On the relationship of Boolean function spectra and circuit output probabilities /Micah A. Thornton, Mitchell A. Thornton --
ROBDD-based computation of special sets with RelView applications /Rudolf Berghammer, Stefan Bolus --
Multiple-valued functions with bent Reed-Muller spectra /Claudio Moraga, Milena Stanković, Radomir S. Stanković --
A framework for reversible circuit complexity /Mathias Soeken, Nabila Abdessaied, Rolf Drechsler --
Gate count minimal reversible circuits /Jerzy Jegier, Paweł Kerntopf --
The synthesis of a quantum circuit /Alexis de Vos, Stijn de Baerdemacker --
Universal two-qubit quantum gates /Md. Mazder Rahman, Gerhard W. Dueck.

Citation preview

Problems and New Solutions in the Boolean Domain

Problems and New Solutions in the Boolean Domain Edited by

Bernd Steinbach

Problems and New Solutions in the Boolean Domain Edited by Bernd Steinbach This book first published 2016 Cambridge Scholars Publishing Lady Stephenson Library, Newcastle upon Tyne, NE6 2PA, UK British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Copyright © 2016 by Bernd Steinbach and contributors All rights for this book reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner. ISBN (10): 1-4438-8947-4 ISBN (13): 978-1-4438-8947-6

Contents List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xv

Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xix

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi

I

Methods, Algorithms, and Programs

1 General Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 A Vector Space Method for Boolean Networks . . . . . 1.1.1 A Vector Space Method for Boolean Networks 1.1.2 History of Switching Theory . . . . . . . . . . . 1.1.3 Useful Topics in Linear Algebra . . . . . . . . . 1.1.4 Vector Space Information Models . . . . . . . . 1.1.5 Switching Network Transfer Matrices . . . . . . 1.1.6 Transfer Matrices of Switching Circuits . . . . 1.1.7 Switching Network Simulation . . . . . . . . . 1.1.8 Switching Network Justification . . . . . . . . . 1.1.9 Algorithms and Implementation . . . . . . . . 1.2 Solving Combinatorial Problems . . . . . . . . . . . . 1.2.1 Boolean Equations . . . . . . . . . . . . . . . . 1.2.2 Solution With Regard to Variables . . . . . . . 1.2.3 Set-Related Problems . . . . . . . . . . . . . .

1 3 3 3 7 9 13 16 19 26 33 38 51 51 57 59

vi

Contents

1.2.4 Graph-Related Problems . . . . . . . . . . . . . 1.2.5 Rule-Based Problems . . . . . . . . . . . . . . 1.2.6 Combinatorial Design . . . . . . . . . . . . . . 1.2.7 Extremely Complex Boolean Problems . . . . . 1.2.8 Summary and Comments . . . . . . . . . . . . Simplification of Extremely Large Expressions . . . . . 1.3.1 An Application in High Energy Physics . . . . 1.3.2 The History of Simplification . . . . . . . . . . 1.3.3 Methods of Simplification . . . . . . . . . . . . 1.3.4 Monte Carlo Tree Search . . . . . . . . . . . . 1.3.5 Nested Monte Carlo Search . . . . . . . . . . . 1.3.6 Simulated-Annealing-UCT . . . . . . . . . . . . 1.3.7 Consequences for High Energy Physics . . . . . 1.3.8 An Outlook on Expression Simplification . . . Novel Polynomial Expansions of Symmetric Functions 1.4.1 Preliminaries and Background . . . . . . . . . 1.4.2 Main Definitions . . . . . . . . . . . . . . . . . 1.4.3 Generation of the Carrier Vector . . . . . . . . 1.4.4 Generation of the Reduced Spectrum . . . . . . 1.4.5 The Complexity of the Combinatorial Method 1.4.6 Discussion of the Reached Improvements . . . .

64 65 68 70 75 76 76 78 79 81 88 90 94 94 96 96 98 100 106 110 111

2 Efficient Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 XBOOLE-CUDA - Fast Calculations on the GPU . . . 2.1.1 Challenges for Boolean Calculations . . . . . . 2.1.2 The Concepts Realized in XBOOLE . . . . . . 2.1.3 Parallel and Serial Architectures . . . . . . . . 2.1.4 Efficient GPU Programming . . . . . . . . . . . 2.1.5 Implementation XBOOLE-CUDA . . . . . . . 2.1.6 Evaluation of XBOOLE-CUDA . . . . . . . . . 2.1.7 Recommendations and Future Work . . . . . . 2.2 Efficient Computing of the Gibbs Dyadic Derivatives . 2.2.1 Walsh and Dyadic Analysis . . . . . . . . . . . 2.2.2 Gibbs Dyadic Derivatives . . . . . . . . . . . . 2.2.3 Computing the Gibbs Dyadic Derivatives . . . 2.2.4 Comparison of Methods and Algorithms . . . . 2.3 Understanding Randomized Algorithms Performance . 2.3.1 Observations from Experiments . . . . . . . . . 2.3.2 The Role of Interpretations . . . . . . . . . . . 2.3.3 Stochastic Models . . . . . . . . . . . . . . . .

117 117 117 118 126 131 133 137 146 150 150 151 154 165 167 167 170 172

1.3

1.4

Contents

2.3.4 2.3.5 2.3.6 2.3.7 2.3.8

vii

The EM Algorithm . . . . . . . . . . . . The Performance of the ABC Tool . . . Randomly Valued MAX-3SAT Formulas Simulated Annealing . . . . . . . . . . . Summary of Interpretations . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

II Applications

172 177 180 183 184

187

3 Several Aspects of Security . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Fast Network Intrusion Detection Systems . . . . . . . 3.1.1 Background on Intrusion Detection Systems . . 3.1.2 Preliminaries . . . . . . . . . . . . . . . . . . . 3.1.3 Regular Expression Matching Hardware . . . . 3.1.4 Regular Expression Matching Software . . . . . 3.1.5 Discussion and Future Prospects . . . . . . . . 3.2 Utilization of Boolean Functions in Cryptography . . . 3.2.1 Types of Cryptosystems . . . . . . . . . . . . . 3.2.2 Cryptographic Properties of Boolean Functions 3.2.3 Boolean Functions in Stream Ciphers . . . . . 3.2.4 Boolean Functions in Block Ciphers . . . . . . 3.2.5 Exploration of the Nonlinearity . . . . . . . . . 3.2.6 Further Research Topics . . . . . . . . . . . . . 3.3 Minimization of ESOP Forms for Secure Computation 3.3.1 Two Party Secure Computation . . . . . . . . . 3.3.2 Exor Sum Of Products . . . . . . . . . . . . . . 3.3.3 Secure Computation with ESOPs . . . . . . . . 3.3.4 Minimization Algorithm . . . . . . . . . . . . . 3.3.5 Experimental Results . . . . . . . . . . . . . . 3.4 Determination of Almost Optimal Check Bits . . . . . 3.4.1 Error Detection Codes . . . . . . . . . . . . . . 3.4.2 Error Model . . . . . . . . . . . . . . . . . . . . 3.4.3 Determination of Additional Check Bits . . . . 3.4.4 Detection of Double-Bit Errors . . . . . . . . . 3.4.5 Additional Check Bit for a Byte-Parity Code .

189 189 189 190 194 204 218 220 220 223 229 233 237 239 241 241 245 247 248 251 255 255 256 260 263 266

4 Exploration of Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Boolean Function Spectra and Circuit Probabilities . . 4.1.1 Spectra of Boolean Functions . . . . . . . . . . 4.1.2 Boolean Function Output Probability . . . . .

269 269 269 270

viii

Contents

4.2

4.3

4.1.3 Conditional Output Probabilities . . . . . . . . 4.1.4 Boolean Difference, Consensus, and Smoothing 4.1.5 Switching Function Spectra . . . . . . . . . . . 4.1.6 Calculating the Walsh Spectrum . . . . . . . . 4.1.7 Calculating the Reed-Muller Spectrum . . . . . 4.1.8 Calculating the Haar Spectrum . . . . . . . . . 4.1.9 Switching Function Output Probability . . . . ROBDD-based Computation of Special Sets . . . . . . 4.2.1 Hard Problems . . . . . . . . . . . . . . . . . . 4.2.2 Fundamentals of ROBDDs . . . . . . . . . . . 4.2.3 ROBDDs for Subsets of Powersets . . . . . . . 4.2.4 Sets of Minimal and Maximal Sets . . . . . . . 4.2.5 Applications Concerning RelView . . . . . . . . 4.2.6 Concluding Remarks . . . . . . . . . . . . . . . Functions with Bent Reed-Muller Spectra . . . . . . . 4.3.1 Background . . . . . . . . . . . . . . . . . . . . 4.3.2 Formalisms . . . . . . . . . . . . . . . . . . . . 4.3.3 Maiorana Class of Bent Reed Muller Spectra . 4.3.4 Special Cases when p = 3 . . . . . . . . . . . . 4.3.5 Properties which Support Future Research . .

III Towards Future Technologies 5 Reversible Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 A Framework for Reversible Circuit Complexity . . . . 5.1.1 Investigations Relating to Reversible Functions 5.1.2 Reversible Boolean Functions . . . . . . . . . . 5.1.3 Function Classes and Symmetric Groups . . . . 5.1.4 Upper Bounds for Single-Target Gate Circuits 5.1.5 Upper Bounds for Toffoli Gate Circuits . . . . 5.1.6 Lower Bounds for Toffoli Gate Circuits . . . . . 5.1.7 Framework for Complexity Analysis . . . . . . 5.1.8 Application: Better than Optimal Embedding . 5.1.9 Open Problems and Future Work . . . . . . . . 5.2 Gate Count Minimal Reversible Circuits . . . . . . . . 5.2.1 A Need for New Reversible Benchmarks . . . . 5.2.2 Preliminaries . . . . . . . . . . . . . . . . . . . 5.2.3 Sequences of Reversible Functions . . . . . . . 5.2.4 Minimal Circuits for Selected Functions . . . .

271 276 279 279 282 283 285 287 287 288 291 294 304 308 309 309 310 316 319 324

325 327 327 327 328 331 333 334 336 338 339 341 342 342 343 347 349

Contents

6 Quantum Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 The Synthesis of a Quantum Circuit . . . . . . . . . . 6.1.1 Quantum Computation . . . . . . . . . . . . . 6.1.2 Building Blocks . . . . . . . . . . . . . . . . . . 6.1.3 First Decomposition of a Unitary Matrix . . . 6.1.4 Further Decomposition of a Unitary Matrix . . 6.1.5 Synthesizing a Fourier Circuit . . . . . . . . . . 6.1.6 Synthesizing a ZU Circuit . . . . . . . . . . . . 6.1.7 Summary . . . . . . . . . . . . . . . . . . . . . 6.2 Universal Two-Qubit Quantum Gates . . . . . . . . . 6.2.1 The NCV Gate Library . . . . . . . . . . . . . 6.2.2 The Semi-Classical Two-Qubit Gate Library . 6.2.3 A Restricted Two-Qubit Gate Library . . . . . 6.2.4 Impact on Toffoli Gates . . . . . . . . . . . . .

ix

357 357 357 358 361 363 366 367 368 369 369 373 377 379

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 List of Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423 Index of Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433

List of Figures 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17 1.18 1.19 1.20 1.21 1.22 1.23 1.24 1.25 1.26 1.27 1.28 1.29

Conceptual models of switching networks . . . . . . Circuit structures for fan-out, fan-in, and crossover . Basic logic network elements and transfer matrices . Example of a logic network . . . . . . . . . . . . . . Truth table isomorphism example . . . . . . . . . . . Circuit and .pla listing of a network . . . . . . . . . 2-level implementation and PLA listing of a network PLA listing and corresponding transfer matrix . . . SBDD and MTBDD models of a network . . . . . . Network and graph of the factored transfer matrix . Partition cuts and transfer matrices of circuit C17 . Portion of the partition φ3 of C17 with crossovers . . Transfer matrix of the benchmark C17 . . . . . . . . Eight queens on a chess board . . . . . . . . . . . . . Sudoku with a clue of 17 values . . . . . . . . . . . . Graphs with and without Hamiltonian Circuits . . . Combinational circuit . . . . . . . . . . . . . . . . . Cyclically reusable, rectangle-free grid G18,18 . . . . A common subexpression in a tree representation . . An overview of the four phases of MCTS . . . . . . . A sensitivity analysis for HEP(σ) regarding Cp . . . A sensitivity analysis regarding Cp and R × N . . . Forward and backward Horner schemes . . . . . . . NMCS level-2 for HEP(σ), taking 8500 evaluations . res(7,5) polynomial with 14 variables . . . . . . . . . HEP(σ) with 15 variables . . . . . . . . . . . . . . . Identification of an odd or even binomial coefficient . Identification of even or odd binomial coefficients . . 5,7,8 ) . . . . . . . . Calculation of π6 , . . . , π10 for π(E10

. 16 . 21 . 23 . 24 . 30 . 39 . 41 . 42 . 43 . 44 . 46 . 47 . 48 . 61 . 63 . 64 . 69 . 74 . 81 . 82 . 84 . 85 . 87 . 89 . 92 . 93 . 104 . 105 . 107

xii

List of Figures

1.30 Transeunt triangle of the polynomial f (x) = E62 (x) . . 112 1.31 Comparison of the complexities of C1 and CT . . . . . 113 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18 2.19 2.20

Orthogonality of ternary vectors . . . . . . . . . . . . Nvidia GPU Fermi architecture . . . . . . . . . . . . . CUDA memory access layout . . . . . . . . . . . . . . CPU and GPU: low latency or high throughput . . . . Scalable programming model in CUDA . . . . . . . . . Memory alignment of ternary vector words . . . . . . Attacks of a Bishop on a 4 × 4 chessboard . . . . . . . Selected 11 × 10 solutions of the Bishop-Problem . . . Timelines for C2070 regarding the limits: 103 and 107 FFT-like algorithm for Gibbs dyadic derivatives . . . . BDD for the function f (x) in Example 2.18. . . . . . . MTBDD for the first row of the Gibbs matrix: n = 4 . Solution quality of randomized ABC: one iteration . . Solution quality of randomized ABC: 40 iterations . . SAT Compress, randomized executed on b04 . . . . . Randomized term expansion on a synthetic circuit . . An iterative design tool sensitive to hidden details . . LUT mapping on br2, with a GM model . . . . . . . . Random valuation on the uf50-0828 instance . . . . . Simulated annealing on the uf20-09 instance . . . . .

119 129 130 131 132 136 138 139 143 155 158 162 168 168 169 170 177 180 182 185

3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15 3.16

FAs for the regular expression (ab|cd)(ac|abd)∗ . . . . NIDS using pattern-specific and -independent engines Matching hardware based on a systolic array . . . . . Matching hardware based on a DPA. . . . . . . . . . . STDPA for the regular expression (ab|cd)(ac|abd)∗ . . Hybrid hardware based on systolic array and STDPA . Decision diagrams for the logic function f . . . . . . . NFA for the regular expression (0|1)∗ 1 . . . . . . . . . Characteristic function f1 . . . . . . . . . . . . . . . . Stream cipher . . . . . . . . . . . . . . . . . . . . . . . Block cipher . . . . . . . . . . . . . . . . . . . . . . . . Key stream generators . . . . . . . . . . . . . . . . . . Structure to realize the DES algorithm . . . . . . . . . Weight and nonlinearity of functions of B4 . . . . . . . Key generation and association for an AND gate . . . Oblivious transfer protocols . . . . . . . . . . . . . . .

194 197 198 200 201 202 207 208 210 221 221 230 234 238 243 243

List of Figures

xiii

3.17 3.18 3.19 3.20 3.21 3.22 3.23 3.24 3.25

Secure computation of the AND gate . . . . . . Examples to reduce the cost function cP1 ,P2 . . Error graph for block codes . . . . . . . . . . . Error graph for two stuck-at-0 faults . . . . . . Non-resistive bridging fault example . . . . . . Non-resistive bridging fault error graph . . . . Flow chart of the heuristic . . . . . . . . . . . . Not detected errors with a linear check bit . . . Not detected errors with a non-linear check bit

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

4.1 4.2 4.3

Modified Haar transformation matrix for n = 3 . . . . 284 Two OBDDs: (a) QOBDD, (b) ROBDD . . . . . . . . 290 A Small ROBDD r. . . . . . . . . . . . . . . . . . . . 303

5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9

330 331 334 339 346 348 349 349

Reversible gates . . . . . . . . . . . . . . . . . . . . . . Reversible circuit . . . . . . . . . . . . . . . . . . . . . Synthesis based on Young subgroups . . . . . . . . . . Reversible circuit complexity . . . . . . . . . . . . . . Graphical symbols for reversible MCT gates . . . . . . A gate count minimal circuit for n = 4 . . . . . . . . . A gate count minimal circuit for n = 3 . . . . . . . . . A circuit for any number of variables n . . . . . . . . . Gate count minimal circuits implementing the gmf n function for n = 3, 4, 5, 6 . . . . . . . . . . . . . . . . . 5.10 Circuits needed for the proof of Theorem 5.20 . . . . . 5.11 Gate count minimal circuit for the gmf n function . .

6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13

Quantum schematic for U = eiα Z1 XZ2 . . . . . . . Quantum schematic for U = P0 Z0 P0−1 Z1 XZ2 . . . . Quantum schematic for a LR-decomposition . . . . . Quantum schematics for T2 , T3 , and T4 . . . . . . . Quantum schematic for a ZU circuit with w = 3 . . . Symbol and NCV realization of the Toffoli-3 gate . . Schematics for the benchmark 3 17 . . . . . . . . . . Non-entangled and entangled Quantum circuits . . . Quantum realizations of the Fredkin gate . . . . . . Semi-classical and entangled circuit realizations . . . Circuit with different two-qubit costs . . . . . . . . . Entangled and best LNN circuits of the Toffoli gate . LLN circuits for Toffoli-3 with non-adjacent controls

. . . . . . . . . . . . .

244 250 257 258 259 259 263 265 265

351 354 355 362 363 364 366 367 371 372 372 373 374 377 380 381

xiv

List of Figures

6.14 Minimized LLN circuits for Toffoli-3 . . . . . . . . . . 382

List of Tables 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8

Traditional and bra-ket notation correspondence . . Size and computation time of transfer matrices . . . Function table of Boolean inequalities . . . . . . . . Ternary vectors for a queen on a chess board . . . . Rules of Hamiltonian Circuits . . . . . . . . . . . . . Mapping of the 4-valued x to two Boolean variables MCTS with 1,000 iterations and 10,000 iterations . . Comparison of the Complexities of C2 and CT . . .

2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11

Encoding of a ternary element by two Boolean values . GPU Specifications . . . . . . . . . . . . . . . . . . . . Types of GPU Memory . . . . . . . . . . . . . . . . . XBOOLE-CUDA 32-bit – Speedup with GTX 470 . . Comparison for the 11 × 10 Bishop Problem – 32-bit . Number of slices and kernel calls w.r.t. Table 2.5 . . . Profiler results on C2070 GPUs w.r.t. Table 2.5 . . . . Comparison for the 11 × 10 Bishop Problem – 64-bit . Comparison 32-bit and 64-bit (Table 2.5 and 2.8) . . . Gibbs dyadic derivatives of a Boolean function . . . . CPU and GPU time to compute the Gibbs dyadic derivative by means of partial derivatives . . . . . . . . . . Computing the Gibbs dyadic derivative in terms of partial derivatives using an SBDD . . . . . . . . . . . . . Computing the Gibbs dyadic derivative from r0 (x) over the SMTBDD and on the GPU . . . . . . . . . . . . . Time to calculate the Gibbs dyadic derivative based on the first row of the Gibbs matrix using a CUDA GPU Gaussian Mixtures fit of experimental data . . . . . . Number of components and local maxima . . . . . . .

2.12 2.13 2.14 2.15 2.16

. 14 . 50 . 55 . 61 . 65 . 72 . 86 . 114 120 128 130 137 141 142 145 146 147 153 157 160 163 165 179 179

xvi

List of Tables

2.17 Components in randomly valuated 3SAT formulas . . 181 2.18 GM models of the uf50-0828 instance . . . . . . . . . 183 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15

Comparison of pattern-independent hardware . . . . Characteristic function obtained by binary encoding. Comparison of representations of NFAs . . . . . . . Comparison of operations for matching . . . . . . . . Number of nodes in BDDs and ZDDs for Δ in NFAs Time to compute the existing and the new methods Calculation of the Walsh spectrum . . . . . . . . . . Inverse transformation of a Walsh spectrum . . . . . Attacks on symmetric cryptosystems . . . . . . . . . S-box S1 of the Data Encryption Standard (DES) . . Properties of DES S-box and AES S-box . . . . . . . Comparison between the ESOP forms of B4 . . . . . Comparison of ESOPs for some benchmark functions Comparison of heuristic calculated and linear codes . Double-bit error detection rates . . . . . . . . . . . .

. . . . . . . . . . . . . . .

204 208 211 216 217 218 225 226 233 235 237 252 253 266 267

4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11

pm0 relationship to Haar spectrum and probabilities pm1 relationship to Haar spectrum and probabilities Holler-Packel power indices computed by RelView Bent Reed-Muller spectra for ternary bent functions Bent Reed-Muller spectra with the signature 100 . . Bent Reed-Muller spectra with the signature 101 . . Bent Reed-Muller spectra with the signature 010 . . Bent Reed-Muller spectra with the signature 110 . . Bent Reed-Muller spectra with the signature 210 . . Bent Reed-Muller spectra with the signature 212 . . Bent Reed-Muller spectra with the signature 021 . .

. . . . . . . . . . .

284 285 307 320 321 321 322 322 323 323 324

6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8

Quantum gates and their unitary matrices . . . . . . . Results of finding semi-classical NCV circuits . . . . . Semi-classical two-qubits gates . . . . . . . . . . . . . Minimal three-qubit circuits using two-qubit gates . . Minimal three-qubit minimal circuits with NCV gates Restricted two-qubit gates . . . . . . . . . . . . . . . . Minimal three-qubit circuits with two-qubit gates . . . Toffoli-3 gates with positive and negative controls . . .

369 373 375 375 376 378 379 380

Foreword Boolean Algebra was introduced by George Boole in 1847. Originally, it was used to study the law of thought. After the invention of digital computers, Boolean Algebra has been used to design digital circuits. Logic functions are represented by Boolean expressions or decision diagrams, and the circuits are directly represented by expressions or decision diagrams. Thus, the simplification of expressions or decision diagrams reduces the size of the circuits. In the early days, the cost of logical elements was rather high, so research on the minimization of logical representations was very important. A Boolean expression is satisfiable (SAT) if it is possible to find an assignment that makes the expression true. Recently, efficient SAT engines have been developed, and they are used for the verification of digital systems. Since various digital systems are used in the infrastructure of our daily life such as power, communication, transportation, and bank systems, verifications of digital systems are essential. Also, SAT engines are used to solve complex combinatorial optimization problems. Now, many engineers are working on semiconductor devices. The pace of evolution of semiconductor technology is very fast, and the cost of basic logic elements has become rather low. Thus, the reduction in circuit size is not so important as before. More important problems are to reduce power dissipation, or to increase the reliability or dependability of digital systems. Furthermore, programmability and regularity are desired. Since fewer researchers are working in the area of Boolean logic, the speed of the research is slower than that in the device area. I would like to mention that the editor of the book, Prof. Dr. Bernd

xviii

Foreword

Steinbach has been regularly organizing the International Workshops on Boolean Problems since 1994. This is a unique and important workshop to exchange the ideas of Eastern and Western people in Boolean problems. His long-term service to the society should be appreciated. Similar meetings are the International Symposium on Multiple-Valued Logic, the International Workshop on Logic and Synthesis, and the Reed-Muller Workshop. I hope the readers can participate in these meetings and present their new research work to enhance the research in the field of Boolean logic.

Tsutomu Sasao Meiji University, Japan April 2016

Preface Progress in the Boolean domain requires both an improved theory and powerful tools which utilize the new theory for practical applications. The first part of this book deals with methods, algorithms, and programs for these aims. Technological progress extends the conventional information processing by switching circuits with alternative information processing systems that use multiple-valued, quantum, reversible, and fuzzy methods. A suggested vector space model can uniquely be applied to all these different areas. A system with n primary inputs and m primary outputs is described by a 2n × 2m transfer matrix. A benefit of the transfer matrix is that it allows both the simulation from known inputs to unknown outputs and the justification from known outputs to unknown inputs. At first glance, the size of the transfer matrix seems to restrict the vector space model to very small systems. However, when the transfer matrix is mapped into a fitting decision diagram, experimental results confirm that the vector space model provides a practical alternative to switching algebraic solutions. Boolean equations play a major role in the Boolean domain. They can be used to solve problems of Discrete Mathematics, not only for digital systems, but for many other problems as well. A Boolean equation is an instrument to map a Boolean function into a set of Boolean vectors and vice versa. A Boolean inequality and a system of Boolean equations can be mapped to a single Boolean equation having the same solution. The field of application is additionally extended by the possibility of solving a Boolean equation with regard to variables. It is sufficient to create correct models, as the appropriate software is available. Both the domain-specific software XBOOLE and the more specialized SAT-solvers can be used. The complexity of the problems solved by Boolean equations is already extremely high. The simple hint that a problem has exponential complexity is no longer sufficient

xx

Preface

to leave the problem behind, without the intention of solving it. The evaluation of expressions with millions of terms could take several months. Such expressions occur both in High Energy Physics and in the specification of Boolean problems. The simplification of such expressions can drastically reduce the time for their evaluation. Simple methods for this simplification are the utilization of Horner’s rule and the common subexpression elimination. Drawbacks of these methods are eliminated by variants of the Monte Carlo Tree Search (MCTS) using Upper Confidence bounds applied to Trees (UCT). Justifications of control parameters needed in these methods can be soften by the approach of Simulated Annealing. Both the benefits and restrictions of these methods are explored by very large benchmarks of High Energy Physics. The significant improvements reached should be a reason to adjust these methods to the simplification of large Boolean expressions. The 2n+1 symmetric Boolean functions of n variables are only a small n part of all 22 Boolean functions. However, due to the independence regarding the exchange of arbitrary pairs of variables, symmetric Boolean functions are preferably used in many applications, e.g., for testing or cryptography. Symmetric Boolean functions can be uniquely specified either by a set of symmetry levels or by the set of numbers that indicate how many variables occur in the conjunction of complete polynomials. The transeunt triangle method is, so far, the best method of calculating one of these sets based on the given other set and has a complexity of O(n2 ). A new combinatorial method will be presented that utilizes the Lucas Theorem to solve the same task. The complexity of the new method could be reduced for elementary symmetric Boolean functions to the linear complexity O(n). The exponential complexity of Boolean functions requires efficient methods for their calculation. An important approach to reach this aim is the utilization of parallel computation. Besides other concepts, XBOOLE realizes the parallel computation of 2d binary vectors which are represented by a single ternary vector containing d dashes as well as the parallel computation of all Boolean variables which are assigned to the bits of the machine word of the computer used. Modern computers provide, in addition to the CPU, a much larger number of computing cores in the available GPUs. The new library XBOOLE-

Preface

xxi

CUDA utilizes these additional resources to increase the computation power again by approximately two orders of magnitude. Based on a practical application, that can be scaled in a very wide range, the welcome properties are explored. Programmers who want to solve Boolean problems benefit from all efforts of optimizing the special requirements of the GPU and the strongly increased computation power by the simple exchange of the XBOOLE library with the XBOOLECUDA library. Due to a renewed interest in Walsh and dyadic analysis, alternative approaches for the efficient computation of Gibbs dyadic derivatives were explored. The FFT-like approach leads to a strong speedup on a GPU but is restricted to Boolean functions of not more than 25 variables. This limit can be exceeded by an adapted version of this algorithm utilizing SMTBDDs. An alternative approach based on the first row of the Gibbs matrix outperforms the calculation of the Gibbs dyadic derivatives for Boolean functions of less than 15 variables. Basically, it can be expected that Electronic Design Automation tools generate the same results for semantically identical inputs. Experimental explorations have shown that large differences in the size of the designed circuits are caused by irrelevant changes in the input descriptions. Similar behaviors are also noticed for a tool to compress test patterns or a tool for two level minimization of a disjunctive form. Due to the complexity of the used algorithms, it is difficult to find the behavior from these tools. Using the stochastic model of the Gaussian Mixtures and the Expectation and Maximization algorithm it was possible to approximate the parameter probability density function of the evaluated tools. Both the progress in the theory of Boolean functions and their very cost-efficient realizations as microelectronic circuits contribute to a growing number of applications. The second part of this book explores some of these practical applications of Boolean functions. The rapid development of the internet influences both our daily life and particularly the successful cooperation in the area of scientific projects. Unfortunately the risk regarding network attacks such as computer viruses or worms increases also. Network Intrusion Detection Systems contribute to network security by checking the transmit-

xxii

Preface

ted data on network nodes. Due to the increasing number of signatures to test and the high speed of data transmission reached, the check to prevent attacks becomes the bottleneck on the internet. New, very fast hardware for regular expression matching based on systolic arrays and dual position automaton avoids this bottleneck. More than one gigabit per second can be checked at high speed using a compact hardware in which the comparison pattern can be quickly updated. For low-end Network Intrusion Detection Systems an efficient software approach based on Zero-Suppressed Binary Decision Diagrams and one-hot encoding of the relations needed will be presented. A huge amount of data is transferred each second through the internet and this amount of data grows continuously. Cipher systems enable the encryption of a given plaintext so that the transmitted ciphertext can only be decrypted by a cipher system with the right key. Due to successful attacks against practically used cipher systems, it is a challenge to increase their security. Boolean functions with special properties become very important in the development of new cipher systems with much higher security. Different types of cipher systems as well as the necessary properties of Boolean functions for such systems are explored within a section about cryptography. The structure, behavior and possible attacks are explained. A list of topics for further research guides the scientists to the development of cipher systems of possibly even higher security. A special problem in cryptography is the computation of a function based on information of two parties such that both parties share the result but maintain the privacy of their secret input data used by the function. Secure two-party computation protocols are the theoretical basis for such computations. The cost of this protocol is determined by the exchange of keys needed. A new type of Exclusive-OR Sums Of Products (ESOP) is suggested, called Secure Computation (SC) - ESOP. An adapted minimization algorithm may increase the total number of gates, but decreases the cost which is caused by the number of EXOR gates that are controlled by the information of both parties. The best gain for the benchmark circuits explored was about 22 percent. Several external reasons can cause errors of one or more bits of a digital system. It is well known that additional check bits in codewords allow

Preface

xxiii

the detection or even the correction of an error. This method increases the security of safety critical systems, but can also be used to increase the rate of yield of the circuits produced. A new heuristic is suggested that finds almost optimal check bits for an arbitrary error model. This heuristic was successfully applied to an error graph of 219 vertices and almost 100 million edges to improve the double-bit error protection. The probabilities of Boolean functions have a strong influence on the efforts of their implementation. Some of these properties are well indicated by the coefficients of several spectra of Boolean functions. However, due to the exponential complexity of Boolean functions, the direct computation of all spectral coefficients needed can be very time-consuming based on the truth table of the Boolean function. The exploration of the relationship between Boolean function spectra and associated circuit output probabilities opens a new way for such computations. Besides the application to the well known Shannon decomposition, operations of the Boolean Differential Calculus, and several spectral transformations are explained. There are hard problems which require the computation of the minimal and maximal sets of given very large sets of sets. Problems in Social Choice Theory such as, e.g., finding the set of maximal transitive sets as subject to a given tournament relation, or game-theoretic problems such as, e.g., the minimal winning coalitions, belong to this class of problems. Using Reduced Ordered Binary Decision Diagrams and Quasi-Reduced Ordered Binary Decision Diagrams new algorithms were implemented in the Computer Algebra System RelView which solve such tasks significantly faster. The properties of Boolean functions have a strong influence on their application. For instance, the most nonlinear functions, called bent functions, are used in cryptographic applications. It is a very difficult problem to compute all bent functions for a large number of variables. Additional knowledge about these functions is valuable for future research and applications. New results for multiple-valued functions having a bent Reed-Muller spectrum are presented and provide such valuable new insights. Gordon E. Moore published in 1965 a paper about the trend that the components in integrated circuits double approximately every 12 to 24

xxiv

Preface

months. This observation is known as Moore’s Law. The predicted exponential increase of the performance requires permanently new ideas and contributions from scientists and engineers. The third part of this book enumerates several problems about future technologies and also presents important new results in this research area. Reversibility is a necessary condition for quantum circuits. Therefore, this class of circuits has been intensively explored over the last two decades. Despite the substantial knowledge about reversible functions and their implementation as reversible circuits, important open problems remain. Using the repeated basic knowledge about reversible circuits, both upper and lower bounds for several classes of gates in the circuit are given. A suggested framework guides scientists to open questions in the complexity analysis of reversible circuits. The evaluation of algorithms using available benchmark functions is a general approach in circuit design. Using the same cost function different algorithms can be compared. However, this method does not give an answer about the distance between the found solution and an optimal one. One step to close this gap is the presentation of a reversible circuit with a minimal number of multiple-control Toffoli gates and the proof of the minimality for the generalized Miller function of an arbitrary number of variables. The Boolean values true and false of logic circuits, short bits, are replaced in quantum circuits by qubits. The values of these qubits are complex numbers and the gates to change these values are quantum gates which are described by unitary matrices. Using controlled NEGATORs and controlled PHASORs as quantum gates, the relationships between reversible computing and quantum computing are explained. Utilizing the properties of the unitary matrices, synthesis algorithms for quantum circuits are proposed. These algorithms decompose the given unitary matrix such that additionally to classical gates and FOURIER circuits either only PHASOR gates or only NEGATOR gates are necessary. Different approaches to synthesize quantum circuits utilize several type of elementary quantum gates and transformations from reversible circuits to quantum circuits. Already the gates NOT, controlled-NOT, controlled-V, and controlled-V† of the NCV library are required to

Preface

xxv

distinguish between the separable and entangled circuits. A semiclassical two-qubit gate library allows the merger of neighbored twoqubit gates in order to reduce the two-qubit cost. A restricted gate library extends the potential for improvements. The results found are applied to Linear Nearest Neighbor circuits to realize Toffoli gates.

Bernd Steinbach Department of Computer Science Technische Universität Bergakademie Freiberg Freiberg, Saxony, Germany

Acknowledgments This is the second book that summarizes the best scientific results of contributions to the International Workshop on Boolean Problems (IWSBP). In this sense it establishes a new series of books. The idea for the first book goes back to Carol Koulikourdi, Commissioning Editor of Cambridge Scholars Publishing. She asked me one month before the 10th IWSBP whether I would agree to publish a book based on the proceedings of the workshop. This initial book [313] was entitled Recent Progress in the Boolean Domain. The interest of many people working in the Boolean domain or faced with such problems encouraged me also to prepare a book about the best results from the 11th IWSBP. Cambridge Scholars Publishing accepted my proposal for this second book within a series about problems and solutions in the Boolean domain. Many people contributed to the origination of this book. I would like to thank all of them: starting with the scientists and engineers who have been working hard on Boolean problems and submitted papers about their results to the 11th IWSBP. My next thanks goes to the 28 members of the program committee from eleven countries. Based on their reviews the best papers submitted could be selected for presentation at the 11th IWSBP. Furthermore, their hints helped the authors to improve the final versions of their papers. My special thanks goes to the invited speakers: Prof. Mitchell A. Thornton from the Southern Methodist University, Dallas, Texas, USA; Prof. Jaap van den Herik, from Leiden University, The Netherlands; and Prof. Shinobu Nagayama, Hiroshima City University, Hiroshima, Japan, as well as all presenters of the papers and all attendees for their fruitful discussions and very interesting presentations on all three days of the workshop. Besides the technical program, such an international workshop requires a lot of work to organize all the

xxviii

Acknowledgments

necessary parts. Without the support of Dr. Galina Rudolf, Karin Sch¨ uttauf, and Birgit Steffen, I would not have been able to organize this series of workshops. Hence, I would very much like to thank them for their valuable hard work. Not only the authors of the sections but often larger groups contribute to the presented results. In many cases these people are financially supported by grants from many different organizations. Both the authors of the sections of this book and myself thank them for this significant support. The list of these organizations, the numbers of grants, and the titles of the supported projects is so long that I must forward the interested reader for this information to the proceedings of the 11th IWSBP [311]. Exemplary, we acknowledge some of these supporters here. The research of Section 2.3 about Electronic Design Automation tools regarding their behaviors in case of similar input descriptions were supported by computational resources provided by the MetaCentrum under the program LM2010005 and the CERIT-SC under the program Centre CERIT Scientific Cloud, part of the Operational Program Research and Development for Innovations, registration number CZ.1.05/3.2.00/08.0144. We thank them for this valuable support. The research of Section 3.1 for fast network intrusion detection systems with high maintainability is partly supported by the Ministry of Education, Culture, Sports, Science, and Technology (MEXT) of Japan as Grant-in-Aid for Scientific Research (C), (No. 25330071), and the Satake Foundation, 2014. We would like to thank Prof. Shinichi Minato, Ryutaro Kurai, Dr. Masato Inagi, Yosuke Kawanaka, Dr. Yoichi Wakaba, and Takumi Makizaki for their support of this work. The work about functions with Bent Reed-Muller Spectra was supported by the Ministry of Education and Science of Serbia, through the project No. ON174026, and by the Foundation for the Advance of Soft Computing, Mieres, Spain. The development of the presented results has been also supported by further scientists. We thank, e.g., Eugenia Rosu for her help to complete the proof of Theorem 5.17.

Acknowledgments

xxix

I would like to emphasize that this book is a common work of many authors. Their names are directly associated to each section and additionally summarized in lexicographical order in the section List of Authors starting on page 423 and the Index of Authors on page 431. Many thanks to all of them for their excellent collaboration and high quality contributions. My special thanks go to Prof. Tsutomu Sasao for his Foreword, Alison Rigg for corrections to the English text, and Dr. Galina Rudolf as well as M.Sc. Matthias Werner for their support improving the quality of the book using many LATEX-tools. Finally, I would like to thank Samuel Baker, Commissioning Editor, Victoria Carruthers, Author Liaison, and Amanda Millar, Typesetting Manager, for the fruitful collaboration in preparing this scientific book. I hope that all readers enjoy reading the book and find helpful suggestions for their own work in the future. It will be my pleasure to talk with many readers at one of the next International Workshops on Boolean Problems or at any other place.

Bernd Steinbach Department of Computer Science Technische Universität Bergakademie Freiberg Freiberg, Saxony, Germany

List of Abbreviations ADD

Algebraic Decision Diagram

AES

Advanced Encryption Standard

AI

Artificial Intelligence

AMD

Advanced Micro Devices

ANF

Algebraic Normal Form

ASCII

American Standard Code for Information Interchange

BDC

Boolean Differential Calculus

BDD

Binary Decision Diagram

BF

Boolean Function

BP

Braided Parallelism

CAM

Content Addressable Memory

CC

Comparison Cell

CDF

Cumulative Distribution Function

CERN

European Organization for Nuclear Research (Conseil Européen pour la Recherche Nucléaire)

CI

Correlation Immunity

CMOS

Complementary Metal-Oxide-Semiconductor

CPU

Central Processing Unit

CSEE

Common Sub-Expression elimination

CUDA

Compute Unified Device Architecture

CUDD

Colorado University Decision Diagram

xxxii

List of Abbreviations

DAG

Directed Acyclic Graph

DES

Data Encryption Standard

DFA

Deterministic Finite Automaton

DM

Discrete Mathematics

DNA

Deoxyribo-Nucleic Acid

DPA

Dual Position Automaton

DSOP

Disjoint Sum Of Products

ECC

Error-Correction Code

EDA

Electronic Design Automation

EM

Expectation and Maximization

EPUSBF

Elementary Polynomial-Unate Symmetric Boolean Function

ESBF

Elementary Symmetric Boolean Function

ESOP

Exclusive-OR Sums Of Products

EU

European Union

FA

Finite Automaton

FLOPS

Floating-Point Operations per Second

FFT

Fast Fourier Transformation

FSM

Finite State Machine

FPGA

Field Programmable Gate Array

Gbps

Gigabits per second

GB

Gigabyte

GC

Garbled Circuit

GM

Gaussian Mixture

GPGPU

General Purpose GPU

GPU

Graphics Processing Unit

HDL

Hardware Description Language

List of Abbreviations

xxxiii

HEP

High Energy Physics

HEPGAME

High Energy Physics Game

KiB

Kibibyte

IC

Integrated Circuit

IPS

Intrusion Prevention System

ITE

If-Then-Else

IWSBP

International Workshop on Boolean Problems

LFSR

Linear Feedback Shift Register

LLR

Log Likelihood Ratio

LNN

Linear Nearest Neighbor

LR

Likelihood Ratio

LRC

Longitude Redundancy Check

LSI

Large Scale Integration

LUT

Look-Up Table

MAX-3SAT

Maximum 3-Satisfiability

MCT

Multiple-Controlled Toffoli

MCTS

Monte Carlo Tree Search

ML

Maximum Likelihood

MLE

Maximum Likelihood Estimate

MOSFET

Metal–Oxide–Semiconductor Field-Effect Transistor

MPI

Message Passing Interface

MPMCT

Mixed-Polarity Multiple-Controlled Toffoli

MTBDD

Multi-Terminal Binary Decision Diagrams

MVL

Multiple-Valued Logic

MWVG

Multiple Weighted Voting Game

NCT

NOT, CNOT, and Toffoli

xxxiv

List of Abbreviations

NCV

NOT, Controlled-NOT, Controlled-V, and Controlled-V†

NFA

Non-deterministic Finite Automaton

NIDS

Network Intrusion Detection System

NIST

National Institute of Standards and Technology

NMCS

Nested Monte Carlo Search

NOR

Not Or

NP-hard

Non-deterministic Polynomial-time hard

OBBC

Orthogonal Block-Building and Change

OBDD

Ordered Binary Decision Diagram

OpenCL

Open Computing Language

OT

Oblivious Transfer

PC

Propagation Criterion

PCRE

Perl Compatible Regular Expression

PDF

Probability Density Function

PLA

Programmable Logic Array

PPRM

Positive Polarity Reed-Muller

PSDKRO

Pseudo Kronecker

PUSBF

Polynomial-Unate Symmetric Boolean Function

QC

Quantum Cost

QOBDD

Quasi-Reduced Ordered Binary Decision Diagram

RBDD

Reduced Binary Decision Diagram

RM

Reed-Muller

ROBDD

Reduced Ordered Binary Decision Diagram

SA

Simulated Annealing

SAC

Strict Avalanche Criterion

SAT

Satisfiability

List of Abbreviations

xxxv

SA-UCT

Simulated Annealing UCT

SBDD

Shared Binary Decision Diagram

SBF

Symmetric Boolean Function

SC

Secure Computation

SIMD

Single Instruction, Multiple Data

SIMT

Single Instruction, Multiple Threads

SISD

Single Instruction, Single Data

SMTBDD

Shared Multi-Terminal Binary Decision Diagram

SMU

String Matching Unit

SOP

Sum Of Products

SPMD

Single Program, Multiple Data

STDPA

String Transition Dual Position Automaton

STU

State Transition Unit

TV

Ternary Vector

TVL

Ternary Vector List

UCB

Upper Confidence Bounds

UCP

Unate Covering Problem

UCT

Upper Confidence bounds applied to Trees

VHM

Vectorial Hamming Measure

VLSI

Very-Large-Scale Integration

VT

Variables Tuple

WVG

Weighted Voting Game

XNOR

Exclusive-NOR

ZBDD

Zero-Suppressed Binary Decision Diagram

ZDD

Zero-Suppressed Binary Decision Diagram

Methods, Algorithms, and Programs

1. General Methods 1.1. A Vector Space Method for Boolean Switching Networks Mitchell A. Thornton

1.1.1. A Vector Space Method for Boolean Networks The use of Boolean Algebra or switching theory is ubiquitous in modern information processing tasks including both hardware and software design and implementation. The switching theory framework is used in a large variety of different modern information processing applications since it conveniently models systems that have two discrete states. In the past as well as the present, the most commonly implemented information processing systems are comprised of devices characterized by two steady-state values such as an electrical relay that can be open or closed or a Metal–Oxide–Semiconductor Field-Effect Transistor (MOSFET) operating in either the cutoff or saturation region. Similarly, the processing of information is modeled using operations that are functions whose domains and ranges are the sets of all two-state values. While traditional, switching theory is certainly, in part, responsible for the current state of information technology, there is no fundamental reason that information models must be restricted to mathematical models with switching theory. Other emerging fields of information processing necessarily use fundamentally different mathematical frameworks for modeling both the information itself as well as the operations that represent the act of processing. This difference in

4

General Methods

models leads to an incompatibility since the theories and techniques developed within the switching algebra framework are not necessarily applicable to alternative information processing systems. One motivation for the vector space model of information processing described here is to develop a theory that is applicable to both conventional and alternative information processing systems. The vector space approach is applicable to multiple-valued, quantum, reversible, and fuzzy logic. In the area of Multiple-Valued Logic (MVL) it is well known that the use of Boolean Algebra as a mathematical model for information processing operations is not desirable since Boolean Algebra is insufficient in terms of representing all possible functions when MVL systems have p steady states and p = 2n . Other switching algebras are typically used for modeling MVL systems such as those attributed to Jan L ukasiewicz or Emil Post. Due to these fundamentally different models, difficulties are often encountered in porting methods developed for binary switching networks to MVL networks. Additionally, the emerging field of quantum computing utilizes the concept of a quantum bit, or qubit, which is modeled as an element or vector in a finite dimensioned Hilbert vector space. The vector model for the qubit arose from the work of John von Neumann who used a Hilbert Space model for describing the laws of quantum mechanics [238]. The qubit model provides the motivation for investigating the use of a vector space model for conventional electronic digital circuitry that is described here. Reversible logic models are applicable to both conventional electronic and quantum computing systems since the predominant quantum computing paradigms all exhibit reversibility, and certain forms of conventional electronic systems are also reversible such as adiabatic logic. The current state of reversible logic models is even more unsettled since a survey of the literature reveals a mix in the usage of both classical switching theory and the linear algebraic notation used by the quantum computing community with no well-established conventions. A large percentage of modern Integrated Circuits (ICs) designed and produced currently are so-called mixed-signal devices. Mixed-signal

A Vector Space Method for Boolean Networks

5

ICs contain both analog and digital circuitry that are interfaced all within the same device. The mathematical models and methods for analog circuitry are well established and many pre-date those of digital circuitry. Concepts such as transfer matrices and spectral domain analysis are commonly employed in the design and analysis of analog circuitry; however, these concepts are not widely used for digital circuitry. Although some research has been accomplished with regard to applying these types of analysis to digital circuitry, a chief reason for the lack of widespread usage is fundamentally due to the underlying switching theory models. As an example, calculation of the spectral response of a switching function requires that it first be extracted from a netlist and represented in some compact form before the spectra can be obtained. Due to the switching theory model, this is an intractable step that prevents widespread use of spectral techniques regardless of any advantage that a technique itself may afford. Because the vector space model utilizes concepts such as transfer functions similar to those used in analog circuit design and analysis, the application of methods such as linear systems analysis and spectral methods is more amenable and may become more practical to implement. Another motivation for the development and use of the vector space model for information processing is one of economy, particularly in the Integrated Circuit (IC) design community since the use of the vector space model can allow Electronic Design Automation (EDA) tasks to be reformulated, requiring only the use of low-cost or open source linear algebraic software. Organizations in the business of IC design expend a significant amount of capital annually in maintaining expensive EDA software tool licenses. The EDA software is often-times based on proprietary algorithms that use underlying switching theory models. This expense can be very prohibitive for new start-up companies desiring to engage in the IC design business. If an alternative set of EDA tools were available that are based upon the well-established field of numerical linear algebraic software, this expense could be significantly reduced. Because the field of numerical linear algebraic software and algorithms has reached a fair level of maturity, the vector space model described here could potentially offer such a capability. The vector space model

6

General Methods

allows for the formulation of Boolean problems as linear algebraic problems and thus the rich set of results in this area of mathematics may be used. While the switching theory framework for information processing can certainly be characterized as a successful and viable method, there are nevertheless certain common tasks that remain challenging from a computational standpoint. These areas include problems in synthesis, simulation, verification, testing, and other areas. Many of these tasks can be accomplished through the solution of problems such as Satisfiability (SAT) or other computationally intractable problems. For this reason, many modern methods rely on heuristic algorithms that are often proprietary to the developer. The employment of the vector space model will certainly not provide a tractable solution to these problems; however, the use of the fundamentally different information model may well yield alternative heuristic solutions that are more desirable than those currently in use and based upon switching theory. In a later subsection, we show that the vector space approach is at least as efficient as modern switching theory techniques, both theoretically and with empirical results. In terms of mixed-signal EDA, the vector space approach offers both the benefit of potentially new and more efficient methods as well as economic advantages. Modern mixed-signal design methods typically use an overarching EDA framework that interfaces complex-valued continuous-function mathematical models for the analog portion with finite discrete-valued mathematical models for the digital portions. Overarching methods that combine the results of these two very different mathematical models can be the critical link that reduces efficiency. The vector space model for the information processing portions of a mixed-signal IC, while not identical to the complex-valued and continuous models used in the analog portion, are more similar, in a theoretical sense, and can result in enhanced mixed-signal EDA tools that have the added benefit of using well-established, trusted, and inexpensive numerical linear algebraic algorithms. It is not proposed nor anticipated that the community abandon the relatively mature and widely accepted models and methods based upon switching theory. However, it is suggested that the vector space

A Vector Space Method for Boolean Networks

7

model may allow for certain methods to be implemented that are at least as efficient and that may have some desirable characteristics as outlined above.

1.1.2. History of Switching Theory Classical logic is often regarded as the creation of the ancient Greek philosophers Plato and his student Aristotle. One definition of logic as espoused by Plato and Aristotle is the science of inference where inference is the act of reasoning from factual knowledge or evidence. Aristotle is often regarded as the initiator of the syllogistic concept whereby a proposition known as a conclusion is inferred from two other propositions known as the major and minor premises. A proposition is part of a speech comprised of two terms; the predicate and the subject. Propositions may either be deemed true or false. For the purposes of the subject described here, it is important to note that propositions have two steady states; those of being true or false. This two-state characteristic is responsible for the common usage or inclusion of the term logic in reference to aspects of modern information processing systems although these systems, for the most part, do not have any components or characteristics analogous to inference. Following the formulation of the syllogism and resulting refinements such as the law of the excluded middle, logic was firmly established as a subject in its own right and an important component of a classical education from the time of Aristotle to the present day. Progress in the field of logic continued with many significant advancements made by philosophical scholars and thinkers until approximately the midfourteenth century. From the period of the mid-fourteenth century through the early nineteenth century, there were few advancements in the field although it was still regarded as a fundamental component of a classical education. This period of little growth is sometimes referred to as the embryonic period since there were a few important new ideas but those ideas were not fully developed. Some scholars of philosophy attribute this lack of growth to the fact that propositions, a basic tenet of logic, are expressed in the inexact and loosely defined language of humans.

8

General Methods

This idea was not unnoticed by scholars in the embryonic period, in fact, some of the important ideas that emerged during this period were likely motivated by this idea. The lack of rigorousness and specificity present in human language spawned ideas such as the use of concentric rings in a geometric system intended to mechanize the process of inference by Franciscan priest Raymond Llull. Another significant idea during the embryonic period was proposed by a group associated with Merton College at Oxford University who later became known as the Oxford Calculators. The Oxford Calculators attempted to use letters instead of words to express logical calculations. Yet another significant idea during this period is attributed to Thomas Hobbes who suggested that all logic and reasoning could be reduced to and expressed as the mathematical operations of addition and subtraction. These three ideas are very significant although they were not fully developed and hence adopted. If the ideas had been fully developed at the time, a mechanized or algorithmic technique (Llull) could have been used to mathematically manipulate (Hobbes) symbolic entities (Oxford Calculators). It was not until the mid-eighteenth century that significant progress in logic began again in earnest due to the work of George Boole. What we now commonly refer to as Boolean Algebra was proposed by Boole for the purpose of manipulating symbolic logic expressions in his seminal work [47]. Boole was undoubtedly influenced by Hobbes and others and successfully developed a philosophy that initiated the algebra of sets. In this work, both multiplicative and additive logical operations were defined and certain rules were established, such as idempotence. A key factor with respect to the subject here, the modeling and manipulation of information, is that Boole’s application continued to that of logic, the science of inference. A basic tenet and fundamental concern of Boole’s work was that of the proposition, now an expression rather than a human language statement, that is assigned one of two steady-state values of being either true or false. The application of Boolean Algebra to modern information processing is credited to Claude Shannon who proposed its use in modeling networks of electrical relays in his 1937 Master of Science degree (S.M.) thesis at the Massachusetts Institute of Technology [293]. Shannon later generalized these ideas for the modeling of information in com-

A Vector Space Method for Boolean Networks

9

munications networks during his tenure at Bell Laboratories [292], firmly implanting the notion of the binary digit, or bit, as the fundamental atomic unit of information. The use of the bit and its manipulation with the postulates and axioms of Boolean Algebra has since become the ubiquitous methodology for modern information processing tasks and is heavily utilized in many areas such as data communications, software development, and digital circuit design. Switching theory is based upon the ideas of Shannon and comprises a vast amount of theory and techniques regarding the modeling and manipulation of modern transistor-based switching circuits for the purpose of information processing. Due to the very rich philosophical history behind the development of switching theory, this field is commonly referred to as digital or Boolean logic although in reality it is nothing more than the use of a mathematical framework intended to be applied to switching networks. There is no inclusion of the fundamental concept of inference in this application, hence the use of the term logic, while widely accepted, is perhaps misleading or at least misplaced. It would be more appropriate to refer to the modeling framework initiated by Shannon as switching theory. Nevertheless, the important point is that the common use of the two integral values B = {0, 1} to represent the basic unit of information is arbitrary and a side-effect from the application of Boole’s (and ultimately Aristotle’s) concept of a two-state or binary-valued proposition. From a purely mathematical modeling viewpoint, any structure could be used to model the basic unit of information and there is no inherent reason that integers must be used over other mathematical structures such as vectors.

1.1.3. Useful Topics in Linear Algebra Basic Definitions Switching theory is based upon the mathematics that Boole originally devised for symbolic logic manipulations. The class of Boolean Algebras can be defined using a variety of basic operators with a common

10

General Methods

algebra being defined as B, +, ·,¯, 0, 1 where B = {0, 1}, + denotes the addition operator or logical-OR, · denotes the multiplicative operator or logical-AND, ¯ denotes the unary negation or logical-NOT operator, the value 0 serves as the additive identity, and the value 1 serves as the multiplicative identity. This algebra is functionally complete. Multi-bit systems can likewise be constructed through the use of the Cartesian product with B and the algebra correspondingly holds for such multi-bit systems. A linear algebra is defined over a vector space. Vector spaces are characterized by their dimension and contain elements known as vectors. Vectors are one-dimensional arrays of values or components, and the number of the values comprising a vector define the vector space dimension. The dimension is not to be confused with the tensor order. The tensor order is the number of indices required to specify a tensor. Vectors are tensors of unity order since they require only a single index value to specify the number of components. Likewise, scalars may be viewed as zero-order tensors and matrices as second-order tensors. In general, higher-ordered tensors are definable. The formal definition of a vector space is given in Definition 1.1. Definition 1.1 (Vector Space). A vector space consists of a set of vectors and the operations of scaling and addition. In general, a vector space may have a dimension approaching infinity although here we utilize vector spaces with a finite dimension. The scaling operation is a multiplicative operation with operands consisting of a scalar and a vector. The product is then the scaled vector that is also a member of the vector space where each component of the product vector is the scalar multiple of the specified scalar and the original vector component value. The vector addition operation is performed over two operand vectors within the space and the resultant vector sum is comprised of components formed as the scalar sum of corresponding components from the operand vectors. The particular vector spaces we are concerned with are finite-dimensioned Hilbert Spaces. Definition 1.2 (Hilbert Space). A vector space of dimension k with associated unary norm and binary inner product operations.

A Vector Space Method for Boolean Networks

11

Vector Space Mappings When vectors are transformed or mapped within the same n-dimensional Hilbert Space, a square transformation matrix T characterizes the mapping as T : Hn → Hn where T = [tij ]n×n . In this case, the mapping can be considered as a combined scaling and rotation operation over a vector v ∈ Hn . For certain vectors vλ relative to a specific transformation matrix T, the transformation results in a scaled vector λvλ with no rotation occurring, Tvλ = λvλ . Such vectors vλ are referred to as eigenvectors and the corresponding scaling factor λ is the eigenvalue. The set of eigenvalues {λi } may be computed for a specific T through the use of the relationship Tvλ = λvλ . Rearranging this expression, we obtain (Tvλ − λvλ ) = [0] where [0] is the n-dimensional null column vector whose components are all zero-valued. Definition 1.3 (Characteristic Equation of a Square Matrix). The expression (Tvλ − λvλ ) can be rewritten as (T − λI)vλ where I is the identity matrix. Because this expression is equal to the null vector for the case of vλ , the determinate |T − λI| is zero when the scalar λ is an eigenvalue. This observation leads to the definition of the characteristic equation of T denoted as CT (λ) = |T − λI|. The characteristic equation, CT (λ), can be used to mathematically define the set of eigenvalues {λi } as given in Definition 1.4. Definition 1.4 (Eigenvalues of a Square Matrix). The set of eigenvalues corresponding to a square matrix T, denoted as {λi }, are the roots of the characteristic equation, CT (λ) = 0. It should be noted that, in practice, computation of the eigenvalues of a matrix T is generally accomplished with an alternative numerical method rather than through the solution of the characteristic equation since computational instabilities can occur. When vectors are transformed or mapped from an n-dimensional Hilbert Space to an m-dimensional Hilbert Space, a non-square transformation matrix T is used to denote the mapping as T : Hn → Hm

12

General Methods

where T = [tij ]n×m and n = m. Clearly, T is not of full rank since n = m, thus the multiplicative inverse does not exist nor does the matrix have eigenvalues. A characterizing matrix referred to as the Gramian or square Gram matrix is useful for non-square matrices. Definition 1.5 (Gramian or Square Gram Matrix). When the components of a non-square matrix T = [tij ]n×m are complex (i.e., tij ∈ C), the Gramian of T is G = T∗ T. Correspondingly, when T has real- or integral-valued components only (i.e., tij ∈ R or tij ∈ Z), the Gramian reduces to G = TT T. Definition 1.6 (Singular Values of a Matrix). The singular values {si } corresponding to a matrix T are the positive square roots of the eigenvalues of the Gramian of T. The singular values are useful for a certain type of decomposition known as the singular value decomposition of the form T = USV∗ . In the case of the singular value decomposition, U is an n × n unitary matrix, S is a n × m rectangular diagonal matrix whose components sii consist of the singular values, and V is an m × m unitary matrix.

Bra-Ket Notation and the Outer Product To provide compatibility with the quantum logic and computing community, we utilize the bra-ket notation of Paul Dirac [98]. A slight departure from the quantum logic notation is that we represent atomic data values as row vectors rather than column vectors as commonly used in the quantum computing community. While this choice is arbitrary in terms of the underlying mathematics, it has the advantage of more clearly illustrating the concept of the isomorphic relation between truth tables for Boolean functions and transfer matrices in the vector space domain. A column vector is referred to as ket and denoted as |v which is identical to the more standard boldface font notation of v. Likewise, a row vector vT is referred to as bra and denoted as v|. The inner product vT w is

A Vector Space Method for Boolean Networks

13

written as v|w. The relationship between the norm of a vector and the inner product of the vector with itself is v · v = vT v = [L2 (v)]2 = [v]2 = v|v . The outer product is a binary multiplicative operation that can be used to multiply two tensors regardless of their respective order. For the finite-dimensioned Hilbert Spaces we are concerned with in this work, the outer product of two second-order tensors is identical to the Kronecker product of matrices. The outer product of two vectors v and w is denoted as v ⊗ w = u where the dimension of u is the sum of the dimensions of v and w. The use of the bra-ket notation is very convenient in this work since inner products are expressed as v|w while the outer product is |vw|. Note that the outer product is not commutative since |vw| = |wv|. The outer product is useful here for transforming several vectors in lower-dimensioned Hilbert Spaces into a single vector in a higher-dimensioned Hilbert Space. When |v ∈ Hn and |w ∈ Hm , the outer product |vw| ∈ Hn+m . The bra-ket notation is also convenient since the orientation of the respective bra or ket within a mathematical expression implicitly indicates whether a multiplication is carried out using the inner or outer product. The expression A|v indicates that a direct vector-matrix product is to be carried out resulting in a product vector |w of the same dimension as that of |v and furthermore A|v = v|AT for realvalued A. The expression |vA where |v is of dimension n and A of dimension m × p would denote the outer product of v ⊗ A and result in a matrix of dimension (nm) × p. Table 1.1 lists the correspondence between traditional linear algebraic notation and bra-ket notation.

1.1.4. Vector Space Information Models Switching logic variables xi can be expressed as xi = m0 · 0 + m1 · 1 where mi ∈ B. The variable represents an atomic information datum in the form of a binary digit or bit and has the value 1 when m0 = 0 and m1 = 1, and likewise the value 0 when m0 = 1 and m1 = 0. This allows for a convenient definition of the negation operation denoted as x by interchanging the coefficients mi as x = m1 · 0 + m0 · 1.

14

General Methods

Table 1.1. Traditional and bra-ket notation correspondence.

Operation

Linear Algebra

Bra-Ket

inner product v · w = w · v v|w = w|v inner product vT w = wT v v|w = w|v 1 1 L2 (v) = (v · v) 2 v|v 2 L2 norm outer product (vectors) v⊗w |vw| outer product (matrices) A⊗B A⊗B direct product AB AB vector-matrix product w = Av |w = A|v w| = v|A vector-matrix product wT = vT A

The corresponding linear algebraic model for atomic information data values is given in Definition 1.7. Definition 1.7 (Linear Algebraic Model for Information Values). Atomic information values are modeled as the basis vectors 0| and 1| where 0| = [1 0] and 1| = [0 1]. 0| corresponds to the switching algebra bit valued as 0 while 1| corresponds to the switching algebra bit valued as 1. In general, x| = [m0 , m1 ]. Just as multi-bit input vectors can be expressed as elements of Bn , a corresponding vector x| ∈ Hn can be formulated from the individual primary input values xi | ∈ H through the use of the outer product. Definition 1.8 (Canonical Basis Vector). Basis vectors are defined as those vectors whose components are zero-valued except for one single unity-valued component. Observation 1.1 (Specific Information Valuations are Canonical Basis Vectors). The specific valuation of a primary input stimulus vector or primary output response vector for a switching network is a canonical basis vector. Observation 1.1 infers some useful insights with regard to decomposing vectors in higher-dimensioned Hilbert Spaces into a series of outer product factors. Since vectors representing single data values are canonical basis vectors, it is often useful to express the vector in

A Vector Space Method for Boolean Networks

15

an alternative number system base or radix and then to determine the index of the unity-valued component. When not implicitly clear, a value may be written in parentheses with a subscript (in the decimal number system) that is the number system radix. For example, (6)10 | = (110)2 | = (20)3 |. For various applications, it has been necessary to define special additional values in conventional Boolean switching algebras such as the use of a don’t care, often denoted by X, or as in the case of a Hardware Description Language (HDL), an indeterminate value which is also unfortunately denoted as X. Other common values used in HDLs and other Electronic Design Automation (EDA) environments including high-impedance, usually denoted as Z, and several others [149, 150]. The vector space model proposed here allows for such values to be conveniently modeled with natural extensions to the form of the vector representing an information quantity. The concept of an input don’t care versus internal and output don’t cares has been described in [280]. While these are characterized as different types of don’t cares, they really have quite different meanings. The input don’t care is often denoted with a - in cube list format such as the .pla format. For example, the cube denoted as 0-1 in the .pla format indicates that both 001 and 011 are covered. This means that - indicates the middle primary input has both binary digit values of 0 and 1. In the vector space model, this is denoted as the total vector t| where t| = 0| + 1| = [0 1] + [1 0] = [1 1]. Another useful quantity is the null vector indicating that the datum has no value; it is neither 0| nor 1| and is denoted as ∅|. The use of t| and ∅| allows for a complete lattice to be formulated with t| serving as the greatest lower bound (glb) since it covers all other values and ∅| as the least upper bound (lub) since it covers no other values. When a true third value is required, such as that of high-impedance, the dimension of the Hilbert Space may be increased. As an example, if a switching algebra quantity can take on values from the set {0, 1, Z}, the corresponding vector space model could be 0| = [1 0 0], 1| = [0 1 0], and Z| = [0 0 1]. As will be explained in a later subsection, values such as Z| are very useful for modeling switching networks comprised of components such as tri-state buffers or other three-state circuits.

16

x1 x2 .. . xn

General Methods

{f1 , f2 , . . . , fm }

f1 f2 .. .

x1 | x2 | .. .

fm

xn |

F

f1 | f2 | .. . fm |

Figure 1.1. Conceptual models of switching networks.

1.1.5. Switching Network Transfer Matrices Switching networks are conventionally modeled using a set of Boolean switching functions where each of the m network outputs are modeled with separate switching functions. We propose to model such networks as the transformation of a vector x| ∈ Hn representing the network input stimulus to a corresponding vector f | ∈ Hm and the specific functionality of the network with a transformation matrix F. Thus a switching network is modeled as the mathematical mapping of a vector from one Hilbert Space to another, F : Hn → Hm . Figure 1.1 contains a conceptual diagram of a switching network modeled conventionally with Boolean Algebra on the left and with the proposed vector space model on the right. The set of switching functions {f1 , f2 , . . . , fm } are symbolically denoted using operators from a Boolean Algebra such as B, +, ·,¯, 0, 1 where the atomic network elements or logic gates correspond to algebraic operators or expressions involving the operators. For example, an AND gate corresponds to the multiplicative operator denoted by · and the XOR gate corresponds to the expression x·y +x·y. The primary inputs xi are switching variables over the set B as are the individual network primary outputs denoted as fi . A specific input stimulus vector may be denoted as an n-dimensional vector whose components are each xi ∈ B, or alternatively, as a single element from Bn . Likewise, the output response vector can be considered to be an element of Bm . The proposed model utilizes a vector space transformation with a corresponding linear algebra. Definition 1.9 defines the proposed algebra.

A Vector Space Method for Boolean Networks

17

Definition 1.9 (Linear Algebraic Model for Switching Networks). The algebra denoted as H, +, ·, 0|, 1| is used to model the functionality of a switching network where + denotes the vector addition, · represents the inner product, and 0| and 1| represent the canonical basis vectors of H. The objective of this subsection is to derive the transformation matrix F which appears in the right-hand portion of Figure 1.1. F characterizes a particular switching network and is a single transformation matrix that projects network input stimulus vectors x| ∈ Hn to corresponding network output response vectors f | ∈ Hm , or, more concisely F : Hn → Hm . For the purpose of consistency with the literature in classical linear systems theory [66] and in the quantum logic community, we interchangeably refer to the transformation matrix F as a transfer matrix since it can be used to obtain the output response of a network by a multiplicative operation with a corresponding input stimulus and is thus analogous to a transfer function as commonly used in classical linear systems analysis. The main mathematical difference between quantum logic network models and conventional switching networks modeled in the vector space is that the transfer matrices describing them are not unitary. Rather the transformation matrices are, in general, not of full rank and often they are non-square. For this reason, matrix inverses are undefined and we shall make use of the Moore-Penrose pseudo-inverse for our purposes [224] when needed. We shall first consider the derivation of a projection matrix Pi that projects a single vector xi | ∈ Hn to a corresponding vector fi | ∈ Hm , or, equivalently, Pi : Hn → Hm . The projection is in the form of a linear transformation, thus, xi |Pi = fi |. Taking the outer product of each side of this equation by multiplying with |xi yields Equation (1.1) |xi xi |Pi = |xi fi | .

(1.1)

The matrix Pi is of dimension n × m and is, in general, non-square and when it does happen to be square, it is often not of full rank. −1 Thus |xi xi | does not usually exist and cannot be formed to solve Equation (1.1). However, the form of |xi xi | is that of a Dirac-delta

18

General Methods

matrix due to Observation 1.1, xi | is a canonical basis vector since it represents the ith valuation of an input stimulus vector. Definition 1.10 (Dirac-delta Matrix). A Dirac-delta matrix is denoted as δ pq = [δij ]n×m and contains a single unity-valued component defined by a set of indices, pq. The components of the Dirac-delta matrix δij are given by the scalar Dirac-delta function 1, if i = p, j = q δij = . 0, otherwise Using Definition 1.10, Equation (1.1) can be rewritten as δ pq Pi = |xi fi | . Because the row-space of δ pq consists of n − 1 null row vectors and the pth row vector is a canonical basis vector q|, n − 1 row vectors of Pi have infinitely many solutions including the solution of setting them equal to the null vector ∅| ∈ Hm . The q th row vector of Pi is not null and is denoted as pq |. Likewise, the matrix |xi fi | consists of n − 1 null row vectors with the q th row vector equal to fi |. Using these observations, we choose the solution of allowing the n − 1 row vectors of Pi to be ∅| and we rewrite Equation (1.1) as pq | = fi | .

(1.2)

The preceding derivation allows Lemma 1.1 to be stated. Lemma 1.1 (ith Row Vector of Pi ). A projection matrix Pi for a logic network characterized by a transfer matrix F may be written as an n × m matrix with n − 1 null row vectors ∅| ∈ Hm and with the ith row vector equivalent to the ith valuation of the logic network fi | as pi | = fi |. In determining the form of the overall transfer matrix F that characterizes a switching network, it is necessary to formulate the matrix such that the input stimulus xi | when multiplied with F results in

A Vector Space Method for Boolean Networks

19

the ith valuation of the switching network. Lemma 1.1 proves that the ith row vector of Pi is identically equal to the output response of the logic network. Observation 1.2 (Complete Input Stimulus Matrix is I). Because each specific input stimulus vector for a logic network is in the form of a canonical basis vector, the collection of all input stimuli vectors forms a complete canonical basis of Hn . The matrix X is formed as the set of all possible input stimuli as row vectors ordered from i = 0 to i = n − 1, as well as from top to bottom and is equivalent to the identity matrix I. Theorem 1.1 (Transfer Matrix of a Logic Network). The transfer matrix F characterizing a logic network with n primary inputs and m primary outputs is expressed as shown in Equation (1.3) F=

n 2 −1

Pi .

(1.3)

i=0

Using (1.1) and Observation 1.2 with the result (1.3) of Theorem 1.1, Corollary 1.1 results allowing the transfer matrix of a logic network to be formulated as a sum of outer products. Corollary 1.1 (Transfer Matrix as Sum of Outer Products). The transfer matrix for a logic network characterized by F can be formed as a sum of outer products as given in F=

n 2 −1

|ifi | .

(1.4)

i=0

1.1.6. Transfer Matrices of Switching Circuits Transfer Matrices of Logic Gates Equation (1.3) may be used to determine the transfer matrices for common switching network components serving as atomic operators

20

General Methods

or logic gates. Example 1.1 illustrates the calculation of the transfer matrix A for a two-input AND gate. Example 1.1 (Transfer Matrix for 2-input AND Gate). Consider a 2-input AND gate. The four possible input stimuli are 00|, 01|, 10|, and 11|. The projection matrix relationships for the AND gate then become 00|P0 = 0|, 01|P1 = 0|, 10|P2 = 0|, and 11|P3 = 1|. From Lemma 1.1, the projection matrices are of the form: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 0 0 0 0 0 0 ⎢ 0 0⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ P1 = ⎢1 0⎥ P2 = ⎢0 0⎥ P3 = ⎢0 0⎥ P0 = ⎢ ⎣ 0 0⎦ ⎣0 0⎦ ⎣ 1 0⎦ ⎣ 0 0⎦ 0 0 0 0 0 0 0 1 The overall transfer matrix for a two-input AND gate A is given by Equation (1.3). A=

3 i=0

⎡

1 ⎢0 ⎢ =⎣ 0 0

P i = P 0 + P 1 + P 2 + P3 ⎤ ⎡ 0 0 ⎢ 0⎥ ⎥ + ⎢1 0⎦ ⎣0 0 0

⎤ ⎡ 0 0 ⎢ 0⎥ ⎥ + ⎢0 0⎦ ⎣1 0 0

⎤ ⎡ 0 0 ⎢ 0⎥ ⎥ + ⎢0 0⎦ ⎣ 0 0 0

⎤ ⎡ 1 0 ⎢ 0⎥ ⎥ = ⎢1 0⎦ ⎣ 1 0 1

⎤ 0 0⎥ ⎥ 0⎦ 1

This calculation can be carried out for any logic network, however, it is clearly exponentially complex as the number of network inputs increase. However, for small two- and three-input sub-circuits and structures, the calculation can be easily performed. In a following subsection, we show that the transfer matrix for a larger network can be hierarchically formulated using smaller transfer matrices and thus the exponential complexity present in Equation (1.3) can be avoided. Three important logic network topological configurations can occur and transfer matrices are required for these instances as well as a library of logic gate matrices to employ the hierarchical methodology.

Fan-Out Structure Transfer Matrix A fan-out is an electrical circuit node in which a single conducting wire carries a signal that drives two or more conductors. Figure 1.2

A Vector Space Method for Boolean Networks

(a)

21

(b)

(c)

Figure 1.2. Circuit structures: (a) fan-out, (b) fan-in, (c) crossover.

(a) shows the circuit diagram topology for a fan-out of size two. From a switching circuit point of view, this is analogous to copying an input datum to two or more resultant data. Electrically, there are clearly limitations to fan-out, but a discussion on this topic is beyond the scope of this section. Fan-outs or the copying of information values is forbidden in quantum logic due to the no-cloning theorem; however, fan-out is permissible and is responsible for circuit area reduction in conventional electronic switching circuits. The transfer matrix FO for a two-output fan-out is computed using Equation (1.4) as: FO = |000| + |111|

1 0 0 1 0 0 0 + = 0 1

1 0 0 0 = . 0 0 0 1

0

0

1

Fan-In Structure Transfer Matrix The fan-in structure, depicted in Figure 1.2 (b), is typically used only in special cases. Certain electronic technologies allow for fan-in structures due to other circuit element outputs being at a high-impedance or other state. As an example, certain current-mode circuitry allow for a fan-in to operate as an AND-type operation. In those cases, the fan-in transfer matrix is identical to that of the matrix characterizing the appropriate switching circuit gate. Many voltage-mode circuits can only include fan-in structures when the inputs are driven by disjoint outputs thereby avoiding a short-circuit situation. In general, fan-ins are not allowed and thus we model them in this case with null row vectors for the disallowed case. Null vectors occur in the transfer

22

General Methods

matrix when those specific disallowed input cases are simply excluded in the calculation of the matrix. For these reasons, the fan-in transfer matrix is technology dependent. In the case where fan-ins with differing input stimulus values are disallowed, the transfer matrix may be computed using the relationship in Equation (1.4) as: ⎡ ⎤ 1 ⎢ 0⎥ ⎥ FI = |000| + |111| = ⎢ ⎣ 0⎦ 1 0

⎡ ⎤ 0

⎢0⎥ ⎥ 0 +⎢ ⎣0⎦ 0 1

⎡

1

⎢0 1 =⎢ ⎣0 0

⎤ 0 0⎥ ⎥ . 0⎦ 1

Crossover Structure Transfer Matrix In a later subsection we will describe how the transfer matrix can be constructed for an interconnection of basic operators, or, a switching network. Because the network transfer matrix is dependent upon the topology of the network, the case where conductors cross one another in the plane must be accounted for. Multiple crossovers can be dealt with as a series of single crossovers, thus a fundamental structure to be considered is the single crossing of two conductors that are electrically isolated. Such a structure is depicted in Figure 1.4 (c). The crossover matrix expresses the four input output relationships 00| → 00|, 01| → 10|, 10| → 01|, and 11| → 11|. Equation (1.4) indicates that the transfer matrix can be computed as: C = |0000| + |0110| + |1001| + |1111| ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 0 0 1 ⎢0⎥ ⎢0⎥ ⎢ 1⎥ ⎢0⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ =⎢ ⎣0⎦ [1 0 0 0] + ⎣0⎦ [0 0 1 0] + ⎣1⎦ [0 1 0 0] + ⎣0⎦ [0 0 0 1] 1 0 0 0 ⎡ ⎤ 1 0 0 0 ⎢0 0 1 0⎥ ⎥ =⎢ ⎣0 1 0 0⎦ . 0 0 0 1

A Vector Space Method for Boolean Networks

⎡

1 ⎢1 ⎢ A=⎣ 1 0

⎡

0 ⎢0 NA = ⎣ 0 1

⎤ 0 0⎥ ⎥ 0⎦ 1

⎡

1 ⎢0 ⎢ O=⎣ 0 0

⎤ 0 1⎥ ⎥ 1⎦ 1

⎡

1 ⎢0 ⎢ X=⎣ 0 1

⎤ ⎤ ⎡ ⎡ 1 0 1 0 1⎥ 1 0 ⎥ ⎢ ⎢1 NO = NX = ⎣ 1 0⎦ ⎣1 1⎦ 0 1 0 0

⎤ ⎡ 0 1 ⎥ ⎢0 1⎥ ⎢ FI = ⎣ 1⎦ 0 0 0

⎤ 0 0⎥ ⎥ 0⎦ 1

23

⎡

1 ⎢0 ⎢ C=⎣ 0 0

0 0 1 0

0 1 0 0

⎤ 0 0⎥ ⎥ I= 1 0 0⎦ 0 1 1

⎤ 1 0⎥ 0 1 1 0 0 0 NI = FO = 0⎦ 1 0 0 0 0 1 1

I=

1 0 0 1

Figure 1.3. Basic logic network elements and transfer matrices.

Other Basic Switching Elements Other basic switching circuit elements can be computed in a similar manner to that shown in Example 1.1 for the AND gate. Two cases deserving of a mention are the non-inverting buffer and the single conductor or pass-through line. In both of these cases, the appropriate transfer matrix is the 2×2 identity matrix. The transfer matrices indicate that the logic network model produces the same non-transformed outputs, but this does not indicate that non-inverting buffers can be replaced with single pass-through conductors. Non-inverting buffers are often included to perform slack-matching timing operations or to restore the voltage or current levels to acceptable values. The abstraction level of the switching models described here do not include such information regarding timing and voltage or current levels, thus the transfer matrices remain identical and are shown on the right-hand side of Figure 1.3 along with several other common basic switching network elements and their respective transfer matrices. It is noted that gates with negated outputs are related to their positive-polarity output counterparts through the relationship Ng = [1] − g where g indicates the gate type and [1] denotes the matrix whose components are all unity-valued.

24

General Methods

φ1

φ2

φ3

x1 |

f1 |

x2 |

f2 | (a)

module exmp-circ (f1, f2, x1, x2); output f1, f2; input x1, x2; or g1 (f2, x1, x2); not g1 (f1, f2); endmodule

(b)

Figure 1.4. Example of a logic network: (a) circuit diagram, (b) HDL.

Logic Network Transfer Matrices Logic networks consist of interconnections of the basic elements depicted in Figure 1.3. In general, the transfer matrices can be computed using the relationship in Equation (1.4). Clearly, this method is impractical for larger networks since it requires the formulation and summation of an exponentially large number of outer product terms. A more efficient method for the determination of a transfer matrix for a logic network allows the matrix to be formed through a traversal of a representation of the topological structure of the network. Such representations are commonly represented as structural HDL descriptions. Example 1.2 refers to Figure 1.4 which depicts a simple logic network (a) in the form of a circuit diagram and (b) an accompanying Verilog structural description. Such structural HDL forms are textual representations of the corresponding circuit diagrams and it is a common task for modern EDA algorithms to parse the HDL into an internal data structure representing the topological circuit diagram. Example 1.2 (Circuit Diagram and Logic Network HDL). A twoinput, two-output logic network is depicted as a circuit diagram and in the form of a Verilog structural netlist in Figure 1.4. The circuit diagram in Figure 1.4 is annotated with four vertical dashed lines depicting serial partitions. Within each of these partitions, the logic network elements are in parallel and thus the lines crossing each partition line each carry individually distinct logic values corresponding to a particular input stimulus vector. These individual logic vectors are elements of H2 and may be combined into a sin-

A Vector Space Method for Boolean Networks

25

gle vector in a higher-dimensioned vector space by multiplying them with the outer product operation. The outer product operation is noncommutative, thus, the order in which the multiplications are applied must be consistent for each partition. In the work presented here, we arbitrarily choose to use the topmost value as the leftmost factor in the outer product calculation. Thus, the input stimulus vector is represented as x1 x2 | = x1 | ⊗ x2 |. Likewise, the output response vector is f1 f2 | = f1 | ⊗ f2 |. The partitions are denoted as φ1 , φ2 , and φ3 . The vector describing the logic value at partition φ1 results from a transformation of x1 x2 | ∈ H2 to a corresponding vector w1 | ∈ H at the output net of the OR gate. w1 | can thus be obtained through a direct vectormatrix product operation x1 x2 |O = w1 |. Likewise, the values at partition φ2 result from w1 |FO = w2 w3 |. Partition φ3 consists of two parallel circuit elements, an inverter (or NOT gate) near the top of the figure and a pass-through element near the bottom resulting in f1 | and f2 |. The output response may be formed as a single vector through the outer product operation f1 | ⊗ f2 | = f1 f2 |. Since f1 | = w2 |N and f2 | = w3 |I, these two relationships may be combined using the outer product yielding f1 f2 | = (N ⊗ I)w2 w3 |. This observation allows the overall transformation matrix for a particular partition to be formed as the outer product of each of the parallel elements comprising the partition. Furthermore, the overall partition matrices of each partition can be combined using the direct matrix product due to the multiplicative relationship among the input stimuli and output responses using the transfer matrix. These observations allow the overall transfer matrix T for the logic network in Figure 1.4 to be computed as: T = (O)(FO)(N ⊗ I) . This technique of extracting a transfer matrix directly from a structural netlist is important since the method allows for the matrix to be formed without resorting to forming an exponential number of matrices to be summed together as described in the relationship in Equation (1.4). The overall complexity of the computation becomes O(N ) where N is the number of basic logic elements. The explicit representation of the transfer matrix for the circuit in Figure 1.4 is

26

General Methods

given in Equation 1.5 as: T = (O)(FO)(N ⊗ I) ⎡ ⎤ 1 0

⎢ 0 1⎥ 1 0 0 0 0 1 1 ⎥ =⎢ ⊗ ⎣ 0 1⎦ 0 0 0 1 1 0 0 0 1 ⎤ ⎡ ⎤⎡ 1 0 0 0 0 0 1 0 ⎢0 0 0 1⎥ ⎢0 0 0 1⎥ ⎥ ⎥⎢ =⎢ ⎣0 0 0 1⎦ ⎣1 0 0 0⎦ 0 1 0 0 0 0 0 1 ⎤ ⎡ 0 0 1 0 ⎢ 0 1 0 0⎥ ⎥ =⎢ ⎣ 0 1 0 0⎦ . 0 1 0 0

0 1

(1.5)

While the netlist partitioning and traversal method reduces the number of computations required to determine a logic network transfer matrix, the overall matrix remains exponentially large. Fortunately, efficient data structures exist for representing the transfer matrices that, on average, allow them to require greatly reduced storage.

1.1.7. Switching Network Simulation Determination of Network Output Response The transfer matrix characterizes a switching network and provides a method for simulation; the determination of a network output response given an input stimulus through a multiplicative operation. Justification is the inverse operation where an input stimulus is determined given a network transfer matrix and output response. Theorem 1.2 proves that a switching network output response is obtained through a multiplicative operation among the transfer matrix and input stimulus vector.

A Vector Space Method for Boolean Networks

27

Lemma 1.2 (Independence of Network Input Vectors). The collection of vectors representing a specific valuation of switching network primary input assignments is linearly independent. Proof. Consider two vectors xi | and xj | representing two specific valuations of the n primary inputs of a switching network. A specific valuation is the assignment of either 0| or 1| to each primary input. Expressing the input stimulus vectors as elements of Hn , they are written as (b0 b1 . . . bn−1 )2 | and are in the form of canonic basis vectors over Hn . Canonic basis vectors have a unity-valued norm since xi |xi = 1 and form inner products of the form: 0, i = j xi |xj = δij = . 1, i = j By definition, two vectors with a non-zero norm and whose inner product is zero are linearly independent. Theorem 1.2 (Switching Network Output Response). The output response of a switching network fi | is obtained as the direct vectormatrix product of the characterizing transfer matrix F and an input stimulus vector xi | as expressed in Equation 1.6. fi | = xi |F

(1.6)

Proof. The transfer matrix F characterizing a logic network is in the form of Equation (1.4). Thus, multiplying this expression with an input stimulus vector xi | yields: xi |F = xi |

n 2 −1

j=0

|xj fj | =

n 2 −1

xi |xj fj | .

j=0

Using the result of Lemma 1.2, the summation argument is zero-valued for all but the case where i = j, and the result becomes fi | = xi |F.

Multiple output responses may be obtained with a single evaluation of Equation (1.6) by assigning the value t| = 0| + 1| to one or more

28

General Methods

switching network primary inputs. Furthermore, all possible network output responses may be obtained by formulating an input stimulus vector of the form (tt . . . t)2 | allowing for a convenient method to determine the total network response. Using t| as a component in an input stimulus vector is analogous to the various symbolic simulation methods that are often employed in simulation algorithms where the switching network is modeled using Boolean Algebra. Example 1.3 illustrates how multiple output responses can be obtained through the execution of a single vector-matrix multiplication. Example 1.3 (Network Simulation). Consider the logic network depicted in Figure 1.4 (a) whose corresponding transfer matrix T is given in Equation (1.5). If it is desired to obtain all achievable network responses when only the topmost primary input x1 is constrained to 0|, the input stimulus vector is formed as: x1 x2 | = 0t| = 0| ⊗ t| = 1

1

0

0 .

Evaluating Equation (1.6) with 0t| results in an output response vector of the form: ⎡

⎤ 0 0 1 0

⎢0 1 0 0⎥ ⎥ 0t|T = 1 1 0 0 ⎢ ⎣0 1 0 0⎦ 0 1 0 0

= 0 1 1 0

= 0 1 0 0 + 0 1 0 0 = (1)10 | + (2)10 | = (01)2 | + (10)2 | . Thus, with a single calculation, it is determined that the inputs 00| and 01| result in output responses of 01| and 10|. Unfortunately, due to commutativity of the vector addition operation, it is not possible to determine which of the two specific input valuations represented by input stimulus 0t| cause one of the specific output valuations to occur. The specific mappings can be obtained through the evaluation of another output response by using input stimulus 00| or 01|. The total output response can be similarly obtained by forming the

A Vector Space Method for Boolean Networks

29

total input stimulus vector tt| and evaluating Equation (1.6) as: ⎡ ⎤ 0 0 1 0

⎢0 1 0 0⎥ ⎥ tt|T = 1 1 1 1 ⎢ ⎣0 1 0 0⎦ 0 1 0 0

= 0 3 1 0

=3 0 1 0 0 + 0 1 0 0 = 3(1)10 | + (2)10 | = 3(01)2 | + (10)2 | The total output response is then observed to be three occurrences of 10| and a single occurrence of 01| for the other response. The total output response can be very useful since it allows the prompt determination of which valuations are not possible in a network.

Transfer Matrix Properties The structure of the transfer matrix representing a switching network can provide useful information concerning characteristics of the network. The transfer matrix can be considered to be a column vector whose components are row vectors that form the set of possible output responses. Likewise, the set of all possible input stimuli are represented as a column vector whose components are each row vectors representing the total collection of unique input stimulus vector valuations. As previously observed, each distinct input stimulus vector valuation is a canonical basis vector xi | ∈ Hn . The column vector of all such input stimuli, when written in the usual order appearing in a truth table, form the identity matrix I. Multiplying the transfer matrix T with a distinct input valuation simply selects the appropriate row vector in T corresponding to the output response resulting from the specific input stimulus. A useful result in terms of implementation of these methods is that the transfer matrix is isomorphic to a conventional truth table representation when the network is modeled using switching algebra. This

30

General Methods

x1 x2

f1 f2

x1 | x2 |

f1 | f2 |

0 0 1 1

1 0 0 0

0| 0| 1| 1|

1| 0| 0| 0|

0 1 0 1

0 1 1 1

0| 1| 0| 1|

0| 1| 1| 1|

x1 x2 |

f1 f2 |

x1 x2 |

f1 f2 |

00| 01| 10| 11|

10| 01| 01| 01|

[1000] [0100] [0010] [0001]

[0010] [0100] [0100] [0100]

⎡

0 ⎢0 ⎢ T=⎣ 0 0

0 1 1 1

1 0 0 0

⎤ 0 0⎥ ⎥ 0⎦ 0

Figure 1.5. Truth table isomorphism example.

observation allows any of the conventional methods for compact representations of switching functions such as a cube list or a Binary Decision Diagram (BDD) to also be employed for representation of a switching network transfer matrix. The combination of using efficient transfer matrix representations with the extraction of the transfer matrix from a structural netlist allows the vector space model to be competitive with conventional switching algebraic approaches. The isomorphic relationship between switching algebraic and vector space models is stated in the following observation. Observation 1.3 (Truth Table Isomorphism). A truth table representation for a switching network is isomorphic to the corresponding transfer matrix. By definition, we model switching constants in xi ∈ B as row vectors xi | ∈ H, that is 0 → 0| and 1 → 1|. Thus, a simple substitution of switching values to their corresponding vector space models can be accomplished to derive the transfer matrix. Example 1.4 illustrates this principle for the switching network example given in Figure 1.4. Example 1.4 (Truth Table Isomorphism Example). Figure 1.5 depicts the truth table for the network in Figure 1.4 (a) on the top left and the corresponding transfer matrix on the bottom right through a simple substitution of switching values in B to their corresponding vector model values in H. Individual valuations in H are combined into

A Vector Space Method for Boolean Networks

31

a single higher-dimensioned vector in Hm through the application of the outer product to expand the dimensionality of the vector space. Example 1.3 indicates that it is not possible for the logic network in Figure 1.4 (a) to ever provide output responses in the form of 00| or 11|. Although this conclusion is based upon the results of network output response calculations, it is possible to determine the number of and to identify the non-possible output responses directly from the form of the transfer matrix. Lemma 1.3 (Allowable Output Response of a Logic Network). The number of possible distinct output responses of a logic network characterized by a transfer matrix T is equivalent to 2m − Nnull where Nnull is the number of null column vectors comprising T.

Proof. The non-standard approach of indexing column and row vectors of a matrix by using indices in the range [0, 2n − 1] is employed since it allows the row vector index value of a transfer matrix to directly yield the corresponding input stimulus vector that results in a particular output response occurring. From the preceding discussion, each distinct input stimulus vector is a canonical basis vector i| ∈ Hn . Thus, the output response calculation simply selects a row vector in T as the corresponding output response vector. If a column vector of T is null, it indicates that no possible input stimulus will result in that particular output response denoted by j| where j is the column vector index of T and where j = [0, 2m − 1] for a network with m primary outputs. The total number of null column vectors comprising T is denoted as Nnull , therefore the total number of permissible output responses of a switching network characterized by T is 2m −Nnull .

A single-output switching network is tautological when the output response is 1| regardless of the primary input assignment. Due to the observations of the structure of a transfer matrix, it is apparent that a tautological network must have a characterizing transfer matrix whose column vector with index j = 0 is |∅∅ . . . ∅ and whose j = 1 indexed column vector is |tt . . . t. Likewise, a contradictory switching network is the inverse of a tautological network and always yields an output response of 0|. The contradictory network must have a characterizing

32

General Methods

transfer matrix whose j = 0 column vector is the total vector |tt . . . t and whose j = 1 vector is null. In general, the form of a transfer matrix is T = [tij ]N ×M where log2 (N ) = n the number of primary inputs and log2 (M ) = m, the number of primary outputs. In the case where N = M , the network is comprised of an equal number of primary inputs and outputs. Furthermore, when T is of full rank and N = M , a bijective mapping is present since the collection of input stimuli are each uniquely mapped to a corresponding output response. This special class of switching networks are said to be logically reversible and is a subject of interest in the research community due to the results of Landauer [190] which states that such networks, when used to process information, do not dissipate power due to information loss. Depending upon the implementation technology, a reversible switching network may also allow an output response to be applied to the physical implementation of a network and the primary inputs to then produce the resulting input stimulus. Such is not the case for common electronic switching networks implemented in a technology such as static Complementary Metal-Oxide-Semiconductor (CMOS) electronic transistor networks, thus this class of networks is logically reversible, but not physically reversible. Quantum logic networks are necessarily both logically and physically reversible due to the laws of quantum mechanics. In the case of quantum logic networks, other properties of their transfer matrices are also in place, such as the matrices being unitary and being comprised of complex-valued coefficients. The vector space model presented in this work for conventional switching networks can thus be viewed as a superset of the special case of reversible switching networks and provides a convenient mathematically unifying theory for modeling conventional reversible and quantum logic networks with the general case of irreversible switching networks where N = M are commonly encountered. This unification of mathematical modeling is one advantage of the approach described here as it provides a means for comparing network functionality among these different forms of physical implementation.

A Vector Space Method for Boolean Networks

33

1.1.8. Switching Network Justification Several EDA tasks require the determination of an input stimulus vector given a characterization of the network and an output response, referred to as the justification problem. Performing a network justification using the vector space model requires the solution of fi | = xi |T for xi | when T and fi | are known. The justification problem as described within the framework of the vector space model is simply the determination of the inverse T−1 if the network is reversible. By definition, the inverse T−1 always exists and is unique for reversible networks since T is square and of full rank. Unfortunately, a large number of switching networks currently in use are irreversible and thus the justification problem cannot be solved in this manner due to the non-existence of T−1 . Mathematically, the solution of fi | = xi |T for xi | when N = M results in two cases. When N > M , the system is over-constrained and multiple solutions exist. Conversely when N < M , the system is under-constrained. One approach for the determination of xi | when N = M is to utilize the Moore-Penrose pseudo-inverse of T denoted as T+ [224]. Equation (1.7) gives the form of the pseudo-inverse. −1 (T∗ T) T∗ , N > M , overspecified + (1.7) T = −1 ∗ T (TT∗ ) , N < M , underspecified In Equation (1.7), T∗ denotes the complex-transpose of T and the square matrices within the parentheses are the square Gram or Gramians of T. For the binary switching networks considered here, the components of T are always real-valued, hence T∗ = TT where TT denotes the transpose. The use of the pseudo-inverse allows for a solution to the justification problem in the form of xi | = fi |T+ . In the case of over constrained systems, multiple solutions exist and the solution obtained using the pseudo-inverse is the single solution that has minimal relative error to all the solutions in a least-squared sense. Fortunately, the error is, by definition, zero-valued since it is known that the solutions are all of the form of 0| or 1|, hence the justification solution using TT is the exact solution. Likewise, in the case of under constrained systems, the pseudo-inverse provides a best-fit solution in the least squared error sense to the exact solution. Again, because the exact solution is known to be either 1| or 0|, the least-squared

34

General Methods

error is zero-valued. This fortunate set of circumstances allows the pseudo-inverse to be employed resulting in an exact solution to the justification problem. The square Gram matrix of T is always invertible and the positive square roots of the eigenvalues of the Gramians are the singular values of T. For the class of possible matrices representing a switching network T, the Gramian is always in the form of a diagonal matrix and the diagonal elements yield further information about the switching network as given in the following Lemmas and Theorems. Lemma 1.4 (Gramian of Single-output Network). The diagonal elements of the 2 × 2 Gramian of a matrix T characterizing an ninput, single output switching network are the number of maxterms and minterms of the network. Proof. In general, the form of the transfer matrix T for an n-input, single-output switching network is of the form of a 2×2n matrix. This can be written as a 2-dimensional row vector composed of column vectors |f0 and |f1 . Due to the definition of 0| and 1|, the sum of the components of column vector |f0 yields the number of maxterms, Nmax , and the sum of components of column vector |f1 yields the number of minterms, Nmin . Furthermore, the inner product of each column vector |fi with itself, fi |fi , is equivalent to the sum of the components. The transfer matrix T has the form ⎡ ⎤ f0,0 f1,0 ⎢ f1,1 ⎥ ⎥

⎢ f0,1 ⎢ f1,2 ⎥ T = |f0 |f1 = ⎢ f0,2 ⎥ ⎣ ··· ··· ⎦ f0,2n −1 f1,2n −1

where the ith row vector is of the form fi | = f0,i f1,i and is either a maxterm, 0|, or a minterm, 1|. Furthermore, since (f0,i , f1,i ) ∈ B and f0,i + f1,i = 1, T can be rewritten as ⎡ ⎤ 1 − f1,0 f1,0 ⎢ f1,1 ⎥ ⎥

⎢ 1 − f1,1 ⎢ f1,2 ⎥ T = |1 − f1 |f1 = ⎢ 1 − f1,2 ⎥ . ⎣ ··· ··· ⎦ 1 − f1,2n −1 f1,2n −1

A Vector Space Method for Boolean Networks

35

The Gramian of T becomes ⎡ ⎢ ⎢ n 1−f 1−f 1−f . . . 1−f ⎢ 1,0 1,1 1,2 1,2 −1 TT T = ⎢ f1,0 f1,1 f1,2 . . . f1,2n −1 ⎢ ⎣

=

=

2 − f1 |f1 0 n

Nmax 0

2 − Nmin 0 = f1 |f1 0 0 . n

1 − f1,0 1 − f1,1 1 − f1,2 .. .

f1,0 f1,1 f1,2 .. .

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

1 − f1,2n −1 f1,2n −1 0 Nmin

Nmin

Theorem 1.3 (Self-Inner Product of a Column Vector of T). The inner product of the ith column vector of T with itself yields the number of output response valuations equivalent to i|. Proof. A network with M primary outputs may be described with M truth tables. A single truth table with M output columns may be written as shown in Figure 1.5. When the M output columns are expressed as a single column of row vectors, each row vector is of the form of a canonical basis vector i|, and contains a single unity-valued component in the ith position with all other components being zerovalued. Using this observation and generalizing the derivation in the proof of Lemma 1.4 yields the result of the theorem. Example 1.5 illustrates how the pseudo-inverse of a transfer matrix can be used to justify the inputs of the switching network in Figure 1.4. Example 1.5 (Justification Example with Pseudo-inverse of T). Using Equation (1.7), the pseudo-inverse is calculated as ⎡ ⎤ ⎤⎡ ⎤ ⎡ 0 0 0 0 0 0 0 0 0 0 0 0 ⎢0 1 0 0⎥ ⎢0 1 1 1⎥ ⎢0 1 1 1 ⎥ 3 3 3 3⎥ ⎥⎢ ⎥ ⎢ T+ = ⎢ ⎣0 0 1 0⎦ ⎣1 0 0 0⎦ = ⎣1 0 0 0 ⎦ . 0 0 0 0 0 0 0 0 0 0 0 0

36

General Methods

Assuming the output response is f1 f2 | = 10|, the corresponding input stimulus can be computed using the pseudo-inverse to justify 10| as x1 x2 | = 10|T+

⎡

0

⎢0 = 0 0 1 0 ⎢ ⎣1 0

= 1 0 0 0

0

0

1 3

1 3

0 0

0 0

0

⎤

1⎥ 3⎥

0⎦ 0

= 00| . Accordingly, the justification of f1 f2 | = 01| is calculated as x1 x2 | = 01|T+ = 0

1

⎡

0

0

⎢0 0 ⎢ ⎣1 0

1

0

0

1 3

1 3

0 0

= 0 13 31 3 1 = (01| + 10| + 11|) . 3

0 0

0

⎤

1⎥ 3⎥

0⎦ 0

This result indicates that the three different input stimulus vectors 01| + 10| + 11| justify f1 f2 | = 01|. The coefficient 13 is a scaling side-effect of the pseudo-inverse required to ensure that TT+ = T+ T = I . When an infeasible output response is justified, the null vector results. Consider the output assignment of f1 f2 | = 11|. The justification calculation becomes x1 x2 | = 11|T+

⎡

0 0

⎢0 1 3 = 0 0 0 1 ⎢ ⎣1 0 0 0

= 0 0 0 0 = ∅∅|

0 1 3

0 0 .

0

⎤

1⎥ 3⎥

0⎦ 0

A Vector Space Method for Boolean Networks

37

Finally, a total vector can conveniently be used to justify subsets of output responses. As an example, if it is desired to justify the case f2 | = 1| regardless of the valuation of f1 |, an output response vector of the form t1| can be used in the justification calculation. Likewise, the total justification can be computed using f1 f2 | = tt| x1 x2 | = tt|T+

⎡

= 1

1

1

= 1

1 3

1 3

0

⎢0 1 ⎢ ⎣1 0

1

0

0

1 3

1 3

0 0

0 0

0

⎤

1⎥ 3⎥

0⎦ 0

3

1 = 00| + (01| + 10| + 11|) . 3 In this latter case, the scaling coefficients of the input stimuli vectors yield useful information regarding the distribution of distinct input stimuli. Because 01|, 10|, and 11| have a common multiple of 13 , it can be concluded that they correspond to the same output response vector.

The Justification Matrix Calculation of the pseudo-inverse is easily accomplished due to the fact that the Gramian of T is diagonal, thus its inverse is also diagonal, with the scalar multiplicative inverse of the diagonal values of the Gramian being present in the corresponding diagonal locations of the inverse. Nevertheless, the computation of the pseudo-inverse can be avoided altogether by simply using the transpose of T in place of T+ . We define the justification matrix in the following Definition. Definition 1.11 (Justification Matrix). The Justification Matrix for a switching network characterized by a transfer function T is denoted as TJ and is defined to be the transpose of the transfer matrix. TJ = TT

(1.8)

38

General Methods

The advantage of using the justification matrix is that a justification computation may be accomplished without computing a pseudoinverse matrix. The disadvantage is that scaling coefficients are no longer present in the results, thus it is possible to determine a set of justifying input values for a given output response, but it is no longer possible to determine how they are distributed amongst all possible input stimuli. Example 1.6 illustrates the use of the justification matrix TJ . Example 1.6 (Justification with Pseudo-inverse of T). Using Equation (1.8), the switching network depicted in Figure 1.4 (a) can be justified for the output response constraint that f2 | = 1| as x1 x2 | = t1|TJ = 0

1

⎡

0

0

⎢0 1 ⎢ ⎣1 0

0 1 0 0

0 1 0 0

⎤ 0 1⎥ ⎥ 0⎦ 0

= 0 1 1 1 = 01| + 10| + 11| . The justification result indicates that three different input stimuli will cause either 01| or 11| to result. In reality, 11| is an infeasible output response and the three calculated justified input stimuli all result in only one of two specified output responses, f1 f2 | = 01|. This could be further verified with a simulation computation.

1.1.9. Algorithms and Implementation Requirements for Practical Usefulness For the vector space model to have practical usefulness, it must not incur worse computational complexity in either runtime or storage requirements than would be the case for a conventional switching algebra model. Furthermore, the vector space model must offer some

A Vector Space Method for Boolean Networks

φ1

φ2

φ3

x1 |

f1 |

x2 |

f2 | (a)

(b)

39

.i 2 .o 2 .ilb x1 x2 .olb f1 f2 .p3 00 10 -1 01 1- 01 .e

Figure 1.6. Example of a network: (a) circuit, (b) .pla listing.

advantages as compared to a corresponding switching algebra implementation. The property of truth table isomorphism guarantees that storage complexity will not be worse than that of conventional switching algebra models. Any data structure capable of representing a switching algebra formulation of a network can also represent the vector space model. Two common methods for compactly representing switching functions are cube lists and BDDs. Because the justification matrix is the transpose of the transfer matrix, it is only necessary for one of these structures to be represented for a network of interest.

Cube List Representations of Transfer Matrices The .pla format is a common cube list representation and is described in detail in [34]. Figure 1.6 (b) contains a sample .pla listing for the example network shown in Figure 1.4 (a). For convenience, the example network circuit diagram is also shown in Figure 1.6 (a). The .pla file can be interpreted as a cube list or two-level characterization of a switching circuit modeled with traditional switching algebras. After the initial labeling information on the lines of the file beginning with the character ‘.’, two arrays of characters are given containing the symbols 0, 1, and -. The leftmost array represents cubes or product terms and the rightmost array contains information regarding the primary outputs of the network. Here, we use the ter-

40

General Methods

minology input array to refer to the leftmost array and output array to refer to the rightmost array. Although the symbols 0, 1, and - may appear in either of the input or output arrays, their meaning differs depending on which array they appear in. When a 1 appears in the input array, it represents an instance of the corresponding network input in a non-complemented form while the 0 element represents that variable in a complemented form. The appearance of - in the input array denotes that the variable may be considered to be either complemented or non-complemented and thus corresponds to the t| value in the vector space model. Alternatively, when 0 appears in the output array, it indicates that the switching function produces a 0 value when the cube in the input array on the same line produces a 1 and when 1 appears in the output array, it indicates that the corresponding network output produces a 1 also. The appearance of - in an output array corresponds to the switching algebra concept of a don’t care meaning that in a realization of the circuit, the designer is free to assign either 0 or 1 to that output. The use of the - symbol in the input and output arrays often leads to confusion since it has two different meanings depending on which array it appears in. Some degree of compactness [247, 314] is achieved in cube list descriptions as compared to truth tables since the appearance of - in the input array allows for two or more rows in a truth table to be represented as a single row or covering cube in a corresponding .pla description. Because cubes or product terms may be realized with AND gates and the primary output connectivity represented with OR gates, this form is also a direct representation of a Programmable Logic Array (PLA) two-level circuit form. A direct realization of the .pla listing in Figure 1.6 (b) in a two-level form with NOT, AND, and OR gates is provided in Figure 1.7 (a); Figure 1.7 (b) repeats the associated .pla description for convenience. PLA realizations of switching functions are commonly found in both custom-designed integrated circuits and

A Vector Space Method for Boolean Networks

x1

x2

f1 f2

(a)

(b)

41

.i 2 .o 2 .ilb x1 x2 .olb f1 f2 .p3 00 10 -1 01 1- 01 .e

Figure 1.7. Network: (a) 2-level implementation, (b) .pla listing.

as programmable structures within FPGAs. Symbolically, cube lists are often denoted using set notation and the .pla format is simply one possible representation of the more general notion of a cube list. In the case of the example in Figures 1.6 and 1.7, a covering set for each output f1 and f2 is written respectively as f1 = {x1 x2 } and f2 = {x1 , x2 }. Each element of the covering set describes a conjunction (or AND) of switching values and all elements within a set are combined with a disjunctive (or OR) operation to produce the switching function. Due to truth table isomorphism, the .pla array is, alternatively, a representation of the transfer matrix of the represented network. The use of the - symbol in the input array causes any particular cube containing a - to represent more than a single row vector in a transfer function representation. In the vector space interpretation of a .pla file, each line can be considered as the specification of a particular vector space mapping. In the example given in Figures 1.6 (b) and 1.7 (b), the three lines containing array information represent the following mappings: 00 10 -1 01 1- 01

represents represents represents

00| → 10|, t1| → 01|, and 1t| → 01| .

To more clearly describe these mappings, Figure 1.8 (a) contains the .pla file and Figure 1.8 (b) shows the corresponding transfer matrix

42

General Methods

.i 2 .o 2 .ilb x1 x2 .olb f1 f2 .p3 00 10 -1 01 1- 01 .e

transfer matrix row indices

transfer matrix row vectors

00| −→ 10| t1| −→ 01| 1t| −→ 01|

(a)

⎡

0 ⎢ ⎢0 ⎢ ⎣0 0

0 1 1 1

1 0 0 0

⎤ 0 ⎥ 0⎥ ⎥=T 0⎦ 0

(b)

Figure 1.8. Network: (a) .pla listing, (b) corresponding transfer matrix.

T with the mappings for each row depicted.

BDD Representations of Transfer Matrices The Binary Decision Diagram (BDD) representation of given switching functions is a widely used technique and can, on average, provide significant reductions in storage requirements. Although the worstcase complexity remains O(2n ) overall, all possible switching functions, ordering and reordering rules provide the compact representation of many functions. The BDD representing a switching function is isomorphic to a BDD representation of a transfer matrix. Multi-output networks may be represented with one of various extensions to the BDD model. These include the Shared Binary Decision Diagram (SBDD) or the Multi-Terminal Binary Decision Diagrams (MTBDD), also referred to as the Algebraic Decision Diagram (ADD) [21, 68, 216]. The transfer matrix interpretation of decision diagrams replaces the switching constants that annotate the terminal vertices (denoted with square boxes) with their corresponding basis vector values (0 → 0| and 1 → 1|). The non-terminal vertices are labeled with switching function variables that correspond to switching network primary inputs in the switching algebraic interpretation. The non-terminal vertex annotations and the directed

A Vector Space Method for Boolean Networks

f1 |

f2 |

i 0

1|

f1 f2 |

i 0

1

j 0

43

i 0

1

j 1

0

0| (a)

1

j 1

0

1|

1

10|

01| (b)

Figure 1.9. Decision diagrams for the example network transfer function: (a) SBDD, (b) MTBDD.

edge labels are replaced with transfer matrix row and column vector indices when BDDs are interpreted as representations of transfer matrices. Figure 1.9 depicts the corresponding transfer matrix interpretation of the SBDD and MTBDD representation of the switching network example depicted in Figure 1.4 (a). In addition to the use of BDD software for the representation of transfer matrices and vectors, some means of performing linear algebraic operations such as direct and outer products implemented as BDDbased algorithms must be available. Previous work describes how BDD algorithms that perform such linear algebraic operations using BDDs to represent vectors and matrices may be efficiently implemented [69].

Structural Representations of Transfer Matrices The other commonly used compact representation of a switching function is the realization of that function as a netlist containing symbols for electronic gates that are modeled as switching function operators. For example, an integer multiplier is known to produce an exponentially sized BDD regardless of the variable ordering, hence circuits in this class are considered hard cases by the EDA community since their accompanying switching function models are unwieldy. A key

44

General Methods

φ1

φ2

φ3

x1 |

f1 |

x2 |

f2 | φ1

x1 | x2 |

⎡

1 ⎢0 ⎢ ⎣0 0

0 1⎥ ⎥ 1⎦ 1

φ3

φ2

⎤

1 0 0 0 0 0 0 1

0 1 1 0 1 0 0 1

f1 | f2 |

Figure 1.10. Network and graph depicting the factored transfer matrix.

advantage of the vector space model presented here is the ability to represent structural net lists as an interconnection of smaller transfer matrices. This allows EDA techniques to be implemented directly over the netlist structure without an intermediate extraction of a switching function representation. In using the structural netlist as a representation of a switching network, each gate appearing in the netlist is replaced with a corresponding transfer matrix. Thus, the topology of the netlist represents a factorization of the overall network transfer function. In addition to the network element transfer functions, it is necessary to include transfer matrices for cases of fan-out, fan-in, and crossovers. Figure 1.10 contains a depiction of a logic network example and the corresponding graph depicting the factored transfer matrix. Using the factored transfer matrix graph allows for advantages in the implementation of various EDA techniques since algorithmic operations can be applied by direct traversal of the factored-form structure without requiring intermediate extractions of the overall network transfer function. The factored transfer matrix form shown in Figure 1.10 allows justification to be performed by propagating given output responses from the rightmost side of the factored graph to the leftmost side and multiplying intermediate vector values by internal justification matrices in the form of transposed versions of the element transfer matrices.

A Vector Space Method for Boolean Networks

45

Computing the Transfer Matrix from a Structural Netlist Above, we described how a transfer matrix may be extracted from a structural description of a switching network such as an HDL netlist through a process of partitioning followed by the computation of serial segment transfer matrices, and finally, through a direct multiplication of the segment transfer matrices. Next, we describe the details of automating this method for the calculation of a transfer matrix.

Switching Network Partitioning The initial stage requires a determination of serial cascade partitions. One method for performing the partitioning phase is to invoke a commonly used method in switching network simulators known as levelization. Levelization assigns integral-valued level numbers to each line in a structural representation. The primary inputs are initially assigned level numbers of zero. These values are then propagated toward the primary outputs. As a network element is encountered, the level number assigned to the network output is computed by incrementing the maximal-valued level number among all the network element inputs. After all lines contained within the network are assigned level numbers, partition boundaries or vertical cuts are determined by choosing all lines with the same level number. To illustrate the levelization-based partitioning technique, we use the small benchmark circuit C17. Figure 1.11 contains the structural representation of C17 with levelization values depicted on each line. Partitioning cuts are depicted along the common level numbers and are shown by dashed lines, denoted by χi . The circuit partitions are denoted by φi .

Computing the Transfer Matrix and Crossover Detection The technique of determining partition transfer matrices has been described above and is employed to compute matrices for the network in Figure 1.11. However, an additional set of matrix factors is required to account for crossover occurring in the structural netlist. One way to account for crossovers is to intersperse permutation matrices, Tχi ,

46

General Methods

χ0

φ1

χ1

φ2

χ2

χ3

φ3

φ4

χ4

φ5

χ5

φ6

χ6

0 0

2 1

0 0

4 3

2

1

6 4

3

5 6

5

0

crossover points

(a) Tφ1

Tφ2

Tφ3

Tφ4

Tφ5

Tφ6

I I FO I I

I NA NA I

I FO I I

NA NA I

I FO I

NA NA

(b) Figure 1.11. Benchmark C17: (a) circuit with levelization and partition cuts, (b) transfer matrices by partition.

between the cascade stage matrices that include crossover information. In a situation where no crossovers occur, the permutation matrix is simply I and may be omitted. However, in a case where network lines cross one another within a particular partition, a permutation matrix must be included as a factor in the direct product calculation for the overall transfer matrix. Two crossovers occur in the network depicted in Figure 1.11 (a) and are shown as dashed circles in the diagram. For the C17 benchmark, six partitions and seven cuts are identified. Figure 1.11 (b) contains the network element transfer matrices organized by partition number. The overall transfer matrix for C17 is given by Equation (1.9) . TC17 = Tχ0 Tφ1 Tχ1 Tφ2 Tχ2 Tφ3 Tχ3 Tφ4 Tχ4 Tφ5 Tχ5 Tφ6 Tχ6

(1.9)

The partition cut transfer matrices, Tχi , are determined by the use of values we refer to as cut indices. For each partition, we assigned values on the immediate right-hand side of each cut line χi that are

A Vector Space Method for Boolean Networks

0 1

I

I

C

I C

2 3

I

0 2 3

47

to input of fanout

1

Figure 1.12. Portion of the partition φ3 in the C17 circuit with crossovers.

in ascending numerical order from top to bottom and where each cut index annotates a network line that crosses a cut or partition line. Within each partition, the cut indices are then propagated forward until they reach the input of a network element within the partition. If no network element is present on the cut index, it is propagated to the left side of cut line χi+1 . After the propagation of the cut indices within a partition, the cut indices are read from top to bottom and their order indicates if and where crossovers occur. In fact, their order defines a permutation matrix. Also, whenever a cut index is propagated entirely to the next cut line χi+1 , it indicates that the transfer matrix for that network line is the identity I since a passthrough line is detected. After the cut indices have been processed for each partition, the overall expression for the transfer matrix for C17 can be modified. Equation 1.9 is simplified since all but one Tχi = I and is given in Equation 1.10. TC17 = ITφ1 ITφ2 Tχ2 Tφ3 ITφ4 ITφ5 ITφ6 I = Tφ1 Tφ2 Tχ2 Tφ3 Tφ4 Tφ5 Tφ6

(1.10)

From the cut index analysis, it is necessary to compute the permutation matrix Tχ2 which characterizes the fact that a network line (with cut index 1) crosses two lines (with cut indices 2 and 3). Fortunately, multiple-line crossovers such as this case can always be modeled as a series of single line crossovers, thus the basic crossover transfer matrix C as depicted in Figure 1.3 may be used multiple times in the overall calculation for Tχ2 . To illustrate how C is used multiple times, Figure 1.12 contains a redrawn portion of the C17 circuit containing the portion of partition φ3 where the crossovers occur. Figure 1.13 summarizes the final steps to calculate the transfer matrix of the benchmark circuit C17. The entire set of partition factors

48

General Methods

Tφ1 Tφ 2 Tχ 2 Tφ 3 Tφ 4 Tφ 5 Tφ 6

= (I ⊗ I ⊗ FO ⊗ I ⊗ I) = (I ⊗ NA ⊗ NA ⊗ I) = (I ⊗ C ⊗ I)(I ⊗ I ⊗ C) = (I ⊗ FO ⊗ I ⊗ I) = (NA ⊗ NA ⊗ I) = (I ⊗ FO ⊗ I) = (NA ⊗ NA)

(a) TC17

TC17 = (I ⊗ I ⊗ FO ⊗ I ⊗ I) (I ⊗ NA ⊗ NA ⊗ I) (I ⊗ C ⊗ I)(I ⊗ I ⊗ C) (I ⊗ FO ⊗ I ⊗ I) (NA ⊗ NA ⊗ I) (I ⊗ FO ⊗ I) (NA ⊗ NA) (b) (c)

⎡ 1 ⎢ 01 ⎢ ⎢ 0 ⎢ 1 ⎢ ⎢ 0 ⎢ 1 ⎢ ⎢ 1 ⎢ ⎢ 0 ⎢ 0 ⎢ ⎢ 0 ⎢ 0 ⎢ ⎢ 0 ⎢ 0 ⎢ ⎢ 1 ⎢ 1 =⎢ ⎢ 1 ⎢ 0 ⎢ ⎢ 1 ⎢ 0 ⎢ ⎢ 0 ⎢ 0 ⎢ ⎢ 0 ⎢ 0 ⎢ ⎢ 0 ⎢ 0 ⎢ ⎢ 0 ⎢ ⎢ 0 ⎢ 0 ⎢ ⎣ 0 0 0

0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 1

0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 0 0

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

Figure 1.13. Transfer matrix of the benchmark C17: (a) partition factors, (b) overall form TC17 , (c) complete transfer matrix.

for C17 is shown in Figure 1.13 (a). The overall form of TC17 is then mathematically modeled as a factored set of network element transfer matrices containing both direct and outer product factors. The complete formula is expressed in Figure 1.13 (b). The overall monolithic transfer matrix for C17 results from the formula in Figure 1.13 (b) and is shown in Figure 1.13 (c). The method of parsing, partitioning, and iteratively computing the overall transfer matrix represented as a BDD would be used rather than calculating the explicit factored form resulting in a BDD representation of the 32 × 4 transfer matrix TC17 .

A Vector Space Method for Boolean Networks

49

Experimental Results Experiments were carried out using the combinational benchmark circuits from [375] and [53]. The experiments were run using a Dell PowerEdge 2950 multiple-user server containing a dual quad-core Intel Xeon Central Processing Unit (CPU) with a 2.6 GHz core clock speed and hosting the Debian linux operating system. Reported times are runtime values obtained through system calls to the linux time function. The Colorado University Decision Diagram (CUDD) software package is used to construct and manipulate the BDDs. The circuit size is reported with the number of primary inputs and outputs. The benchmark circuits were initially converted from their native format (.pla and .isc) into a structural Verilog netlist similar to that depicted in Figure 1.4. The netlist conversion times are not included in the reported experimental runtime although they were negligible. The experiments parse the Verilog file and build the transfer matrix utilizing the BDD data structure. The computation time, reported in Table 1.2 includes the time required to parse and partition the netlist and the time to construct the monolithic BDD representing the overall circuit transfer function. By using structures such as decision diagrams or cube lists, the computational complexity of the linear algebraic formulations for switching network representation and manipulation incurs penalties no worse than those of Boolean formulations and, in some cases, advantages are present using the alternative approach. Although the theoretical computational complexities for Boolean problems remain, heuristic approaches for their solution may be of benefit when linear algebraic methods are employed as compared to traditional Boolean switching algebraic solutions. The experimental results in Table 1.2 demonstrate that the vector space model provides a practical alternative to switching algebraic solutions. They are thus competitive candidates for use in solving Boolean problems. It is not expected that the community will abandon the traditional

50

General Methods

Table 1.2. Size and computation time of transfer matrices using the BDD data structure.

Name C880 C1355 C1908 C3540 apex7 dalu x4 apex5 ex4 frg2 i2

Number of

Matrix

inputs

outputs

Size (KB)

Time (ms)

60 41 33 50 49 75 94 117 128 143 201

26 32 25 22 37 16 71 88 28 139 1

18,513.56 2,876.28 21,193.72 40,409.64 26.39 51,064.39 79.84 42.41 143.72 102.63 8.36

60.0 50.0 90.0 3,940.0 133.1 563.7 8.3 296.5 113.9 306.9 4.7

approaches grounded in switching theory; however, it is encouraging that some classes of Boolean problems are more conveniently modeled as problems in the vector space.

Solving Combinatorial Problems

51

1.2. Solving Combinatorial Problems Using Boolean Equations Christian Posthoff

Bernd Steinbach

1.2.1. Boolean Equations Boolean equations are convenient tools to solve combinatorial problems. Boolean equations are based upon Boolean functions. Definition 1.12 (Boolean Function). A Boolean function f of n variables is a unique mapping f (x) :

Bn ⇒ B .

(1.11)

It can be represented by a function table, but also by an expression in which the Boolean variables xi are connected by well-known Boolean operations [247] such as the negation NOT (xi ), the conjunction AND (∧), the disjunction OR (∨), the antivalence XOR (⊕), the equivalence XAND ( ), or the implication IF-THEN (→). The order to execute these operations is implicitly defined by priorities or explicitly specified by pairs of parentheses. Definition 1.13 (Boolean Equation). Let x = (x1 , . . . , xn ), f (x) and g(x) be two Boolean (logic) functions, then f (x) = g(x)

(1.12)

is a Boolean equation. The solution of a Boolean equation is a set S of Boolean vectors bj for which either both functions satisfy f (bj ) = 0 and g(bj ) = 0 or f (bj ) = 1 and g(bj ) = 1. A simple algorithm to solve a Boolean equation (1.12) consists of the sequential substitution of all 2n Boolean vectors bj into the equation, the calculation of the function values of f (x) and g(x), and the assignment of bj to the solution set S in the cases of 0 = 0 or 1 = 1. The practical execution of this simple algorithm fails for a Boolean equation that depends on more than a certain number of Boolean variables

52

General Methods

due to the exponential number of Boolean vectors bj . The problem to store a huge number of Boolean vectors bj can be simplified by using ternary vectors. Definition 1.14 (Ternary Vector). Let t = (t1 , . . . , tn ), ti ∈ {0, 1, −}, then t is a ternary vector. A ternary vector can be understood as an abbreviation of a set of binary vectors. When we replace each − by 0 or 1, then we get several binary vectors generated by this ternary vector. In this way, the vector (0 − 1−) represents four binary vectors (0010), (0011), (0110) and (0111). A list (matrix) of ternary vectors, also called Ternary Vector List (TVL), is a compact representation of a set of binary vectors. The effort to calculate the solution of a Boolean equation can be reduced by the utilization of relations between Boolean operations and set operations. Assume that Sf is the solution set of the Boolean equation f (x) = 1 and Sg is the solution set of the Boolean equation g(x) = 1 then: • S1 = S f

is the solution set of

f (x) = 1,

• S2 = Sf ∩ Sg

is the solution set of

f (x) ∧ g(x) = 1,

• S3 = Sf ∪ Sg

is the solution set of

f (x) ∨ g(x) = 1,

• S4 = Sf ΔSg

is the solution set of

f (x) ⊕ g(x) = 1,

• S5 = Sf ΔSg

is the solution set of

f (x) g(x) = 1, and

• S6 = S f ∪ Sg

is the solution set of

f (x) → g(x) = 1.

The programming system XBOOLE [247, 314, 325] implements all these set operations for sets given by ternary vectors efficiently and comfortably. Due to the linearity of the Boolean operation ⊕ and , a Boolean equation in arbitrary form (1.12) can be transformed into two different homogeneous forms. The set of solutions of a Boolean equation remains unchanged if the

Solving Combinatorial Problems

53

same function g(x) is added on both sides using the equivalence operation: f (x) = g(x)

(1.13)

f (x) g(x) = g(x) g(x) f (x) g(x) = 1 hc (x) = 1 .

(1.14) (1.15) (1.16)

The Boolean equation (1.16) is a homogeneous characteristic equation because the function values 1 of the function hc (x) characterizes the solution (belong to the solution set). The set of solutions of a Boolean equation also remains unchanged if the same function g(x) is added on both sides using the antivalence operation: f (x) = g(x) f (x) ⊕ g(x) = g(x) ⊕ g(x) f (x) ⊕ g(x) = 0 hr (x) = 0 .

(1.17) (1.18) (1.19) (1.20)

The Boolean equation (1.20) is called a homogeneous restrictive equation because the function values 0 of the function hr (x) restricts the solution (do not belong to the solution set). A literal is a negated or a non-negated Boolean variable. There are different expressions consisting of literals and Boolean operations which represent the same Boolean function. The rules [247] of Boolean Algebra allow equivalent transformations. In special forms [315] of Boolean functions appear all literals either only in conjunctions or only in disjunctions: • a disjunctive form consists of disjunctions of conjunctions of literals: f1 (x) = x1 x2 ∨ x3 x4 x6 ∨ x2 x4 x5 , • a conjunctive form consists of conjunctions of disjunctions of literals: f2 (x) = (x1 ∨ x2 ) ∧ (x3 ∨ x4 ∨ x6 ) ∧ (x2 ∨ x4 ∨ x5 ) , • an antivalence form consists of antivalences of conjunctions of literals: f3 (x) = x1 x2 ⊕ x3 x4 x6 ⊕ x2 x4 x5 , and

54

General Methods

• an equivalence form consists of equivalences of disjunctions of literals: f4 (x) = (x1 ∨ x2 ) (x3 ∨ x4 ∨ x6 ) (x2 ∨ x4 ∨ x5 ) . Characteristic Boolean equations, where the function on the left-hand side is given as a conjunctive form, became particularly important over the last years. Solving such an equation is called Satisfiability (SAT) problem [41]. There are programs [40, 128] which are specialized to solve SAT-problems for several hundreds of Boolean variables. Such programs are called SAT-solvers, and their performance has been significantly improved over the last years. SAT-solvers try to find at least one solution of a SAT-problem; only some SAT-solvers are able to calculate all solutions. The literals of a SAT-equation are generally binate. A literal is binate when the associated variable appears in the expression both in the negated and the non-negated form. These different forms can avoid the satisfiability of the SAT-equation. A literal is call unate if the associated variable appears in an expression either only in the negated or only the non-negated form. Each characteristic Boolean equation is satisfiable, when the function on the left-hand side is given in conjunctive form of unate literals. An additional challenge for the solution of such equations is that the equation is satisfied by the assignment of values to a minimal number of literals. Such a problem is called Unate Covering Problem (UCP) [321, 323, 329]. Arbitrary expressions on both sides of a Boolean equation provide a wide field for modeling of given problems. This freedom can be extended by inequalities. Definition 1.15 (Boolean Inequalities). Let x = (x1 , . . . , xn ), f (x) and g(x) two Boolean (logic) functions, then f (x) < g(x)

(1.21)

is a Boolean inequality. The solution of this inequality is a set S of Boolean vectors bj which satisfy f (bj ) = 0 and g(bj ) = 1. Another Boolean inequality is f (x) ≤ g(x) . (1.22) The solution of the Boolean inequality (1.22) is a set S of Boolean vectors bj that contains both the solution set of the Boolean equation (1.12) and the Boolean inequality (1.21).

Solving Combinatorial Problems

55

Table 1.3. Function table of Boolean inequalities

f (x) 0 0 1 1

g(x) 0 1 0 1

f (x) < g(x) 0 1 0 0

f (x) ≤ g(x) 1 1 0 1

The values 1 in the right two columns of Table 1.3 describe which pairs of function values of f (x) and g(x) belong to the solution of the associated inequality. Based on this table, the following homogeneous Boolean equations can be built which have the same set of solutions like the associated inequality: f (x) < g(x)

⇔

f (x) ∧ g(x) = 1

⇔

f (x) ∨ g(x) = 0 , (1.23)

f (x) ≤ g(x)

⇔

f (x) ∨ g(x) = 1

⇔

f (x) ∧ g(x) = 0 . (1.24)

Due to the equivalences (1.23) and (1.24), no special algorithms are necessary to solve a Boolean inequality. Sometimes it is easier to specify the properties of a problem using a set of simple Boolean equations instead of a single more complicated Boolean equation. Definition 1.16 (System of Boolean Equations). A system of Boolean equations consists of a finite number of Boolean equations. The solution of a system of Boolean equations is a set S of Boolean vectors bj which belongs to the solution of each Boolean equation of the system. A system of Boolean equations can be mapped into a single homogenous Boolean equation that has the same set of solutions. Homogeneous Boolean equations of each equation of the system are built as an intermediate step of this transformation. A single equivalent characteristic Boolean equation of a system of Boolean equations can be constructed as follows: f1 (x) = g1 (x) f2 (x) = g2 (x) .. . . = .. fm (x) = gm (x)

f1 (x) g1 (x) = 1 m f2 (x) g2 (x) = 1 (fj (x) gj (x)) = 1 . ⇔ ⇔ .. . . = .. fm (x) gm (x) = 1

j=1

(1.25)

56

General Methods

Alternatively, a single equivalent restrictive Boolean equation of a system of Boolean equations can be constructed as follows: f1 (x) = g1 (x) f2 (x) = g2 (x) .. . . = .. fm (x) = gm (x)

f1 (x) ⊕ g1 (x) = 0 m f2 (x) ⊕ g2 (x) = 0 (fj (x)⊕gj (x)) = 0 . ⇔ ⇔ .. .. . = . fm (x) ⊕ gm (x) = 0

j=1

(1.26) Due to these transformations each system of Boolean equations can be solved by means of the universal algorithm that solves a single Boolean equation. Knowing the solutions Sj of the m Boolean equations of the system, the solution set of the system of Boolean equations can be calculated by: m Sj . (1.27) S= j=1

A Boolean equation, a Boolean inequality, and a system of Boolean equations map Boolean functions into a set of Boolean vectors. Vice versa, a set of Boolean vectors can be mapped into a Boolean function. This characteristic function can be constructed based on the compact representation of the solution set by a set of ternary vectors. A conjunction C with the variables x1 , . . . , xk is equal to 1 for all Boolean vectors covered by a ternary vector t using the following rules: • if ti = 1 then C contains xi , • if ti = 0 then C contains xi , and • if ti = − then xi is missing in C . Example 1.7 (Characteristic Function and Equation). The set S1 describes 16 Boolean vectors by three ternary vectors: x1 1 S1 = 0 −

x2 − 0 1

x3 0 − −

x4 − 1 −

x5 1 . − 0

(1.28)

The associated characteristic function is f1 (x1 , x2 , x3 , x4 , x5 ) = x1 x3 x5 ∨ x1 x2 x4 ∨ x2 x5 .

(1.29)

Solving Combinatorial Problems

57

The characteristic Boolean equation x1 x3 x5 ∨ x1 x2 x4 ∨ x2 x5 = 1

(1.30)

with f1 (x1 , x2 , x3 , x4 , x5 ) (1.29) on the left-hand side has the solution set S1 (1.28).

1.2.2. Solution With Regard to Variables The power of Boolean equations does not only originate from the mapping between Boolean functions and sets of Boolean vectors, but is increased furthermore by the possibility of solving a Boolean equation with regard to variables. Due to the possible transformations, introduced in Subsection 1.2.1, we restrict ourselves to the solution with regard to variables of a characteristic Boolean equation f (x, y) = 1 .

(1.31)

The aim of the solution of Equation (1.31), where x = (x1 , . . . , xn ) and y = (y1 , . . . , ym ) are two disjoint sets of variables, with regard to the variables y = (y1 , . . . , ym ) is finding Boolean functions f1 (x), . . . , fm (x) , such that the set of solutions of (1.31) is equal to the set of solutions of the system of Boolean equations (1.32): y1 y2 .. . ym

= = = =

f1 (x) f2 (x) .. .

(1.32)

fm (x) .

There are equations of the type (1.31) with regard to the variables y for which no solutions exist. Derivative operations of the Boolean Differential Calculus (BDC) [247, 317–319] allow us to decide whether solution functions exist and how unique single solution functions fj (x) or lattices of these functions can be calculated.

58

General Methods

Theorem 1.4 (Condition to Solve Equation (1.31) with Regard to the Variables y). The Boolean equation (1.31) can be solved with regard to the variables y = (y1 , . . . , ym ) if and only if maxm f (x, y) = 1 . y

(1.33)

Proof. A necessary and sufficient condition that the Boolean equation (1.31) can be solved with regard to the variables y = (y1 , . . . , ym ) is that for each pattern of the independent variables x = (x1 , . . . , xn ) at least one Boolean value for each variable yj , j = 1, . . . , m exists such that Equation (1.31) is satisfied. Exactly this special requirement is expressed by the m-fold maximum in (1.33).

The solution function yj = fj (x) is uniquely specified by the given Boolean equation (1.31) if

∂ maxm−1 f (x, y) = 1 . ∂yj y\yj

(1.34)

In this case the solution function fj (x) can be calculated by fj (x) = maxm (yj ∧ f (x, y)) . y

(1.35)

If the condition (1.33) is satisfied but not the condition (1.34) then the solution function yj = fj (x) can be chosen out of a lattice of Boolean functions: (1.36) fjq (x) ≤ fj (x) ≤ fjr (x) which is characterized by the mark functions fjq (x) (infimum, onset) and fjr (x) (complement of the supremum, off-set). These mark functions can be calculated by

fjq (x) = maxm y j ∧ f (x, y) ,

(1.37)

fjr (x) = maxm [yj ∧ f (x, y)] .

(1.38)

y y

Solving Combinatorial Problems

59

1.2.3. Set-Related Problems Many problems in Discrete Mathematics (DM) deal with finite sets. Let M = {m1 , . . . , mn } be a finite set of n elements, P (M ) be the set of all subsets, the power set of M . Then each subset S ⊆ M can be defined by a binary vector (x1 , x2 , . . . , xn−1 , xn ), xi ∈ {0, 1} . / S, if xi = 1 then mi ∈ S. On one side we If xi = 0 then mi ∈ have P (M ) with the set operations (S, ∩, ∪), on the other side the set of all binary vectors Bn . It is not very difficult to transfer the set operations. • The complement of a set S with regard to M is given by transforming each single bit using NOT: 0 ⇒ 1, 1 ⇒ 0. • The intersection S1 ∩ S2 uses the logical AND (∧) component by component. • The union S1 ∪ S2 uses the logical OR (∨) component by component. So we can state that (P (M ), S, ∩, ∪) and (Bn , x, ∧, ∨) are isomorphic structures. In addition to this Boolean Algebra the isomorphic structures (P (M ), S, ∩, Δ) and (Bn , x, ∧, ⊕) and (P (M ), S, ∪, Δ) and (Bn , x, ∨, ) are Boolean Rings. This use of binary vectors is related to the concept of characteristic functions; each binary vector is the characteristic function of the respective subset.

60

General Methods

Example 1.8 (Eight Queens on a Chessboard). The first steps in solving a problem is its specification by an adequate model. We take as example the modeling of a well-known combinatorial problem, the placement of eight queens on a chessboard, which is traditionally solved by searching. We transform it into the area of Boolean equations in the following way. We use 64 Boolean variables. The value 1 for the variable a1 indicates that a queen is associated to this field. Due to the rules of chess a queen threatens all the fields of the same row, the same column, and the respective diagonals. Therefore the other variables a2, a3, . . . , a8 of the column (a), the variables b1, c1, . . . , h1 of the first row, as well as all variables of the respective diagonal b2, c3, . . . , h8 must be equal to 0. Hence the conjunction Ca1 expresses this condition: Ca1 = a1 ∧ a2 a3 a4 a5 a6 a7 a8 ∧ b1 c1 d1 e1 f 1 g1 h1 ∧ b2 c3 d4 e5 f 6 g7 h8 .

(1.39)

In the same way conjunctions for all other fields can be created. Using these 64 conjunctions all requirements and restrictions of the 8-queenproblem are included in the system of Boolean equations: Ca1 Cb1 Cc1 Cd1 Ce1 Cf 1 Cg1 Ch1

∨ ∨ ∨ ∨ ∨ ∨ ∨ ∨

Ca2 Cb2 Cc2 Cd2 Ce2 Cf 2 Cg2 Ch2

∨ ∨ ∨ ∨ ∨ ∨ ∨ ∨

Ca3 Cb3 Cc3 Cd3 Ce3 Cf 3 Cg3 Ch3

∨ ∨ ∨ ∨ ∨ ∨ ∨ ∨

Ca4 Cb4 Cc4 Cd4 Ce4 Cf 4 Cg4 Ch4

∨ ∨ ∨ ∨ ∨ ∨ ∨ ∨

Ca5 Cb5 Cc5 Cd5 Ce5 Cf 5 Cg5 Ch5

∨ ∨ ∨ ∨ ∨ ∨ ∨ ∨

Ca6 Cb6 Cc6 Cd6 Ce6 Cf 6 Cg6 Ch6

∨ ∨ ∨ ∨ ∨ ∨ ∨ ∨

Ca7 Cb7 Cc7 Cd7 Ce7 Cf 7 Cg7 Ch7

∨ ∨ ∨ ∨ ∨ ∨ ∨ ∨

Ca8 Cb8 Cc8 Cd8 Ce8 Cf 8 Cg8 Ch8

=1 =1 =1 =1 =1 =1 =1 =1.

(1.40)

The system of Boolean equations (1.40) where all conjunctions are completely specified contains 8 ∗ 8 ∗ (1 + 3 ∗ 7) = 1408 literals of 64 variables. This system can be directly solved using the XBOOLEMonitor [325]. Figure 1.14 shows one of the 92 solutions of the board 8 × 8. Each function of the system of equations (1.40) has the structure of a disjunctive form [315]. It is less time-consuming to assign the values 1, 0, or − to the ordered components of one ternary vector instead of

Solving Combinatorial Problems

61

0Z0ZQZ0Z Z0L0Z0Z0 6 QZ0Z0Z0Z 5 Z0Z0Z0L0 4 0L0Z0Z0Z 3 Z0Z0Z0ZQ 2 0Z0Z0L0Z 1 Z0ZQZ0Z0 8 7

a

b

c

d

e

f

g

h

Figure 1.14. Eight queens on a chess board which do not threaten each other. Table 1.4. Ternary vectors to describe the assignment of a queen to one field and all restrictions for other fields of the chess board

a1

a2

a3 . . .

b1

c1 . . .

b2

c3

...

b4

b5 . . .

1 0

0 1

0 0

0 0

0 −

0 0

0 −

... ...

− −

− −

... ...

... ...

... ...

the name of variables and all operation signs of one conjunction. As shown in Table 1.4, we use vectors of 64 components corresponding to the 64 fields of the board. The setting of a queen on the field a1 can be seen by the value 1 for this field, but the additional effect of this setting is also included in this vector: it is not allowed to set a queen on other fields of the same column, the first row, and the respective diagonal, but by − it is also indicated that there is no restriction for this field. The eight vectors for the column (a) are considered as an ∨ of these possibilities and describe the solution set Sa of the first equation of (1.40). In the same way we directly write down the solution sets Sb , . . . , Sh for the other columns of the board. Due to (1.27) the solution set S of the system of equations (1.40) must be calculated by the intersection: S = Sa ∩ Sb ∩ Sc ∩ Sd ∩ Se ∩ Sf ∩ Sg ∩ Sh .

(1.41)

62

General Methods

These seven intersection operations can be executed by the operation ISC of XBOOLE. The result is again a TVL that contains the Boolean vectors of the 92 valid settings of the eight queens on the board 8 × 8. As in the case of the eight queens’ problem, very often there is a binary description just from the beginning. However, sometimes the variables of a problem have multiple-valued domains, and a careful modeling is required. Example 1.9 (Sudoku). A multiple-valued combinatorial problem is Sudoku [248, 326]. It needs a binary modeling. Exactly one of the numbers 1, . . . , 9 must be assigned to each field of a 9 × 9 grid. Some of these numbers, the clue, are given (e.g., see Figure 1.15 (a)), and the remaining numbers must be found. The smaller the number of elements of the clue, the more difficult it is to find a solution. Each solution of a Sudoku must satisfy the following conditions: 1. each row must contain each of the nine numbers 1, . . . , 9 exactly once, 2. each column must contain each of the nine numbers 1, . . . , 9 exactly once, 3. each block of nine positions, surrounded by thick lines in Figure 1.15, must contain each of the nine numbers 1, . . . , 9 exactly once, and 4. each field of the 9 × 9 grid must contain exactly one of the nine numbers 1, . . . , 9. Here we use another interpretation of a ternary component: ⎧ if the value on the field (i, j) = k ; ⎨ 1, 0, if the value on the field (i, j) = k ; xijk = ⎩ −, if the value k on the field (i, j) is unknown . A ternary vector can be transformed into a conjunction using the following mapping rules. If the component xijk = 0 then we include the variable xijk into a conjunction, if the component xijk = 1, then we include xijk , and for xijk = −, the variable xijk is omitted.

Solving Combinatorial Problems

7

1

63

8

9

7 1 4 2 9 5 6 8 (b) 3

3 2 5 1

9 6

2 8 5

1

(a) 3 2

6

5 9 3 8 6 7 1 4 2

2 8 6 3 1 4 9 5 7

1 7 2 4 8 6 3 9 5

3 4 9 5 2 1 7 6 8

8 6 5 9 7 3 2 1 4

6 5 7 1 3 8 4 2 9

9 3 8 6 4 2 5 7 1

4 2 1 7 5 9 8 3 6

Figure 1.15. Sudoku: (a) clue: the given 17 values, (b) solved.

All constraints that are caused by the assignment of the value k on a field (i, j) can be stated by one single conjunction Cijk ; for k = 1 on the field (1, 1) we get: C111 = x111 ∧x112 x113 x114 x115 x116 x117 x118 x119 ∧ x121 x131 x141 x151 x161 x171 x181 x191 ∧ x211 x311 x411 x511 x611 x711 x811 x911 ∧ x221 x231 x321 x331 . This conjunction describes completely the setting of the value 1 on the field (1, 1) and all the consequences. There are 729 of such conjunctions. The requirement to assign one of the nine values to each field without any contradiction to the clue can be described by the following system of Boolean equations: Cijk = 1 ijk∈clue 9

9

i=1 j=1

9

Cijk = 1 .

(1.42)

k=1

Solving this system of equations can be done using the XBOOLEMonitor [325]. So we have a very general procedure for problems such as: • sets of fault combinations,

64

General Methods

2

1

(a)

3

4

2

5

1

(b)

3

4

2

5

1

(c)

3

5

4

Figure 1.16. Analysis regarding Hamiltonian Circuits: (a) graph without a Hamiltonian Circuit; (b) graph that contains a Hamiltonian Circuit; (c) Hamiltonian Circuit of the graph given in (b).

• representation of state sets and verification of sequential machines, • computation of prime implicants, • the analysis of arbitrary circuit structures, etc. The search procedures are no longer necessary to find the final result, the solution of a Boolean equation delivers all solutions.

1.2.4. Graph-Related Problems Many problems can be expressed by graphs: planning problems, accessibility, path problems, the coloring of edges and nodes in a graph and many more. We only discuss the Birkhoff Diamond [249] and the passing of bridges in Koenigsberg (solved by L. Euler), and explain the basic model for the problem of Hamiltonian Circuits. A Hamiltonian Circuit [320], also called Hamiltonian Cycle, is a cycle in a graph which visits each node exactly once and returns to the starting node. It is well-known that the problem of determining whether such paths or cycles exist is a NP-complete problem [364]. Figure 1.16 shows examples of graphs without and with a Hamiltonian

Solving Combinatorial Problems

65

Table 1.5. Rules of Hamiltonian Circuits applied to the edge from node 3 to node 5 in the graph of Figure 1.16 (b)

rule

example

An edge from node i to node j, i.e., xij = 1, prohibits that the reverse edge is used, i.e., xji = 0.

(x35 =⇒ x53 ) = 1

An edge from node i to node j, i.e., xij = 1, prohibits all edges to other destination nodes dl , i.e., xidl = 0.

(x35 =⇒ x32 )∧ (x35 =⇒ x34 ) = 1

An edge from node i to node j, i.e., xij = 1, prohibits all edges from other source nodes sm , i.e., xsm j = 0.

(x35 =⇒ x45 ) = 1

Circuit. The edges can be expressed by Boolean variables: 1 , if the graph contains an edge from node i to node j; xij = 0 , otherwise. Using this encoding the rules of the graph problem can be expressed by Boolean equations. Table 1.5 describes all rules which must be satisfied by a Hamiltonian Circuit and gives the Boolean equation for a selected edge in the graph of Figure 1.16 (b). Again, without any searching procedures, all solutions of the problem will be found as the solution of the created system of Boolean equations. Alternatively the nodes of the graph can be encoded by Boolean variables, and additional bits can express the colors of a coloring problem. An edge is expressed in this model by a conjunction of variables that encode both the source node and the destination node.

1.2.5. Rule-Based Problems Rule-based problems are very important; at a given point in time they became a crucial issue in the field of Artificial Intelligence (AI). Many efforts have been made to build efficient SAT-solvers [41, 250]. Again search procedures were a central point, such as breadth-first

66

General Methods

search, depth-first search, heuristic search etc. The rules can be easily transformed into a system of Boolean equations that must be finally solved. Any rule has the format if condition then conclusion, or in a logical way of speaking x→y . Each rule x → y can now be transformed into a disjunction: x→y =x∨y . The validity of several rules can be ensured by using the conjunction. We are able to solve SAT-problems directly by means of ternary vectors as the basic data structure. We will use the following small example: (x1 ∨ x2 ∨ x3 )(x2 ∨ x4 ∨ x5 )(x1 ∨ x4 ∨ x5 )(x2 ∨ x3 ∨ x5 ) = 1 . (1.43) This equation is equivalent to a system of four single equations: x1 ∨ x 2 ∨ x 3 = 1 x2 ∨ x4 ∨ x5 = 1

(1.44)

x1 ∨ x4 ∨ x5 = 1 x2 ∨ x3 ∨ x5 = 1 .

(1.46)

(1.45) (1.47)

The solution S1 of the first equation (1.44) can be expressed by the orthogonal TVL: x1 x2 x3 x4 x5 1 − − − − S1 = . 0 0 − − − 0 1 0 − − This TVL shows all the vectors that satisfy Equation (1.44). If x1 = 1, then the values of the other variables are not important. If x1 = 0, then x2 must be equal to 1, i.e., x2 = 0. Finally, if x1 = 0 and x2 = 1, then x3 must be equal to 0. Double solutions cannot exist.

Solving Combinatorial Problems

67

It is very characteristic that each vector of the TVL includes more information than the previous vectors. The number of vectors in the resulting TVL is equal to the number of variables in the disjunction. In Equation (1.43) each disjunction has three variables (an example of a 3-SAT-problem). If we generate the solution sets for the other three equations (1.45), (1.46), and (1.45), then we get finally four TVLs: x1 1 S1 = 0 0

x2 − 0 1

x3 − − 0

x4 − − −

x5 − , − −

x1 − S2 = − −

x2 1 0 0

x3 − − −

x4 − 0 1

x5 − , − 0

x1 0 S3 = 1 1

x2 − − −

x3 − − −

x4 − 1 0

x5 − , − 1

x1 − S4 = − −

x2 1 0 0

x3 − 1 0

x4 − − −

x5 − . − 0

In order to get the final solution, these four TVLs have to be combined by intersections. Each line of one TVL has to be combined with each line of the next TVL, empty intersections can be omitted. For the first and second TVL, we get S1 ∩ S2 as follows: x1 1 x 1 x2 x3 x4 x5 x1 x2 x3 x4 x5 1 1 − − − − − 1 − − − = 1 0 0 − − − − 0 − 0 − 0 0 1 0 − − − 0 − 1 0 0 0

x2 1 0 0 0 0 1

x3 − − − − − 0

x4 − 0 1 0 1 −

x5 − − 0 . − 0 −

x2 0 0 1 1 0 1 0

x3 − − 0 − − − −

x4 0 1 − 1 1 0 0

x5 − 0 − , − 0 1 1

Now we calculate (S1 ∩ S2 ) ∩ S3 : x1 1 x1 x2 x3 x4 x5 1 0 − − − − 1 1 − − 1 − 0 1 − − 0 1 0 0

x2 1 0 0 0 0 1

x3 − − − − − 0

x4 − 0 1 0 1 −

x1 x5 0 − 0 − 0 0 = 1 − 1 0 1 − 1

68

General Methods

and (S1 ∩ S2 ) ∩ S3 ) ∩ S4 will return the final result: x1 0 x1 x2 x3 x4 x5 0 − 1 − − − 0 − 0 1 − − 1 − 0 0 − 0 1 1 1

x2 0 0 1 1 0 1 0

x3 − − 0 − − − −

x4 0 1 − 1 1 0 0

x5 x1 x2 − 0 0 0 − 0 − = 1 1 − 0 1 0 1 − 1 1

x3 − 1 − 0 −

x4 − 0 − − 1

x5 0 1 . 1 − 0

This TVL represents all solutions of the original SAT-problem (1.43). Since the value − represents 0 as well as 1, the equation (1.43) has 18 solutions. The XBOOLE operation Orthogonal Block-Building and Change (OBBC) creates these five ternary vectors from the ten ternary vectors found as result of the intersection.

1.2.6. Combinatorial Design In this area the binary description of problems is well-known. It is necessary to transform the requirements of a special problem into the area of binary equations. As a tiny example we use the circuit of Figure 1.17. All values of the circuit in a certain point in time are specified by the input variables x1 , x2 , x3 , x4 , the intermediate variables g1 , g2 , g3 , and the output variables y1 , y2 . The global task to be solved is the analysis of the circuit. This requires the calculation of the behavior that may be expressed by a target list of ternary vectors. The local behavior of each gate can be expressed by a simple Boolean equation: g 1 = x1 , g 2 = x1 ∧ x2 , g 3 = x3 ∨ x4 , y1 = g 1 ⊕ g 2 , y2 = y 1 ∧ g 3 .

Solving Combinatorial Problems

69

g1

x1

y1 g2

y2

x2 g3

x3 x4

Figure 1.17. Combinational circuit.

Using the orthogonal solutions of each single equation: x1 g 1 S1 = 0 1 , 1 0

x1 0 S2 = 1 1

g1 0 S4 = 0 1 1

g2 0 1 0 1

x2 − 0 1

y1 0 1 , 1 0

g2 0 , 0 1

x3 0 S3 = 1 0

y1 0 S5 = 1 1

g3 − 0 1

x4 0 − 1

g3 0 , 1 1

y2 0 , 0 1

the global behavior needed results from the intersections of these five TVLs: x1 0 0 0 5 1 Si = S= 1 i=1 1 1 1 1

x2 − − − 0 0 0 1 1 1

x3 0 1 − 0 1 − 0 1 −

x4 0 0 1 0 0 1 0 0 1

g1 1 1 1 0 0 0 0 0 0

g2 0 0 0 0 0 0 1 1 1

g3 0 1 1 0 1 1 0 1 1

y1 1 1 1 0 0 0 1 1 1

y2 0 1 1 0 . 0 0 0 1 1

(1.48)

70

General Methods

The characteristic function S(x, g, y) which can be constructed as disjunctive form from the solution set S (1.48) describes the global behavior of the circuit of Figure 1.17 as a white box. The intermediate variables g are omitted in a black box description of this behavior: S(x, y) = maxm S(x, g, y) . g

(1.49)

Generally, a system function S(x, y) describes the allowed behavior of a combinational circuit. The synthesis of the circuit structure requires the solution of the system equation S(x, y) = 1 with regard to the output variables y = (y1 , . . . , ym ) in order to find the needed Boolean functions y1 = f1 (x), . . . , ym = fm (x). The steps to solve this synthesis task are explained in Subsection 1.2.2.

1.2.7. Extremely Complex Boolean Problems The exponential number 2n of Boolean vectors of n variables makes it very hard to solve Boolean problems which depend on a very large number of variables. The solution of a problem that depends on 648 Boolean variables was one of the highlights of our research [313, 316, 322, 324, 328, 330]. The definition of this problem is rather short: “A two-dimensional grid is a set Gm,n = [m] × [n] . A grid Gm,n is c-colorable if there is a function χm,n : Gm,n → [c] , such that there are no rectangles with all four corners of the same color.” It is known by some theorems [110] that rectangle-free 4-colorable grids exist up to certain sizes of the grid. As an example we can state that the grids G22,4 , G22,5 , . . ., G22,10 are rectangle-free 4-colorable, but the grids G22,11 , G22,12 , . . . do not have this property. It was unknown whether there were rectangle-free 4-colorable grids of the sizes 17 × 17, 17 × 18, 18 × 17, and 18 × 18. At first glance this

Solving Combinatorial Problems

71

problem seems to be easy. It is only necessary to fix a color for the 18 × 18 = 324 grid points and to check whether all possible rectangles do not have the same color at their corners. However, when the first check fails, we have to take into account the next color pattern and so on, until the test is successful or all patterns have been tested. The grid G18,18 has 18 × 18 = 324 positions, and one of four colors must be selected for each of them. Hence, there is the gigantic number 4324 = 1.16798 ∗ 10195 of different patterns altogether in which one of four colors has been assigned to each of the 18 × 18 positions of the grid. When we assume that we are able to evaluate one pattern in one nano-second (10−9 ) and to spend 100 years (3∗109 ) then we must repeat the job 3.8∗10176 times in order to know whether there is a valid color pattern and if YES, which is a valid color pattern. So we see that we are going to solve an extremely complex problem. The function fec (xri ,ck , xri ,cl , xrj ,ck , xrj ,cl ) (1.50) depends on four 4valued variables xr,c and has a Boolean result that is true in the case that the colors in the corners of the rectangle selected by the rows ri and rj and by the columns ck and cl are equal to each other. fec (xri ,ck , xri ,cl , xrj ,ck , xrj ,cl ) = ((xri ,ck ≡ 1) ∧ (xri ,cl ≡ 1) ∧ (xrj ,ck ≡ 1) ∧ (xrj ,cl ≡ 1))∨ ((xri ,ck ≡ 2) ∧ (xri ,cl ≡ 2) ∧ (xrj ,ck ≡ 2) ∧ (xrj ,cl ≡ 2))∨ ((xri ,ck ≡ 3) ∧ (xri ,cl ≡ 3) ∧ (xrj ,ck ≡ 3) ∧ (xrj ,cl ≡ 3))∨ ((xri ,ck ≡ 4) ∧ (xri ,cl ≡ 4) ∧ (xrj ,ck ≡ 4) ∧ (xrj ,cl ≡ 4))

(1.50)

The condition that the four corners of the rectangle selected by the rows ri and rj and by the columns ck and cl do not show only one of the colors 1, 2, 3, 4 can be expressed as fec (xri ,ck , xri ,cl , xrj ,ck , xrj ,cl ) = 0 .

(1.51)

For the whole grid Gm,n , we get: m−1

n m n−1

i=1 j=i+1 k=1 l=k+1

fec (xri ,ck , xri ,cl , xrj ,ck , xrj ,cl ) = 0 .

(1.52)

72

General Methods

Table 1.6. Mapping of 4-valued colors x to two Boolean variables a and b

x

a

b

1 2 3 4

0 1 0 1

0 0 1 1

This model uses 324 variables, and each variable can take 4 values. Because of the intention to apply our previous research in the area of Boolean equations, we transformed the 4-valued model into a Boolean (binary) model. This can be done by coding the four-valued variable x using two Boolean variables a and b. Table 1.6 shows the mapping which has been used. Function (1.53) depends on eight Boolean variables and has a Boolean result equal to 1 if the colors in the corners of the rectangle selected by the rows ri and rj and by the columns ck and cl are identical: fecb (ari ,ck , bri ,ck , ari ,cl , bri ,cl , arj ,ck , brj ,ck , arj ,cl , brj ,cl ) = (ari ,ck ∧ bri ,ck ∧ ari ,cl ∧ bri ,cl ∧ arj ,ck ∧ brj ,ck ∧ arj ,cl ∧ brj ,cl )∨ (ari ,ck ∧ bri ,ck ∧ ari ,cl ∧ bri ,cl ∧ arj ,ck ∧ brj ,ck ∧ arj ,cl ∧ brj ,cl )∨ (ari ,ck ∧ bri ,ck ∧ ari ,cl ∧ bri ,cl ∧ arj ,ck ∧ brj ,ck ∧ arj ,cl ∧ brj ,cl )∨ (ari ,ck ∧ bri ,ck ∧ ari ,cl ∧ bri ,cl ∧ arj ,ck ∧ brj ,ck ∧ arj ,cl ∧ brj ,cl ) . (1.53) The conditions of the rectangle-free 4-color problem on a grid Gm,n are met when the function fecb (1.53) is equal to 0 for all rectangles which can be expressed by m−1

n m n−1

fecb (ari ,ck , bri ,ck , ari ,cl , bri ,cl ,

i=1 j=i+1 k=1 l=k+1

arj ,ck , brj ,ck , arj ,cl , brj ,cl ) = 0 . (1.54) It is not possible to represent here all the steps we tried and all the difficulties that we met. We only mention some critical observations of the sequence of our approaches.

Solving Combinatorial Problems

73

The equation (1.54) can be easily transformed into a SAT-equation: m−1

n m n−1

fecb (ari ,ck , bri ,ck , ari ,cl , bri ,cl ,

i=1 j=i+1 k=1 l=k+1

arj ,ck , brj ,ck , arj ,cl , brj ,cl ) = 1 . (1.55) No SAT-solver was able to solve this SAT-problem of 648 variables and 93,636 clauses over several months. We applied the principle Divide and Conquer and tried to find a solution where tokens of a single color are associated to at least one fourth of the grid positions. The function fecb (1.53) can be simplified to fecb1 (1.56) for a single color. fecb1 (ari ,ck , ari ,cl , arj ,ck , arj ,cl ) = ari ,ck ∧ ari ,cl ∧ arj ,ck ∧ arj ,cl (1.56) Transformed into a SAT-problem we get: n−1

n

m−1

m

fecb1 (ari ,ck , ari ,cl , arj ,ck , arj ,cl ) = 1 .

(1.57)

i=1 j=i+1 k=1 l=k+1

The SAT-solver clasp [128] found the first solution of this SAT-equation of 324 Boolean variables and 23,409 clauses immediately. However, the value 0 is assigned to all variables in this solution. Of course, this solution satisfies the condition that the selected color does not appear in all corners of any rectangle. However, we are not interested in this trivial solution; we are looking for a solution where the chosen set of colors covers one fourth of the grid position. The calculation of all solutions of the SAT-problem (1.57) and the subsequent selection of solutions with a maximal number of values 1 fails again because of time constraints. Hence, the SAT-solver is not able to solve this problem directly. In this situation an extended system of Boolean equations allowed us to finally solve the problem. Due to the four different colors, we divided the grid of 324 fields into 324/4 = 81 regions of four fields which are cyclically located around the center of the grid. The Boolean variables for one of these regions may be r1 , r2 , r3 , and r4 . Each color is supposed to appear once in such a region which can be expressed

74

General Methods

1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0

0 1 0 1 0 0 1 1 0 0 1 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 1

1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0

0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 1 0 1

0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0

0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 1

1 0 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0 0

0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 0 0

1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1

1 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0

0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0

0 1 1 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0

0 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0

0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 1 0 0

0 0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0 0

0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0

0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 1 0

Figure 1.18. Cyclically reusable, rectangle-free coloring of the grid G18,18 .

by a requirement and a constraint. For one color one of the variables ri , i = 1, . . . , 4, is equal to 1 if r1 ∨ r 2 ∨ r 3 ∨ r 4 = 1 .

(1.58)

The constraint for one color is that the conjunction of no pair of two variables ri is equal to 1: r1 r 2 ∨ r 1 r 3 ∨ r 1 r 4 ∨ r 2 r 3 ∨ r 2 r 4 ∨ r 3 r 4 = 0 .

(1.59)

The equations (1.58) and (1.59) can be transformed into a single SATequation of 7 clauses. The more specific SAT-problem that prohibits any rectangle and fills each of the 81 regions with exactly one value 1 merges the equations (1.57), (1.58), and(1.59) and contains 324 variables in 23,796 clauses. Using SAT-solver clasp [128] to solve this more restricted SAT-problem, we could find one solution for a single color. This solution could be used three more times for the other colors after the configuration has been rotated by 90 degrees [322, 324]. Figure 1.18 shows one of these extremely rare cyclically reusable, rectangle-free colorings of the grid G18,18 .

Solving Combinatorial Problems

75

1.2.8. Summary and Comments • A matrix containing the ternary elements (0, 1, −) was suggested and named by Arkadij Zakrevskij as “troiqna matrica” [381]. Such a TVL is a very helpful data structure in the Boolean domain. • Boolean equations are a powerful instrument for many problems of Discrete Mathematics, not only for digital systems, but for many other problems as well. • A Boolean inequality or a system of Boolean equations can be mapped to a single Boolean equation having the same solution. • A Boolean equation is an instrument to map a Boolean function into a set of Boolean vectors and vice versa. • It is sufficient to create correct models, the appropriate software is available. Both the domain-specific software XBOOLE [247, 314, 325] and the more specialized SAT-solvers [41, 128] can be used. • The complexity of the solved problems is already extremely high. The simple hint that a problem has exponential complexity is no longer sufficient to leave the problem, without the intention of solving it. • There is a strong relationship between Mathematics and Engineering to solve these problems in a cooperative way. Most likely a course of one semester is not sufficient, two semesters would be better and obviously necessary. • Both very good theoretical knowledge and programming skills are required in each engineering field.

76

General Methods

1.3. Simplification of Extremely Large Expressions Ben Ruijl Aske Plaat

Jos Vermaseren Jaap van den Herik

1.3.1. An Application in High Energy Physics High Energy Physics is a field that pushes the boundaries of possibilities, mathematically, technologically, and computationally. In String Theory new mathematics has been developed to describe the fundamentals of the universe [15]. At the European Organization for Nuclear Research (Conseil Européen pour la Recherche Nucléaire) (CERN) and other institutes such as Fermilab, high accuracy particle detectors measure newly created particles from scattering processes [17]. And computationally, results from calculations of the Standard Model of particle physics and next generation theories, such as supersymmetry, are being compared to observational data to further our knowledge about the fundamental structure of nature [107]. Computational problems arise due to the fact that the expressions derived from the mentioned models contain hard integrals and that their size grows exponentially as more precision is required. As a result, for some processes the detectors at CERN are able to measure a higher precision than the results obtained from theory. The project High Energy Physics Game (HEPGAME) aims to improve the current level of computational accuracy by applying methods from artificial intelligence to solve difficult problems [354]. Our work is based on FORM [182, 183]. FORM is a computer algebra system for highly efficient manipulations of very large expressions. It contains special routines for tensors, gamma matrices, and other physics related objects. Our results will be part of the next release of FORM. One of the current problems is the simplification of expressions. From

Simplification of Extremely Large Expressions

77

Quantum Field Theory, expressions with millions of terms, taking up gigabytes of memory in intermediate form, are not uncommon. These expressions have to be numerically integrated, in order to match the theoretical results with the observations performed. For these large expressions, the integration process could take months. If we are able to reduce the number of operations, we are able to make high accuracy predictions that were previously computationally unfeasible. In this section we give an overview of the recent results on simplifying expressions. First, we describe two methods which we use to reduce the number of operations, namely the Horner’s rule for multivariate polynomials, and common subexpression elimination. Horner’s rule is extracting variables outside brackets [143]. For multivariate expressions the order of these variables is called a Horner scheme. Next, we remove the common subexpressions [175]. The problem of finding the order that yields the least number of operations is Non-deterministic Polynomial-time hard (NP-hard) [63]. We will investigate three methods of finding a near-optimal Horner scheme. The first method is Monte Carlo Tree Search (MCTS) using Upper Confidence bounds applied to Trees (UCT) as best-child criterion. We obtain simplifications that are up to 24 times smaller for our benchmark polynomials [184]. However, UCT is not straightforward, as it introduces an exploration-exploitation constant Cp that must be tuned. Furthermore, it does little exploration at the bottom of the tree. To address both issues, the second method investigated is Nested Monte Carlo Search (NMCS). NMCS does not have the two challenges mentioned. However, our evaluation function turns out to be quite expensive (6 seconds for one of our benchmark polynomials). So, NMCS performs (too) many evaluations to find a path in the tree, rendering it unsuitable for our simplification task. Third, we make a modification to UCT, which we call Simulated Annealing UCT (SA-UCT). SA-UCT introduces a dynamic explorationexploitation parameter T (i) that decreases linearly with the iteration number i. SA-UCT causes a gradual shift from exploration at the start of the simulation to exploitation at the end. As a consequence, the final iterations will be used for exploitation, improving their solution

78

General Methods

quality. Additionally, more branches reach the final states, resulting in more exploration at the bottom of the tree. Moreover, we show that the tuning of Cp has become easier, since the region with appropriate values for Cp has increased at least tenfold [267]. The main contribution of SA-UCT is that this simplification of tuning allows for the results of our MCTS approach to be obtained much faster. In turn, our process is able to reduce the computation times of numerical integration from weeks to days or even hours. This section is structured as follows. Subsection 1.3.2 shows some related work, Subsection 1.3.3 provides a background on the optimization methods, Subsection 1.3.4 introduces MCTS, Subsection 1.3.5 discusses NMCS, Subsection 1.3.6 describes SA-UCT, Subsection 1.3.7 provides our conclusion, and Subsection 1.3.8 contains a discussion and an outlook for the project HEPGAME.

1.3.2. The History of Simplification Computer algebra systems, expression simplification, and Boolean problems are closely related. General purpose packages such as Mathematica and Maple have evolved out of early systems created by physicists and artificial intelligence in the 1960s. A prime example of the first development was the work by the later Nobel Prize laureate Martinus Veltman, who designed a program for symbolic mathematics, especially High Energy Physics, called Schoonschip (Dutch for clean ship, or clean up) in 1963. In the course of the 1960s Gerard ’t Hooft supported Veltman with whom he shared the Nobel Prize. The first popular computer algebra systems were muMATH, Reduce, Derive (based on muMATH), and Macsyma. It is interesting to note that FORM [183], the system that we use in our work, is a direct successor of Schoonschip. The Boolean Satisfiability problem, short SAT-problem, is a central problem in symbolic logic and computational complexity. Ever since Cook’s seminal work [72], finding efficient solvers for SAT-problems has driven much progress in computational logic and combinatorial optimization. MCTS has been quite successful in adversary searches and optimization [54]. In the current work, we discuss the application

Simplification of Extremely Large Expressions

79

of MCTS to expression simplification. Curiously, we are not aware of many other works, with the notable exception of Previti et al. [253]. Expression simplification is a widely studied problem. We have already mentioned Horner schemes [175], and Common Sub-Expression elimination (CSEE) [5], but there are several other methods, such as partial syntactic factorization [194] and Breuer’s growth algorithm [52]. Horner schemes and CSEE do not require much algebra: only the commutativity, associativity, and distributivity of the operators are used. Boolean expressions in all four normal forms have these properties too. Hence, the methods explored to simplify expressions can easily be mapped to Boolean expressions. Much research is put into simplifications using more algebraic properties, such as factorization, especially because of its interest to cryptographic research. In subsection 1.3.6 we will introduce modifications to UCT, in order to make the importance of exploration versus exploitation iterationnumber dependent. In the past related changes have been proposed. For example, Discounted Upper Confidence Bounds (UCB) [177] and Accelerated UCT [137] both modify the average score of a node to discount old wins over new ones. The difference between our method and past work is that the previous modifications alter the importance of exploring based on the history and do not guarantee that the focus shifts from exploration to exploitation. In contrast, this work focuses on the exploration-exploitation constant Cp and on the role of exploration during the simulation.

1.3.3. Methods of Simplification Horner Schemes Horner’s rule reduces the number of multiplications in an expression by lifting variables outside brackets [63, 143, 175]. For multivariate expressions Horner’s rule can be applied sequentially, once for each variable. The order of this sequence is called the Horner scheme. Take for example x2 z + x3 y + x3 yz → x2 (z + x(y(1 + z))) .

(1.60)

80

General Methods

Here, first the variable x is extracted (i.e., x2 and x) and second, y. The number of multiplications is now reduced from 9 to 4. However, the order x, y is chosen arbitrarily. One could also try the order y, x: x2 z + x3 y + x3 yz → x2 z + y(x3 (1 + z)) ,

(1.61)

for which the number of multiplications is 6. Evidently, this is a suboptimal Horner scheme. There are n! orders of extracting variables, where n is the number of variables, and it turns out that the problem of selecting an optimal ordering is NP-hard [63]. A heuristic that works reasonably well is selecting variables according to how frequently a term with such a variable occurs: occurrence order. A counter-example that shows that this is not always optimal is (1.62) x50 y + x40 + y + yz , where extracting the most occurring variable y first causes the x50 and x40 to end up in different subparts of the polynomial, preventing their common terms from being extracted. We note that ordering the variables according to its highest power or to the sum of its powers in all the terms, leads to other counter-examples. Since the only property that Horner’s rule requires is that the operators are distributive it follows that this method could also be used to reduce the number of operations in Boolean expressions with logical operators. However, Horner’s rule is most effective when the number of terms is orders of magnitude greater than the number of variables. For expressions with Boolean operations, such a difference quickly leads to redundant subexpressions. For these cases, utilization of the absorption rule of Boolean Algebra may decrease the number of operations more than Horner schemes.

Common Sub-Expression Elimination The number of operations can be reduced further by applying the Common Sub-Expression elimination (CSEE). This method is well

Simplification of Extremely Large Expressions

81

+ ×

× +

b a

b e

+

c a

e

Figure 1.19. A common subexpression (shaded) in an associative and commutative tree representation.

known from the fields of compiler construction [5] and computer chess [3], where it is applied to much smaller expressions or subtrees than in High Energy Physics. Figure 1.19 shows an example of a common subexpression in a tree representation of an expression. The shaded expression b × (a + e) appears twice, and its removal means removing one superfluous addition and one multiplication. CSEE is able to reduce both the number of multiplications and the number of additions, whereas Horner schemes are only able to reduce the number of multiplications. We note that there is an interplay between Horner’s rule and CSEE: a particularly good Horner scheme may reduce the number of multiplications the most, but the resulting scheme may expose less common subexpressions than a mediocre Horner scheme. Thus, we need a way to find a Horner scheme that reduces the number of operations the most after both Horner and CSEE have been applied. To achieve this situation, we investigate Monte Carlo Tree Search.

1.3.4. Monte Carlo Tree Search Monte Carlo Tree Search (MCTS) is a tree search method that has been successful in games such as Go, Hex, and other applications with a large state space [14, 192]. It works by selectively building a tree, expanding only branches it deems worthwhile to explore. MCTS consists of four steps, which are displayed in Figure 1.20 [64]. The

82

General Methods

x

x z

z w

(a)

(b)

y

w

Δ x

Δ z

(c)

w

Δ y

(d)

Δ

Random scheme

Δ Figure 1.20. An overview of the four phases of MCTS: (a) selection, (b) expansion, (c) simulation, and (d) backpropagation.

first step in Figure 1.20 (a) is the selection step, where a leaf or a not fully expanded node is selected according to some criterion (see below). Our choice is node z. In the expansion step in Figure 1.20 (b), a randomly unexplored child of the selected node is added to the tree (node y). In the simulation step in Figure 1.20 (c), the rest of the path to a final node is completed using a random child selection. Finally a score Δ is obtained that signifies the score of the chosen path through the state space. In the back-propagation step in Figure 1.20 (d), this value is propagated back through the tree, which affects the average score (win rate) of a node (see below). The tree is built iteratively by repeating the four steps. In the game of Go, each node represents a player move and in the expansion phase the game is played out, in basic implementations, by random moves. In the best performing implementations heuristics and

Simplification of Extremely Large Expressions

83

pattern knowledge are used to complement a random playout [192]. The final score is 1 if the game is won, and 0 if the game is lost. The entire tree is built, ultimately, to select the best first move. For our purposes, we need to build a complete Horner scheme, variable by variable. As such, each node will represent a variable and the depth of a node in the tree represents the position in the Horner scheme. Thus, in Figure 1.20 (c) the partial Horner scheme is x, z, y and the rest of the scheme is filled in randomly with unused variables. The score of a path in our case is the improvement of the path on the number of operations: the original number of operations divided by the number of operations after the Horner scheme and CSEE have been applied. We note that for our purposes the entire Horner scheme is important and not just the first variable. In many MCTS implementations UCT (1.63) is chosen as the selection criterion [54, 176]: 2 ln n(s) , (1.63) ¯(c) + 2Cp argmax x n(c) children c of s where c is a child node of node s, x ¯(c) the average score of node c, n(c) the number of times the node c has been visited, Cp the explorationexploitation constant, and argmax the function that selects the child with the maximum value. This formula balances exploitation, i.e., picking terms with a high average score, and exploration, i.e., selecting nodes where the child has not been visited often compared to the parent. The Cp constant determines how strong the exploration term is: for high Cp the focus will be on exploration, and for low Cp the focus will be on exploitation.

Sensitivity Analysis Below, we will investigate the effect of 1. different values of Cp and 2. the number of iterations N on the number of operations by performing a sensitivity analysis.

84

General Methods

A polynomial from High Energy Physics (HEP) called HEP(σ) is taken as our benchmark [184]. In Figure 1.21 the result for N = 300 is displayed on the left and N = 3000 on the right. On the x-axis we indicate the Cp values ranging from 0.01 to 10; on the y-axis we indicate the number of operations, ranging from 4000 to 6000 (lower is better). We see that for low Cp there are many local minima, as evidenced by the clustering in horizontal lines (thick clusters and sparse clusters). If the Cp is high, there is much exploration and this translates to a diffuse region where there is no convergence to a local minimum. When the N is sufficiently high (e.g., N = 3000), we see an intermediate region emerging where the probability of finding the global minimum is high (only one thick line). This region is demarcated by dashed lines on the right-hand side of Figure 1.21.

Figure 1.21. A sensitivity analysis for the expression HEP(σ) of the exploration-exploitation constant Cp horizontally, on the number of operations (lower is better) vertically.

From Figure 1.21 we see that if Cp is in the low or intermediate region, the density of dots is highest near the global minimum. This means that if we select the best results from several simulations, the probability of finding the global minimum is high. To study these effects, we have made scatter plots of the number of operations and Cp , given a number of repetitions R and a number of iterations N , while keeping the total number of iterations R × N constant [352]. The best value of these R × N runs is selected and is shown in Figure 1.22, for HEP(σ). On the top left, we have 30 runs with 100 expansions

Simplification of Extremely Large Expressions

85

(30 × 100), on the top right 18 runs of 167 expansions (18 × 167), on the bottom left 10 runs of 300 expansions (10 × 300) and on the bottom right 3 runs of 1000 expansions (3 × 1000). Each graph contains 4000 measurements (dots) with 3000 iterations in total for each measurement. Thus, this graph is comparable in CPU time to MCTS with 3000 iterations (1 × 3000), displayed in Figure 1.21 on the right. We notice that for all but the bottom right graph, the local minima have almost disappeared for low Cp , so that only a cluster around the global minimum remains. For the bottom right graph the local minima are very sparse. The top right graph and the bottom left graph have the best characteristics: from Cp = 0.01 to Cp = 0.3 the probability of finding the local minimum is higher than the MCTS run with 3000 iterations. The main obstacle is that we do not know in advance how many repetitions or which number of iterations we should use, without computing the scatter plots.

Figure 1.22. A sensitivity analysis for HEP(σ) of the exploration-exploitation constant Cp , and R × N on the number of operations (lower is better).

86

General Methods

Table 1.7. MCTS with 1,000 iterations and 10,000 iterations compared to occurrence order and the original number of operations # vars

original

occurrence

MCTS 1k

MCTS 10k

res(7,4) res(7,5) res(7,6)

13 14 15

29 163 142 711 587 880

4 968 20 210 71 262

(3.86 ± 0.10) · 103 (1.39 ± 0.01) · 104 (4.58 ± 0.05) · 104

(3.84 ± 0.01) · 104 (1.38 ± 0.03) · 104 (4.54 ± 0.01) · 104

HEP(σ) HEP(F13 ) HEP(F24 )

15 24 31

47 424 1 068 153 7 722 027

6 744 92 617 401 530

(4.11 ± 0.01) · 103 (6.60 ± 0.20) · 104 (3.80 ± 0.06) · 105

(4.09 ± 0.01) · 103 (6.47 ± 0.08) · 104 (3.19 ± 0.04) · 105

Table 1.7 shows the results (from [184]) for MCTS runs on several polynomials. The results are statistical averages. The res(m, n) polynomials are resolvents and are defined by res(m, n) = resx (

m i=0

ai xi ,

n

bi x i ) ,

i=0

as described in [194]. The HEP polynomials stem from theoretical predictions of scattering processes in High Energy Physics. The Cp used in the above results has been manually tuned to the region where good values are obtained. Additionally, the construction direction of the scheme was selected appropriately (see Subsection 1.3.4). We see that MCTS with 10,000 iterations reduces the number of operations by a factor of 24 compared to the original and reduces it by 25% compared to the occurrence order scheme for HEP(F24 ). For the resolvent res(7,6) the reduction factor is 13 compared to the original and 56% compared to the occurrence order scheme. In practice, numerical integration of these expressions could take weeks, therefore the simplifications are able to reduce computation time from weeks to days.

Unresolved Issues There are two issues with the current form of the MCTS algorithm. First of all, the Cp parameter must be tuned. Sometimes the region of Cp that yields good values is small, thus it may be computationally expensive to find an appropriate Cp . Second, trees are naturally

Simplification of Extremely Large Expressions

87

asymmetric: there is more exploration at nodes close to the root compared to the nodes deeper in the tree. Moreover, only a few branches are fully expanded to the bottom. Consequently, the final variables in the scheme will be filled out with the variables of a random playout. No optimization is done at the end of these branches. As a result, if a very specific order of moves at the end of the tree yields optimal results, these will not be found by MCTS. The issue can be partially reduced by adding a new parameter that specifies whether the Horner scheme should be constructed in reverse, so that the variables selected near the root of the tree are actually the last to be extracted [184, 267].

Figure 1.23. res(7,5): differences between forward (left) and backward (right) Horner schemes with SA-UCT and N = 1000.

Figure 1.23 shows the difference between a forward and a backward MCTS search with 1000 updates for the polynomial res(7,5) in a scatter plot. For the forward construction, we see that there is a region in Cp where the results are good: the optimum is often found. However, the backward scheme does not have a similar range. For other polynomials, it may be better to use the backward scheme, as is the case for HEP(σ) and HEP(F13 ). Currently, there is no known way to predict whether forward or backward construction should be used. Thus, this introduces an extra parameter to our algorithm. Even though the scheme direction parameter reduces the problem somewhat, the underlying problem that there is little exploration at

88

General Methods

the end of the tree still remains. To overcome the issues of tuning Cp and the lack of exploration, we have looked at a related tree search method called the Nested Monte Carlo Search.

1.3.5. Nested Monte Carlo Search The Nested Monte Carlo Search (NMCS) addresses the issue of the exploration bias towards the top of the tree by sampling all children at every level of the tree [62]. In its simplest form, called a level-1 search, a random playout is performed for each child of a node. Next, the child with the best score is selected, and the process is repeated until one of the end states is reached. This method can be generalized to a level k search, where the above process is nested: a level k search chooses the best node from a level k − 1 search performed on its children. Thus, if the NMCS level is increased, the top of the tree is optimized with greater detail. Even though NMCS makes use of random playouts, it does so at every depth of the tree as the search progresses. Consequently, there is always exploration near the end of the tree. Figure 1.24 shows the results for level-2 NMCS for HEP(σ). The average number of operations is 4, 189 ± 43. To compare the performance of NMCS to that of MCTS, we study the run-time. Since most of the run-time is spent on the evaluation function, we could compare the number of evaluations instead. A level-2 search for HEP(σ) takes 8, 500 evaluations. In order to be on a par with MCTS, the score should have been between MCTS 1, 000 and MCTS 10, 000 iterations. However, we see that the score is higher than MCTS with 1, 000 iterations and thus we may conclude that the performance of NMCS is inferior to MCTS for HEP(σ). We have performed similar experiments with NMCS on other polynomials, but the resulting average number of operations was always greater than MCTS’s. The likely reason is because we select a low level k: a level-1 search selects the best child using one sample per child, a process which is highly influenced by chance. However, there are some performance issues with using a higher k. To analyze these, we investigate the number of evaluations that a level n search requires.

Simplification of Extremely Large Expressions

89

Figure 1.24. NMCS level-2 for HEP(σ), taking 8500 evaluations.

The form of our tree is known, since every variable should appear only once. This means that there are n children at the root, n − 1 children of children, and n − d children at depth d. Thus a level-1 search takes n + (n − 1) + (n − 2) + . . . + 1 = n(n + 1)/2 evaluations. It can be shown that a level k search takes Snk+n , where S is the Stirling Number of the First Kind. This number grows rapidly: if k = 1 and n = 15, the number of evaluations is 120, and for level k = 2, it takes 8, 500 evaluations. For an expression with 100 variables, a level-1 search takes 5, 050 evaluations, and a level-2 search takes 13, 092, 125 evaluations. The evaluation function is expensive for our polynomials: HEP(F13 ) takes about 1 second per evaluation and HEP(F24 ) takes 6.6 seconds. We have experimented with parallelization of the evaluation function, but due to the fine-grained nature of the evaluation function, this was unsuccessful. For HEP(F24 ) a million iterations will be slow, hence for practical reasons we have only experimented with a level-1 search. The domains in which NMCS performed well, such as Morpion Solitaire and SameGame, have a cheap evaluation function relative to the

90

General Methods

tree construction [62]. If the evaluation function is expensive, even the level-1 search takes a long time. Based on the remarks above, we may conclude that for polynomials with a large number of variables, NMCS becomes unfeasible.

1.3.6. Simulated-Annealing-UCT Since NMCS is unsuitable for simplifying large expressions, we return our focus to MCTS, but this time on the UCT best child criterion. We now consider the role of the exploration-exploitation constant Cp . We notice that at the beginning of the simulation there is as much exploration as there is at the end, since Cp remains constant throughout the search. For example, the final 100 iterations of a 1,000 iterations MCTS run are used to explore new branches even though we know in advance that there is likely not enough time to reach the final nodes. Thus we would like to modify the Cp to change during the simulation emphasizing exploration early in the search and exploitation towards the end. We introduce a new, dynamic exploration-exploitation parameter T that decreases linearly with the iteration number: T (i) = Cp

N −i , N

(1.64)

where i is the current iteration number, N the preset maximum number of iterations, and Cp the initial exploration-exploitation constant at i = 0. We modify the UCT formula to become: ¯(c) + 2T (i) argmax x children c of s

2 ln n(s) , n(c)

(1.65)

where c is a child of node s, x ¯(c) is the average score of child c, n(c) the number of visits at node c, and T (i) the dynamic explorationexploitation parameter of the formula (1.64).

Simplification of Extremely Large Expressions

91

The role of T is similar to the role of the temperature in Simulated Annealing: in the beginning of the simulation there is much emphasis on exploration, the analogue of allowing transitions to energetically unfavorable states. During the simulation the focus gradually shifts to exploitation, analogous to annealing. Hence, we call our new UCT formula Simulated Annealing UCT (SA-UCT). We notice four improvements over UCT: not only are the final iterations used effectively, there is more exploration in the middle and at the bottom of the tree. This is due to more nodes being expanded at lower levels, because the T is lowered. As a consequence, we see that more branches reach the end states. As a result, there is exploration near the bottom, where there was none for the random playouts. In order to analyze the effect of SA-UCT on the fine-tuning of Cp (the initial temperature), we perform a sensitivity analysis on Cp and N [267]. Figure 1.25 displays the results for the res(7,5) polynomial with 14 variables. Horizontally, we have Cp , and vertically we have the number of operations (where fewer is better). A total of 4,000 MCTS runs (dots) are performed for a Cp between 0.001 and 10. On the left we show the results for UCT and on the right for SA-UCT. Just as in Figure 1.21, we identify a region with local minima for low Cp , a diffuse region for high Cp and an intermediate region in Cp where good results are obtained. This region becomes wider if the number of iterations N increases, for both SA-UCT and UCT. However, we notice that the intermediate region is wider for SA-UCT, compared to UCT. For N = 1000, the region is [0.1, 1.0] for SA-UCT, whereas it is [0.07, 0.15] for UCT. Thus, SA-UCT makes the region of interest about 11 times larger for res(7,5). This stretching is not just an overall rescaling of Cp : the uninteresting region of low Cp did not grow significantly. For N = 3000, the difference in width of the region of interest is even larger. Figure 1.26 shows a similar sensitivity analysis for HEP(σ) with 15 variables. We identify the same three regions and see that the region of interest is [0.5, 0.7] for UCT and [0.8, 5.0] for SA-UCT at N = 1000. This means that the region is about 20 times larger relative to the uninteresting region of low Cp , which grew from 0.5 to 0.8. We performed the experiment on more than five other expressions, and

92

General Methods

res(7,5) with 14 variables

Figure 1.25. res(7,5) polynomial with 14 variables: on the x-axis we show Cp and on the y-axis the number of operations.

Simplification of Extremely Large Expressions

93

HEP(σ) with 15 variables

Figure 1.26. HEP(σ) with 15 variables: on the x-axis we show Cp and on the y-axis the number of operations.

94

General Methods

obtained similar results [267].

1.3.7. Consequences for High Energy Physics MCTS is able to find Horner schemes that yield a smaller number of operations than the native occurrence order schemes. For some polynomials, MCTS yields reductions of more than a factor 24. However, this method has two issues that deserve closer inspection, which are the fine-tuning of the exploration-exploitation constant Cp and the fact that there is little exploration at the bottom of the tree. We attempted to address these issues by using NMCS, but found that this method is unsuitable for our domain due to a slow evaluation function. Next, we modified Cp to decrease linearly with the iteration number. We call this new selection criterion SA-UCT. SA-UCT caused more branches to reach end states and simplified the tuning of Cp : the region of appropriate Cp was increased at least tenfold. As a conclusion we may state that by using SA-UCT we are able to simplify expressions by at least a factor of 24 for our real-world test problems, with a reduced sensitivity to Cp , effectively making numerical integration 24 times faster.

1.3.8. An Outlook on Expression Simplification Using SA-UCT, the fine-tuning of the exploration-exploitation constant Cp has become easier, but there is still no automatic way of tuning it. A straightforward binary search may be suitable, if the region of Cp where good results are obtained is large enough. Moreover, it is still an unresolved question of whether a forward or backward construction has to be chosen. Perhaps the preferred scheme orientation can be derived from some currently unidentified structure of the polynomial.

Simplification of Extremely Large Expressions

95

Furthermore, to come to a deep insight into the choices to be made, our methods have to be tested on more expressions from various fields. Currently, we have expressions from High Energy Physics and the resolvents from mathematics, but our methods could also be tried on: 1. a class of Boolean expressions, and 2. on results from astrophysics. Additional work needs to be spent on supporting non-commutative variables and functions. In principle, Horner schemes can be applied to expressions with non-commuting variables if the Horner scheme itself only consists of commuting variables. Furthermore, the common subexpression elimination should take into account that some variables do not commute. For example, in Figure 1.19, the highlighted part is not a common subexpression if b and c are non-commutative. Next, it is challenging to investigate replacing variables by a linear combination of other variables. These global patterns are not recognized by CSEE and may reduce the number of operations considerably. However, determining which variables to combine linearly, might be time-consuming. In the future, the project HEPGAME will focus on solving a range of different problems in High Energy Physics [354]. Polynomial reduction is a first step, but there are many other challenges, such as solving recurrence relations with a minimal number of generated terms. The solution to this class of problem allows for previously impossible computations of the integrals involved in higher order correction terms of scattering processes at CERN.

96

General Methods

1.4. A Novel Approach of Polynomial Expansions of Symmetric Functions Danila A. Gorodecky

1.4.1. Preliminaries and Background The creation of a polynomial expansion (also called Reed-Muller form, or EXOR form, or algebraic normal form) of a Boolean Function (BF) belongs to the most complex task of the theory of BFs. It has a widespread use in applications because of benefits in testability, and compact representation for some classes of BFs. One of these classes is called Symmetric Boolean Functions (SBFs). SBFs have several remarkable properties. These functions can be efficiently utilized in programmable logic arrays [276]; SBFs are perfect benchmarks due to the difficulty of minimizing algorithms based on the ideas of Quine-McCluskey [58, 95]. Furthermore, they have many applications in logic synthesis [106, 170, 181, 220], in the synthesis of a Field Programmable Gate Array (FPGA) [274], in technology mapping [189, 198, 220], in BDD minimization [220, 285], and in testing [220, 276]. Welcome cryptographic properties [50, 59, 244] may be the main reasons which make SBFs a preferred class of BF. An important advantage of SBFs in comparison with other classes of BFs is that an SBF on n variables can be represented as a vector of (n + 1) binary values. The i-th value of this vector indicates whether the SBF is equal to 1 in the case where values 1 are assigned to i variables of this function. The index of these (n + 1) Boolean values is called rank. A Reed-Muller polynomial is an expression of a BF consisting of conjunctions which are connected by EXOR-operations. Each variable appears in these conjunctions in either only non-negated or negated form. A positive polarity Reed-Muller polynomial, also called Zhegalkin polynomial, contains only non-negated variables. A Reed-

Novel Polynomial Expansions of Symmetric Functions

97

Muller polynomial of an arbitrary BF is uniquely defined for each selected polarity by a vector of 2n Boolean values which select the conjunctions (factors) for the given BF of n variables. The main aim of this section is the generation of positive polarity Reed-Muller forms. It is a special task to find such polynomials for given SBFs. A vector of (n + 1) Boolean values is sufficient for a unique specification of the SBF in this case. There are several methods solving this task. The most efficient methods known so far are: • the division of the truth table into halves method with complexity O(2n ) [127], • the Reed-Muller matrix method having complexity O(2n ) [278], • the matrix method with complexity O(n2 ) [132], and • the transeunt triangle method with complexity O(n2 ) [336]. The last two methods are more preferable due to the smaller complexity. The last one is well known and instigated comprehensive research in various fields [56–58, 104, 278, 290, 377]. However, there is some redundancy in the applied computation. A much simpler method to calculate the Reed-Muller polynomial of an SBF is presented in this section. The complexity of this new method achieves O(n). The new method utilizes the Lucas theorem that provides the combinatorial relationship needed. This method can be applied to calculate the polynomial expansion of an SBF given in Antivalence Form, as well as the reverse task, i.e., the calculation of Antivalence Form based on a known polynomial. The rest of this section is organized as follows. Subsection 1.4.2 provides the basic definitions and Subsection 1.4.3 describes the combinatorial method, e.g., the generation of the characteristic vector of an SBF and the polynomial vector. Using these results, Subsection 1.4.4 emphasizes the generation of the reduced spectrum. The complexity of the method is evaluated in Subsection 1.4.5 and allows us to reach the improvements explained in Subsection 1.4.6.

98

General Methods

1.4.2. Main Definitions A BF f (x) of the n variables, where x = (x1 , x2 , ..., xn ), with unchanged values after swapping any pair of variables xi and xj , where i = j and i, j = 1, 2..., n, is called a Symmetric Boolean Function (SBF). Each SBF Sna1 ,a2 ,...,ar (x) is characterized by the set of symmetry levels A(S) = {a1 , a2 , . . . , ar }. SBFs are referred to as f (x) = Sna1 ,a2 ,...,ar (x) .

(1.66)

The SBF (1.66) is equal to 1 if and only if ai variables x1 , x2 , . . . , xn are equal to 1, where 0 ≤ ai ≤ n, 1 ≤ i ≤ r and 1 ≤ r ≤ n + 1. If r = 1, then a function f (x) = Sna (x) is called an Elementary Symmetric Boolean Function (ESBF). There is one-to-one correspondence between the SBF f = Sna1 ,a2 ,...,ar and the (n + 1)-bits binary vector π(f ) = (π0 , π1 , . . . , πn ). The vector π(f ) is also called the carrier vector [58] or the reduced truth vector [278], where the i-th entry πi specifies whether the function f (x) is equal to 1 if i variables are equal to 1, where 0 ≤ i ≤ n. In other words, πi = 1 if and only if i belongs to the set A(f ). Formula (1.67) is true for an arbitrary SBF f (x) =

n i=0

πi Sni (x) =

n

πi Sni (x) ,

(1.67)

i=0

because the ESBFs Sni (x) are orthogonal to each other. In a positive polarity Reed-Muller polynomial, referred to as P (f ), only non-negated variables appear. Such a polynomial is also called a Zhegalkin polynomial. There are n + 1 special Zhegalkin polynomials Eni (x), 0 ≤ i ≤ n, which are called Elementary Polynomial-Unate Symmetric Boolean Functions (EPUSBFs). The polynomial Eni (x) combines all ni con-

Novel Polynomial Expansions of Symmetric Functions

99

junctions of i out of n variables by EXOR operations. Hence, it follows En0 (x) = 1 En1 (x) = x1 ⊕ x2 ⊕ . . . ⊕ xn En2 (x) = x1 x2 ⊕ . . . ⊕ x1 xn ⊕ . . . ⊕ xn−1 xn .. . Enn (x) = x1 x2 ...xn . All 2n+1 SBFs f = f (x) can be represented as f (x) = γ0 ⊕ γ1 (x1 ⊕ x2 ⊕ . . . ⊕ xn )⊕ γ2 (x1 x2 ⊕ ... ⊕ x1 xn ⊕ . . . ⊕ xn−1 xn ) ⊕ . . . ⊕ γn x1 x2 . . . xn , where γ(f ) = (γ0 , γ1 , γ2 , . . . , γn ) is the reduced Zhegalkin (ReedMuller) spectrum of the SBF. For short, we have f (x) =

n

γi Eni (x) .

i=0

Using EXOR operations, q polynomials Eni (x) can be combined to a Polynomial-Unate Symmetric Boolean Function (PUSBF) f (x) = Enb1 ,b2 ,...,bq (x) , where bj are integer numbers of the polynomial numbers B(E) = {b1 , b2 , . . . , bq } . The value bj belongs to B(E) if and only if γbj = 1, where 0 ≤ bj ≤ n, 1 ≤ j ≤ q, 1 ≤ q ≤ n + 1. The contribution of this section is a method to transform the reduced b ,b ,...,bq (x) truth vector π(f ) of an SBF f (x) = Sna1 ,a2 ,...,ar (x) = En1 2 into the reduced spectrum γ(f ) of this function, i.e., π(Sna1 ,a2 ,...,ar ) into γ(Enb1 ,b2 ,...,bq ) , and backwards, i.e., γ(Enb1 ,b2 ,...,bq ) into π(Sna1 ,a2 ,...,ar ) .

100

General Methods

1.4.3. Generation of the Carrier Vector The combinatorial method that generates the carrier vector π(f ) will be inductively introduced. First it is shown how this vector can be created for an EPUSBF f (x) = Enb (x). After that the method will be b ,b ,...,bq . generalized for the PUSBF f (x) = En1 2

Generation of the carrier vector π(Enb ) Example 1.10 demonstrates the process of how to generate the carrier vector π(f (x)) of the EPUSBF f (x) = Enb (x). Example 1.10. The carrier vector π = (π0 , π1 , ..., π6 ) of the EPUSBF f (x) = E62 (x) will be calculated. Due to the given definition of an EPUSBF we have γ(E62 (x)) = (0, 0, 1, 0, 0, 0, 0) and 1 2 3 4 E62 (x) = x1 x2 ⊕ x1 x3 ⊕ x1 x4 ⊕ x1 x5 ⊕ x2 x3 ⊕ x2 x4 ⊕ x2 x5 ⊕ x3 x4 ⊕ x3 x5 ⊕ x4 x5

⊕ ⊕ ⊕ ⊕ ⊕

5 x1 x6 x2 x6 x3 x6 x4 x6 x5 x6 .

(1.68)

The conjunctions of E62 (x) are arranged in (1.68) in columns such that the number of the column is equal to the number of conjunctions within the column. The polynomial E62 (x) contains 62 = 15 conjunctions of two variables. The components πi , i = 0, . . . , 6, of the carrier vector π(E62 (x)) = (π0 , π1 , π2 , π3 , π4 , π5 , π6 ) are determined as follows:

Novel Polynomial Expansions of Symmetric Functions

101

— If π0 = 1 then E62 contains S60 . The ESBF S60 = 1 and this term does not appear in the polynomial (1.68). Hence, π0 = 0 for E62 . — If π1 = 1 and π0 = 0 then E62 contains S61 . According to the definition, the ESBF S61 = 1 if x1 = 1 and x2 = x3 = . . . = x6 = 0. However, the assignment of these values to (1.68) results in E62 (1, 0, 0, 0, 0, 0) = 0. Hence, π1 = 0 for E62 . — If π2 = 1 and π0 = π1 = 0 then E62 contains S62 . According to the definition, the ESBF S62 = 1 if exactly two of the six variables are equal to 1 and the remaining four variables are equal to 0. For all these assignments exactly one conjunction of (1.68) is equal to 1. Hence, from S62 = 1 it follows E62 = 1 and therefore π2 = 1. — If π2 = π3 = 1 and π0 = π1 = 0 then E62 contains S62 ⊕ S63 . According to the definition, the ESBF S63 = 1 if exactly three of the six variables are equal to 1 and the remaining three variables are equal to 0. For such an assignment three conjunctions of (1.68) are equal to 1; due to this odd number the polynomial E62 (1.68) is equal to 1 and therefore we have π3 = 1. In the special case S63 (1, 1, 1, 0, 0, 0) = 1, the assignment x1 = x2 = x3 = 1 and x4 = x5 = x6 = 0 causes the conjunctions of the columns 1 and 2 in (1.68) are equal to 1. — If π2 = π3 = π4 = 1 and π0 = π1 = 0 then E62 contains S62 ⊕ S63 ⊕ S64 . According to the definition, the ESBF S64 = 1 if x1 = x2 = x3 = x4 = 1 and x5 = x6 = 0. In this special case the conjunction of the columns 1, 2, and 3 are equal to 1. Due to the even number of these conjunctions E64 (1, 1, 1, 1, 0, 0) = 0 and therefore we have π4 = 0. — If π2 = π3 = π5 = 1 and π0 = π1 = π4 = 0 then E62 contains S62 ⊕ S63 ⊕ S65 . According to the definition, the ESBF S65 = 1 if x1 = x2 = x3 = x4 = x5 = 1 and x6 = 0. In this special case the conjunction of the columns 1, 2, 3, and 4 are equal to 1. Due to the even number of these conjunctions E65 (1, 1, 1, 1, 1, 0) = 0 and therefore we have π5 = 0. — If π2 = π3 = π6 = 1 and π0 = π1 = π4 = π5 = 0 then E62

102

General Methods

contains S62 ⊕ S63 ⊕ S66 . According to the definition, the ESBF S66 = 1 if and only if all six variables xi are equal to 1. All conjunctions of (1.68) are equal to 1 in this case; due to the odd number of conjunctions the polynomial E62 = 1 and therefore π6 = 1. The results of this calculation are the carrier vector of the EPUSBF E62 (x) π(E62 ) = (0, 0, 1, 1, 0, 0, 1) and the polynomial P (E62 ) = S62 ⊕ S63 ⊕ S66 . It is worth paying attention that the value of the polynomial depends only on the numbers i of non-negated variables in the ESBFs Sni . Theorem 1.5 summarizes the reasoning used in Example 1.10. Theorem 1.5. The i-th entry πi of the carrier vector π = (π0 , π1 , . . . , πn ) of EPUSBF Enb = Enb (x1 , x2 , ..., xn ) can be calculated by i (mod 2) , πi = b

(1.69)

where i = b, . . . , n. Proof. Here, we distinguish between three cases of the relationship between i and b. 1. If i < b then the number i of values 1 which are assigned to the variables xj is less than the number of variables in the conjunctions of the EPUSBF Eib (x) (see the first and the second cases of Example 1.10). Hence, Eni = 0 and πi = 0 for i < b. 2. If i = b then b variables xj are equal to 1. The EPUSBF Ebb (x) contains all conjunctions so that exactly one of these conjuncb is equal to 1. Hence, En = 1 and πi = 1 for i = b, where tions i b (mod 2) = 1.

Novel Polynomial Expansions of Symmetric Functions

103

b 3. If i > b then i out of the polynomial En (x) i n variables ofnthe are equal to 1. Hence, b out of the b conjunctions of the polynomial Enb (x) are equal to 1. If bi is an odd number and Sni (x) = 1 then Enb (x) = 1 so that (1.69) is satisfied.

Corollary 1.2. It follows from Theorem 1.5 that the carrier vector π of EPUSBF Enb can be written in the following form ⎞ ⎛ ⎟ ⎜ π = ⎝0, 0, . . . , 0, 1, πb+1 , . . . , πn ⎠ . !" # !" # b

(1.70)

n−b

The Lucas theorem [134] can be utilized to calculate (1.69). It helps to determine whether bi is an odd or even number. Theorem 1.6 (Application of the Lucas Theorem [134]). Let i, b be two integer values, i > b, and ij , bj be the corresponding bits of these values based on their decimal representation i = (iδ1 , iδ1 −1 , . . . , i1 ) and b = (bδ2 , bδ2 −1 , . . . , b1 ) with δ1 = log2 i + 1 and δ2 = log2 b + 1 then i (mod 2) = 1 ⇐⇒ ∀j ij ≥ bj . b Example 1.11. It must be verified whether the value of bi is an odd or even number, where i = 11 and b = 2 in cases (a) and b = 5 in cases (b). The length of i is δ1 = log2 11 + 1 = 4 so that four bits of i are taken into account: i = (i4 , i3 , i2 , i1 ) = (1011). The bits of b are specified in the cases (a) and (b) by (a) δ2 = log2 2 + 1 = 2, so that b = (b2 , b1 ) = (10), and (b) δ2 = log2 5 + 1 = 3, so that b = (b3 , b2 , b1 ) = (101). Figure 1.27 (a) shows that the values i = 11 and b = 2 satisfy the is an condition of Theorem 1.6 so that the binomial coefficient 11 2

104

i i4 i3 i2 i1

General Methods

1 0 1 1

= >

b 1 b2 0 b1

i i4 i3 i2 i1

1 0 1 1

(a)

< = >

b 1 b3 0 b2 1 b1

(b)

not involved in the comparison satisfies the condition of Theorem 1.6 does not satisfy the condition of Theorem 1.6

Figure 1.27. Identification of binomial coefficients: (a) the odd binomial coefficient 11 ; (b) the even binomial coefficient 11 . 2 5

odd number

11 2

(mod 2) = 1 .

Figure 1.27 (b) shows that the values i = 11 and b = 5 do not satisfy is the condition of Theorem 1.6 so that the binomial coefficient 11 5 an even number 11 (mod 2) = 0 . 5 Example 1.12 demonstrates the benefit of Theorem 1.5 and Theorem 1.6 for the calculation of the carrier vector π of the same polynomial E62 (x) already used in Example 1.10. Example 1.12. The carrier vector π = (π0 , π1 , . . . , π6 ) of the EPUSBF f (x) = E62 (x) will be calculated. The reduced Zhegalkin spectrum of the polynomial E62 (x) is γ(E62 ) = (0, 0, 1, 0, 0, 0, 0). According to (1.70) π0 = π1 = 0, π2 = 1, and π = (0, 0, 1, π3 , π4 , π5 , π6 ). Due to Theorem 1.5 it must be verified whether 32 , 42 , 52 , 62 are odd or even numbers in order to find π3 , π5 , π5 , π6 . Figure 1.28 shows how the results π3 = 1, π4 = 0, π5 = 0, and π6 = 1 are calculated using Theorem 1.6.

Novel Polynomial Expansions of Symmetric Functions

π3 ⇔ 3 2

π4 ⇔ 4 2

105

π5 ⇔ 5 2

π6 ⇔ 6 2

1 = 1 1 > 0

1 0 < 1 0 = 0

1 0 < 1 1 > 0

1 1 = 1 0 = 0

π3 =1

π4 =0

π5 =0

π6 =1

!" #

!" #

!" #

!" #

not involved in the comparison satisfies the condition of Theorem 1.6 does not satisfy the condition of Theorem 1.6 Figure 1.28. Identification of binomial coefficients regarding the even or odd property.

3 4 5 , 2 , 2 , and 62 2

The found carrier vector is π(E62 ) = (0, 0, 1, 1, 0, 0, 1) and the associated polynomial is P (E26 ) = S62 (x) ⊕ S63 (x) ⊕ S66 (x) . The demonstrated procedure to calculate the entries πi of the carrier vector π by utilizing the Lucas theorem is called the combinatorial method.

b ,b2 ,...,bq

Generation of the carrier vector π(En1

)

The combinatorial method to generate the carrier vector for EPUSBF Enb = Enb (x1 , x2 , . . . , xn ) can be generalized for an arbitrary PUSBF b ,b ,...,bq b ,b ,...,bq (x1 , x2 , . . . , xn ). = En1 2 En1 2 Theorem 1.7. The i−th entry πi of the carrier vector π(Enb1 ,b2 ,...,bq ) = (π0 , π1 , . . . , πn )

106

General Methods

b ,b ,...,b

q of the PUSBF En1 2 can be calculated by

i i i πi = + + ... + (mod 2) , (1.71) b1 b2 bq where i = b1 + 1, . . . , n, and bij = 0 for i < bj and j = 1, 2, . . . , q.

The proof of Theorem 1.7 follows from Theorem 1.5. Example 1.13. The carrier vector π = (π0 , π1 , . . . , π10 ) 5,7,8 (x) will be calculated. of the PUSBF f (x) = E10

Due to the given definition of an PUSBF we have 5,7,8 ) = (0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0) γ(E10 5,7,8 5 7 8 = E10 ⊕ E10 ⊕ E10 . Using (1.70), it follows and E10

π0 = π1 = π2 = π3 = π4 = 0, π5 = 1 , and

5,7,8 ) π(E10

= (0, 0, 0, 0, 0, 1, π6 , π7 , π8 , π9 , π10 ).

Using (1.71) and Theorem 1.6, it is easy to calculate π6 , π7 , π8 , π9 , and π10 . Figure 1.29 shows in detail all the steps of these calculations with the results π6 = π7 = 0 and π8 = π9 = π10 = 1. 5,7,8 (x) is The found carrier vector of the PUSBF E10 5,7,8 ) = (0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1) π(E10

and the polynomial 5,7,8 5 8 9 10 ) = S10 ⊕ S10 ⊕ S10 ⊕ S10 . P (E10

1.4.4. Generation of the Reduced Spectrum The combinatorial method to generate the carrier vector π can be modified to generate the reduced spectrum γ(Sna1 ,a2 ,...,ar ) of the SBF Sna1 ,a2 ,...,ar . For that purpose Theorem 1.5 and Theorem 1.7 are adapted as follows:

Novel Polynomial Expansions of Symmetric Functions

π6

π7

107

π8

6

7

7

8

8

8

5

5

7

5

7

8

5,7,8 E10

5,7,8 E10

5,7,8 E10

5,7,8 E10

5,7,8 E10

5,7,8 E10

1 = 1 1 = 1 1 = 1

1 0 < 1 0 = 0 0 < 1

1 0 < 1 0 < 1 0 < 1

1 = 1 1 > 0 0 < 1

!" #

π6 =0

1 = 1 1 > 0 1 = 1

!" #

!" #

!" #

!" #

1 0 0 0

= = = =

1 0 0 0

!" #

1 (mod 2)+1 (mod 2) 0 (mod 2)+0 (mod 2)+1 (mod 2)

!"

#

π7 =0

!"

#

π8 =1

π9

π10

9

9

9

10

10

10

5

7

8

5

7

8

5,7,8 E10

5,7,8 E10

5,7,8 E10

5,7,8 E10

5,7,8 E10

5,7,8 E10

1 0 < 1 0 = 0 1 = 1

1 0 < 1 0 < 1 1 = 1

= = = >

1 0 < 1 1 > 0 0 < 1

1 0 < 1 1 = 1 0 < 1

!" #

!" #

1 0 0 1

1 0 0 0

!" #

!" #

!" #

1 0 1 0

= = > =

1 0 0 0

!" #

0 (mod 2)+0 (mod 2)+1 (mod 2)

0 (mod 2)+0 (mod 2)+1 (mod 2)

π9 =1

π10 =1

!"

#

!"

not involved in the comparison satisfies the condition of Theorem 1.6 does not satisfy the condition of Theorem 1.6 5,7,8 Figure 1.29. Calculation of π6 , π7 , π8 , π9 , π10 for π(E10 ).

#

108

General Methods

Theorem 1.8. The i−th entry γi of the reduced spectrum γ = (γ0 , γ1 , . . . , γn ) of ESBF Sna = Sna (x1 , x2 , . . . , xn ) can be calculated by i (mod 2) , γi = a

(1.72)

where i = a + 1, . . . , n. Theorem 1.9. The i−th entry γi of the reduced spectrum γ(Sna1 ,a2 ,...,ar ) = (γ0 , γ1 , . . . , γn ) of the SBF Sna1 ,a2 ,...,ar can be calculated by

i i i + + ... + (mod 2) , (1.73) γi = a1 a2 ar where i = a1 + 1, . . . , n, and aij = 0 for i < aj and j = 1, 2, . . . , r. Corollary 1.3. It follows from Theorem 1.8 and Theorem 1.9 that the reduced spectrum γ = (γ0 , γ1 , . . . , γn ) of the ESBF Sna1 (x) or the SBF Sna1 ,a2 ,...,ar (x) can be written in the following form ⎛ ⎞ ⎜ ⎟ γ = ⎝0, 0, . . . , 0, 1, γa1 +1 , . . . , γn ⎠ . !" # !" # a1

(1.74)

n−a1

Example 1.14 demonstrates the application of Theorem 1.9. Example 1.14. The reduced spectrum γ = (γ0 , γ1 , . . . , γ7 ) of the SBF S72,3 (x) will be calculated. The carrier vector of the SBF S72,3 (x) is π(S72,3 ) = (0, 0, 1, 1, 0, 0, 0, 0). From Equation (1.74) of Corollary 1.3 follows that γ0 = γ1 = 0, γ2 = 1, and γ(S72,3 ) = (0, 0, 1, γ3 , γ4 , γ5 , γ6 , γ7 ). Using Theorem 1.9 and Theorem 1.6 the values of γj , j = 3, . . . , 7 can be easily calculated. A bit pair is highlighted with bold letters in the

Novel Polynomial Expansions of Symmetric Functions

109

binary representation of the binomial coefficients if this pair does not satisfy Theorem 1.6.

3 3 + (mod 2) γ3 = 2 3

11 11 = (mod 2) + (mod 2) (mod 2) 10 11 = [1 + 1] (mod 2) γ3 = 0

4 4 (mod 2) + γ4 = 2 3

100 100 = (mod 2) + 10 11 = [0 + 0] (mod 2) γ4 = 0

5 5 + (mod 2) γ5 = 2 3

101 101 = (mod 2) + 10 11 = [0 + 0] (mod 2) γ5 = 0

6 6 + (mod 2) γ6 = 2 3

110 110 = (mod 2) + 10 11 = [1 + 0] (mod 2) γ6 = 1

7 7 + (mod 2) γ7 = 2 3

111 111 = (mod 2) + 10 11 = [1 + 1] γ7 = 0

(mod 2)

(mod 2)

(mod 2)

(mod 2)

(mod 2)

(mod 2)

(mod 2)

(mod 2)

(mod 2)

110

General Methods

The found reduced spectrum of the SBF S72,3 (x) is γ(S72,3 ) = (0, 0, 1, 0, 0, 0, 1, 0) and the associated polynomial is P (S72,3 ) = E72 (x) ⊕ E76 (x) .

1.4.5. The Complexity of the Combinatorial Method The complexity of the proposed method can be defined as the number of binary operations EXOR (or OR). This complexity is referred to as C1 for the EPUSBF Enb (x) (or the ESBF Sna (x)) and C2 for the b ,b ,...,bq (x) (or the SBF Sna1 ,a2 ,...,ar (x)). PUSBF En1 2 The binary vectors x = (xt , xt−1 , . . . , x1 ) and y = (yt , yt−1 , . . . , y1 ) satisfy the relation (xt , xt−1 , . . . , x1 ) ≥ (yt , yt−1 , . . . , y1 ) if xi ≥ yi , where i = 1, . . . , t. The inequality xi ≥ yi is equivalent to the Boolean equation xi ∨ y i = 1 . (1.75) Due to (1.75) and the condition of Theorem 1.6 the complexity to calculate whether bi is an odd or even number is (log2 b+1). Using (1.70) of Corollary 1.2, the complexity to compute the carrier vector π(Enb ) = (π0 , π1 , . . . , πn ) is C1 = (log2 b + 1) · (n − b) .

(1.76)

b ,b ,...,b

q (x), it follows In the more general case of a given PUSBF En1 2 from Theorem 1.7 that the complexity to calculate the carrier vector b ,b ,...,bq π(En1 2 ) = (π0 , π1 , . . . , πn ) is

C2 = (log2 b1 + 1)·(n−b1 )+

q i=2

(log2 bi + 1)·(n−bi +1) . (1.77)

Novel Polynomial Expansions of Symmetric Functions

111

The complexity to calculate the reduced spectrum γ(Sna ) of the ESBF Sna (x) is (1.78) C1 = (log2 a + 1) · (n − a) , and for the reduced spectrum γ(Sna1 ,a2 ,...,ar ) of the SBF Sna1 ,a2 ,...,ar (x) it is C2 = (log2 a1 + 1)·(n−a1 )+

r

(log2 ai + 1)·(n−ai +1) , (1.79)

i=2

respectively.

1.4.6. Discussion of the Reached Improvements There are several efficient methods to solve the task considered in this section, i.e., the calculation of the reduced spectrum γ(Sna1 ,a2 ,...,ar ) = (γ0 , γ1 , . . . , γn ) and the reverse task of the calculation of the carrier vector π(Enb1 ,b2 ,...,bq ) = (π0 , π1 , . . . , πn ) . One of these methods is the transeunt triangle method. It was originally proposed by V. P. Suprun for SBFs [336]. Later this method was generalized for arbitrary Boolean functions [337]. The transeunt triangle method is, so far, the most efficient method for polynomial expansion of the SBF f = f (x1 , x2 , . . . , xn ). It has the complexity O n2 . The transeunt triangle form is iteratively calculated from the given binary vector in the upper line to the single bit in the bottom line by using the EXOR-operation (see Example 1.15). The number of EXOR-operations defines the complexity CT of the transeunt triangle method and is n2 + n . (1.80) CT = 2 Example 1.15 shows the application of the transeunt triangle method for the calculation of a carrier vector π(Enb ) = (π0 , π1 , . . . , πn ).

112

General Methods

Example 1.15. The same task as in Example 1.12 has to be solved. The given EPUSBF f (x) = E62 (x) has the reduced Zhegalkin spectrum γ(E62 ) = (0, 0, 1, 0, 0, 0, 0). The calculation of the reduced carrier vector π(E62 ) will be realized using the transeunt triangle method. 0

0 0

1 1

1

0 1

0 1

0 0

1 1

0

0 1

0 0

0 0

0 0

0 0

1 1

1 Figure 1.30. Transeunt triangle of the polynomial f (x) = E62 (x).

Figure 1.30 shows the transeunt triangle where the values of the reduced spectrum γ(E62 ) = (0, 0, 1, 0, 0, 0, 0) are assigned to the upper line. The value in line i + 1 are calculated by an EXOR of neighbored values of line i. According to the transeunt triangle method the left side of the triangle corresponds to the carrier vector. Hence, the calculated carrier vector is π(E62 ) = (0, 0, 1, 1, 0, 0, 1) and the associated polynomial is E62 = S62 ⊕ S63 ⊕ S66 . The complexity to calculate the carrier vector π with the transeunt triangle method is specified by (1.80) and has the value CT = 21 for π(E62 ). Example 1.12 shows that the same result can be calculated using the combinatorial method. Due to (1.76), the complexity to calculate the carrier vector π(E62 ) with the new combinatorial method is C1 = 8. Figure 1.31 shows the comparison between the complexity C1 (1.76) of the combinatorial method proposed in this section and the complexity CT of the transeunt triangle method for the calculation of the reduced carrier vector π of the EPUSBF Enb (x). The same results occur for

Novel Polynomial Expansions of Symmetric Functions

113

the comparison of these methods for the reverse task to calculate the reduced spectrum γ of a given ESBF Sna (x).

5000 number of operations (EXOR, OR)

4500

5050

the transeunt triangle method the combinatorial method

4095

4000 3500

3240

3000 2485

2500 2000

1830

1500

1275

1000

828 465

500 210 20

70

120

170

220

370

320

370

420

n

10 20 30 40 50 60 70 80 90 100 number of variables Figure 1.31. Comparison of the complexities of C1 and CT .

It can be seen in Figure 1.31 that the combinatorial method for the EPUSBF Enb (x) has the linear complexity O(n). The proposed combinatorial method is already ten times faster than the transeunt triangle method for n = 80 variables. The complexities of the combinatorial method (1.76) were calculated for b = 16 which belongs to the worst cases. The complexity of the combinatorial method in comparison with the b ,b ,...,bq , complexity of the transeunt triangle method for PUSBF En1 2 where q > 1, strongly depends on the polynomial numbers included in the set B(E) = {b1 , b2 , . . . , bq }. Table 1.8 shows some threshold

114

General Methods

values, where the efficiency of the combinatorial method is similar to the transeunt triangle method. Table 1.8. Comparison of the complexities of C2 and CT

n

r

10 20 30 40 50 60 70 80 90 100 255 511 1023 2047 4095

3 5 6 7 8 9 10 11 12 13 26 44 76 135 242

B(E)

%

C2

CT

{2,3,4} {4,. . . ,8} {4,. . . ,9} {8,. . . ,14} {11,. . . ,18} {16,. . . ,24} {16,. . . ,25} {16,. . . ,26} {16,. . . ,27} {16,. . . ,28} {32,. . . ,57} {64,. . . ,107} {128,. . . ,203} {256,. . . ,390} {512,. . . ,753}

30 25 20 18 16 15 14 14 13 13 10 9 7 7 6

53 235 483 836 1266 1840 2520 3295 4765 5130 32988 131355 521960 2095866 8381660

55 210 465 820 1275 1830 2485 3240 4095 5050 32640 130816 523766 2096128 8386560

The columns of Table 1.8 have the following meanings. The number b ,b ,...,bq n of the variables of the function En1 2 (x) is specified in the first column. The second column r gives the number of the polynomial numbers in the set B(E) for which C2 ≈ CT . The third column B(E) contains the set of polynomial numbers B(E) for which the complexities of both methods are approximately the same. Each other set of the polynomial numbers B(E) provides a lower complexity of the combinatorial method for the number of variables specified in the first column. The fourth column shows the ratio between the cardinality r and all variables n specified in the first column. The last two columns show the nearly equal complexities of the combinatorial and the transeunt triangle methods. The suggested combinatorial method is a new method to calculate b ,b ,...,bq b ,b ,...,bq ) of a given PUSBF En1 2 (x) or the carrier vector π(En1 2 the reduced spectrum γ(Sna1 ,a2 ,...,ar ) of a given SBF Sna1 ,a2 ,...,ar (x). The benefit of the proposed method is their linear complexity O(n) whereas the so far best transeunt triangle method has a quadratic

Novel Polynomial Expansions of Symmetric Functions

115

complexity O(n2 ) for EPUSBFs or ESBFs. Furthermore, the combinatorial method provides a high efficiency, especially for small sets of polynomial numbers B(E) of PUSBFs or small sets of symmetry levels A(S) of SBFs.

2. Efficient Calculations 2.1. XBOOLE-CUDA - Fast Calculations of Large Boolean Problems on the GPU Bernd Steinbach

Matthias Werner

2.1.1. Challenges for Boolean Calculations The technological progress in micro- and nano-electronics leads to both many helpful new applications and growing requirements for the design of digital systems [313]. Boolean functions are the main instrument for their description [247]. It is well known that the number of function values of a Boolean function exponentially grows depending on the number of Boolean variables. Hence, solving Boolean problems which depend on many variables is an important challenge for scientists [313]. An important source for improvements in solving Boolean tasks is the utilization of many compute cores for parallel computations. Recent processors contain a small number of cores in the Central Processing Unit (CPU). Significantly more cores are available on the Graphics Processing Unit (GPU). Therefore, the main aim of this paper is the utilization of the GPU for faster Boolean computations. The resources needed in time and space to solve a Boolean problem are not only determined by the hardware used, but also by the main data structure of the required Boolean functions and the algorithms for their computation. One result of comprehensive research in this field is XBOOLE [314]. Many applications of XBOOLE are shown

118

Efficient Calculations

in [317, 325]. XBOOLE was originally implemented for a single core of the CPU. Alternative methods for faster calculations of Boolean problems were explored in [332, 333]. Werner has ported the most time-consuming XBOOLE-operations to the GPU in his recent Master Thesis [367]. He used Nvidia’s Compute Unified Device Architecture (CUDA) developing the new library called XBOOLE-CUDA.

2.1.2. The Concepts Realized in XBOOLE Main Data Structure: Ternary Vector List (TVL). The solution of a Boolean equation of n variables is a set of up to 2n binary vectors. XBOOLE reduces this exponential number by means of Ternary Vectors (TVs). A single TV summarizes many binary vectors. In this way a set of binary vectors can be expressed by a smaller number of ternary vectors of a Ternary Vector List (TVL). A TVL can be created from a list of binary vectors by the repeated application of the following two rules: • Two binary vectors which differ in only one position can be expressed by a single ternary vector that contains, in this position, the dash element (−). • Two ternary vectors which differ in only one position, and contain in this position the values 0 and 1, can be summarized to a single ternary vector that contains, in this position, the dash element (−). Example 2.16. The set S of seven binary vectors contains three pairs of binary vectors which can each be merged into a single ternary vector.

S=

x1 0 0 0 0 1 1 1

x2 1 1 0 0 0 1 1

x3 0 1 1 1 0 0 1

x4 1 1 0 1 0 1 1

x5 1 1 0 0 0 1 1

=

x1 0 0 1 1

x2 1 0 0 1

x3 − 1 0 −

x4 1 − 0 1

x5 1 0 0 1

XBOOLE-CUDA - Fast Calculations on the GPU

119

Two of these new ternary vectors can be combined into a single ternary vector that contains two − elements. Hence, three ternary vectors are sufficient to store the given seven binary vectors. x1 0 S= 0 1 1

x2 1 0 0 1

x3 − 1 0 −

x4 1 − 0 1

x5 1 0 0 1

=

x1 − 0 1

x2 1 0 0

x3 − 1 0

x4 1 − 0

x5 1 0 0

The construction of ternary vectors reveals that a single ternary vector with d dashes (−) represents 2d binary vectors. Hence, the exponential increase of the possible number of solution vectors of a Boolean equation caused by a linear increase of the number of variables is exponentially decreased by the utilization of ternary vectors. XBOOLE utilizes this possibility to decrease the memory needed and therefore uses the TVL as the main data structure. Orthogonality. Two different binary vectors of the same variables are called disjoint or orthogonal. We re-use the term orthogonal as the property of two ternary vectors which do not simultaneously contain any binary vector. Vice versa, two ternary vectors are non-orthogonal if at least one binary vector belongs to both ternary vectors. two orthogonal ternary vectors x1 x2 x3 x4 x5 1 0 − − 1 − 1 1 − 1

two non-orthogonal ternary vectors x1 x2 x3 x4 x5 0 − 1 0 1 0 1 1 − 1 0

no common binary vector

1 1 0 1 one common binary vector

Figure 2.1. Orthogonality of ternary vectors.

Example 2.17. The left two ternary vectors of Figure 2.1 are orthogonal to each other. There is no common binary vector, due to the value x2 = 0 of the first ternary vector and the value x2 = 1 for the second ternary vector.

120

Efficient Calculations

Despite the smaller number of included binary vectors, the two righthand side TVs of Figure 2.1 are not orthogonal to each other. There is no column for these TVs that contain a combination of values 01. Both the replacement of the dash in the first TV by a value 1 and the replacement of the dash in the second TV by a value 0 leads to the same binary vector. A TVL is called orthogonal if each pair of its TVs is orthogonal to each other. XBOOLE uses orthogonal TVLs, because no binary vector can be included more than once in an orthogonal TVL. Additional advantages of orthogonal TVLs arise from their interpretation as a Boolean function. Computation of TVLs. A TVL is not only a data structure to store a set of binary vectors [381]; these ternary vectors can also be directly manipulated to calculate the results of several set operations [312]. In this way, the benefit of the compact representation is re-used for the computation task because all binary vectors which are represented by a single ternary vector are commonly computed without splitting up into the merged binary vectors. Each ternary element ti is represented by two Boolean values ai and bi as shown in Table 2.1. Table 2.1. Encoding of the ternary element ti by the two Boolean values ai and bi used in XBOOLE

ti

ai

bi

0 1 −

0 1 0

1 1 0

A hierarchical system of algorithms for the calculation of all set operations between TVLs is explained in [246, 312]. The key algorithm of this system is the test of whether two TVs are orthogonal to each other. Two ternary elements ti = (ai , bi ) and tj = (aj , bj ) are not orthogonal to each other if (ai ⊕ aj ) ∧ bi ∧ bj = 0 .

(2.1)

Hence, three binary operations are sufficient to make this decision. Subsequently, the evaluated TVs are manipulated in the required manner.

XBOOLE-CUDA - Fast Calculations on the GPU

121

Parallel Computation. The orthogonality can be provided by any pair of ternary elements of two TVs. If the number of variables is less or equal to the number of bits within a machine word of the computer used, the check for orthogonality (2.1) will be executed in XBOOLE in parallel for all elements of the TVs ti = (ai , bi ) and tj = (aj , bj ) using vector operations of the computer: (ai ⊕ aj ) ∧ bi ∧ bj = 0 .

(2.2)

If (2.2) is satisfied the two TVs ti = (ai , bi ) and tj = (aj , bj ) are not orthogonal to each other. This parallel computation of 32, 64 or even more bits, depending on the word length of the computer, contributes strongly to the power of XBOOLE. The parallel computation is used in XBOOLE for almost all operations with TVs. If, for instance, (2.2) detects two non-orthogonal TVs ti = (ai , bi ) and tj = (aj , bj ), then the TV tisc = (aisc , bisc ) of the intersection of these vectors is calculated in XBOOLE using two simple vector operations: aisc = ai ∨ aj ,

(2.3)

= bi ∨ bj .

(2.4)

b

isc

Unlimited Number of Variables. Each ternary element ti is stored by two bits, according Table 2.1, within the same position of the two machine words a and b of nm bits each. XBOOLE uses more than one of such pairs of machine words to store a single ternary vector if the solution of a problem requires np variables and np > nm . The number of machine words to store one ternary vector is called TYP. The value of TYP is restricted to the box size (see XBOOLE Box System on page 122). XBOOLE does not waste memory space and defines the box size adequately to the word length of the computer. The change of the box size only requires the change of a constant and the rebuilding of the XBOOLE library. In this way, an unlimited number of variables is provided by XBOOLE. Restricted Sequential Computation. In the case of a large number of variables (np > nm ) the decision about orthogonality may require

122

Efficient Calculations

(TYP/2)-times the evaluation of (2.2). However, the orthogonality of the evaluated ternary vectors is detected if (2.2) is not satisfied for one pair of machine words. Hence, this sequential computation is immediately interrupted if it is known that the ternary vectors are orthogonal. The orthogonality of two ternary vectors can mostly be determined by the evaluation of the first pair of machine words due to the parallel check of nm variables. Therefore, XBOOLE also efficiently processes TVLs which depend on more than nm variables. Space Concept. The required memory to store a single TV is equal to TYP machine words. This entails the conflict that a large value of TYP enables many Boolean variables, but a small value of TYP saves memory space. XBOOLE solves this conflict by means of the space concept. The user of XBOOLE can define an unlimited number of Boolean spaces by fixing the maximal number of Boolean variables for each of these spaces. In this way an extremely large number of Boolean variables can be handled within restricted memory resources. The space concept has one more beneficial effect. The Boolean variables can be assigned to each Boolean space in an arbitrary but fixed order. Each TVL is associated to exactly one Boolean space of XBOOLE. Hence, all TVLs of the same Boolean space have the same order of variables such that the parallel computation of the ternary vectors can be executed without adjustment to the position of the variables. Almost all XBOOLE operations manipulate only TVLs of the same Boolean space. Only the XBOOLE operation SPACE TRANS transfers one TVL from one space into another by adjustment of the columns and, if necessary, assigning additional variables to the target space. XBOOLE Box System. The memory needed to store a single TVL can vary across a large range due to the exponential number of 2n different binary vectors of n Boolean variables. XBOOLE does not force the user to predefine the expected memory for one TVL, but implicitly assigns the memory needed to each TVL using the implemented box system. XBOOLE requests an adequate amount of memory space from the operating system, divides this space into boxes of a fixed size, and

XBOOLE-CUDA - Fast Calculations on the GPU

123

chains these boxes into a so-called empty list. If a TVL needs memory to store further ternary vectors, XBOOLE takes one more box from the empty list and adds this box to the chain of boxes of the TVL. If a TVL is no longer needed, the chain of their boxes is chained with the empty chain. Hence, the memory can be re-used without fragmentation into parts of different small sizes. XBOOLE uses the term object for a single box or a chain of boxes and allows us to create different types of objects. As well as TVLs, XBOOLE also stores all information about a Boolean space within an XBOOLE object. An ordered set of variables, called Variables Tuple (VT), is another XBOOLE object that is used to control some XBOOLE operations. The type of each XBOOLE object is stored by a code within an exclusive position in the box. Three special types of XBOOLE objects are used to manage the XBOOLE system: 1. The variable list (VL) stores the names of all used Boolean variables. 2. The space list (SL) stores pointers to the XBOOLE objects of Boolean spaces. 3. The memory list (ML) stores pointers to TVLs, VTs, or other user defined objects in the XBOOLE box system. Form Predicate of TVLs. In addition to the representation of a set of binary vectors, a TVL can be used to represent a Boolean function by one of the four normal forms: 1. Disjunctive form (D-form), e.g., f1 (x) = x1 x3 ∨ x1 x2 ∨ x2 x3 , 2. Antivalence form (A-form), e.g., f2 (x) = x1 x3 ⊕ x1 x2 ⊕ x2 x3 , 3. Conjunctive form (K-form), e.g., f3 (x) = (x1 ∨ x3 ) ∧ (x1 ∨ x2 ) ∧ (x2 ∨ x3 ) , 4. Equivalence form (E-form), e.g., f4 (x) = (x1 ∨ x3 ) (x1 ∨ x2 ) (x2 ∨ x3 ) ,

124

Efficient Calculations

Each conjunction of a Boolean function in D-form or A-form is expressed by one ternary vector of the TVL. From the examples of the above enumeration we get the following TVLs:

D(f1 ) =

x1 0 1 −

x2 − 0 1

x3 0 − 1

A(f2 ) =

x1 0 1 −

x2 − 0 1

x3 0 − 1

Similarly, each disjunction of a Boolean function in K-form or E-form is expressed by one ternary vector of the TVL. From the examples of the above enumeration we get the following TVLs:

K(f3 ) =

x1 0 1 −

x2 − 0 1

x3 0 − 1

E(f4 ) =

x1 0 1 −

x2 − 0 1

x3 0 − 1

All three pairs of conjunctions of f1 and f2 are orthogonal to each other. Hence, these two functions are identical and the operations ∨ and ⊕ can be exchanged without changing the function. XBOOLE prefers orthogonal TVLs and uses, in this case, the form predicate ODA. x1 x2 x3 0 − 0 ODA(f1 ) = ODA(f2 ) = 1 0 − − 1 1 Similarly, all three pairs of disjunctions of f3 and f4 are orthogonal to each other. Hence, these two functions are identical and the operations ∧ and can be exchanged without changing the function. XBOOLE prefers orthogonal TVLs and uses, in this case, the form predicate OKE. x1 x2 x3 0 − 0 OKE(f3 ) = OKE(f4 ) = 1 0 − − 1 1 Portable Library as a Software Product. XBOOLE consists of more than 100 functions which are combined within a portable library which can be used to solve Boolean problems from a wide field of applications

XBOOLE-CUDA - Fast Calculations on the GPU

125

efficiently. These optimized functions are implemented in C and are available as a software product for many platforms. The XBOOLE functions can be directly used in C and C++, but also in other programming languages. All properties of the XBOOLE functions and the rules for their applications are documented in [102]. Many applications of XBOOLE, especially in logic design, are explained in [43]. The XBOOLE library uses the following groups for the available functions: 1. set operations (6), 2. derivative operations (6), 3. converting operations (14), 4. ternary matrix operations (13), 5. test operations (14), 6. operations for sets of variables (9), 7. operations to determine predicates (8), 8. operations to manage XBOOLE objects (12), 9. load and store operations (5), 10. operations to manage the XBOOLE box system (17), and 11. operations to manage XBOOLE errors (4). XBOOLE-Monitor as a Boolean Pocket Calculator. For users with lesser skills in programming, the XBOOLE-Monitor has been developed. The XBOOLE-Monitor provides a simple possibility to use almost all functions of the XBOOLE library. The XBOOLE-Monitor can be used by everybody for free and downloaded from the web page: http://www.informatik.tu-freiberg.de/xboole/.

The XBOOLE-Monitor is intensively used for teaching in universities [331] and is outstandingly suitable to solve practical tasks in a wide field of Boolean applications. The XBOOLE-Monitor supports the

126

Efficient Calculations

user in the fast execution of Boolean calculations and in converting data as well as stimulating thinking about solution strategies. The book [325] contains many tasks and explains the solutions using the XBOOLE-Monitor.

2.1.3. Parallel and Serial Architectures Here, we keep focus on Nvidia GPUs and the CUDA framework, which is maintained by Nvidia. CUDA can be downloaded for free at: https://developer.nvidia.com/ and supports Windows, Linux and MacOS. CUDA programs only work with Nvidia GPUs, but the CUDA package also includes an Open Computing Language (OpenCL) compiler. OpenCL programs run on every platform whose manufactures have specified the OpenCL standard of the Khronos Group [169]. Programming GPUs has gained a lot of attention over the last decade. In late 2000 it became possible with DirectX/Direct3D 8.0 to execute people’s own GPU programs with a General Purpose GPU (GPGPU), although the functionality was very limited. Over the last few years the APIs like CUDA or OpenCL became more and more easy and flexible to use. Recently many researchers have accelerated their simulations or experiments on the GPU. There have been many presentations where the GPU solves a given task much faster than the CPU. However, we will not remove the CPU in our computers. These processing units are converging by combining the strengths of both architectures. The CPU minimizes latency for sequential workloads and the GPU maximizes throughput by parallelism. Which one is better? It depends on the problem that must be solved. The different architectures are the answer to the question of how instructions and data for an algorithm can be processed. In 1966, Michael J. Flynn defined a taxonomy to distinguish architectures with

XBOOLE-CUDA - Fast Calculations on the GPU

127

regard to the number of concurrent instructions and data streams [117]. The questions of this taxonomy are: 1. Are single instructions sequentially executed or are several instructions executed in parallel? 2. Is one instruction executed for a single data stream or for several data streams in parallel? There are algorithms where it is not possible to leverage parallelization, because each instruction depends on the result from the previous instruction. Such a sequential program runs best on a single CPU core which is skillful and fast in executing Single Instruction, Single Data (SISD) [117]. Opposite to this, the GPU requires parallel and concurrent programming and involves Single Instruction, Multiple Data (SIMD) and Single Program, Multiple Data (SPMD) [172, p. 51]. Hence, efficient GPU programming only makes sense for algorithms which have a sufficient amount of instruction or data parallelism. The GPU is an optimized parallel architecture because its native task is highly parallel. They were basically developed to accelerate the graphical representation of data on the screen and naturally have to transform geometryand texture-data to colored pixels. Most of the computations are independent from each other due to the data locality. GPUs have been heavily optimized for such tasks over the years. This has been reached by throughput oriented, many-core architectures with hundreds of compute cores. The main advantage of the GPU is the much higher number of compute cores and the higher memory bandwidth. Table 2.2 shows the development of Nvidia GPU hardware over the last few years. We still use two C2070 Fermi GPUs from the Tesla series for scientific computing; they have 6 GB memory with ErrorCorrection Code (ECC) available and lower clocks for running longterm computations reliably. The GPUs GTX 680 and GTX 980 are desktop GPUs. They have less memory, no ECC, but higher clocks, since gaming is supposed to be a short-term activity. These cards are from newer generations with more cores, more transistors, higher Floating-Point Operations per Second (FLOPS) and less power consumption. Their architectures change as well, so that optimized codes

128

Efficient Calculations

Table 2.2. GPU Specifications

Release Cores Base Clock Memory Bandwidth Thermal Design Power Transistors Fabrication GFLOPS (SP) Release-Price

C2070 GF100

GTX 680 GK104

GTX 980 GM204

Fermi GPU

Kepler GPU

Maxwell GPU

March 2010 448 0.6 GHz 6144 MiB 144 GB/s 247 W 3.2 Mrd 40 nm 1030 $349

May 2012 1536 1 GHz 2048 MiB 192.3 GB/s 195 W 3.54 Mrd 28 nm 3090 $500

January 2015 2048 1.1 GHz 4096 MiB 224 GB/s 165 W 5.2 Mrd 28 nm 4612 $549

for Fermi GPUs may have worse speedups on Kepler GPUs. The porting of an efficient, sequential CPU-program like XBOOLE into an also efficient, parallel GPU-program like XBOOLE-CUDA is no simple task. Due to the differences in the architecture and programming paradigms between the CPU and the GPU, success strongly depends on the algorithm to port. According to Amdahl’s Law [11] the theoretical maximal speedup is determined by the sequential part of a program even when an infinite number of processor cores can be utilized. Consequently, significant speedups can only be reached on the GPU for programs with a large parallel part. A GPU consists of several, concurrently acting multiprocessors as illustrated in Figure 2.2 for the Fermi architecture. Each of these multiprocessors is equipped with shared memory, caches, very fast thread-private registers, control units, execution pipelines and many compute cores, which work in parallel. On Nvidia GPUs all computations are realized by lightweight threads, where each 32 threads are grouped as a warp. Instructions are executed by warps based on the SIMD architecture. Generally, 32 threads of a warp execute the same instruction at the same time on multiple data. However, warp instructions are serialized by the hardware; if divergent control flows, atomic operations or memory conflicts are present. Nvidia calls it Single Instruction, Multiple Threads (SIMT). Such latencies can be hidden by executing other warp instructions, as long as there are free warps available for the warp schedulers; see latency hiding, occupancy, thread

DRAM

Host Interface

DRAM

129

DRAM

XBOOLE-CUDA - Fast Calculations on the GPU

DRAM

DRAM

DRAM

GigaThread Engine

L2 Cache

Instruction Cache

CUDA Core Dispatch Port

Warp Scheduler

Warp Scheduler

Dispatch Unit

Dispatch Unit

Register File (32 768×32-bit)

Operand Collector

FP Unit

FP Unit

Result Queue

Floating Point Unit

Special Function Unit Load & Store Unit

ld/st ld/st ld/st ld/st ld/st ld/st ld/st ld/st ld/st ld/st ld/st ld/st ld/st / ld/st ld/st ld/st

SFU

SFU

SFU

SFU

Interconnect Network 64 KiB Shared Memory / L1 Cache Uniform Cache

Figure 2.2. Nvidia GPU Fermi architecture [240]; 1 KiB is 1024 bytes.

level parallelism, instruction level parallelism. Utilized methods for latency hiding and increasing the core occupancy are thread level parallelisms and instruction level parallelisms. The GPU-programmer has to manage many slow cores so that they do not obstruct each other, even though they have to share scarce resources. These different paradigms imply many challenges. The GPU-programmer is responsible for work balance, cache coherency and optimal utilization of memory hierarchies—e.g., coalesced memory access is crucial on a GPU.

130

Efficient Calculations

Block 1

Block 2

Shared Memory

Shared Memory

Register

Register

Register

Register

Thread 1

Thread 2

Thread 1

Thread 2

Local Memory

Local Memory

Local Memory

Local Memory

Global Memory, Surface Memory Host Constant Memory, Texture Memory

Figure 2.3. CUDA memory access layout.

Figure 2.3 illustrates the access layout of CUDA memory hierarchies. A block of threads is executed by a multiprocessor and the threads have access to different memory layers. Communication between the CPU and the GPU requires data transactions between the main memory and the GPU device memory over the PCIe bus. This data transfer is limited by the bandwidth of the PCIe bus of e.g., 8 GB/s for PCIe v2.x. Table 2.3. Types of GPU Memory

Memory

Position

Cached

Access

Scope

Register

On-chip

–

R/W

Thread

Shared

On-chip

–

R/W

Block

Local

Off-chip

Yes*

R/W

Thread

Global

Off-chip

Yes*

R/W

Global

Constant

Off-chip

Yes

R

Global

Texture

Off-chip

Yes

R

Global

Off-chip

Yes

R/W

Global

Surface

*) . . . with L1/L2 Caches since Fermi

Table 2.3 summarizes the properties of the different memory types on the GPU. The GPU itself has internal caches and memory layers

XBOOLE-CUDA - Fast Calculations on the GPU

CPU Core – Low Latency Processor T1 T2 T3 time

131

Legend T4 Processing

GPU Multiprocessor – High Throughput Processor Waiting

W4 W3 W2

Ready

W1 MP Utilization

Context switch

Figure 2.4. CPU and GPU: “Low Latency or High Throughput” [211].

designed by decreasing both the size and latency. Constant, texture, and surface memory have their own caches with optimized access routines. Global and local memory accesses can be cached since the Fermi architecture (compute capability ≥ 2.0). Figure 2.4 summarizes the different architectures of CPU and GPU. While the single CPU core tries to minimize the latency of instructions within the tasks Ti , the GPU multiprocessor aims for maximal throughput. A warp instruction may need more time due to slow core frequency, but the pipeline is fed with instructions from different warps Wj to always keep the processor busy.

2.1.4. Efficient GPU Programming Functions running on GPUs are called kernels. Since we are using CUDA for C, these functions are written in C with some CUDA specific extensions. Such a function is a single program which is executed on multiple data on different multiprocessors on the GPU, so SPMD applies here as well. Instructions are executed in SIMT fashion on a multiprocessor as explained above. The host (CPU) starts a kernel with a set of parameters including the number of threads per block and the number of blocks. The more multiprocessors a GPU has, the more blocks can be executed in parallel, see Figure 2.5.

132

Efficient Calculations

Global Scheduler - Blocks of a CUDA Program Block 1

Block 2

Block 3

Block 4

Block 5

Block 6

Block 7

Block 8

GPU 2×SM

GPU 4×SM

SM 1

SM 2

SM 1

SM 2

SM 3

SM 4

Block 1

Block 2

Block 1

Block 2

Block 3

Block 4

Block 3

Block 4

Block 5

Block 6

Block 7

Block 8

Block 5

Block 6

Block 7

Block 8

Figure 2.5. Scalable programming model in CUDA.

There are several issues which can have a significant impact on the performance of a kernel. For instance, a thread on a Fermi GPU can allocate at most 63 variables in 32-bit registers. A block contains at most 1024 threads. The Shared Memory and the L1 cache use the same memory banks and the total 64 KiB can be configured in such a way that the Shared Memory uses 16 KiB or 48 KiB, where L1 cache has the remaining size. A single multiprocessor can allocate at most 32 Threads/Warp × 48 Warps = 1536 Threads . An overview for CUDA-specific key numbers is given in the cheat sheet in [366]. The speedup of a GPU vs. CPU is defined by S=

tAcpu , tAgpu

where tAcpu contains the runtime of algorithm Acpu on the CPU and tAgpu is the runtime of algorithm Agpu on the GPU. After implementing all the kernels correctly, there is the question of how to make things faster. Optimizing kernels involves profiling to obtain performance indicators. However, while optimizing one of them, other performance indicators often become worse. An exploration in [355] shows that even the CUDA optimization strategies are just rules of thumb which do not fit in every case. The effort required for op-

XBOOLE-CUDA - Fast Calculations on the GPU

133

timization rises after each improvement, while the gain becomes less and less. XBOOLE-CUDA still has several potentials to be optimized. A kernel is ideally bound by the compute or by the memory peak performance of the GPU. If a kernel has low arithmetic intensity—which is the case for almost all XBOOLE-CUDA kernels—it is probably memory bound and the memory bandwidth should be optimized for a maximal throughput of data. Hence, a big issue will be the coalesced memory access and the utilization of registers and caches. Valuable hints regarding optimal algorithms and programs on Nvidia GPUs are given in [370]. Details of the Nvidia Fermi-GPU architecture used are described in [240] and information about memory and instruction latencies of these GPUs are given in [355].

2.1.5. Implementation XBOOLE-CUDA M. Werner implemented the library XBOOLE-CUDA which migrates XBOOLE and also utilizes multi-GPU platforms [367]. A wrapper decides, by the size of the given TVLs, whether the CPU is used for small TVLs or the GPU accelerates the calculation for large TVLs. Additional operations allow the programmer to customize such decisions. Implementations for the CPU and GPU are realized in XBOOLECUDA for all set operations, derivative operations, test operations, and the orthogonalization of TVLs (ORTH). In this way, all recent and future applications benefit from both the speedup of XBOOLECUDA and the simple implementation of Boolean algorithms on the high domain specific level. The migration from an XBOOLE program to an XBOOLE-CUDA program simply requires the replacement of the library. An overview of the data layout in XBOOLE is given, before the parallelization of the intersection algorithm can be discussed. A TVL object stores the Ternary Vectors (TVs) and metadata of a TVL. A TV consists of two integer vectors a and b, since two bits are needed to encode one ternary ' ( element, (see Table 2.1). If there = 4 (a, b) words and consequently are 128 variables, then ω = 128 32

134

Efficient Calculations

TYP = 2 · ω = 8 machine words are needed to store a single TV on a 32-bit system. A TVL is stored in the XBOOLE box system as a dynamically linked array, which contains TVs and pointers to the main memory addresses of the previous and next box. Hence, it is not useful to copy these linked boxes to the GPU. Moreover, the size of a box is not appropriate for efficient GPU processing due to the small and varying number of TVs in a box. Also the data alignment of a TV on the CPU side is not efficient on the GPU. Hence, the TVL has to be transformed by the CPU, before it is uploaded to the GPU. A TVL can require more than the available memory of a GPU. Dividing a TVL into so-called slices allows the computation of TVLs without any restriction of the number of the containing TVs. With a slice only a section of a TVL becomes transformed and uploaded to the GPU immediately. Slices reduce the parallelization effect and increases the overhead, so that the runtime usually becomes slower. The optimum of the maximal size of a slice depends on the free memory, allocation time, and maximum number of threads a GPU kernel can process [367, p. 77]. The examples are run with maximal 64 MiB per slice on 32-bit and 128 MiB on 64-bit systems; 1 mebibyte (MiB) is equal to 220 = 1, 048, 576 bytes. On a platform with n GPUs the bigger input TVL is divided into n parts, if both input TVLs are big enough. Such a part can be treated as a group of slices dedicated to a GPU device. The intersection is one of the easier operations to port; nevertheless, there are pitfalls that can decrease the speedup significantly. Algorithm 2.1 gives an insight into the steps of this operation. ac (w) is defined as the c-th a-word of the TV w, analogous to bc (w) for the c-th b-word. For the parallelization the “bigger” for-loop in lines 2 and 3 is chosen, otherwise there would be fewer threads having more sequential work, which yields a longer runtime. At first glance, the test for non-orthogonality in line 7 seems to be independent from other threads. However, this while-loop can be skipped early by some threads of a warp. This violates the SIMD pattern, i.e., threads of a warp have to wait, until the last one has finished the test. Making things even worse, further instructions in lines 14 to 18 are

XBOOLE-CUDA - Fast Calculations on the GPU

135

Algorithm 2.1. Intersection (ISC) on GPU Input: TVL T1 , TVL T2 , both in ODA-form Input: ω: number of (a, b) words of a ternary vector Input: n1 = NTV(T1 ), n2 = NTV(T2 ) Output: TVL R = T1 ∩ T2 in ODA-form 1: r ← 0 initial index value for result vectors 2: for i = 1, . . . , n1 do parallel-for if NTV(T1 ) ≥ NTV(T2 ) 3: for j = 1, . . . , n2 do parallel-for if NTV(T1 ) < NTV(T2 ) 4: c←ω 5: v ← T1,i 6: w ← T2,j 7: while (ac (v) ⊕ ac (w)) ∧ bc (v) ∧ bc (w) = 0 do 8: check non-orthogonality 9: c←c−1 10: if c = 0 then ternary vectors are not orthogonal 11: break 12: end if 13: end while 14: if c = 0 then in case of non-orthogonal ternary vectors 15: r ←r+1 index of the new result vector 16: for c = 1, . . . , ω do store intersection vector in R 17: ac (Rr ) ← ac (v) ∨ ac (w) 18: bc (Rr ) ← bc (v) ∨ bc (w) 19: end for 20: end if 21: end for 22: end for

executed only for those threads which passed the test. Those threads that failed the test have to wait again. If the latency is very high, e.g., in the case of the DIF-operation, the sorting of TVLs can enable better speedups. This will be a task for our future work. Algorithm 2.1 requires, in line 15, a single atomic operation on the GPU. Hence, if there are many output vectors to be computed then the GPU badly performs due to the serialization by atomic operations. A significant pitfall is the data alignment of the (a, b) words on the GPU. XBOOLE stores the (a, b) words on the CPU one after another, see Figure 2.6 (a).

136

Efficient Calculations

TV 1 a0

a1

a2

b1

Thread 1

(a) a0

(b)

b0

b0

a2

Thread 1

TV 3

TV 2 b2

a3

a4

b3

Thread 2 b2

a4

b4

a1

Thread 2

b4

a5

b5

Thread 3 b1

a3

b3

a5

b5

Thread 3

Figure 2.6. Memory alignment of ternary vector words—threads are accessing first slice: (a) serial alignment, (b) optimized alignment for the GPU.

While this is good for sequential work, it fails for SIMD on the GPU. One warp instruction can be executed in a single run if the requested 32-bit words are coalesced. Otherwise, several transactions are needed for a single warp instruction, which can degrade the overall performance by almost this factor. Different alignments of (a, b) words were examined for the intersection algorithm on a GTX 470 in [367, p. 62 ff.]. The best one is illustrated in Figure 2.6 (b); using this alignment the GPU runtime, compared to the sequential alignment, could be reduced by a factor of 2.5 [367, p. 64, p. 77]. Since the second input TVL is accessed sequentially by a warp, the realignment of (a, b) words only suits the first input TVL. Results from the profiler indicate further optimizations [367]. The speedup achieved by XBOOLE-CUDA strongly depends on the executed operation and the size of the data. In the best case a speedup of more than 13·103 was realized using a GPU-optimized brute force algorithm compared to a standard algorithm on the CPU. Table 2.4 summarizes the arithmetic average (∅), the standard de-

XBOOLE-CUDA - Fast Calculations on the GPU

137

Table 2.4. XBOOLE-CUDA 32-bit – Speedup with GTX 470 [367]

operation CPL() ISC() UNI() DIF() SYD() CSD() DERK() MINK() MAXK() DERV() MINV() MAXV() ORTH()

(xm3)

∅

σ

5 .04 × 54.90 × 34.25 × 19.63 × 17.61 × 197.58 ×

minimum

maximum

2.21 × 7.48 × 15.57 × 15.03 × 5.40 × 287.55 ×

2 .11 × 43.54 × 13.83 × 2.22 × 11.02 × 3.47 ×

8 .11 × 67.32 × 61.44 × 43.15 × 24.74 × 843.54 ×

29.95 × 68.78 × 56.24 × 18.84 × 77.42 × 43.19 ×

10.26 × 30.31 × 16.80 × 2.63 × 22.93 × 6.36 ×

19.90 × 41.06 × 42.25 × 16.24 × 55.63 × 32.69 ×

52.63 × 136.75 × 95.97 × 24.01 × 115.05 × 52.09 ×

448.48 ×

728 .29 ×

21.20 ×

2196.97 ×

viation (σ), as well as the measured minimal and maximal speedup of XBOOLE-CUDA-operations on a GPU GTX 470 in comparison to the same XBOOLE-operations running on a CPU Intel i7 3.06 GHz. Italic values indicate the weakest value in each column.

2.1.6. Evaluation of XBOOLE-CUDA The aim of the experiment in this subsection is the comparison of the time needed by the CPU or GPU to solve a given Boolean problem. Knowing that the beneficial utilization of the GPU requires large problems we selected the Bishop Problem [286] on a chessboard sized 11 × 10 which depends on 110 Boolean variables. The Boolean space B110 contains 2110 = 1.298 · 1033 binary vectors. The Bishop Problem is defined as follows: Let a chessboard be given with m rows and n columns. Arrange a number of Bishops on the board, not attacking each other, but attacking all empty fields on the board.

138

Efficient Calculations

0Z0Z Z0Z0 2 0ZBZ 1 Z0Z0 4 3

1

2

3

4

Figure 2.7. The Bishop on field (2,3) attacks, on a 4 × 4 chessboard, five fields marked by circles.

Each of the 110 Boolean variables carries the information: 1 if a Bishop is on the field (i, j) , xi,j = 0 otherwise .

(2.5)

We explain the condition of the Bishop Problem using the simple example of Figure 2.7. A bishop on the field (2, 3) attacks the fields (1, 2) and (3, 4) on one diagonal as well as the fields (4, 1), (3, 2), and (1, 4) on the second diagonal which can be expressed by the following equation: C2,3 = x2,3 ∧ x1,2 ∧ x3,4 ∧ x4,1 ∧ x3,2 ∧ x1,4 = 1 .

(2.6)

A maximal solution requires one Bishop on each diagonal. For the marked diagonal in Figure 2.7 from the field (1, 2) to the field (3,4) we get the requirement rule Ri = C1,2 ∨ C2,3 ∨ C3,4 = 1 .

(2.7)

Such requirement rules Ri must hold for the selected chessboard sized 11×10 for all 36 diagonals. Hence, all solutions of the Bishop Problem belong to the solution set of the Boolean equation:

2∗(m+n−3)

Ri = 1 .

(2.8)

i=1

There are 36 functions Ri in the special case of a chessboard sized 11×10. These functions have a disjunctive form. Using the XBOOLE operation ORTH we get 36 TVLs ODA(Ri ). All solutions from the

XBOOLE-CUDA - Fast Calculations on the GPU

139

Bishop Problem result from a sequence of the 35 XBOOLE operations intersection (ISC) according to (2.8) for m = 11 and n = 10. There are 66,049 solutions of the 11 × 10 Bishop Problem; 65,536 solutions contain 18 Bishops on the board, 512 solutions contain 19, and there is a single solution of 20 Bishops. Figure 2.8 shows one solution for each of these numbers of Bishops. More details about Bishop Problems for other sizes of chessboards are given in [327].

ZBA0ZBZ0A0 0Z0Z0Z0ZBZ 9 ZBZ0Z0Z0A0 8 0Z0Z0Z0Z0Z 7 A0Z0Z0Z0Z0 6 0Z0Z0Z0Z0A 5 ABZ0Z0Z0ZB 4 0Z0Z0Z0Z0Z 3 Z0Z0Z0Z0A0 2 0Z0Z0Z0Z0Z 1 ZBABZ0ZBA0

ZBABZBZBZB 0Z0Z0Z0Z0A 9 Z0Z0Z0Z0Z0 8 0Z0Z0Z0Z0A 7 A0Z0Z0Z0Z0 6 0Z0Z0Z0Z0A 5 A0Z0Z0Z0Z0 4 0Z0Z0Z0Z0Z 3 A0Z0Z0Z0Z0 2 0Z0Z0Z0A0A 1 ZBZBZBZBZB

11

11

10

10

a

b

c

d

e

f

g

h

i

j

a

b

(a) 18 bishops

c

d

e

f

g

h

i

j

(b) 19 bishops

ABABABABAB 0Z0Z0Z0Z0Z 9 Z0Z0Z0Z0Z0 8 0Z0Z0Z0Z0Z 7 Z0Z0Z0Z0Z0 6 0Z0Z0Z0Z0Z 5 Z0Z0Z0Z0Z0 4 0Z0Z0Z0Z0Z 3 Z0Z0Z0Z0Z0 2 0Z0Z0Z0Z0Z 1 ABABABABAB 11 10

a

b

c

d

e

f

g

h

i

j

(c) 20 bishops Figure 2.8. Selected solutions of the Bishop-Problem on a chessboard sized 11 × 10: (a) one of 65 536 positionings with 18 Bishops; (b) one of 512 positionings with 19 Bishops; (c) the single positioning with 20 Bishops.

140

Efficient Calculations

Depending on the associated diagonal line, the TVLs of the functions Ri in (2.8) contain only two to ten TVs. The intersection operations of (2.8) yield larger TVLs. We calculated these TVLs using the XBOOLE operations intersection (ISC) for the TVLs Ri in the given order on the CPU until a fixed number of TVs occurred in the intermediate solution. We used the values 10k , k = 2, . . . , 7, as limits. A larger value of this limit increases the number of TVs in the TVLs for the remaining ISC operations, but decreases the number of required ISC operations that are executed for comparison either on the CPU or on the GPU. Table 2.5 summarizes the experimental results. The column nres contains the numbers of TVs calculated by the ISC operation between the TVL f1 and the TVL f2 . The columns nf1 and nf2 contain the associated numbers of TVs. It can be seen that the result of the current ISC operation fres is used as the TVL f1 in the subsequent ISC operation. tgpu holds the GPU time for executing the operation gpu ISC() in XBOOLE-CUDA, analogous to the CPU time tcpu for cpu ISC(). Regardless of the chosen value of the limit the same number of 66,049 solutions is calculated. This number is very small in comparison to all 2110 = 1.298 · 1033 possible assignments of Bishop pieces to the 11×10 chessboard. The ratio of the number of solutions constitutes only 5.09 · 10−29 . Of course, limit = 102 is the best configuration due to the smaller complexity. However, there are XBOOLE programs with much higher complexity and by using limit = 107 the performance of such a program can be demonstrated. The ISC operation between TVLs of 19,605,640 and 1,902,160 TVs needs more than 2 days and 19 hours on the CPU. Using the GPU it is finished after 18 minutes (see Table 2.5). The ISC operation can have opposite impacts. The result of the intersection of two different sets of binary vectors that depend on the same variables will be smaller than each of these sets. Sets of TVs which depend on the same variables behave similarly, but decompressing TVs can slightly enlarge the number of result vectors. However, the secondary effect of the intersection appears when the used TVLs depend on disjoint sets of variables; in this case, the cross product of the given TVLs must be calculated so that each TV of the first TVL

XBOOLE-CUDA - Fast Calculations on the GPU

141

Table 2.5. Intersections of TVLs to solve the 11 × 10 Bishop Problem: X5650 – XBOOLE vs. C2070 – XBOOLE-CUDA 32-bit limit

n f1

n f2

nres

nf1 ·nf2

102

190

198

6399

3.8 · 104

time in ms ratio t tgpu (ngpu ) tcpu tcpu gpu

Total 103

0.8

26.8 (1)

108 . Table 2.6 gives more details about the GPU runtime of Table 2.5. At smaller runtime (tgpu < 3 s) the work is mostly done by the CPU. The average total time per GPU of executed kernels tk is much lower than the total runtime. The time used for initialization, data transfer and transformation becomes visible. When the input TVL is divided into slices, the number of executed kernels nk increases. This is also the case, when there are too many output vectors and the output TVL must be partitioned as well. The two timelines in Figure 2.9 are captured from the Nvidia profiler. They show the activities of the first and second GPU. Each of the three rows within a device context shows, top-down, the time to copy data from the host to the device, the time to copy data from the device to the host, and the compute time. Figure 2.9 (a) presents the timeline for the configuration limit = 103

144

Efficient Calculations

with a duration of about 10 s. It can be seen that time intervals between working kernels are not completely filled with data transfers between the host and the device. The preparation of the data on the CPU to speedup the next running kernel on the GPU requires some of this time. Figure 2.9 (b) uses a much larger scale to show the same information for the largest TVLs for the value limit = 107 with a duration of about 18 min. The almost continuous bars in Figure 2.9 (b) depict that both GPUs are using the most time to calculate the solution. Table 2.7 confirms by numbers from the profiler that the utilization of the GPU kernels grows for larger TVLs. While for limit = 102 the GPUs are busy only 8 % of the time executing gpu ISC, utilization rises to 99 % at limit = 107 . Each invocation of the gpu ISC operation causes memory allocations on the GPU that consume some time. The shorter the runtime and the more invocations of gpu ISC, the greater the influence of these allocations. Table 2.8 contains the runtime measured with the 64-bit compiled application instead of the 32-bit as in Table 2.5. Table 2.9 compares the measurements of these two tables. Of course, the CPU performs much better with the 64-bit compiled application, whereas the GPU as 32-bit architecture becomes worse. The PCIe 2.0 bus of the computer used for the experiments has a theoretically maximal bandwidth of 5 GB/s. The average bandwidth measured in Table 2.7 achieved approximately 80 % of that maximum. However, for bigger data sets these memcpy operations have no impact on the runtime, since the bottleneck is the data processing on the GPU itself. The improvements of the runtime with the 64-bit application are quite common for compute intensive tasks. A 64-bit compiled application has access to more general purpose registers than a 32-bit application on a 64-bit CPU. Running a 32-bit application in a 64-bit Windows also involves the WoW64 (Windows-on-Windows 64-bit) emulation layer which has a small penalty on the runtime as well [271]. On the GPU, there are only 32-bit registers. A 64-bit integer must be

XBOOLE-CUDA - Fast Calculations on the GPU

145

Table 2.7. Profiler results on C2070 GPUs w.r.t. Table 2.5 Limit level Total time gpu ISC() . . . invocations . . . kernels part Kernel time . . . invocations GPU Allocations Memcpy HtoD . . . invocations . . . size . . . bandwidth Memcpy DtoH . . . invocations . . . size . . . bandwidth Kernel time . . . invocations GPU Allocations Memcpy HtoD . . . invocations . . . size . . . bandwidth Memcpy DtoH . . . invocations . . . size . . . bandwidth

102 9969 ms 7747 ms 10 8%

103 10 127 ms 7929 ms 6 32 %

105 63 893 ms 61 977 ms 3 88 %

Tesla C2070 – GPU device #1 1001 ms 3737 ms 59 754 ms 61 43 27 349 ms 133 ms 69 ms 261 ms 185 ms 114 ms 79 50 29 1154 MB 833 MB 488 MB 4430 MB/s 4400 MB/s 4180 MB/s 338 ms 244 ms 173 ms 38 25 18 1311 MB 1002 MB 726 MB 3880 MB/s 4020 MB/s 4090 MB/s Tesla C2070 – GPU device #2 831 ms 3595 ms 56 994 ms 52 39 18 388 ms 82 ms 43 ms 246 ms 180 ms 115 ms 69 48 27 1096 MB 825 MB 475 MB 4460 MB/s 4470 MB/s 4040 MB/s 252 ms 164 ms 58 ms 29 23 9 963 MB 656 MB 190 MB 3740 MB/s 3910 MB/s 3200 MB/s

107 1 059 950 ms 1 052 439 ms 1 99 % 1 049 029 ms 12 19 ms 171 ms 18 648 MB 3700 MB/s 0 ms 6 1 MB 3230 MB/s 1 051 208 ms 12 18 ms 158 ms 18 648 MB 4000 MB/s 0 ms 6 1 MB 3690 MB/s

represented by two 32-bit integers and some additional instructions. Hence, the kernel runtime becomes worse. In Table 2.9, tgpu and tgpu,64 measure the time for the whole ISC operation including CPU processing, so the negative impact of a 32-bit GPU becomes visible for limit ≥ 104 . This issue can be solved by adapting the transformation algorithms to support 64-bit/32-bit conversion, so the GPU always receives 32-bit

146

Efficient Calculations

Table 2.8. Intersections of TVLs to solve the 11 × 10 Bishop Problem: X5650 – XBOOLE vs. C2070 – XBOOLE-CUDA 64-bit limit

n f1

n f2

nres

190 198 6399 150 88 368 150 1 182 923 198 9 535 372 190 15 720 600 520 20 729 155 296 19 582 122 130 7 989 539 240 653 235 12 – – 1620 3300 256 162 3080 9 535 372 1235 17 509 360 2450 21 104 003 2730 5 864 959 324 – – 12 640 14 520 2 662 184 14 094 19 187 800 18 574 12 445 566 5184 – – 256 162 186 204 17 509 360 112 448 12 445 566 5184 – – 1 182 923 1 240 264 19 582 122 51 840 – – 19 605 640 1 902 160 – –

6399 88 368 1 182 923 9 535 372 15 720 600 20 729 155 19 582 122 7 989 539 653 235 66 049 – 256 162 9 535 372 17 509 360 21 104 003 5 864 959 66 049 – 2 662 184 19 187 800 12 445 566 66 049 – 17 509 360 12 445 566 66 049 – 19 582 122 66 049 – 66 049 –

nf1 ·nf2

time in ms ratio t tcpu tgpu (ngpu ) tcpu gpu

102

Total 103

Total 104

Total 105

Total 106 Total 107 Total

3.8 · 104 0.3 65.8 9.6 · 105 5.4 71.3 7 1.3 · 10 81.4 154.6 2.3 · 108 1247.5 720.8 1.8 · 109 5098.8 1196.7 8.2 · 109 14 905.6 1703.4 6.1 · 109 11 911.3 1368.0 2.5 · 109 5211.1 795.5 1.9 · 109 3638.1 365.0 7.8 · 106 23.0 72.9 – 43 398.0 7829.4 5.3 · 106 30.2 93.7 7.9 · 108 2376.4 728.2 10 1.2 · 10 27 527.8 1331.0 4.3 · 1010 87 411.2 2549.2 5.8 · 1010 110 876.7 2663.9 1.9 · 109 3871.5 317.7 – 233 392.3 8871.5 1.8 · 108 763.7 262.8 3.8 · 1010 91 100.1 2012.2 11 3.6 · 10 822 211.6 13 007.8 121 644.7 2548.9 6.5 · 1010 – 1 036 976.0 18 815.4 4.8 · 1010 110 461.1 4331.7 2.0 · 1012 4 433 323.0 70 476.5 6.5 · 1010 121 746.4 2559.6 – 4 666 744.5 78 443.8 1.5 · 1012 6 992 835.0 108 592.8 1.0 · 1012 2 266 480.5 35 975.1 – 9 260 648.0 145 827.0 3.7 · 1013 214 929 488.0 1 345 034.0 – 214 934 240.0 1 349 975.1

(1) (1) (1) (1) (2) (2) (2) (2) (2) (1) (1) (1) (2) (2) (2) (2) (1) (2) (2) (2) (1) (2) (2) (1) (2) (2)

0 in the expression. Given the ESOPs minimized with EXORCISM, the time required by the proposed procedure (considering all the 65,536 functions) is about 15 seconds. The experimental results show that over 10% of the functions benefits of our strategy. For these functions the cost is reduced by about 30%. Finally, we can note that while the total number of gates increases, the total cost of the final circuits decreases. A significant subset of the results of the second set of experiments is shown in Table 3.13. We have divided the set of variables into two parts where, for a function with n variables, P1 contains the first n/2! variables and P2 contains all the others. The first column in Table 3.13 reports the name of the benchmark considered. The following two columns provide the size, i.e., the total number of products that contain at least a variable of P1 and at least

Minimization of ESOP Forms for Secure Computation

253

Table 3.13. Comparison between ESOP and SC-ESOP for some benchmark functions Benchmark addm4 adr4 al2 alcom apla b10 b2 dekoder exp exps f51m fout gary ibm in0 in1 lin log8mod luc m181 m1 m2 m3 m4 mainpla max1024 max128 p82 pdc pope prom2 radd test1 test2 test4 tial ti tms

ESOP

SC-ESOP

gain

133 38 116 86 142 198 707 23 300 1850 23 283 225 175 225 707 1615 37 174 22 55 141 184 279 3858 268 248 94 727 888 4180 39 548 43438 5265 520 500 183

126 38 110 75 139 194 694 18 292 1773 23 218 220 175 220 694 1488 37 167 22 53 135 175 270 3858 266 240 90 719 728 4050 37 547 43420 4954 518 471 173

7 0 6 11 3 4 13 5 8 77 0 65 5 0 5 13 127 0 7 0 2 6 9 9 0 2 8 4 8 160 130 2 1 18 311 2 29 10

254

Several Aspects of Security

a variable of P2 , counted in the original ESOP form and in the new SC-ESOP network, respectively. Then, the last column reports the gain of the SC-ESOP network. The SC-ESOP network can be significantly smaller than the original ESOP form; see, for example, the benchmark functions fout and dekoder that have about 22% of gain. The experimental results show that the SC-ESOP synthesis can then be used as a post-processing phase, after ESOP minimization, for constructing a compact network for two party secure computation.

Determination of Almost Optimal Check Bits

255

3.4. Almost Optimal Check Bits for an Arbitrary Error Model ¨ nther Nieß Gu

Thomas Kern

¨ ssel Michael Go

3.4.1. Error Detection Codes Codes are used for error detection and correction. In this section we will only be dealing with error detection. It is well-known that different codes detect different errors with different probabilities. Thus a parity code with a single parity check bit cP determined from k data bits u1 , u2 , . . . , uk as c P = u1 ⊕ u2 ⊕ · · · ⊕ uk detects all 1-bit, 3-bit, 5-bit, . . . , errors but no 2-bit, 4-bit, 6-bit, . . . , errors. On the other hand a single check bit cSP defined by a perfect nonlinear function cSP = u1 u2 ⊕ u3 u4 ⊕ · · · ⊕ uk−1 uk determines almost all error vectors with the same probability [133, 158, 362]. k

There are 22 different Boolean functions f1 , f2 , . . . , f22k of k data bits to determine a single check bit ci from k data bits k

ci = fi (u1 , u2 , . . . , uk ), i ∈ {1, 2, . . . , 22 } to optimize error detection for a given error model. In this section we show how optimal check bits can be determined for an arbitrary given error model. Errors are modeled by an error graph where vertices of the error graph are n-ary binary words. Two vertices v1 and v2 are connected by an edge from v1 to v2 if an error of the error model changes the bits of v1 into the bits of v2 . This simple

256

Several Aspects of Security

error model allows us to adequately express all combinatorial errors, caused by different fault mechanisms. In general it is also possible to assign probabilities of the corresponding errors to the edges. In this section we assume that edges are not weighted. An optimal check bit for a given error model is determined by an algorithm based on graph theory minimizing the number of edges of the error graph. Since the edges correspond to the different errors of the error model the number of remaining errors after determining the optimal check bits is minimal. It also shows how an additional optimal check bit can be added to the check bits of an already defined code. The typical error models of arbitrary errors, unidirectional errors, 2-bit errors and adjacent 2-bit errors are considered [120, 174]. Unexpectedly, as well as well-known codes, e.g., parity prediction, and Bose-Lin codes, a variety of other codes with the same error detection capability exist. It can be demonstrated that the proposed approach is an effective method for the determination of optimal check bits for given error models. This section is organized as follows. The error model is introduced in Subsection 3.4.2 and shows how different types of faults can be expressed by the error graph. Subsection 3.4.3 demonstrates how the error model is used to define an error detection code or to assign additional check bits to a given code. To also be able to handle larger numbers of data bits we introduce a heuristic for the determination of an additional check bit. An example application of the error graph and the heuristic to assign additional check bits is presented in subsections 3.4.4 and 3.4.5 with double-bit errors and byte parity codes.

3.4.2. Error Model The error model is described by an error graph. The vertices of the error graph are n-ary binary words. If an error exists altering a word v into a word v the vertex v is connected by a directed edge with the vertex v from v to v . If there also exists an error changing v into v the vertices v and v are connected by an undirected edge. If the

Determination of Almost Optimal Check Bits

257

0,0,0,0

0,0,1,1

0,1,0,1

0,1,1,0

1,0,0,1

1,0,1,0

1,1,0,0

1,1,1,1

0,0,0,1

1,1,0,1

0,1,1,1

0,1,0,0

0,0,1,0

1,1,1,0

1,0,1,1

1,0,0,0

Figure 3.19. Error graph for block codes of length 4 with unidirectional 2-bit errors.

error appears with the probability p, the edge from v to v has the weight p. We assume in this section that the edges are not weighted. Based on the assumption that for every error e which changes the word v into v an error exists which transforms v into v; the graph is undirectional. An error graph for block length 4 and unidirectional 2-bit errors is given in Figure 3.19. An unidirectional error is an error where only 0 values are corrupted into 1s or only 1s are corrupted into 0s. For example, the two unidirectional errors which modify (0, 1, 0, 0) into (1, 1, 1, 0) and also (1, 1, 1, 0) into (0, 1, 0, 0) are expressed in Figure 3.19 by the edge connecting the vertices (0, 1, 0, 0) and (1, 1, 1, 0). In electrical circuits there are faults changing the correct behavior of the circuit only if some conditions are fulfilled. The single stuckat fault model is the most frequently used fault model of integrated circuits. For example a stuck-at-0 fault on a line only changes the behavior if the line should carry a 1. It is well-known that a single

258

Several Aspects of Security

1,1,0

1,0,0

0,1,0

1,1,1

0,0,0

1,0,1

0,1,1

0,0,1

Figure 3.20. Error graph for two stuck-at-0 faults.

stuck-at fault leads to a functional error in only half of the possible configurations. But if there are two stuck-at faults then only in 25% of the possible inputs will these two lines be erroneous, in 50% there will be a single-bit error, and in 25% the fault won’t lead to a functional error. Example 3.33. Stuck-at faults can easily be modeled by an error graph as explained in Figure 3.20. We consider three bits u1 , u2 , u3 . The error graph has the 8 vertices (0, 0, 0), (0, 0, 1), . . . , (1, 1, 1). If we assume that u1 and u2 are stuck-at-0 we obtain the error graph of Figure 3.20. The vertices (1, 1, 0), (1, 0, 0), (0, 1, 0) are erroneously changed into the word (0, 0, 0). Also the data words (1, 1, 1), (1, 0, 1), (0, 1, 1) are changed into (0, 0, 1). Therefore, the vertices (1, 1, 0), (1, 0, 0), (0, 1, 0) are connected by an edge to the vertex (0, 0, 0), and the vertices (1, 1, 1), (1, 0, 1), (0, 1, 1) are connected by an edge with the vertex (0, 0, 1) in the error graph. As another example we consider a bridging fault. Example 3.34. In the simplest case of a non-resistive bridging fault a resistance between the connected lines is neglected. Depending on the circumstances a wired OR or a wired AND may result. The circuit shown in Figure 3.21 performs, as an example, y = x3 in the Galois

Determination of Almost Optimal Check Bits

259

x1

y1 1 2

x2

y2

y3

x3

Figure 3.21. Non-resistive bridging fault example. 0,0,0

0,0,1

1,0,0

1,0,1

0,1,0

0,1,1

1,1,0

1,1,1

Figure 3.22. Non-resistive bridging fault error graph.

field GF(23 ) y1 = (x2 ∨ x3 ) ⊕ x1 , y2 = (x1 ∧ x2 ) ∨ (x1 ∧ x3 ) , y3 = (x1 ∧ x2 ) ⊕ x3 . Assuming a bridging fault between the lines 1 and 2 and that a bridging fault is functionally described by a wired OR, the resulting erroneous functions are y1 = (x2 ∨ x3 ) ⊕ x1 , y2 = x2 ∨ x3 , y3 = (x1 ∧ x2 ) ⊕ x3 . So only the inputs (x1 , x2 , x3 ) = (1, 1, 0) and (0, 0, 1) result in erroneous outputs (y1 , y2 , y3 ) = (0, 1, 1) and (1, 1, 1). The error graph of the bridging fault is shown in Figure 3.22. The vertices of the error graph are outputs of the circuit shown in Figure

260

Several Aspects of Security

3.21. The fault free outputs (0, 0, 1) and (1, 0, 1) of the circuit are modified into the faulty outputs (0, 1, 1) and (1, 1, 1) by the considered bridging fault. Other outputs aren’t affected and have therefore no connecting edge.

3.4.3. Determination of Additional Check Bits In this subsection we describe how additional check bits for error detecting block codes can be determined. We consider two cases. 1. In the first case we determine m optimal check bits c1 , c2 , . . . , cm for k data bits u1 , u2 , . . . , uk . Thereby, we assume that there are 2k different k-tupels of data bits. When the check bits c1 , c2 , . . . , cm are determined by the k-ary Boolean functions f1 , f2 , . . . , fm from the data bits u1 , u2 , . . . , uk as c1 = f1 (u1 , u2 , . . . , uk ) , c2 = f2 (u1 , u2 , . . . , uk ) , .. . cm = fm (u1 , u2 , . . . , uk ) the data bits and the corresponding check bits u1 , u2 , . . . , uk , c1 , c2 , . . . , cm are forming a code word of the determined code C, which is optimal for error detection of errors modeled by the error graph. 2. In the second case we assume that the bits (v1 , v2 , . . . , vn ) are to be protected by additional check bits. We assume the bits (v1 , v2 , . . . , vn ) are already code words of an error detecting or error correcting code Cout . In this case we add optimal additional bits c1 , c2 , . . . , cm to improve the error detection capability of the code Cout . The check bits are determined as c1 = g1 (v1 , v2 , . . . , vn ) , c2 = g2 (v1 , v2 , . . . , vn ) , .. . cm = gm (v1 , v2 , . . . , vn )

Determination of Almost Optimal Check Bits

261

for (v1 , v2 , . . . , vn ) ∈ Cout . The code Cout can be considered as an outer code and when the check bits c1 , c2 , . . . , cm are determined by the Boolean functions, the bits (v1 , v2 , . . . , vn , c1 , c2 , . . . , cm ) form a codeword of the determined inner code Cin . Now we will consider the first case and assume that only a single check bit will be determined. The determination of more than one check bit is based on the same approach and will not be described in this section. For a given data word (u1 , u2 , . . . , uk ) there are two vertices (u1 , u2 , . . . , uk , c1 ) and (u1 , u2 , . . . , uk , c1 ) with the same data word and with different values of the check bit in the initial error graph. To determine the code C one pair of the vertices (u1 , u2 , . . . , uk , 1) or (u1 , u2 , . . . , uk , 0) has to be removed from the initial error graph for every data word (u1 , u2 , . . . , uk ) ∈ {0, 1}k . If the vertex (u1 , u2 , . . . , uk , 1) is removed from the error graph, we have 0 = f1 (u1 , u2 , . . . , uk ) and if (u1 , u2 , . . . , uk , 0) is removed 1 = f1 (u1 , u2 , . . . , uk ) holds. If the vertices (u1 , u2 , . . . , uk , 0) and (u1 , u2 , . . . , uk , 1) are complementary with respect to the check bit these vertices will be called a pair of complementary vertices. To determine the Boolean function f1 one of the vertices of every pair k of complementary vertices has to be removed. This can be done in 22 different ways. For small values of k, k ≤ 6, it is possible to consider all presentations and to determine the best codes for the considered examples. For larger values of k a heuristic approach was developed. At every step of the algorithm a vertex of a pair of complementary vertices is removed from the error graph. It is comprehensive in removing a vertex with a maximal number of edges connected to this vertex. Thereby, we also take into account what effect the removal of the complementary vertex has and we define a rank of a vertex v = (u1 , u2 , . . . , uk , c) as rank(v) = |edges(u1 , u2 , . . . , uk , c)| − |edges(u1 , . . . , uk , c)|.

262

Several Aspects of Security

Thereby, |edges(v)| is the number of incoming and outgoing edges of the vertex v in the actual error graph. In every step the algorithm deletes a vertex with a maximal rank and a new codeword of code C is determined; after 2k steps code C is complete. Example 3.35. The vertex (1, 0, 0, 1) in Figure 3.19 has rank 1 since the vertex (1, 0, 0, 1) is connected by two edges with the two vertices (0, 0, 0, 0) and (1, 1, 1, 1) and its corresponding vertex (1, 0, 0, 1) = (1, 0, 0, 0) is connected to the three vertices (1, 1, 1, 0), (1, 0, 1, 1) and (1, 1, 0, 1) so according to the definition of the rank we have rank(1, 0, 0, 1) = |edges(1, 0, 0, 0)| − |edges(1, 0, 0, 1)| = 1. On the other hand the vertex (0, 0, 0, 1) has rank 3 because rank((0, 0, 0, 1) = |edges(0, 0, 0, 0)| − |edges(0, 0, 0, 1)| = 3. So assigning the codeword (0, 0, 0, 1) has a higher priority and the pair of vertices (0, 0, 0, 1), (0, 0, 0, 0) has more impact than the pair of vertices (1, 0, 0, 1), (1, 0, 0, 0). Figure 3.23 illustrates the process of assigning an additional check bit with the proposed heuristic. First the algorithm selects a pair of vertices (v1 , v2 ) where the vertex v1 has the highest rank and v2 is the complementary vertex representing the opposed check bit of v1 . After removing all edges which are connected with v2 the vertex v2 can be removed from the graph. If the check bit function isn’t completely defined the algorithm continues with the next vertex for removal. The algorithm repeats the above steps until the code is defined. Example 3.36. We apply the flow chart of Figure 3.23 to the error graph of Figure 3.19. The vertices (0, 0, 0, 1) and (1, 1, 1, 0) have the highest rank 3. Hence, these vertices are selected for codewords. After removing the vertices (0, 0, 0, 0) and (1, 1, 1, 1) and the connecting edges, the resulting graph of Figure 3.19 shows that the vertices (0, 0, 1, 1), (0, 1, 0, 1), (0, 1, 1, 0), (1, 0, 0, 1), (1, 0, 1, 0) and (1, 1, 0, 0) have no connecting edge. All other vertices have three connecting edges. Hence, the vertices (0, 0, 1, 1), (0, 1, 0, 1), (0, 1, 1, 0), (1, 0, 0, 1), (1, 0, 1, 0) and (1, 1, 0, 0) are selected step by step as codewords and all other vertices and edges are removed.

Determination of Almost Optimal Check Bits

263

start find vertex v 1 with highest rank and pair of vertices (v 1 , v 2 ) with alternative check bits remove all edges which are connected to v 2 remove the vertex v 2 from the graph

graph has 2k vertices?

no

yes print found code with all vertices from the graph stop

Figure 3.23. Flow chart of the heuristic.

The second case is similar except for the difference that only errors are considered which change codewords in codewords of the inner code Cin . Errors which are detected by the outer code Cout can be removed from the initial error graph by removing vertices (v1 , v2 , . . . , vn , c) where the first n bits don’t reflect the outer code v ∈ Cout .

3.4.4. Detection of Double-Bit Errors In this subsection we determine an optimal check bit for double-bit error detection. First the quality of the heuristic is evaluated by comparing the codes found by the heuristic with a single linear check bit for the detection of double-bit errors. Then in Subsection 3.4.5 we determine an additional check bit for byte parity codes. For these

264

Several Aspects of Security

codes every byte is checked by a parity bit. To improve the double-bit error detection a single additional check, determined by the method described in Subsection 3.4.3 is added. To compare the heuristic with another approach, we want to analyze a linear check bit. In the case of a separable linear code every codeword comprises of the data bits u1 , u2 , . . . , uk and one check bit c. The check bit is derived by a function c(u) = a1 u1 ⊕ a2 u2 ⊕ . . . ⊕ ak uk where a1 , a2 , . . ., ak are Boolean constants. They determine whether the data bits u1 , u2 , . . . , uk are included in the calculation of the XOR sum. To obtain the check bit with the best error detection probability for double-bit errors we have to find the optimal set of constants a1 , a2 , . . ., ak . Assuming the errors are equally distributed we only have to determine the number of data bits to be included in the XOR sum. Which data bit included determines only the actual detected error vectors and not the error detection probability. The simplified function to calculate the check bit is cl (u) = u1 ⊕ u2 ⊕ · · · ⊕ ul and the check doesn’t detect double-bit errors which are only contained in the bits u1 , u2 , . . . , ul , c or only in the bits ul+1 , ul+2 , . . . , uk . The number of undetected errors is accordingly l+1 (k + 1) − (l + 1) f (l, k) = + 2 2 if we assume k is arbitrary but fixed the function has a minimum for l=

k−1 2

and for even integers it can be shown that f we have a linear function

k

2,k

=f

k−2 2

, k . So

c(u) = u1 ⊕ u2 ⊕ · · · ⊕ u k−1 2

which defines a check bit for each number of data bits k and this function has the best probability to detect double-bit errors within the class of linear functions.

Determination of Almost Optimal Check Bits

0,0,0,0,0

1,0,0,0,1

0,0,0,1,0

265

0,1,0,0,0

0,1,1,0,0

1,1,0,1,1

0,1,1,1,0

0,0,1,0,0

1,1,1,0,1

0,1,0,1,0

1,1,1,1,1

1,0,1,0,1

1,0,1,1,1

0,0,1,1,0

1,0,0,1,1

1,1,0,0,1

Figure 3.24. Error graph of not detected errors of the code with 4 data bits and a linear check bit. 0,0,0,0,0

1,0,0,0,1

0,1,1,0,1

1,0,1,0,1

0,0,1,1,0

1,1,0,1,1

1,1,1,0,0

1,0,1,1,0

0,0,1,0,1

1,1,0,0,0

0,1,0,0,0

0,0,0,1,0

0,1,1,1,1

1,1,1,1,0

0,1,0,1,1

1,0,0,1,1

Figure 3.25. Error graph of not detected errors of the code with 4 data bits and a non-linear check bit.

In Figure 3.24 the error graph of undetected 2-bit errors with l = 4−1 2 = 1 and the linear check bit function c(u1 , u2 , u3 , u4 ) = u1 is illustrated. The code has an error detection probability of q = 0, 6 for all 2-bit errors. The error graph in Figure 3.25 shows all undetected 2-bit errors for the check bit ch (u1 , u2 , u3 , u4 ) = u1 ⊕ u3 ⊕ u1 u2 ⊕ u1 u3 ⊕ u2 u4 ⊕ u3 u4 . This check bit was determined by the presented graph heuristic. The graph has the same number of edges as the graph for the linear check bit shown in Figure 3.24. For this reason the two codes detect the same number of 2-bit errors and the error detection probability of the codes are equal. The linear code shown in Figure 3.24 detects 6 error vectors impacting two bits where the code from Figure 3.25 detects 96 of the 160 possible 2-bit errors. These 96 2-bit errors are distributed

266

Several Aspects of Security

Table 3.14. Codes found by the heuristic compared with linear codes # data bits k 4 5 6 7 8 9 10 11 12

Double-bit error detection rate heuristic check bit linear check bit 0, 600 0, 600 0, 565 0, 571 0, 554 0, 556 0, 544 0, 545 0, 537

0, 600 0, 600 0, 571 0, 571 0, 556 0, 556 0, 545 0, 545 0, 538

over the 10 2-bit error vectors, so the code detects all possible 2-bit error vectors but with probability q ≤ 1. In Table 3.14 codes with a linear check bit and codes which are determined with the proposed heuristic are compared. The codes from the heuristic are nonlinear and achieve, in most cases, the same probability to detect double-bit errors as the best linear code. For some odd block lengths the heuristic gives slightly worse results compared to the optimal linear check code.

3.4.5. Additional Check Bit for a Byte-Parity Code Now we assume a Longitude Redundancy Check (LRC) where K groups of b bits are already protected with K parity check bits and one additional check bit can be added to protect the overall data bits against double-bit errors not detected by the parity groups. The number of data bits is k = K · b and the block length of the code is n = b · K + K + 1. The code naturally detects all error vectors with an odd Hamming weight. A 2-bit error within a codeword is detected by the parity check bits when the error affects two different parity byte groups. If the error modifies two bits within a parity group the additional check bit should

Determination of Almost Optimal Check Bits

267

Table 3.15. Double-bit error detection rates of extended LRC codes Parameters of the codes Double-bit error detection rates of byte-parity codes k b K n heuristic check bit linear check bit nonlinear check bit 8 8 8

8 4 2

1 2 4

10 11 13

0, 644 0, 855 0, 949

0, 644 0, 855 0, 949

0, 600 0, 818 0, 923

16 16 16 8 16 4 16 2

1 2 4 8

18 19 21 25

0, 582 0, 813 0, 924 0, 973

0, 582 0, 813 0, 924 0, 973

0, 556 0, 789 0, 905 0, 960

detect the error as best as is possible. For one group there are

b+1 2

=

b2 + b 2

2-bit errors which can affect the data bits and the parity check bit which aren’t detected by the parity bit. We are looking for a check bit that detects most of these errors. Example 3.37. Let us consider the case of two bytes checked by a single parity bit and an additional check bit determined by the heuristic for better double-bit error The initial error graph has protection. 19 ) · 2 = 99, 614, 720 edges representing all 219 vertices and (19 + 19 2 possible 1-bit or 2-bit errors. After removing the vertices which differ in the parity check bit positions from the group parity, there are 217 vertices left. After removing the edges which were connected to the removed vertices there are 5, 898, 240 edges left representing undetected errors. On this graph the heuristic selects 216 codewords and removes 216 vertices and connected edges. In Table 3.15 the error detection probabilities for 2-bit errors of group parity codes for LRC with different group sizes and different additional check bits are compared. The first four columns show data word size k, block length n, group size b and number of parity groups K. The next column reveals the error detection probability for a group parity code with an additional check bit defined with the previously described heuristic. In the sixth column an additional linear check bit is used calculated by the XOR sum of the first b−1 2 data bits of each parity group. The last column grabs the error detection probabilities of a

268

Several Aspects of Security

LRC code using an additional nonlinear check bit defined by c(u) = u1 u2 ⊕ u3 u4 ⊕ · · · ⊕ un−1 un . This check bit detects all error vectors with 50% (excluding the error which only affects the check bit). Table 3.15 shows that the additional nonlinear check bit has a lower 2-bit error detection probability than the other check bits. The additional linear check bit and the check bit found by the heuristic detects the same amount of edges in the graph representing errors but the check defined by the heuristic is nonlinear and detects all 2-bit error vectors with a probability q > 0. In two adjacent memory cells a 1-bit error can occur at the same time [81]. Adjacent 2-bit errors can be detected by the linear check bit if the appropriate data bits for the XOR sum are selected. If the error model reflects the adjacent 2-bit errors, the heuristic will assign a suitable check bit detecting all adjacent errors. Bella Bose and Der Jei Ling have presented codes which have a small fixed number of check bits and detect all unidirectional errors up to the weight l ≤ n [48]. The weight l and the number of required check bits are in contrast to linear codes independent of the block length n. The two check bits for detecting unidirectional 2-bit errors are calculated as follows: count the number of 0s in the data bits and take result modulo 4. By using a group of parity codes for LRC with one group and one additional check bit defined by the heuristic flow of Figure 3.23 the codes can also detect all unidirectional 2-bit errors. The individual codewords may be different from the codes described by Bose and Ling but the error detection probability for 2-bit errors is the same. The fault model graphs for unidirectional and asymmetric errors are very similar. The edges of the graph for unidirectional errors are undirected and the edges of the graph for asymmetric errors are directed but connect the same vertices.

4. Exploration of Properties 4.1. On the Relationship of Boolean Function Spectra and Circuit Output Probabilities Micah A. Thornton

Mitchell A. Thornton

4.1.1. Spectra of Boolean Functions Boolean function spectra provide information regarding the behavior of the function over all possible inputs. On the other extreme, the function may be completely described by a specific valuation of dependent variables, such as that represented by a truth table or Binary Decision Diagram. Circuit output probabilities can be used as an alternative set of characterizing parameters for a Boolean function. Furthermore, various different spectra can be formulated as functions of circuit output probabilities. For these reasons, we are motivated to explore the use of circuit output probabilities and show how they are mathematically related to various spectra and other commonly used switching function characterizations. Circuit output probabilities are often used for analysis of the effectiveness of random testing [242] and in noise-resilient computing [378]. A circuit output probability is the likelihood that a switching function f evaluates to a specific value in B = {0, 1} given the probability distributions of each dependent variable in the support set of f , {x1 , x2 , . . . , xn }. Unless otherwise specified, the distributions for each dependent variable are assumed to be equally likely, therefore P (xi = 0) = P (xi = 1) =

1 . 2

270

Exploration of Properties

If the switching function f is incompletely specified, or the support set variables are not equally likely to evaluate to 0 or 1, an alternative probability mass function may be used for P (xi ) without loss of generality of the results described in this section. Switching function output probabilities are also of interest due to their close relationship to signal switching probabilities (or switching activity factors.) Switching activity factors are widely used in the area of power dissipation estimation and synthesis of low power circuits [129, 243, 350]. Switching activity factors are related to output probabilities in that they provide a measure of the likelihood that an output value will change; however, they are distinctly different from the circuit output probabilities used in this work. To visualize the difference imagine switching activity factors as a more edge sensitive measurement, whereas Boolean output probabilities are level sensitive. The relationship between output probability and spectral coefficients is of interest since it allows the global behavior provided by many spectral coefficients and the local behavior specified by the truth table output vector to be related through a common set of values. These relations can be used to develop an efficient means for computing the Walsh family of spectra [344] and generalized Reed-Muller spectra [343]. Furthermore, switching function output probabilities are also related to wavelet transforms whose spectral coefficients provide behavior over restricted portions of the function range [341]. The Haar transform is used as an example of a wavelet transform to illustrate this property. Past work has resulted in methodologies for computing output probabilities without requiring the use of floating point operations [154]. Output probabilities are also used in reliability and fault tolerance analysis of conventional digital switching circuitry as well as stochastic circuit design [2, 272, 284, 306].

4.1.2. Boolean Function Output Probability The output probability expression for a Boolean function is a realvalued algebraic equation that specifies the probability that the func-

Boolean Function Spectra and Circuit Probabilities

271

tion is valued at logic-1 given the probabilities that each of the dependent variables are valued at logic-1. Alternatively, the output probability expression is the probability mass function in the event that f = 1 given the probability mass functions for each variable in the support set of f . For completely specified switching functions, the event space consists of 2n experiments where n is the number of dependent Boolean variables and each experiment corresponds to a unique value assignment for the n variables that f depends upon. In the case of completely specified switching functions whose dependent variables are uniformly distributed with respect to the event xi = 1, the switching function output probability is simply the percentage of minterms that cause the function to evaluate to logic-1. Example 4.38 (Output Probability). Consider the completely specified switching function with P (xi ) = 12 , symbolically expressed in Equation (4.1). f (x) = x1 x3 x6 ∨ x1 x3 x4 x6 ∨ x1 x3 x4 x5 ∨ x1 x2 x4 x6 ∨ x1 x2 x4 x5 ∨ x1 x2 x5

(4.1)

By computing the percentage of minterms of f , it is easily seen that 6 P (f ) = 27 64 = 0.421875. Thus, 42.1875% of the possible 2 minterms cause f to evaluate to logic-1 values.

4.1.3. Conditional Output Probabilities The use of circuit output probabilities in digital systems engineering tasks has generally centered around the computation of the probabilities of single functions. In this section we will use conditional output probability expressions to show how some interesting relationships may be derived. We also show how output probability values may be related to cover sets of Boolean functions. Unless otherwise noted, the conditioned quantity fc always depends on a subset of the dependent variables of f . Therefore, the event space for the conditional output probability, P (f |fc ), is comprised of 2n−m experiments where n is the number of dependent variables of f and m is the number of mutually dependent variables of f and fc . It is convenient to describe conditional output probabilities by ex-

272

Exploration of Properties

amining the output probabilities of co-factor functions as used in a particular form of switching function decomposition commonly referred to as the Shannon expansion. Although this decomposition is attributed to the pioneering researcher Claude Shannon, it is noted that the relationship was included in the writings of George Boole himself. The Shannon expansion theorem is well known and is used to provide a definition of the co-factors of f . Equations (4.2), (4.3), and (4.4) state the Shannon expansion theorem. Theorem 4.10 (Shannon Expansion Theorem). The Shannon expansion of a switching function f (x1 , x2 , . . . , xi−1 , xi , xi+1 , . . . , xn ) is a disjunctive combination of two product terms where each term is a product of a literal and a co-factor with respect to the literal. The literals are the chosen variable of decomposition in both positive and negative polarity. Co-factors are commonly expressed as f0 = fxi for the negative co-factor and f1 = fxi for the positive co-factor. Equation (4.2) is a statement of the Shannon expansion and Equations (4.3) and (4.4) define the negative and positive co-factors respectively. f (x1 , x2 , . . . , xi−1 , xi , xi+1 , . . . , xn ) = xi f0 + xi f1

(4.2)

f0 = f (x1 , x2 , . . . , xi−1 , 0, xi+1 , . . . , xn )

(4.3)

f1 = f (x1 , x2 , . . . , xi−1 , 1, xi+1 , . . . , xn )

(4.4)

Other useful functions such as smoothing, consensus, and the Boolean derivative (or difference) are commonly used in switching theory and can be defined in terms of the Shannon co-factors. Some applications of these functions include test, synthesis, and formal verification. Because these important functions can be defined in terms of the Shannon expansion co-factors, f0 and f1 , the output probabilities of the co-factors are related to smoothing, consensus, and the Boolean derivative. Consider the co-factor f1 with respect to f for a particular variable, xi . The output probability is computed as the total number of times

Boolean Function Spectra and Circuit Probabilities

273

f1 = 1 divided by 2n−1 if f is a completely specified function of n variables. The resulting co-factor continues to depend upon all variables in the support set of f except xi . In terms of probability theory, P (f1 ) is simply the conditional probability P (f |xi ). Using Bayes’ theorem the conditional probability is expressed as in Equation (4.5). Note that P (xi ) = 0 always holds in Equation (4.5) since the cofactor f1 is always computed in terms of a dependent variable xi . The trivial case where xi = 0 and hence P (xi ) = 0 indicates that f does not depend upon xi and is thus not of interest in Equation (4.5). Indeed, in this case xi is more properly considered a constant rather than a variable. P (f · xi ) (4.5) P (f |xi ) = P (xi ) Since it is typical to deal with fully specified functions and equally likely valuations of xi , P (xi ) = 12 , in this case and Equation (4.5) simplifies to P (f |xi ) = 2P (f · xi ). It is possible to compute conditional probabilities for more complex constituent functions, P (f |fc ). The use of Bayes’ theorem allows a very succinct method for determining if a given function f covers another function, fc . Lemma 4.1.1 states and proves this result. Lemma 4.1.1 (Covering Functions). A function f covers another function, fc , if and only if P (fc |f ) = P (f ) . Proof. By definition, a cover of function fc by another function f means that for all fc = 1, f is also at a logic-1 value. In terms of conditional probabilities, this is stated as P (fc |f ) = 1. Substituting this expression into Equation (4.5) yields the following result: P (f · fc ) = P (f ) .

(4.6)

This substitution is valid for all cases where f and fc have mutual dependence on a common subset of variables. This dependence assumption prevents P (fc ) = 0. It is trivial to ensure dependence since it is only necessary to show that at least one common variable is present in the support sets of f and fc .

274

Exploration of Properties

This result can be very useful in synthesis algorithms that require efficient means for checking covers. Depending on the way f and fc are represented, computing the two probability values in Equation (4.6) may prove to be more efficient than other methods, such as checking for intersections using cube lists. Many logic optimization algorithms rely upon repeatedly computing relationships among a set of cubes representing a logic function. A cube is a product of literals where the literals are formed from dependent variables in the support set of a switching function f . When the cube or product term is formed with all n variables in the support set of f , it covers exactly one valuation of f and is referred to as a minterm of f . Therefore, if a cube ci consists of m literals where m < n, then ci covers 2n−m minterms of f . Sets of cubes that collectively cover all minterms of f are referred to as cube lists or cube covers representing f . Well-known EDA algorithms operate over cube lists and typically have the goal of reducing the number of covering cubes in a particular list such as [265]. In particular, computations present in such algorithms commonly involve determining if pairs of cubes are disjoint, totally redundant, or partially redundant. The objective here is to show that conditional probability values may be used to quickly determine which of these three relationships are present between any pair of cubes from a set that covers a particular function f . The totally redundant cube category can be considered a special case of Lemma 4.1.1. In this case, a cube, ci , is totally redundant with respect to cube cj if cj covers ci . Alternatively, two cubes, ci and cj are disjoint when they each cover mutually exclusive minterms within the range of the Boolean function f. Finally, ci and cj are partially redundant if they each cover intersecting proper subsets of the complete set of minterms of f . The following lemmas provide the relationships between the conditional probabilities of pairs of cubes which satisfy certain properties.

Boolean Function Spectra and Circuit Probabilities

275

Lemma 4.1.2 (Totally Redundant Cubes). Given two cubes ci and cj within the cube cover of a function f , P (ci |cj ) = 1 if, and only if, ci is totally redundant. Proof. Since P (cj ) = 0, the conditional probability can be computed as: P (ci · cj ) (4.7) P (ci |cj ) = P (cj ) When cube ci is totally redundant, ci ·cj = cj . Substituting this result into Equation 4.7, it is seen that P (ci |cj ) = 1. Lemma 4.1.3 (Disjoint Cubes). Given two cubes ci and cj within the cube cover of a function f , P (ci |cj ) = 0 if, and only if, ci and cj are disjoint. Proof. Since P (cj ) = 0, the conditional probability can be computed as P (ci · cj ) (4.8) P (ci |cj ) = P (cj ) When the cubes ci and cj are disjoint, ci ·cj = 0. Thus, the conditional probability becomes P (ci |cj ) =

P (0) =0. P (cj )

Lemma 4.1.4 (Partially Redundant Cubes). Given two cubes ci and cj within the cube cover of a function f , P (ci |cj ) = p, where p lies in the interval (0, 1) if, and only if, ci and cj are partially redundant. Proof. When P (ci |cj ) = p, there is necessarily some dependence between ci and cj , therefore they cannot be disjoint and P (ci |cj ) = 0. Furthermore, since p < 1, ci is not totally redundant. Hence, ci and cj are partially redundant. When a cube ci consists of n literals, it could either be or not be a minterm of f (denoted as mi ) since it either corresponds to a single f = 1 or f = 0 valuation. Thus, the conditional probability becomes

276

Exploration of Properties

P (f |ci ) = 0 or P (f |ci ) = 1. If P (f |ci ) = 1, it is necessarily the case that ci = mi , that is ci is a minterm of f . This is due to the fact that the event space is reduced in size to 2n−n = 1, hence it consists of a single experiment. The outcome of this experiment is that the function either covers the cube or it does not. Hence, a conditional probability, P (f |ci ), indicates whether or not the function depends on ci . The real value of the probability is the same as the Boolean valuation of the function when it is evaluated with variables set to the values that satisfy ci , i.e., cause ci = 1. This result is inferred directly from the preceding three lemmas since it is impossible for a cube consisting of n literals and a function to be partially redundant.

4.1.4. The Smoothing, Consensus and Boolean Derivative Operators As briefly mentioned above, there are three other operators which are important in Boolean Algebra, defined in terms of co-factors. These are smoothing (existential quantifier), consensus (universal quantifier), and Boolean derivative (Boolean difference). While commonly referred to as functions, it is mathematically more accurate to refer to these as operators since they operate over a function f with respect to a variable xi . (4.33) provides the mathematical definition of these operators over a switching function f with respect to a dependent variable xi . Definition 4.33 (Smoothing, Consensus, and Boolean Derivative). ∂f are Smoothing Sxi (f ), consensus Cxi (f ), and Boolean derivative ∂x i expressed in terms of Shannon expansion co-factors with respect to dependent variable xi as follows: Sxi (f ) = f0 ∨ f1 , Cxi (f ) = f0 · f1 , ∂f = f0 ⊕ f1 . ∂xi

(4.9) (4.10) (4.11)

In terms of notation, the smoothing operator is defined with respect to a switching function f and a corresponding variable from the support set, xi . The smoothing operator is an instance of existential

Boolean Function Spectra and Circuit Probabilities

277

quantification and is denoted here as Sxi (f ). Likewise, the consensus of f with respect to dependent variable xi is an instance of universal quantification and is denoted here as Cxi (f ). An alternative notation for the smoothing and consensus operators is ∃xi (f ) and ∀xi (f ). Finally, the Boolean derivative of f with regard to xi (or Boolean dif∂f . It is noted that these three operators are ference) is denoted as ∂x i inter-related in the switching domain and formulate the contradiction relation given in Equation (4.12). Cxi (f ) · Sxi (f ) ⊕

∂f =0 ∂xi

(4.12)

In Boolean Differential Calculus (BDC) these three operators are defined as simple derivative operations:

simple minimum:

∂f = f0 ⊕ f1 , ∂xi min f = f0 ∧ f1 = Cxi (f ) ,

simple maximum:

max f = f0 ∨ f1 = Sxi (f ) .

simple derivative:

xi

xi

The term simple expresses that only a single variable is changed between the two co-factors evaluated. Usually, this term is omitted. The terms minimum and maximum are chosen due to the inequality min f ≤ f ≤ max f . xi

xi

More details about the BDC are explored in [247, 317–319]. Qualitatively, Sxi (f ) can be considered to be an operator that represents the component of f that is independent of the dependent variable xi . In other words, the smoothing operator represents the behavior of a function f when the dependence of that function on a particular variable, xi , is suppressed. An example application of the use of the smoothing operator is in generating the image or pre-image of switching functions that model the transition relation in a finite state machine representation of a controller circuit. Such image computations are employed in symbolic state machine equivalence checking, model checking, and other formal verification methods.

278

Exploration of Properties

A qualitative interpretation of Cxi (f ) is that of an operator that expresses the component of f when all appearances of dependent variable xi are omitted. The consensus of a Boolean function is the portion of the function that is independent of a variable xi and has the function value 1. Very small output probabilities of the consensus operator over a function f indicate that the function is highly dependent upon variable xi . As an example application, the degree of dependence of a function with respect to a dependent variable has been used as the basis of a heuristic for BDD variable ordering algorithms [214]. The Boolean derivative operator indicates whether or not f is sensitive to changes in the value of a dependent variable xi . As an example, the concept of observability is central in many VLSI test applications. The Boolean derivative provides a measure of the observability of a particular input, or internal variable. When ∂f /∂xi = 0, the change of the variable xi is not observable at point f in the circuit. Therefore, the overall observability may be characterized by computing the output probability of ∂f /∂xi with values close to zero indicating that xi is relatively unobservable and hence does not greatly affect the two co-factors. It is suggested that the use of the output probability of these measures could provide useful insights into the relationship between a function and an input variable. In terms of output probability computations, the corresponding output probabilities of these operators can be computed directly from the output probabilities of the co-factors thus avoiding explicit formulation of the operators themselves. The following Equations (4.13), (4.14), and (4.15) express the relationship of the output probabilities of the co-factors with the output probabilities of the operators. P

∂f ∂xi

= P fxi · f xi + P f xi · fxi − P f xi · f xi P f xi · fxi (4.13) P (Cxi (f )) = P (fxi · fxi ) P (Sxi (f )) = 1 − P f xi · f xi

(4.14) (4.15)

Equations (4.13), (4.14), and (4.15) indicate that the output probabilities of these three important operators can be easily provided

Boolean Function Spectra and Circuit Probabilities

279

when the four fundamental probabilities of P (fxi · fxi ), P (fxi · f xi ), P (f xi · fxi ), and P (f xi · f xi ) are given.

4.1.5. Switching Function Spectra Here we describe how output probabilities are related to various spectra. Different spectra are defined based upon the structure of a particular linear transformation matrix. The vector of spectral coefficients corresponding to a particular switching function and transformation matrix is obtained through the calculation of a vector-matrix direct product where the vector, termed an output vector , is formulated as all possible function valuations of f in the switching domain and the particular transformation matrix is chosen based upon the desired spectra. In this formulation, the vector-matrix product is not computed directly. Rather, each transformation matrix row vector is considered to be an output vector of some other function called a constituent function. By using Boolean relations between the function to be transformed, f , and the various constituent functions fc , output probabilities are computed and used to calculate the spectral coefficients through simple algebraic relations [341, 342, 344–346]. One advantage of this approach is that the spectral coefficients are computed individually thus avoiding the need for storage of all 2n spectral coefficients as well as the switching function output vectors and the transformation matrix itself, which are also exponentially large.

4.1.6. Calculating the Walsh Spectrum The Walsh spectral values form several different spectra. The chief difference in the various spectra depends only upon the order in which coefficients appear [147]. For example, this set of coefficients may be ordered naturally, in sequency order, in dyadic order, or, in revised sequency order. These orders correspond to the Hadamard-Walsh,

280

Exploration of Properties

the Walsh, the Paley-Walsh, and the Rademacher-Walsh spectra, respectively [38, 147]. In the past, the Walsh-Hadamard transformation has seen much use due to the fact that efficient decompositions of the transformation matrix may be obtained allowing for fast transform methods [73, 291] to be applied. The Walsh-Hadamard, or naturally-ordered Walsh transformation matrix can be described recursively as in Equations (4.18). Alternatively, the Kronecker product multiplicative operation denoted symbolically as ⊗ may be used to define the Walsh-Hadamard matrix as given in Equation (4.16).

W1 =

Wn =

1 1

1 −1

Wn−1 Wn−1

(4.16) Wn−1 −Wn−1

Wn = W1 ⊗ Wn−1

(4.17) (4.18)

Since the relationships described here apply only to a single Walsh coefficient, the particular coefficient orderings are not important. Each coefficient is described by the particular fc that corresponds to a transformation matrix row vector regardless of the actual location of the row vector in the matrix. The Walsh coefficient is denoted by Wf (fc ) where f denotes that the coefficient is with respect to function f and fc denotes it is the coefficient resulting from the transform matrix row vector corresponding to constituent function fc . It is generally the case that the Walsh spectral values are computed by replacing all occurrences of logic-1 with the integer −1, and all occurrences of logic-0 with the integer +1. The actual calculation of the coefficient is then carried out by using integer arithmetic. In terms of computing an inner-product of a transformation row vector and an output vector of a function to be transformed, it is easy to see that each scalar product to be accumulated is either +1 or −1 in value. Furthermore, a scalar product of −1 will only occur when the functions f and fc have different output values for the same set of input variable assignments.

Boolean Function Spectra and Circuit Probabilities

281

If we consider the equivalence function given by the Exclusive-NOR (XNOR) operation, a function can be formed whose output is logic-1 if and only if both f and fc are at logic-1 or both at logic-0, f ⊕ fc . Furthermore, if the output probability of this function is computed, we have the percentage of outputs where both f and fc simultaneously output the same value, P (f ⊕ fc ). The corresponding Walsh spectral coefficient can then be obtained by scaling the output probability value by 2n , where n is the number of variables in f . This is given in Equation (4.19). Wf (fc ) = 2n [1 − P (f ⊕ fc )]

(4.19)

The Walsh coefficients in Equation (4.19) depend upon the equivalence relation of the function being transformed and the constituent function. However, the spectral coefficient can be computed using Boolean operators other than XOR. This is accomplished by exploiting the probability relationships given in the equations below: P (f ∨ fc ) = P (f · fc ) + P (f ⊕ fc ) , P (f ∨ fc ) = P (f ) + P (fc ) − P (f · fc ) .

(4.20) (4.21)

Substituting these relationships into Equation (4.19) yields the following results: Wf (fc ) = 2n [1 + 2(P (f · fc ) − P (f + fc ))] , Wf (fc ) = 2n [1 + 4P (f · fc ) − 2P (f ) − 2P (fc )] , Wf (fc ) = 2n [1 − 4P (f + fc ) + 2P (f ) + 2P (fc )] ,

(4.22) (4.23) (4.24)

The relationships in these equations may be simplified since P (fc ) = 12 is easily shown to be true for all fc except the zeroth ordered coefficient Wf (0), where P (0) = 0. In addition, the higher ordered Walsh coefficients can be related to the zeroth ordered coefficient through the expressions given in the following equations: Wf (0) = 2n (1 − 2P (f )) , Wf (fc ) = 2n (4P (f · fc ) − 1) + Wf (0) , Wf (fc ) = 2n (3 − 4P (f ∨ fc )) − Wf (0) .

(4.25) (4.26) (4.27)

These relations may be rearranged to compute the output probabilities directly from the Walsh coefficients if so desired.

282

Exploration of Properties

4.1.7. Calculating the Reed-Muller Spectrum The Reed-Muller (RM) family of spectra consist of 2n distinct transformations. These are generally classified according to a polarity number . The polarity number is used to indicate if a literal is present in complemented or non-complemented form in the generalized ReedMuller expression. In terms of the RM spectra, the polarity number can be considered to uniquely define the transformation matrix. The generalized Reed-Muller spectra have been studied and used extensively in the past. The polarity-0 RM transformation matrix, Rn is defined in Equations (4.28) and (4.29). Similar definitions allow for the formation of polarity-j RM transformation matrices where j = 0 and are described in detail in [342].

R1 =

Rn =

1 1

0 1

Rn−1 Rn−1

(4.28) 0

Rn−1

Rn = R1 ⊗ Rn−1

(4.29) (4.30)

In relating output probabilities to the RM spectra consideration must be given due to the fact that the RM spectrum is computed over Galois field 2 and the output probabilities are real quantities in the interval [0,1]. Like the Walsh spectrum, each of the rows of the RM transformation matrix may be viewed as constituent function output vectors. The constituent functions turn out to be all possible products of literals with complementation or lack thereof determined by the polarity number. Therefore, any arbitrary RM spectral coefficient may be computed with the equation below: Rf (fc ) = [2n P (f · fc )](mod2) .

(4.31)

Using the probability relationships given in Subsection 4.1.6, alternative relationships can be derived as given in the following: Rf (fc ) = [2n (P (f ) − P (fc ) − P (f + g))](mod2) , Rf (fc ) = [2n (P (f ⊕ fc ) − P (f + g))](mod2) .

(4.32) (4.33)

Boolean Function Spectra and Circuit Probabilities

283

4.1.8. Calculating the Haar Spectrum The Haar spectrum [4, 145, 159] has not seen as much application in the area of switching theory as the Walsh family or RM transforms described above. However, the Haar spectrum and, in particular the modified Haar spectrum, can provide very interesting information concerning the correlation of a function with various Shannon decompositions [147, 341]. The modified Haar transform is the Haar wavelet transform with the scalar multiplier of the form ( √12 )n omitted. The transformation matrix Hn is defined in Equation (4.35) where In represents the n × n identity matrix.

1 1 (4.34) H1 = 1 −1

Hn−1 ⊗ 1 1

Hn = (4.35) ⊗ In−1 1 −1 The row vectors of the Haar matrix can in fact be described in terms of the AND operation of the function to be transformed and various cubes formed from the literals. This is in contrast to the Walsh and RM transformation matrices where the respective fc functions were defined totally independent of the function to be transformed. The reason for this difference in row vector definition is because the Haar transform does not give totally global information, rather it gives information regarding the correlation of a function with its various co-factors. However, like the previous approaches, each Haar coefficient can be computed as an algebraic relation of various probability values and hence the Haar spectrum is also directly linked with output probability calculations. Like the Walsh family of transforms, the output vector of the function to be transformed generally contains integers with -1 representing logic-1 and +1 representing logic-0. From this viewpoint, we can define the number of matches between a particular transformation matrix row vector as the number of times the row vector and function vector components are simultaneously equal to -1 or +1. As an example of the modified Haar transform, the transformation matrix for a 3-variable function is shown in Figure 4.1. Each of the constituent Boolean functions are given to the left of their respective row vectors.

284

Exploration of Properties

f x1 x2 · fx1 x2 · fx1 x3 · fx1 x2 x3 · fx1 x2 x3 · fx1 x2 x3 · fx1 x2

⎤ 1 1 1 1 1 1 1 1 ⎢1 1 1 1 −1 −1 −1 −1 ⎥ ⎥ ⎢ ⎢1 1 −1 −1 0 0 0 0⎥ ⎥ ⎢ ⎢0 0 0 0 1 1 −1 −1 ⎥ ⎥ ⎢ ⎢ 1 −1 0 0 0 0 0 0⎥ ⎢ ⎥ ⎢0 0 1 −1 0 0 0 0⎥ ⎢ ⎥ ⎣0 0 0 0 1 −1 0 0⎦ 0 0 0 0 0 0 1 −1 ⎡

Figure 4.1. Example of modified Haar transformation matrix for n = 3.

Note that the matrix contains a 0 value in addition to the 1 and -1 quantities which represent logic levels. Since some of the rows represent constituent functions that are co-factors, the output space is less than 23 and the presence of a 0 value acts as a place holder. Tables 4.1 and 4.2 contain symbols for each of the Haar coefficients, Hi , values that indicate the size of the constituent function range, i and n, and probability expressions that evaluate whether the function to be transformed and the constituent function simultaneously evaluate to logic-0, pm0 , or, evaluate to logic-1, pm1 . Table 4.1. pm0 Relationship to Haar spectrum and output probabilities

SYMBOL

i

n

pm0

H0 H1 H2 H3 H4 H5 H6 H7

0 0 1 1 2 2 2 2

3 3 2 2 1 1 1 1

P (f · 0) P (f · x1 ) P (f · x1 · x2 )/P (x1 ) P (f · x1 · x2 )/P (x1 ) P (f · x1 · x2 · x3 )/P (x1 · x2 ) P (f · x1 · x2 · x3 )/P (x1 · x2 ) P (f · x1 · x2 · x3 )/P (x1 · x2 ) P (f · x1 · x2 · x3 )/P (x1 · x2 )

Using the notation introduced in Tables 4.1 and 4.2, we can write an algebraic equation to compute the value of a Haar spectral coefficient

Boolean Function Spectra and Circuit Probabilities

285

Table 4.2. pm1 relationship to Haar spectrum and output probabilities

SYMBOL

i

n

pm1

H0 H1 H2 H3 H4 H5 H6 H7

0 0 1 1 2 2 2 2

3 3 2 2 1 1 1 1

P (f · 0) P (f · x1 ) P (f · x1 · x2 )/P (x1 ) P (f · x1 · x2 )/P (x1 ) P (f · x1 · x2 · x3 )/P (x1 · x2 ) P (f · x1 · x2 · x3 )/P (x1 · x2 ) P (f · x1 · x2 · x3 )/P (x1 · x2 ) P (f · x1 · x2 · x3 )/P (x1 · x2 )

in terms of the output probabilities used to compute pm0 and pm1 : H = 2n−i [2(pm0 + pm1 ) − 1] .

(4.36)

By applying the same principles as those used to determine the algebraic relationships between output probabilities and the Walsh family and RM transforms, we can formulate the Haar transform relationships. The presence of cofactors in the Haar constituent functions can be handled by using the relationship in Equation (4.5) to represent these quantities as output probabilities of the AND of the function to be transformed with its respective dependent literals. Note also that the maximum absolute value of a Haar spectral coefficient varies depending on the order of the coefficient. This is due to the reduction in the size of the range of the constituent functions containing co-factors.

4.1.9. Switching Function Output Probability Output probabilities have applications in both the switching and spectral domains. In particular, the concepts that relate the output probabilities of Shannon co-factors and conditional probabilities indicate that many commonly used operations in electronic design automation

286

Exploration of Properties

algorithms can be formulated in terms of output probability computations. Spectral information can yield characterizing information about the switching functions of interest and this information can be global or multi-resolutional depending upon the choice of the particular transformation employed. Problems with excessive computational complexity can be alleviated through the computation of individual coefficients and selected row vectors of transformation matrices. Recently, efficient methods for the computation of absolute and conditional output probabilities for discrete switching functions have emerged based upon the use of decision diagram structures as reported in [201, 235].

ROBDD-based Computation of Special Sets

287

4.2. ROBDD-based Computation of Special Sets with RelView Applications Rudolf Berghammer

Stefan Bolus

4.2.1. Hard Problems Many hard problems in discrete mathematics and computer science require the computation of the minimal and maximal sets, respectively, of a generally very large set of sets. An example is the identification of Banks winners [23] in Social Choice Theory. This task calls for the set of maximal transitive sets as subject to a given tournament relation. Also game-theoretic problems lead to such tasks. For instance, the definition of important power indices is based on the set of minimal winning coalitions [10, 91, 139]. As a final example we want to mention independence systems and matroids. Computing the bases of such structures means we have to determine the set of maximal independent sets and computing the circuits’ means we have to determine the set of minimal dependent sets [180]. A Binary Decision Diagram (BDD) provides an efficient way to represent not only a Boolean function but also a set or a set of sets. In [219] the zero-suppressed variant is used for this purpose in a very general context. We have applied other variants of BDDs to solve related but more specific tasks, namely to answer game-theoretic questions (via quasi-reduced BDDs [28, 45, 46]) and to implement relation algebra within the specific purpose Computer Algebra system RelView (via reduced BDDs [30, 31, 195, 212]). This led to an amazing computational power. However, in the case of the RelView system the computation of specific relations (usually called vectors) that represent sets involves difficulties, if a set to be represented needed to consist of the minimal or maximal sets of a subset of a powerset 2X . Typically they were caused by the use of the set-inclusion relation on 2X , a relation that in the case of a larger base set X possesses a huge implementing BDD which only can be avoided in fortunate situations (see, e.g., [29]).

288

Exploration of Properties

Therefore, we looked for possibilities of avoiding set-inclusion relations when computing sets of minimal or maximal sets and to work instead directly on the level of BDDs. The solution, we have found, is based on the insight that if a (relational) vector v represents a subset A of 2X and the elements of X are interpreted as BDD-labels (or Boolean variables, respectively), then each set S of A precisely corresponds to a satisfying assignment of the BDD-implementation of v. This insight not only allows us to transfer the corresponding algorithms of [219] from the Zero-Suppressed Binary Decision Diagram (ZBDD) to the Reduced Binary Decision Diagram (RBDD) but also to develop refinements which are significantly faster than the original versions. Inheritance conditions are necessary for the refinements to be correct. However, these conditions seem to hold frequently in practice as shown for instance, in the examples mentioned at the beginning of this section. In this section we only present refined algorithms on reduced BDDs and their correctness. Meanwhile they have been integrated into RelView and thus we are able to present an application with regard to this tool. It shall demonstrate the potential of the new algorithms (the new version of RelView, respectively) and the gain in efficiency achieved by that.

4.2.2. Fundamentals of ROBDDs A Reduced Ordered Binary Decision Diagram (ROBDD) is a data structure that allows a very compact representation of a Boolean function. In this section we present some basic notions and notations. For more details, see [55, 363] for example. Let X be a non-empty and finite set of labels. To simplify matters, for the entire section we assume that labels are natural numbers and X = {1, 2, . . . , n}, with n ∈ N>0 . A BDD over X is a directed, acyclic multigraph with exactly one source, called the root, and exactly two sinks I and O, called the 1-sink and the 0-sink, respectively. Each node v that is not a sink is called an inner node. An inner node v has exactly two outgoing edges, called the 1-edge and the 0-edge, and is labeled with an element from X. By then(v) we denote the successor

ROBDD-based Computation of Special Sets

289

of v w.r.t. the 1-edge, by else(v) the successor of v w.r.t. the 0-edge and by lab(v) the label of v. Again for reasons of simplicity, the two sinks have the special label n + 1, i.e., lab(I) = lab(O) = n + 1. A BDD is called an Ordered Binary Decision Diagram (OBDD) if on each path from the root to a sink the labels of the inner nodes satisfy a given linear ordering on the set X of its labels. In the remainder of this section we always suppose the canonical ordering is 1 < 2 < . . . < n. That is, the root has label 1 and the labels strictly increase along all paths. An inner node v of a BDD is redundant if then(v) = else(v). Two different inner nodes v1 and v2 are equivalent if lab(v1 ) = lab(v2 ), then(v1 ) = then(v2 ), and else(v1 ) = else(v2 ). ROBDDs are those OBDDs that contain neither equivalent nor redundant inner nodes. If an OBDD does not contain equivalent inner nodes and each path from the root to a sink consists of n edges, then it is called a Quasi-Reduced Ordered Binary Decision Diagram (QOBDD). Because each node of an OBDD is the root of an OBDD itself, in the following we will not distinguish between the root of an OBDD and the OBDD itself. For OBDDs/roots we always use the letter r, also equipped with indices if more than one OBDD is treated. Example 4.39. In the literature OBDDs are usually drawn in a layered top-down fashion and without explicit names for the nodes, as shown in all the pictures of this paper. The root is placed at the top, while the two sinks are placed on the bottom layer. Inner nodes are drawn as circles and the two sinks are depicted as squares to emphasize their special meaning. The labels of the inner nodes are written inside the circles, but for the sinks the labels n + 1 are omitted. Instead, 1 is written inside the square of the 1-sink and 0 inside that of the 0-sink. Edges are always directed downwards, so that arrows can be omitted. An 1-edge is drawn as a solid line and a 0-edge is drawn as a dashed line. Figure 4.2 shows two graphs of BDDs. The OBDD of Figure 4.2 (a) is not reduced since it contains two redundant nodes on level 3. It is only a QOBDD. If the redundant nodes and their outgoing edges are removed and their ingoing edges are redirected to their unique successor nodes, then the QOBDD is transformed into the ROBDD of Figure 4.2 (b).

290

Exploration of Properties

1

1

2

2

3

(a)

3

2

2

3

1

0

3

(b)

1

0

Figure 4.2. Two OBDDs: (a) QOBDD, (b) ROBDD.

Given a ROBDD r over the set of labels X and a label i, each node v with i ≤ lab(v) represents a Boolean function fi,v : Bn−i+1 → B. The definition is by induction. Assuming x = (xi , . . . , xn ) as the list of variables, it starts with the following sink cases: fi,I (x) = 1 and fi,O (x) = 0 .

(4.37)

For an inner node v with lab(v) = k the induction step is as follows: fi,v (x) = xk fi,then(v) (x) ∨ xk fi,else(v) (x) .

(4.38)

Notice that the specific label i = n + 1 implies v ∈ {I, O} and in this case x is the 0-tuple, i.e., the 0-ary function fi,v : B0 → B corresponds to a Boolean value. The Boolean function represented by the ROBDD r is fr := f1,r . Example 4.40. We consider the ROBDD over the set of labels X = {1, 2, 3} that is depicted in Figure 4.2 (b). If v denotes the left node with label 2 and w denotes the only node with label 3, then we get the following result: f2,v (x2 , x3 )

= = =

x2 f2,I (x2 , x3 ) ∨ x2 f2,w (x2 , x3 ) x2 ∨ x2 (x3 f2,I (x2 , x3 ) ∨ x3 f2,O (x2 , x3 )) x2 ∨ x2 x3 .

In [55] the following canonicity property is shown.

ROBDD-based Computation of Special Sets

291

Theorem 4.11. For each Boolean function f (x) and each label/variable ordering there is (up to isomorphism) a unique ROBDD r that represents f (x). In the same paper the most important algorithms for the efficient manipulation of ROBDDs are also presented. We need three of them: 1. The ite-operation takes a label i ∈ X and two ROBDDs r1 and r2 without inner nodes of label i and yields the ROBDD ite(i, r1 , r2 ) with a new root r3 such that for this new ROBDD the properties lab(r3 ) = i, then(r3 ) = r1 and else(r3 ) = r2 are satisfied. 2. The neg-operation specifies negation on the ROBDD level. It takes a ROBDD r and yields the ROBDD neg(r) that represents f r. 3. The diff -operation specifies the difference on the ROBDD level. This operation takes two ROBDDs r1 and r2 and yields the ROBDD diff (r1 , r2 ) that represents fr1 ∧ f r2 . A realization of diff (r1 , r2 ) is possible as apply(r1 , neg(r2 ), ∧), where apply is the operation of [55] for binary synthesis and the symbol ∧ denotes the conjunction.

4.2.3. ROBDDs for Subsets of Powersets The data structure ROBDD has been developed to represent and evaluate a Boolean function. In order to represent finite sets with ROBDDs, the encoding of sets by means of Boolean vectors can be used. In this section we concentrate on the representation of subsets of a powerset, where the base set is non-empty and finite. To simplify matters, we again assume X as the base set, i.e., we deal with subsets of 2X . Doing so, the insertion of elements into sets and removal of elements from sets will play an outstanding role. For S being a set and i being an element, in the following we write S + i and S − i short for S ∪ {i} and S \ {i}, respectively. Let r be a ROBDD over the set of labels X, let i be a label, and sup-

292

Exploration of Properties

pose that v is a node of r, with i ≤ lab(v). If we assign to the elements of X Boolean variables x1 , . . . , xn , then each subset S of {i, . . . , n} can be interpreted as an assignment of the variables xj , i ≤ j ≤ n: it assigns the value 1 to the variable xj if and only if j ∈ S. With the definition of the set set(i, v) of all satisfying assignments of the Boolean function fi,v : Bn−i+1 → B (defined, as in Subsection 4.2.2, via the variables xi , . . . , xn ) can inductively be specified as follows. The sink cases are set(i, O) = ∅ and set(i, I) = {∅} .

(4.39)

If v is not a sink, that is, i < n + 1, then we have set(i, v) = {S +i | S ∈ set(i+1, then(v))}∪set(i+1, else(v)) (4.40) in the case of i = lab(v) and set(i, v) = {S + i | S ∈ set(i + 1, v)} ∪ set(i + 1, v)

(4.41)

if i < lab(v). Notice for all inner nodes v the set set(i, v) is a subset of 2{i,...,n} . Due to the canonicity of ROBDDs, for two inner nodes v1 and v2 with the same label k and all labels i with i ≤ k it holds set(i, v1 ) = set(i, v2 ) if and only if v1 = v2 . Definition 4.34. Given a ROBDD r over the set of labels X, a label i ∈ X, and an inner node v such that i ≤ lab(v), we say that v represents the subset A of 2{i,...,n} if A = set(i, v). The subset of 2X that is represented by r is defined as set(1, r). Let again an inner node v and a label i of a ROBDD r be given. If we additionally assume that i = lab(v), then there is a one-toone correspondence between the elements of the set set(i, v) and the paths (as sequences of edges) from v to the 1-sink in the QOBDD that results from r by inserting redundant nodes on the respective levels. Each path from v to I in the QOBDD induces a set from set(i, v) by collecting the labels of the initial nodes of its 1-edges. In the other direction, each set S ∈ set(i, v) leads to a path from v to I in the QOBDD by following in each node the 1-edge if the label of the node is in S and the 0-edge if the label is not in S. Example 4.41. We consider once again the OBDD r over the set of labels X = {1, 2, 3} that is depicted in Figure 4.2. The insertion

ROBDD-based Computation of Special Sets

293

of two redundant nodes on the third level yields the QOBDD that is depicted in Figure 4.2 (a). In the QOBDD there are precisely four paths from the root to I and these correspond to the four sets of set(1, r) = {{1, 2, 3}, {1, 2}, {1, 3}, {2, 3}} . If v denotes the left node with label 2 of the OBDD of Figure 4.2 (a), then the three paths from v to the sink I lead to the three-element set set(2, v) = {{2, 3}, {2}, {3}} . Now the equation set(1, v) = {{1, 2, 3}, {1, 2}, {1, 3}, {2, 3}, {2}, {3}} follows from (4.41). It is very easy to check that the subset set(1, v) of the powerset 2{1,2,3} consists precisely of the satisfying assignments of the Boolean function f1,v : B3 → B, that is defined by f1,v (x1 , x2 , x3 ) = x2 ∨ x2 x3 , for all x1 , x2 , x3 ∈ B. With the definition of the function set, the three ROBDD-operations, introduced in the enumeration at the end of Subsection 4.2.2, obtain a special meaning. The meaning of the difference is summarized in the following Lemma. Lemma 4.5. Let v1 and v2 be inner nodes of a ROBDD r over the set of labels X and assume i ∈ X such that i ≤ lab(v1 ) and i ≤ lab(v2 ), then we have the following equation: set(i, diff (v1 , v2 )) = set(i, v1 ) \ set(i, v2 ) .

(4.42)

For the special case set(i, v1 ) = 2{i,...,n} the set-difference (4.42) describes the complement: set(i, neg(v2 )) = 2{i,...,n} \ set(i, v2 ) .

(4.43)

We do not need this more specific property in the remainder of this section.

294

Exploration of Properties

4.2.4. Sets of Minimal and Maximal Sets A Zero-Suppressed Binary Decision Diagram (ZBDD) is a variant of an OBDD. To obtain a ZBDD from the original decision tree of a Boolean function, such as a ROBDD (or a QOBDD) as the first simplification rule, the merging of equivalent nodes is realized. But in the second simplification rule ROBDDs and ZBDDs differ from each other. Whereas in a ROBDD all redundant nodes are removed, in a ZBDD all such nodes are removed where the 1-edge points to the 0-sink. Using the same interpretation of sets as we have sketched in Subsection 4.2.3, it is shown in [219] how ZBDDs can be used to implement important set-theoretic algorithms. A result of [219] is the public domain ZBDD-library EXTRA [109]. It contains an operation Extra zddMinimal that computes from the ZBDD-representation of a subset A of 2X the ZBDD-representation of the subset min(A) of 2X that consists of the minimal sets of A and also an operation Extra zddMaximal that does the same for the subset max(A) of 2X that consists of the maximal sets of A. We have been able to transfer these operations to ROBDDs and to refine them afterwards under conditions, which frequently appear in practice, into significantly faster versions. This subsection deals with the refinements. They are based on [46] and the PhD thesis [45], where similar algorithms are presented in the context of QOBDDs and simple games to get minimal winning and maximal losing coalitions. We start our exploration with the minimal sets and assume that the subset A of the powerset 2X is represented by the ROBDD r. The function Minsets(1, r) is specified as Algorithm 4.1 and computes the ROBDD-representation of the subset min(A) of 2X . The input of the function Minsets is a label i ∈ X (including the special label n + 1 of the sinks) and a node v from the ROBDD r such that i ≤ lab(v). To simplify the correctness proof we have renounced the use of a computed table as is usual for BDD-algorithms. Therefore, the cascade-like recursion of the function Minsets may lead to a lot of identical recursive calls and an exponential running time. But it is an easy exercise to insert a computed table into Minsets that stores

ROBDD-based Computation of Special Sets

295

Algorithm 4.1. Minsets(i, v) Input: the label i and the node v of a ROBDD of the subset A of the powerset 2X Output: ROBDD-representation of min(A) if i = n + 1 then return v else if i < lab(v) then return ite(i, O, Minsets(i + 1, v)) end if if i = lab(v) then t := Minsets(i + 1, then(v)) e := Minsets(i + 1, else(v)) return ite(i, diff (t, e), e) end if end if the results that have already been computed. Doing so, the initial call Minsets(1, r) leads to a total number of O(|r|) recursive calls of Minsets, with |r| as the number of nodes of r. For the correctness of the function Minsets it is necessary that A is an upset in the powerset ordering (2X , ⊆), i.e., for all S, T ∈ 2X from S ∈ A and S ⊆ T it follows that T ∈ A. We start the correctness proof with the following auxiliary result about the inheritance of the upset property. Lemma 4.6. Assume that v is an inner node of the ROBDD r and let i ∈ X be a label such that the set set(i, v) is an upset in (2{i,...,n} , ⊆), then we have the following properties: (a) If i < lab(v), then the set set(i + 1, v) is an upset in (2{i+1,...,n} , ⊆). (b) If i = lab(v), then the sets set(i + 1, then(v)) and set(i + 1, else(v)) are upsets in (2{i+1,...,n} , ⊆).

296

Exploration of Properties

Proof. First we verify (a), i.e., that the set set(i + 1, v) is an upset in (2{i+1,...,n} , ⊆) if i < lab(v). To this end we assume that S ∈ set(i + 1, v) and T ∈ 2{i+1,...,n} such that S ⊆ T . From S ∈ set(i + 1, v) we get S + i ∈ set(i, v) due to the assumption i < lab(v). Now S + i ⊆ T + i in combination with the upset property of set(i, v) yields T + i ∈ set(i, v). Hence, there exists U ∈ set(i + 1, v) with T + i = U + i. Because of i ∈ / U and i ∈ / T we obtain T = U , i.e., T ∈ set(i + 1, v). For a proof of (b), i.e., the upset properties of set(i + 1, then(v)) and set(i + 1, else(v)) in the case of i = lab(v), a case distinction is necessary. (1) If the nodes then(v) and else(v) are sinks, then the claim follows from the fact that {∅} and ∅ are upsets in (2{i+1,...,n} , ⊆). (2) If then(v) and else(v) are inner nodes, then a proof is very similar to the above one. This concludes the proof. An immediate consequence of Lemma 4.6 is the following important fact: If the subset A := set(1, r) represented by the argument pair (1, r) of the initial call of the function Minsets, i.e., by the ROBDD r, is an upset, then this also holds for the subsets represented by the argument pairs of all recursive calls of Minsets. Hence, the upsetproperty is an invariant of Minsets. The next lemmas are also auxiliary results for the correctness proof of the function Minsets. It is the place where the upset property comes into the play. Lemma 4.7. Assume that v is an inner node of the ROBDD r and let i ∈ X be a label such that i < lab(v) and the set set(i, v) is an upset in (2{i,...,n} , ⊆), then we have the following equation: min(set(i + 1, v)) = min(set(i, v)) .

(4.44)

Proof. Inclusion ⊆: S ∈ min(set(i + 1, v)) implies that S ∈ set(i + 1, v) ⊆ set(i, v). To prove the minimality of S ∈ set(i, v) we assume

ROBDD-based Computation of Special Sets

297

there is T ∈ set(i, v) such that T ⊆ S. For this T we have i ∈ / T. Otherwise it would follow i ∈ S by the upset property and also that there exists U ∈ set(i+1, v) such that T = U +i. But i ∈ T and i ∈ /U imply T − i = U ∈ set(i + 1, v) and in combination with T − i ⊂ S this contradicts S ∈ min(set(i + 1, v)). From i ∈ / T and T ∈ set(i, v) we get T ∈ set(i + 1, v) and S ∈ min(set(i + 1, v)) implies the desired result T = S. Inclusion ⊇: Now we assume S ∈ min(set(i, v)). First, we show i∈ / S. For a proof we assume to the contrary that i ∈ S. Then we get S − i ∈ set(i + 1, v) which, in turn, implies S − i ∈ set(i, v). But the latter property contradicts S ∈ min(set(i, v)). From i ∈ / S and S ∈ set(i, v) we get S ∈ set(i + 1, v). To verify that even S ∈ min(set(i + 1, v)) holds, let T ∈ set(i + 1, v) such that T ⊆ S. If we combine i ∈ / S with the upset property, then we get i ∈ / T . This shows T ∈ set(i, v) and, hence, T = S because of S ∈ min(set(i, v)). After the case of i < lab(v) we still have to consider the case of i = lab(v), i.e., that i is the label of the node v. The corresponding result looks as follows. Lemma 4.8. Let v be an inner node of the ROBDD r with the label i := lab(v) and let us suppose that the set set(i, v) is an upset in (2{i,...,n} , ⊆). Then we have the following equation: min(set(i, v)) = {S + i | S ∈ min(T) \ min(E)} ∪ min(E) . (4.45) Here the two subsets T and E of the powerset 2{i+1,...,n} are defined as follows: T := set(i + 1, then(v)) and E := set(i + 1, else(v)) .

(4.46)

Proof. Inclusion ⊆: Since there no set S ∈ min(E) exists such that S + i ∈ min(set(i, v)) is true, we get for all sets in min(set(i, v)) of the form S + i that S ∈ / min(E) is true. As a consequence, it suffices to prove the following inclusion: min(set(i, v)) ⊆ {S + i | S ∈ min(T)} ∪ min(E) .

(4.47)

298

Exploration of Properties

To this end we assume an arbitrary set T ∈ min(set(i, v)) to be given. Then we distinguish the following two cases to prove the claimed inclusion: (1) Let i ∈ T . We define S := T − i and get S ∈ T and T = S + i. In order that T becomes an element of the left part of the union of (4.47), we have to verify S ∈ min(T). So, let U ∈ T be given such that U ⊆ S. Then we have U + i ∈ set(i, v) and U + i ⊆ S + i = T and this yields U + i = T because of T ∈ min(set(i, v)). Now i ∈ / U and i ∈ T show U = T − i = S. (2) Let i ∈ / T . Here we obtain T ∈ E. To establish T as an element of the right part of the union of (4.47), we assume U ∈ E with U ⊆ T . Since also U ∈ set(i, v) is true, from T ∈ min(set(i, v)) we can conclude U = T . Inclusion ⊇: We split the proof into two parts. To verify the inclusion min(set(i, v)) ⊇ {S + i | S ∈ min(T) \ min(E)} , we assume S ∈ min(T) \ min(E) and consider S + i. Because of S ∈ min(T) we get S ∈ T and this implies S + i ∈ set(i, v). Now we assume to the contrary that S + i ∈ / min(set(i, v)). Since X is finite, there exists T ∈ min(set(i, v)) such that T ⊂ S + i. Again, we distinguish two cases: (1) Suppose that i ∈ T . Then we obtain T −i ⊂ S and also T −i ∈ T. But this is a contradiction to S ∈ min(T). (2) Let i ∈ / T . Here we have T ⊆ S and it is easy to show that T ∈ min(set(i, v)) not only implies T ∈ E but even T ∈ min(E). To conclude this case, we consider two subcases: a) If T = S, then we get S ∈ min(E) and this contradicts S ∈ min(T) \ min(E). b) The remaining case T ⊂ S also contradicts this property. Because the set set(i, v) is an upset in (2{i,...,n} , ⊆), from T ∈ set(i, v) we get T + i ∈ set(i, v) and from the latter property then T = (T + i) − i ∈ T, so that S is not a minimal set in T as assumed.

ROBDD-based Computation of Special Sets

299

Finally, to prove the still necessary inclusion min(set(i, v)) ⊇ min(E), let an arbitrary S ∈ min(E) be given. Then we have S ∈ E and this yields S ∈ set(i, v). To verify that S ∈ min(set(i, v)) we assume T ∈ set(i, v) such that T ⊆ S. From S ∈ E we get i ∈ / S, this shows i∈ / T and the latter yields T ∈ E. Now T = S is a consequence of the minimality of S in E. In the following last of our series of preparatory lemmas the main part of the correctness proof of the above algorithm is shown. Lemma 4.9. Assume that the set set(1, r) represented by the ROBDD r is an upset in (2X , ⊆). Then for all labels i ∈ X and all nodes v of r with i ≤ lab(v) we have the following equation: set(i, Minsets(i, v)) = min(set(i, v)) .

(4.48)

Proof. We use induction on k, where the natural number k is defined by k := n − i + 1. For the induction base k = 0 we assume that v is a node with the label i ≤ lab(v). Since k = 0 implies i = n + 1, the node v is a sink. If it is the 1-sink, then we get set(i, Minsets(n + 1, I)) = set(i, I) = {∅} = min({∅}) = min(set(i, I)) , and if it is the 0-sink, then we get set(i, Minsets(n + 1, O)) = set(i, O) = ∅ = min(∅) = min(set(i, O)). So, in both cases we have shown the desired equation (4.48). For the induction step we assume k > 0, i.e., i < n + 1, and that the statement holds for k − 1, i.e., the results of the functions set and Minsets with the first argument i + 1. Let v be a node with the label i ≤ lab(v). Because of the above calculations we may suppose that v is not a sink. We distinguish two cases:

300

Exploration of Properties

(1) Let i < lab(v). In this case we have i + 1 ≤ lab(v) and we obtain via, the definition of the function Minsets, the following equation: set(i, Minsets(i, v)) = set(i, ite(i, O, Minsets(i + 1, v))) . From the assumption v = O we get Minsets(i + 1, v) = O and this property implies ite(i, O, Minsets(i + 1, v)) = O. Since i is the label of the root of the ROBDD ite(i, O, Minsets(i + 1, v)) we can calculate as follows: set(i, ite(i, O, Minsets(i + 1, v))) = {S + i | S ∈ set(i + 1, O)} ∪ set(i + 1, Minsets(i + 1, v)) = ∅ ∪ set(i + 1, Minsets(i + 1, v)) = min(set(i + 1, v)) = min(set(i, v)) . The first two steps of this calculation use the definition of the function set, then the induction hypothesis is applied, and the last step follows from Lemma 4.7. Altogether, we have shown the desired equation (4.48). (2) Let i = lab(v). Guided by the code of the function Minsets we introduce two auxiliary ROBDDs t and e: t := Minsets(i+1, then(v)) and e := Minsets(i+1, else(v)) . If we use again the two abbreviations T := set(i + 1, then(v)) and E := set(i + 1, else(v)) for sets T and E introduced in Lemma 4.8, then the induction hypothesis implies the equality set(i + 1, t) = min(set(i + 1, then(v))) = min(T) for the set T and also the equality set(i + 1, e) = set(i + 1, else(v)) = min(E) for the set E, by reason of i + 1 ≤ lab(then(v)) and i + 1 ≤ lab(else(v))

ROBDD-based Computation of Special Sets

301

and the upset property of set(i + 1, then(v)) as well as set(i + 1, else(v)) in the corresponding powerset orderings. From the definition of the function Minsets we get: set(i, Minsets(i, v)) = set(i, ite(i, diff (t, e), e)) . Similar to the previous case it can be shown that i is the label of the root of the ROBDD ite(i, diff (t, e), e). Hence, we can use the definition of set and get via, Lemma 4.5 and the two equations implied by the induction hypothesis, the following result: set(Minsets(i, v)) = set(i, ite(i, diff (t, e), e)) = {S + i | S ∈ set(i + 1, diff (t, e))} ∪ set(i + 1, e) = {S + i | S ∈ set(i + 1, t) \ set(i + 1, e)} ∪ set(i + 1, e) = {S + i | S ∈ min(T) \ min(E)} ∪ min(E) . Now Lemma 4.8 again shows the desired equation (4.48). This completes the proof of the induction step and, hence, also that of the lemma. As an immediate consequence of Lemma 4.9 we get the following result that shows the correctness of the function Minsets. Theorem 4.12. If the ROBDD r represents the subset A of 2X and A is an upset in (2X , ⊆), then the ROBDD computed by the call Minsets(1, r) represents the subset min(A) of 2X . We can compute the ROBDD-representation of the subset max(A) of 2X from the ROBDD-representation of the subset A of 2X in a very similar manner to the function Minsets. The corresponding function Maxsets looks as given in Algorithm 4.2, where we have again renounced the use of a computed table. It has to be invoked with the negation of the ROBDD in question. Also for the correctness of the function Maxsets an inheritance condition has to be satisfied. The subset A has to be a downset in the powerset ordering (2X , ⊆).

302

Exploration of Properties

Algorithm 4.2. Maxsets(i, v) Input: the label i and the node v of a ROBDD of the subset A of the powerset 2X Output: ROBDD-representation of max(A) if i = n + 1 then return neg(v) else if i < lab(v) then return ite(i, Maxsets(i + 1, v), O) end if if i = lab(v) then t := Maxsets(i + 1, then(v)) e := Maxsets(i + 1, else(v)) return ite(i, t, diff (e, t)) end if end if This property means that for all S, T ∈ 2X from S ∈ A and T ⊆ S it follows that T ∈ A. The correctness proof of the function Maxsets is very similar to that of the function Minsets. Therefore, in the following we only formulate the corresponding theorem and omit its proof. Theorem 4.13. If the ROBDD r represents the subset A of 2X and A is a downset in (2X , ⊆), then the ROBDD computed by the call Maxsets(1, neg(r)) represents the subset max(A) of 2X . Again, it is an easy exercise to insert a computed table into Maxsets to obtain a total number of recursive calls that is in O(|r|). It should be mentioned that the upset and downset properties are necessary for the correctness of the functions Minsets and Maxsets. We close this subsection with a counter-example for the first function. Example 4.42. We define by X := {1, 2} the set X of two labels and consider the subset A := {∅, {1, 2}} of the powerset 2X , that obviously is not an upset in the powerset ordering (2X , ⊆). The subset A is represented by the ROBDD over X that is depicted in Figure 4.3. If r denotes its root and we call Minsets(1, r), then we get the results

ROBDD-based Computation of Special Sets

303

1 2

2

1

0

Figure 4.3. A Small ROBDD r.

of the recursive calls as follows, where the variables t and e are those of Algorithm 4.1. For t we get ite(2, I, O) because of t = Minsets(2, then(r)) = Minsets(2, ite(2, I, O)) = ite(2, diff (I, O), O) as Minsets(3, I) = I and Minsets(3, O) = O = ite(2, I, O) . For e we get ite(2, O, I) because of e = Minsets(2, else(r)) = Minsets(2, ite(2, O, I)) = ite(2, diff (O, I), I) as Minsets(3, O) = O and Minsets(3, I) = I = ite(2, O, I) . As a consequence, the ROBDD computed by the call Minsets(1, r) equals the ROBDD r, due to the subsequent calculation: Minsets(1, r) = ite(1, diff (t, e), e) = ite(1, diff (ite(2, I, O), ite(2, O, I)), ite(2, O, I)) = ite(1, ite(2, diff (I, O), diff (O, I)), ite(2, O, I)) = ite(1, ite(2, I, O), ite(2, O, I)) =r . The ROBDD r does not represent the subset min(A) = {∅} of the powerset 2X , but {∅, {1, 2}}, that is, the subset A.

304

Exploration of Properties

4.2.5. Applications Concerning RelView RelView [30, 31, 260] is a specific purpose Computer Algebra system for the manipulation and visualization of relations and for relationalgebraic prototyping and programming. Its core is a ROBDD-based implementation of the constants and operations of heterogeneous relation algebra in the sense of [282, 283], that basically has been developed in the course of the PhD theses [195] and [212]. Meanwhile, the ROBDD-versions of the algorithms of [219] for computing the sets min(A) and max(A) and the refinements we have presented in Subsection 4.2.4 are also implemented in RelView (Version 8.1). This version of RelView was released in September 2012 and is available free of charge using the URL [260]. The implemented operations can be applied to relational vectors which describe relations with a specific singleton set 1 := {⊥} as the target. Such relations can be considered as Boolean column vectors. Using the typing notation of the relational community with the double arrow, a vector v with source A is typed by v : A ↔ 1. Then va is used instead of (a, ⊥) ∈ v and {a ∈ A | va } is defined as the set the vector v represents. In the specific case A = 2Y of a powerset as the source the immediate transferability of the above mentioned ROBDD-algorithms to four RelView-operations, which yield for the vector representation v : 2Y ↔ 1 of a subset A of 2Y the vector representations: (1) minsets(v) : 2Y ↔ 1 of min(A); (2) minsets upset(v) : 2Y ↔ 1 of min(A), if A is an upset in (2Y , ⊆); (3) maxsets(v) : 2Y ↔ 1 of max(A); (4) maxsets downset(v) : 2Y ↔ 1 of max(A) if A is a downset in (2Y , ⊆), are a consequence of the technique applied within RelView to implement relations via ROBDDs.

ROBDD-based Computation of Special Sets

305

Namely, if the set Y has the form {x1 , . . . , xn }, the ROBDD r over the set of labels X := {1, . . . , n} implements the vector v, the elements of the set Y are taken as the variables of the Boolean function fr , and the sets of 2Y are taken as variable assignments, then the specific typing of the vector v implies that A is the set of satisfying assignments of fr . See [30, 31] for details. Altogether, we have set(1, r) = A as the algorithms demand. To give an impression of RelView-applications and their running times, in the remainder of this section we present an example from game theory. It has been carried out with RelView Version 8.1 on a Sun Fire V440 workstation with an UltraSPARC-IIIi processor, 1.6 GHz and 16 GByte main memory. A simple game is a pair (X, A) with a finite set X = {1, 2, . . . , n} of players and a subset A of 2X that is an upset in (2X , ⊆). The sets from 2X are called coalitions and those from A winning coalitions. An important special case is the Weighted Voting Game (WVG). Here every player i ∈ X has a weight wi ∈ N>0 and there + is a quota q ∈ N>0 . A coalition S ∈ 2X is winning if and only if i∈S wi ≥ q. A Multiple Weighted Voting Game (MWVG) is specified using lists of WVGs and here a coalition is winning if and only if it is winning in every WVG. Example 4.43. A MWVG with three WVGs is given by the following lists, where in each case the first list element is the quota and the remaining ones (after the semicolon) are, from left to right, the weights of the 27 players: g1 = [ 255; 29, 29, 29, 29, 27, 27, 14, 13, 12, 12, 12, 12, 12, 10, 10, 10, 7, 7, 7, 7, 7, 4, 4, 4, 4, 4, 3 ] g2 = [ 14; 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ] g3 = [ 620; 170, 123, 122, 120, 82, 80, 47, 33, 22, 21, 21, 21, 21, 17, 17, 11, 11, 11, 8, 8, 5, 4, 3, 2, 1, 1 ] . This MWVG models the European Union (EU) after the treaty of Nice (2003), with the players 1, 2, . . . , 27 corresponding, in the same order, to the (at that time) 27 EU countries Germany, United Kingdom, France, Italy, Spain, Poland, Romania, The Netherlands, Greece, the Czech Republic, Belgium, Hungary, Portugal, Sweden, Bulgaria,

306

Exploration of Properties

Austria, the Slovak Republic, Denmark, Finland, Ireland, Lithuania, Latvia, Slovenia, Estonia, Cyprus, Luxembourg, and Malta. For instance, the second WVG g2 corresponds to the rule that for the acceptance of a proposal at least 14 of the 27 EU countries have to vote yes. One of the most important topics is to measure the power of players in simple games. The power indices are one of the most common approaches for expressing the influence of the players. We concentrate on the Holler-Packel power index, introduced in [139]. It assigns to each player i ∈ X a real number δi ∈ [0, 1] as follows: δi :=

p , q

(4.49)

where the numerator of the fraction is defined as p := |{S ∈ min(A) | i ∈ S}|

(4.50)

and denotes the number of minimal winning coalitions that contain the player i in question, and the denominator of the fraction is defined as q := |{(j, S) ∈ X × min(A) | j ∈ S}| (4.51) and denotes the number of so-called swings, that is, the number of possibilities to make a minimal winning coalition lose by the removal of a single player. Hence, for this specification of power only the minimal winning coalitions of the simple game are relevant. We have used RelView to compute the Holler-Packel power indices for the above MWVG. To this end, we first entered the three lists above into the tool and obtained three vectors v1 , v2 , v3 : 2X ↔ 1, where vj represents the set Aj of the winning coalitions of the WVG gj , for j ∈ {1, 2, 3}. In [32] such vectors are called vector models of simple games. Next, we computed v := v1 ∩ v2 ∩ v3 . The vector v : 2X ↔ 1 represents the set A := A1 ∩ A2 ∩ A3 of winning coalitions of the MWVG for the treaty of Nice. RelView showed that |A| = 2, 718, 752 and only 599 nodes are necessary to represent this set by a ROBDD. For computing from the vector model v of the MWVG the vector representation vmin : X ↔ 1 of the set min(A), a set with |min(A)| =

ROBDD-based Computation of Special Sets

307

Table 4.3. The Holler-Packel power indices computed by RelView δi 0.0483488 0.0483487 0.0471566 0.0383454 0.0376467 0.0368833 0.0353909 0.0331480 0.0307141 0.0264000

countries i Germany United Kingdom, France, Italy Spain, Poland Romania The Netherlands Greece, Czech Republic, Belgium, Hungary, Portugal Sweden, Bulgaria, Austria Slovakia, Denmark, Finland, Ireland, Lithuania Latvia, Slovenia, Estonia, Cyprus, Luxembourg Malta

561, 820 elements that can be represented by a ROBDD with 611 nodes, the tool reported 0.19 seconds as the running time when using the operation minsets and 0.00 seconds when using the operation minsets upset. In [32] it is shown how, by means of a simple relation-algebraic expression, the vector vmin can be transformed into a relation M : X ↔ min(A) . The single columns of M , when using the Boolean matrix model of a relation, represent the single sets of the set of sets min(A). The step from vmin to M corresponds to the computation of the so-called membership model of a simple game from its vector model, as shown in [32]. In the same paper, it is also demonstrated how a specific feature of the RelView tool allows us to get M , when it is again considered as a Boolean matrix, the number |M | of its 1-entries as well as the number |M |i of 1-entries for each row i, i.e., the Holler-Packel power indices as a comparison with the definition of δi shows. The latter numbers are listed in Table 4.3. To present some additional information, Germany is a member of precisely |M |1 = 486, 389 minimal winning coalitions of the EU game (this is the largest such number) and Malta is a member of precisely |M |27 = 265, 584 minimal winning coalitions (this is the smallest such number). All in all there are precisely |M | = 10, 060, 000 1-entries in M and it took RelView 152.7 seconds to compute M from vmin .

308

Exploration of Properties

The number of nodes of the ROBDD that implements M is 149,745. We also have applied the newest version of RelView to other problems. Here we only want to mention the computation of Banks winners, a concept from Social Choice theory that we mentioned in the introduction. We refer the interested reader to [27]. A series of experiments with RelView confirms the enormous gain in efficiency by using the operations maxsets and maxsets downset instead of a relation-algebraic specification of the set of maximal sets by means of the set-inclusion relation. The experiments also show that the operation maxsets downset is much faster than maxsets.

4.2.6. Concluding Remarks In this section we have refined two algorithms from the ZBDD-library EXTRA to ROBDD-algorithms that lead (in the versions with computed tables) to a total number of O(|r|) recursive calls, with r as input ROBDD. The new algorithms proved to be significantly faster than the original ones. For their correctness inheritance conditions are necessary which are often satisfied in practice. In the meantime, the algorithms are part of the newest version of the Computer Algebra System RelView and we have demonstrated an application that comes from game theory. Although both algorithms need only O(|r|) recursive calls, this does not mean that their running times are in O(|r|), even if the costs for the computed table are disregarded. The running times suffer from the recursive use of the ROBDD-operation diff , since this operation has a non-constant running time. In Subsection 4.2.4 we have mentioned that our approach is based on [45, 46], where similar QOBDD-based algorithms are developed for problems in simple games. In [45] it is shown that for the specific class of directed simple games the use of diff can be avoided.

Functions with Bent Reed-Muller Spectra

309

4.3. Multiple-valued Functions with Bent Reed-Muller Spectra Claudio Moraga

´ Milena Stankovic

´ Radomir S. Stankovic

4.3.1. Background The study of properties of Boolean and multiple-valued functions has drawn the interest of mathematicians, computer scientists and engineers since early in the 20th century, possibly starting with the most obvious properties, addressing monotone functions, and balanced functions, going on to linear and non-linear functions. The study of symmetric functions, starting with the works of Y. Komamiya [179] and G. Epstein [108] in the early 1950s, when the computer era was starting, showed that they constituted very effective building blocks for the realization of complex digital circuits. Several early results were later extended to the multiple-valued case [308]. A similar development may be reported with respect to threshold functions in the early 1960s, possibly starting with Michael Dertouzos’ dissertation at MIT in 1964 [93], with a text book appearing in 1967 [196]. In the ternary world the early works of Hanson [136] and Merril [210] should be mentioned. The monograph [225] summarizes many advances in the study of ternary threshold functions. A comprehensive summary on different classifications of Boolean functions may be found in Chapter 2 of [146]. The Chapter includes the classification of Boolean functions based on their Walsh spectra, what was later extended to multiple-valued functions [382] based on the Chrestenson spectra. Spectral techniques applied to digital systems may be traced back to the work of Lechner [191] for the binary case and to Karpovsky[159] for the multiple-valued case. This has continued to be an area of intensive research [160, 228, 310].

310

Exploration of Properties

A relationship between the Walsh spectral coefficients and the characterizing parameters of binary threshold functions was disclosed in [105]. Similarly, a relationship between the Chrestenson coefficients and parameters of ternary threshold functions was pointed out in [227]. This was later extended to p-valued threshold functions [382]. With respect to non-linear functions, in 1976 Rothaus [263] introduced the most non-linear Boolean functions under the name bent functions, which immediately captured the interest of people working in cryptography. From a classification point of view it is interesting to mention that for n = 2 and n = 4 it has been shown that all Boolean bent functions belong to the same Walsh class or exhibit Walsh Decision Diagrams with the same shape [310]. Extensions of the work of Rothaus to multiple-valued functions were first reported in [185], meanwhile in [197] these functions were studied from a spectral point of view. Please refer to [349] for a recent overview of work carried out by mathematicians with the aim of counting, characterizing, and generating generalized bent functions. In [230] it was shown that there are 18 ternary bent functions on one argument, 486 ternary bent functions on two arguments, 100 five-valued bent functions on one argument, and in [231] it was reported that there are at least 700,000 ternary bent functions on four arguments.

4.3.2. Formalisms The core of spectral techniques, as will be used in this section, may be summarized as follows: • Let f : GF (p)n → GF (p), p prime, p > 2, be a p-valued function on n arguments and let F denote its value vector , i.e., F = [f (0), f (1), . . . , f (pn − 1)]T , where for the numbering of the arguments the isomorphism between the Galois field GF (pn ) and the vector space GF (p)n is used. • The correspondence of f and F will be written f ↔ F. • It is simple to see that for any pn × pn non-singular matrix T,

Functions with Bent Reed-Muller Spectra

311

with entries form GF (p), it holds that f ↔ (T · T−1 ) · F. • The columns of T may be seen as value vectors of one-place functions in GF (p), and since T is non-singular, these one-place functions are linearly independent and constitute a basis. • T−1 is the inverse of the numerical version of T and is called the transform matrix. • The product T−1 · F is known as the spectrum of f and it is frequently denoted as Sf . • The product (T · T−1 ) · F in GF (p) with T in a symbolic representation and T−1 in a numerical representation, leads to a polynomial expression for f in terms of the basis functions. This is called the spectral transformation of f . • Furthermore the product T · Sf , sometimes called the inverse spectrum, returns the value vector F. For a deeper understanding of spectral techniques, the reader is referred to, e.g., [160, 309]. In this section, the Reed-Muller spectrum of p-valued functions is considered. A particular property of the Reed-Muller transform of a p-valued function is the fact that the spectrum is also a p-valued function. From a formal point of view, the Reed-Muller transform constitutes an automorphism in the set of p-valued functions for any prime p. Definition 4.35. The matrix T representing the basis of the ReedMuller transform of p-valued functions is defined as follows:

1

x

x2

x3

...

xp−1

1 ⎢1 ⎢ ⎢1 ⎢ = ⎢1 ⎢ ⎢ .. ⎣.

0 1 2 3 .. .

0 1 22 32 .. .

0 1 23 33 .. .

... ... ... ... .. .

0 1 2p−1 3p−1 .. .

p−1

(p − 1)2

(p − 1)3

...

(p − 1)p−1

T(1) =

⎡

1

⎤

(4.52)

⎥ ⎥ ⎥ ⎥ ⎥ mod p , ⎥ ⎥ ⎦

312

Exploration of Properties

T(n) = 1 1 1 1

xn−1

x2n−1

xn−2

x2n−2

x1

x21

x0

x20

xp−1 n−1 ⊗

. . . xp−1 n−2 ⊗ . . . ⊗

⊗ . . . xp−1 1

p−1 mod p . . . . x0 ...

(4.53)

Numerically: T(n) = (T(1))⊗n ,

(4.54)

where, ⊗, as the operation symbol, denotes the Kronecker product, and as the exponent, the n-fold Kronecker product of the argument with itself. The Reed-Muller transform matrix, denoted as RM, is obtained as the inverse of the numerical version of the basis matrix. Let r be the symbolic representation of a row value vector

r = 0 1 2 ... p − 1 , where as in (4.52), rk =

0

1

2k

3k

...

(p − 1)k

,

for any k ∈ GF (p), accepting that 00 = 1. Then [226], ⎡ p−1 ⎤ r + [p − 1, p − 1, . . . , p − 1] ⎢ ⎥ rp−2 ⎢ ⎥ p−3 ⎢ ⎥ r ⎢ ⎥ RM(1) = (p − 1) · ⎢ ⎥ .. ⎢ ⎥ . ⎢ ⎥ 1 ⎣ ⎦ r 0 r and

RM(n) = (RM(1))⊗n .

(4.55)

(4.56)

Lemma 4.10. For any p prime, p > 2 and n > 1, the row with index πn−1 = (pn−1 + pn−2 + · · · + p + 1) of T(n) is a row of 1s. Proof. Equation (4.52) shows that in T(1), the second row, i.e., the row with index 1 is a row of 1s. It is simple to prove that in T(2) it will be the row with index p + 1.

Functions with Bent Reed-Muller Spectra

313

Induction assumption: In T(k), let the row with index πk−1 = (pk−1 + pk−2 + · · · + p + 1) be the row of 1s. Moreover, T(k) has dimension pk . Induction step: Let k be increased by 1. T(k + 1) = T(1) ⊗ T(k). It is simple to see that T(k + 1) will have a first row-block of height pk comprising a single T(k) matrix and zero-blocks otherwise; followed by a second row-block consisting of p T(k) matrices; therefore the row of 1s will have index pk + πk−1 = pk + (pk−1 + pk−2 + · · · + p + 1) , which equals πk . This completes the proof. Corollary 4.4. Let f : GF (p)n → GF (p), with F its value vector, and let R denote its Reed-Muller spectrum. Since F = T(n) · R = (T(1)⊗n ) · R then F(πn−1 ) has the value corresponding to the sum mod p of all spectral components. Lemma 4.11. Let f : GF (p)n → GF (p), with F its value vector, and let R denote its Reed-Muller spectrum, i.e., R = RM(n) · F. Then R(pn −1) has the value corresponding to the sum mod p of all function values, scaled by (p − 1)n . Proof. Eq. (4.55) shows that the last row of RM(1) is a row of 1s, scaled by (p − 1). It is simple to deduce that the last row of RM(n) = (RM(1))⊗n will be a row of 1s scaled by (p − 1)n accordingly. The assertion follows. Definition 4.36. The modular weight of a p-valued vector is given by the sum mod p of all its components. The symbol ϕ will be used to denote the modular weight. Lemma 4.12. The modular weight ϕ of the Kronecker product of two p-valued vectors of length pn and pm , respectively, equals the product of the individual modular weights of the corresponding vectors.

314

Exploration of Properties

Proof. Let N = pn −1 and M = pm −1. Let V1 = [v10 , v11 , . . . , v1N ]T and V2 = [v20 , v21 , . . . , v2M ]T . Moreover, let ⊕ denote the mod p addition. Then V1 ⊗ V2 = [v10 v20 , v10 v21 , . . . , v10 v2M , v11 v20 , v11 v21 , . . . , v11 v2M , ..., v1N v20 , v1N v21 , . . . , v1N v2M ]

T

,

ϕ(V1 ⊗ V2 ) = v10 v20 ⊕ v10 v21 ⊕ . . . ⊕ v10 v2M ⊕ v11 v20 ⊕ v11 v21 ⊕ . . . ⊕ v11 v2M ⊕ ...⊕ v1N v20 ⊕ v1N v21 ⊕ . . . ⊕ v1N v2M (modp) = v10 (v20 ⊕ v21 ⊕ . . . ⊕ v2M )⊕ v11 (v20 ⊕ v21 ⊕ . . . ⊕ v2M )⊕ ...⊕ v1N (v20 ⊕ v21 ⊕ . . . ⊕ v2M ) = (v10 ⊕ v11 ⊕ . . . ⊕ v1N )(v20 ⊕ v21 ⊕ . . . ⊕ v2M ) = ϕ(V 1) · ϕ(V2 ) .

Lemma 4.13. A p-valued function is bent if and only if its circular Vilenkin-Chrestenson spectrum is flat, i.e., all spectral components have magnitude pn/2 [185, 230]. The tensor sum (&) is the analogue operator to the Kronecker product, but using additions mod p instead of products as elementary operations [230]. The tensor sum of the value vectors of any two p-valued bent functions produces the value vector of a new p-valued bent function on as many arguments as the sum of the number of arguments of the two original functions [230]. Similarly, as has been stated earlier in [185] and [310], without using the formalism of a tensor sum however. Definition 4.37. Given a matrix, with entries from Zp its structure is another matrix of the same dimensions, with all entries equal to 1.

Functions with Bent Reed-Muller Spectra

315

Lemma 4.14. Let N = pn − 1 and M = pm − 1, m, n ∈ N. Let V1 = [v10 , v11 , . . . , v1N ]T and V2 = [v20 , v21 , . . . , v2M ]T with entries from GF (p). Moreover, let 1N +1 denote the structure of V2 as a column vector of N + 1 1s; similarly for 1M +1 . Then V1 & V2 = (V1 ⊗ 1M +1 ) ⊕ (1N +1 ⊗ V2 ) . Proof. V1 & V2 = [v10 ⊕ v20 , v10 ⊕ v21 , . . . , v10 ⊕ v2M , v11 ⊕ v20 , v11 ⊕ v21 , . . . , v11 ⊕ v2M , ..., v1N ⊕ v20 , v1N ⊕ v21 , . . . , v1N ⊕ v2M ]T = (V1 ⊗ 1M +1 )⊕ (v20 v21 . . . v2M , v20 v21 . . . v2M , . . . , v20 v21 . . . v2M ) = (V1 ⊗ 1M +1 ) ⊕ (1N +1 ⊗ V2 )

Corollary 4.5. Extending the lemma to the tensor sum of matrices is straight forward. Corollary 4.6. The tensor sum is associative, but in general not commutative. Lemma 4.15. Let V1 and V2 be as in Lemma 4.14. Then ϕ(V1 & V2 ) ≡ 0 mod p . Proof. ϕ(V1 & V2 ) = ϕ[(V1 ⊗ 1M +1 ) ⊕ (1N +1 ⊗ V2 )] = ϕ(V1 ⊗ 1M +1 ) ⊕ ϕ(1N +1 ⊗ V2 ) = ϕ(V1 )ϕ(1M +1 ) ⊕ ϕ(1N +1 )ϕ(V2 ) However, since ϕ(1M +1 ) ≡ 0 mod p and ϕ(1N +1 ) ≡ 0 mod p, the assertion follows.

316

Exploration of Properties

Definition 4.38. For a given p, let γ be the set of all one-place pvalued bent functions. Let Γ denote the set of all p-valued functions obtained as the tensor sum of one-place bent functions, including repetitions and reorderings. Lemma 4.16. The elements of Γ are bent and if V ∈ Γ then ϕ(V) ≡ 0 mod p . Proof. The elements of Γ are bent, since they are generated as the tensor sum of bent functions. Their modular weight is 0 mod p according to Lemma 4.15. Remark 4.1. It has been shown [230] that there are 18 ternary oneplace bent functions and 100 five-valued one-place bent functions. Direct calculations have proven that the modular weight of the one-place ternary bent functions is not 0 mod 3, meanwhile the modular weight of the five-valued one-place bent functions is indeed 0 mod 5. Consequence 4.1. A necessary condition for a Reed-Muller spectrum R to be bent (and belong to Γ) is that ϕ(R) ≡ 0 mod p . Consequence 4.2. A necessary condition for an n-place p-valued function to have a bent Reed-Muller spectrum in Γ is, from Corollary 4.4, that f (πn−1 ) = 0 .

4.3.3. The Maiorana Class of Bent Reed Muller Spectra Rothaus disclosed in his pioneering paper [263], a class of binary bent functions, which J. A. Maiorana effectively improved. There are however no clear references for that in the literature, except for perhaps [199]. The Maiorana class was extended to the multiple-valued case [185] and a matrix-oriented proof of correctness was presented in [231]. The Maiorana class contains only n-place bent functions, where n is even.

Functions with Bent Reed-Muller Spectra

317

Definition 4.39. Let M = [mi,j ], with mi,j = i · j mod p, i, j ∈ Zp . Lemma 4.17. Extending the concept of modular weight to matrices, ϕ(M ) ≡ 0 mod p .

(4.57)

Proof. It is obvious that the 0-th row of M is a row of p zeros, therefore its ϕ is also 0. For the other rows, since (Zp , ·) is the multiplicative group of GF (p) and p is a prime, there are no zero-dividers. Hence every row will contain a permutation of the elements of Zp . Therefore ϕ of each of these rows is given by [(p − 1) · p]/2 and this is congruent with 0 mod p, since p − 1 is even. This proves Equation (4.57). Lemma 4.18. Let M[q] denote the q-fold tensor sum of M with itself. Then (4.58) ϕ(M[q] ) ≡ 0 mod p . Proof. Equation (4.58) follows directly from Lemma 4.17 and Corollary 4.4. Definition 4.40. Let Πk denote the Kronecker product of k not necessarily different p × p elementary permutation matrices. Let n = 2q, Q = pq , and let R denote the Reed-Muller spectrum of an n-place p-valued function. Following [231], if R = vec[(M[q] · Πq ) ⊕ (1Q ⊗ GT )] , where vec is a vectorizing operation connecting the columns of a matrix into a single column vector, and G is the value vector of a q-place random p-valued function, then R is an n-place bent Reed-Muller spectrum. It will be said that R belongs to the Maiorana class. Lemma 4.19. If R is a bent Reed-Muller spectrum of the Maiorana class, then ϕ(R) ≡ 0 mod p .

318

Exploration of Properties

Proof. ϕvec[(M[q] · Πq ) ⊕ (1Q ⊗ GT )] = ϕ(M[q] · Πq ) ⊕ ϕ(1Q ⊗ GT ) Since Πq only reorders the columns of MR[q] , ϕ(M[q] · Πq ) = ϕ(M[q] ), which after Lemma 4.18 is congruent with 0 mod p. With Lemma 4.12, ϕ(1Q ⊗ GT ) = ϕ(1Q ) · ϕ(GT ) . Obviously, ϕ(1Q ) ≡ 0 mod p ; therefore ϕ(1Q · GT ) ≡ 0 mod p . The assertion follows. Consequence 4.3. If R denotes the Reed-Muller spectrum of an nplace p-valued function, where n is even, ϕ(R) ≡ 0 mod p is a necessary condition for R to be bent and to belong to the Maiorana class. Consequence 4.4. A necessary condition for an n-place p-valued function to have a bent Reed-Muller spectrum in the Maiorana class is f (πn−1 ) = 0 . Note that n will be even. Recall that members of the Γ class are defined for all n > 1; meanwhile, the members of the Maiorana class have an even n. Furthermore in [231] it was shown that there are bent functions with n even that are not members of the Maiorana class, but are realized in the Γ class. Therefore, the Γ and the Maiorana class are different. It is easy 2 to see that Consequences 4.1, . . . , 4.4 may be summarized for Γ Maiorana. Furthermore, a Maiorana-based Γ-type new class may be constructed by taking the tensor sum of Maiorana Reed-Muller spectra. It is easy

Functions with Bent Reed-Muller Spectra

319

to see that the elements of this class will be n-place with n even, they will satisfy the necessary condition ϕ(R) ≡ 0 mod p , and the functions generating the spectra will satisfy the necessary condition f (πn−1 ) = 0 . Similarly the tensor sum of Γ-spectra and Maiorana spectra will allow the generation of new bent spectra, particularly with n odd. They will also satisfy the necessary condition ϕ(R) ≡ 0 mod p , and the functions generating the spectra will satisfy the necessary condition f (πn−1 ) = 0 .

4.3.4. Special Cases when p = 3 Notice that πn−1 = (pn−1 + pn−2 + . . . + p + 1) = (pn − 1)/(p − 1) . Since in the ternary case p−1 = 2, then, πn−1 = (3n −1)/2. Therefore for all n > 1, the necessary condition for a ternary function to have a bent Reed-Muller spectrum is f ((3n − 1)/2) = 0. Definition 4.41. Let m : GF (p)n → GF (p)n be a mirror mapping such that if F = [f (0), f (1), . . . , f (pn − 2), f (pn − 1)] then mF = [f (pn − 1), f (pn − 2), . . . , f (1), f (0)] . It has been shown [229] that if F is the value vector of a ternary function and RM(F ) is bent, then both RM(mF ) and mRM(F ) are bent.

320

Exploration of Properties

The following lemmas for p = 3 and n = 2 have been proven by exhaustive search. Full tables are given in [229]. Lemma 4.20. There are 18 ternary two-place bent functions such that their Reed-Muller spectra are also bent. Example 4.44. Table 4.4 shows three ternary bent functions and the Table 4.4. Bent Reed-Muller spectra for ternary bent functions

Ternary bent functions

Bent Reed-Muller spectra

[102 000 201]T [200 101 002]T [001 202 100]T

[120 210 000]T [201 210 000]T [012 210 000]T

associated bent Reed-Muller spectra. Notice that the functions satisfy the necessary condition f (4) = 0 and, moreover, their value vectors are palindromes. Since they are bent, they may be considered to be bent Reed-Muller spectra of some other functions. Therefore they also satisfy the necessary condition ϕ(F) ≡ 0 mod 3. Lemma 4.21. Let F denote the value vector of a non-bent ternary function such that RM(F ) is bent and RM(F ) = [r0 , r1 , r2 , r3 , r4 , 1, r6 , 0, 0]T . Then, there are 18 ternary functions which satisfy the following conditions: RM(mF ) = RM(F ) ⊕ [222 111 000] mF = F ⊕ [202 111 101]

T

T

,

.

Example 4.45. Table 4.5 shows three ternary functions which satisfy the conditions of Lemma 4.21. Lemma 4.22. Let F denote the value vector of a non-bent ternary function such that RM(F ) is bent and RM(F ) = [r0 , r1 , r2 , r3 , r4 , 1, r6 , 0, 1]T .

Functions with Bent Reed-Muller Spectra

321

Table 4.5. Bent Reed-Muller spectra with the signature 100

F

Bent RM(F )

mF

Bent RM(mF )

[201 101 001]T [110 202 210]T [120 101 220]T

[210 201 000]T [121 001 100]T [110 101 200]T

[100 101 102]T [012 202 011]T [022 101 021]T

[102 012 000]T [010 112 100]T [002 212 200]T

Then, there are 18 ternary functions which satisfy the following conditions: RM(mF ) = R(F ) ⊕ [120 120 120] mF = F ⊕ [102 000 102]

T

T

,

.

Example 4.46. Table 4.6 shows three ternary functions which satisfy the conditions of Lemma 4.22. Table 4.6. Bent Reed-Muller spectra with the signature 101

F

Bent RM(F ) T

[210 202 210] [021 202 021]T [201 202 000]T

T

[220 001 001] [020 101 101]T [210 111 201]T

mF

Bent RM(mF ) T

[010 121 121]T [110 221 221]T [000 201 021]T

[012 202 012] [120 202 120]T [000 202 102]T

Lemma 4.23. Let F denote the value vector of a non-bent ternary function such that RM(F ) is bent and RM(F ) = [r0 , r1 , r2 , r3 , r4 , 0, r6 , 1, 0]T . Then, there are 18 ternary functions which satisfy the following conditions: RM(mF ) = RM(F ) ⊕ [210 210 210] mF = F ⊕ [201 000 201]

T

T

,

.

Example 4.47. Table 4.7 shows three ternary functions which satisfy the conditions of Lemma 4.23. Lemma 4.24. Let F denote the value vector of a non-bent ternary

322

Exploration of Properties

Table 4.7. Bent Reed-Muller spectra with the signature 010

F

Bent RM(F )

mF

Bent RM(mF )

[010 202 112]T [020 101 122]T [202 202 001]T

[022 220 010]T [011 020 110]T [222 120 210]T

[211 202 010]T [221 101 020]T [100 202 202]T

[202 100 220]T [221 200 020]T [102 000 120]T

function such that RM(F ) is bent and RM(F ) = [r0 , r1 , r2 , r3 , r4 , 1, r6 , 1, 0]T . Then, there are 18 ternary functions which satisfy the following conditions: RM(mF ) = RM(F ) ⊕ [102 021 210] mF = F ⊕ [100 000 002]

T

T

,

.

Example 4.48. Table 4.8 shows three ternary functions which satisfy the conditions of Lemma 4.24. Table 4.8. Bent Reed-Muller spectra with the signature 110

F

Bent RM(F ) T

[011 202 111] [120 101 022]T [102 101 202]T

T

[001 211 010] [110 221 110]T [120 111 210]T

mF

Bent RM(mF ) T

[100 202 220]T [212 212 020]T [222 102 120]T

[111 202 110] [220 101 021]T [202 101 201]T

Lemma 4.25. Let F denote the value vector of a non-bent ternary function such that RM(F ) is bent and RM(F ) = [r0 , r1 , r2 , r3 , r4 , 2, r6 , 1, 0]T . Then, there are 18 ternary functions which satisfy the following conditions: RM(mF ) = RM(F ) ⊕ [021 102 210] mF = F ⊕ [002 000 100]

T

T

,

.

Example 4.49. Table 4.9 shows three ternary functions which satisfy the conditions of Lemma 4.25.

Functions with Bent Reed-Muller Spectra

323

Table 4.9. Bent Reed-Muller spectra with the signature 210

F

Bent RM(F )

mF

Bent RM(mF )

[012 202 110]T [022 101 120]T [102 202 101]T

[010 202 010]T [002 002 110]T [120 222 210]T

[011 202 210]T [021 101 220]T [101 202 201]T

[001 001 220]T [020 101 020]T [111 021 120]T

Lemma 4.26. Let F denote the value vector of a non-bent ternary function such that RM(F ) is bent and RM(F ) = [r0 , r1 , r2 , r3 , r4 , 2, r6 , 1, 2]T . Then, there are 18 ternary functions which satisfy the following conditions: RM(mF ) = RM(F ) ⊕ [120 120 120] mF = F ⊕ [102 000 102]

T

T

,

.

Example 4.50. Table 4.10 shows three ternary functions which satisfy the conditions of Lemma 4.26. Table 4.10. Bent Reed-Muller spectra with the signature 212

F

Bent RM(F ) T

[000 101 201] [110 202 212]T [022 000 121]T

T

[000 102 012] [121 022 112]T [002 122 212]T

mF

Bent RM(mF ) T

[120 222 102]T [211 112 202]T [122 212 002]T

[102 101 000] [212 202 011]T [121 000 220]T

Lemma 4.27. Let F denote the value vector of a non-bent ternary function such that RM(F ) is bent and RM(F ) = [r0 , r1 , r2 , r3 , r4 , 0, r6 , 2, 1]T . Then, there are 18 ternary functions which satisfy the following conditions: RM(mF ) = RM(F ) ⊕ [021 102 210] mF = F ⊕ [002 000 100]

T

T

,

.

Example 4.51. Table 4.11 shows three ternary functions which satisfy the conditions of Lemma 4.27.

324

Exploration of Properties

Table 4.11. Bent Reed-Muller spectra with the signature 021

F

Bent RM(F )

mF

Bent RM(mF )

[000 101 200]T [212 202 112]T [021 101 020]T

[000 120 021]T [211 220 121]T [020 200 221]T

[002 101 000]T [211 202 212]T [020 101 120]T

[021 222 201]T [202 022 001]T [011 002 101]T

4.3.5. Properties which Support Future Research This study of the class of multiple-valued functions with bent Reed Muller spectra shows that they exhibit quite unusual properties. A general characterization is still missing; however, for the Γ-class, the Maiorana class and their tensor sums, the fact that f (πn−1 ) = 0, (for an n-place function, n > 1), is a necessary condition for f to have a bent Reed-Muller spectrum, which will have a modular weight congruent with 0 mod p. The fact that for n = 2 there are 486 ternary bent functions, which are considered as bent Reed-Muller spectra, makes it possible to study these functions/spectra by inspection and exhaustive searches. Some examples of equivalence classes in the set of two-place ternary functions with bent spectra are given. Even though the number of bent functions is relatively small with respect to all the functions for given p and n, the number grows too fast to make further inspections and searches possible. Multiple-valued functions with bent Reed-Muller spectra in the Γ or the Maiorana classes are well characterized by the necessary conditions above and allow moving to n > 2. To go deeper in this study would require combining the VilenkinChrestenson transform in the complex plane with the Reed-Muller transform in a Galois field, since a Reed-Muller spectrum of a multiplevalued function to be bent must satisfy that its circular VilenkinChrestenson spectrum must be flat, with magnitude pn/2 [230]. An early approach was shown in [230] for proof of the bentness of the elements of the Maiorana class.

Towards Future Technologies

5. Reversible Circuits 5.1. A Framework for Reversible Circuit Complexity Mathias Soeken

Nabila Abdessaied

Rolf Drechsler

5.1.1. Investigations Relating to Reversible Functions In this section we concern ourselves with a special class of Boolean multiple-output functions, called reversible functions, which are those functions f : Bn → Bm that are bijective, i.e., there exists a 1-to-1 mapping from the inputs to the outputs. Clearly, if f is reversible, then n = m. Boolean multiple-output functions that are not reversible are called irreversible. Reversible functions can be implemented in terms of reversible circuits. Reversible functions and circuits play an important role in quantum computation [24] and low-power computing [37]. A lot of research investigated the complexity of Boolean functions and Boolean circuits in the past [356, 365], however, no thorough considerations have been made for reversible functions and circuits so far. Recently the complexity of synthesis [65] and equivalence checking [155] have been individually investigated. Based on bounds for the number of gates in reversible circuits, in this section we propose a general framework that is helpful for the analysis of reversible circuit complexity. The first important bound is a linear upper bound with respect to single-target gates [90], the second one is an exponential

328

Reversible Circuits

lower bound with respect to Toffoli gates [204]. Single-target gates are convenient to use as a model for analysis of reversible functions as well as for the description of synthesis algorithms [1, 90], whereas Toffoli gates have been used in practical implementations [94]. Single-target gates can be mapped to a cascade of Toffoli gates using Exclusive-OR Sums Of Products (ESOP) mapping. This section is structured as follows. We first review ESOP mapping and reversible circuits. In Subsections 5.1.4–5.1.6, we show proof of the bounds that are discussed above. Subsection 5.1.7 describes the framework for complexity analysis and Subsection 5.1.8 illustrates an application based on the framework for a better than optimal embedding strategy.

5.1.2. Reversible Boolean Functions Boolean Functions Let B = {0, 1} denote the Boolean values. Then we refer to Bn,m = {f | f : Bn → Bm } as the set of all Boolean multiple-output functions with n inputs and n m outputs. There are 2m2 such Boolean functions. We write Bn = Bn,1 and assume that each f ∈ Bn is represented by a propositional formula over the variables x1 , . . . , xn . Furthermore, we assume that each function f ∈ Bn,m is represented as a tuple f = (f1 , . . . , fm ) where fi ∈ Bn for each i ∈ {1, . . . , m} and hence f (x) = (f1 (x), . . . , fm (x)) for each x ∈ Bn . The on-set of a Boolean function are all assignments that evaluate to true, i.e., on(f ) = {x ∈ Bn | f (x) = 1} . Similarly, the off-set is the set of all assignments that evaluate to false, i.e., off(f ) = {x ∈ Bn | f (x) = 0} .

A Framework for Reversible Circuit Complexity

329

Exclusive Sum of Products Exclusive-OR Sums Of Products (ESOP) [275] are two-level descriptions for Boolean functions in which a function is composed of k product terms that are combined using the exclusive-OR (EXOR, ⊕) operation. A product term is the conjunction of li literals where a literal is either a propositional variable x1 = x or its negation x0 = x. ESOPs are the most general form of two-level AND-EXOR expressions: f=

k

p

p il

xi1i1 ∧ · · · ∧ xil

i=1

i

i

.

(5.1)

Several restricted subclasses have been considered in the past, e.g., Positive Polarity Reed-Muller (PPRM) expressions [275] in which all literals are positive. There are further subclasses and most of them can be defined based on applying the following decomposition rules. An arbitrary Boolean function f (x1 , x2 , . . . , xn ) can be expanded as f = xi fxi ⊕ xi fxi f = fxi ⊕ xi (fxi ⊕ fxi )

(positive Davio)

f = fxi ⊕ xi (fxi ⊕ fxi )

(negative Davio)

(Shannon)

with co-factors fxi = f (x1 , . . . , xi−1 , 0, xi+1 , . . . , xn ) , fxi = f (x1 , . . . , xi−1 , 1, xi+1 , . . . , xn ) .

(5.2) (5.3)

Reversible Circuits Reversible functions can be realized by reversible circuits that consist of at least n lines and are constructed as cascades of reversible gates that belong to a certain gate library. The most common gate library consists of Toffoli gates or single-target gates. Definition 5.42 (Reversible Single-target Gate). Given a set of variables X = {x1 , . . . , xn }, a reversible single-target gate Tg (C, t) with control lines C = {xi1 , . . . , xik } ⊂ X, a target line t ∈ X \ C, and a control function g ∈ Bk inverts the variable on the target line, if and

330

Reversible Circuits

only if g(xi1 , . . . , xik ) evaluates to true. All other variables remain unchanged. If the definition of g is obvious from the context, it can be omitted from the notation Tg . Definition 5.43 (Toffoli Gate). Mixed-Polarity Multiple-Controlled Toffoli (MPMCT) gates are a subset of the single-target gates in which the control function g can be represented with g = 1 or one product term. Multiple-Controlled Toffoli (MCT) gates in turn are a subset from MPMCT gates in which the product terms can only consist of positive literals. The NOT, CNOT, and Toffoli (NCT) library further restricts gates to have, at most, two control lines. Using synthesis algorithms it can easily be shown that any reversible function f ∈ Bn,n can be realized by a reversible circuit with n lines when using MCT gates. That is, it is not necessary to add any temporary lines (ancilla) to realize the circuit. This can be the case if the MCT (or MPMCT) gates are restricted to a given size, e.g., three bits. Note that each single-target gate can be expressed in terms of a cascade of MPMCT or MCT gates, which can be obtained from an ESOP or Positive Polarity Reed-Muller (PPRM) expression [275], respectively. For drawing circuits, we follow the established conventions of using the symbol ⊕ to denote the target line, solid black circles to indicate positive controls and white circles to indicate negated controls. x1 x2 .. .

(a)

xn−1 xn

x1 x2 .. . xn−1 xn ⊕ (x1 ∧ · · · ∧ xn−1 ) x1 x2 .. .

(c)

xn−1 xn

g

x1 x2 .. .

(b)

xn−1 xn

x1 x2 .. . xn−1 xn ⊕ (¯ x1 ∧ · · · ∧ xn−1 )

x1 x2 .. . xn−1 xn ⊕ g(x1 , x2 , . . . , xn−1 )

Figure 5.1. Reversible gates: (a) Toffoli gate—positive controls, (b) Toffoli gate—mixed polarity controls, (c) single-target gate.

Example 5.52. Figure 5.1 (a) shows a Toffoli gate with n positive controls, Figure 5.1 (b) shows a Toffoli gate with mixed polarity control

A Framework for Reversible Circuit Complexity

x1 = 0 x2 = 1 x3 = 0

1 1 0 g1

1 1 1 g2

1 0 1 g3

g4

331

y1 = 1 y2 = 1 y3 = 1

Figure 5.2. Reversible circuit.

lines, and Figure 5.1 (c) shows the diagrammatic representation of a single-target gate based on Feynman’s notation. Figure 5.2 shows different Toffoli gates in a cascade forming a reversible circuit. The annotated values demonstrate the computation of the gate for a given input assignment.

5.1.3. Function Classes and Symmetric Groups A variety of function classes have been found in the past that characterize different properties of Boolean functions. These are particularly convenient for complexity analysis, since good complexity results may be found for specific function classes. This has an impact on practical applications. Unfortunately almost no function classes are defined for multiple-output functions, however, multiple-output functions can be transformed into Boolean functions using characteristic function construction. Given a multiple-output function f = (f1 , . . . , fm ) over variables x1 , . . . , xn , its characteristic function χf ∈ Bn+m is defined as χf (x1 , . . . , xn , y1 , . . . , ym ) =

m

(y i ⊕ fi (x1 , . . . , xn )) .

(5.4)

i=1

Now we can introduce several classes of Boolean functions by keeping in mind that reversible functions can be applied to them using its characteristic function. A function f ∈ Bn is called self-dual , if f (x1 , . . . , xn ) = f (x1 , . . . , xn ) .

(5.5)

Lemma 5.28. Let χf ∈ B2n be the characteristic function of a reversible function f . Then χf is not self-dual.

332

Reversible Circuits

Proof. Let F = χf . If F is a self-dual function, it must be balanced, i.e., | on(F )| = | off(F )|, because | on(F (x1 , . . . , xn , y 1 , . . . , y n ))| = | on(F (x1 , . . . , xn , y1 , . . . , yn ))| and | on(F )| = 22n − | on(F )| . Since | on(F )| = 2n , F is only balanced if n = 1 and there are two reversible functions on one variable, i.e., inversion (F1 = x1 ⊕ y1 ) and identity (F2 = x1 ⊕ y1 ). Both F1 and F2 are not self-dual. A function f ∈ Bn is monotone, if x ⊆ y ⇒ f (x) ≤ f (y) ,

(5.6)

where x = x1 , . . . , xn , y = y1 , . . . , yn and x ⊆ y if and only if xi ≤ yi for all 1 ≤ i ≤ n. Lemma 5.29. Let χf ∈ B2n be the characteristic function of a reversible function f . Then χf is not monotone. Proof. We first observe that in the bit-string truth table representation of χf there cannot be more than two successive 1’s, since for each x ∈ Bn there is exactly one y such that χf (x, y) = 1. The truth table can be partitioned into 2n blocks for each assignment to x each of size 2n , where only one bit is set to 1. The only crucial case needing to be ¯1 ⊕ y1 are monotone. checked is n = 1, but neither x1 ⊕ y1 nor x As can be seen, Boolean function classes and the characteristic function construction offer an elegant way to analyze reversible functions. However, it also turns out that some of the most interesting function classes are not relevant for reversible functions such as self-dual and monotone functions. With the same arguments it can be seen that characteristic functions for reversible functions cannot be symmetric, unless n = 1, for which both of them are symmetric. There are still many other interesting function classes to explore and relate them to reversible logic as, e.g., Horn functions, Krom functions, bent functions, and canalizing functions. Yet even more interesting is the

A Framework for Reversible Circuit Complexity

333

chance to relate Boolean functions to symmetric groups as is explained in the remainder of this subsection. Every reversible function f ∈ Bn,n can be represented by an element πf from the symmetric group S2n , i.e., a permutation over the elements {0, . . . , 2n − 1}. We have πf (x) = y whenever f (x1 , . . . , xn ) = (y1 , . . . , yn ) with x and y being the natural number representations of the bits x1 , . . . , xn and y1 , . . . , yn , respectively. This duality has been used for reversible logic synthesis in the last decade [89, 295] but has also been used as a theoretical foundation for reversible logic analysis [303, 334]. We consider 0 to be the smallest element in a permutation and make use of the standard notation of writing explicit permutations using square brackets, e.g., π = [0, 1, 3, 2] and parenthesis for the cycle representation, e.g., π = (2, 3). Using the characteristic function construction one can relate Boolean functions to elements of the symmetric group, since we also have πf (x) = y for πf ∈ S2n if and only if χf (x1 , . . . , xn , y1 , . . . , yn ) = 1 for χf ∈ B2n . One interesting property of permutations is self-inversion. A permutation is self-inverse, also called involution, if it consists only of cycles of length 2 or fixed points. The characteristic function χf is self-inverse, if χf (x1 , . . . , xn , y1 , . . . , yn ) = χf (y1 , . . . , yn , x1 , . . . , xn ) . This is an important correlation and can be exploited when implementing algorithms. The self-inversion check may be faster to implement using BDDs for characteristic functions than for the permutation representation.

5.1.4. Upper Bounds for Single-Target Gate Circuits Theorem 5.14. Let f ∈ Bn,n be reversible and x a variable in f . Then, f can be decomposed as f = g2 ◦ f ◦ g1 into three functions g1 , f , g2 ∈ Bn,n such that f is a reversible function that does not change x, and g1 and g2 can each be realized as a single-target gate that acts on x.

334

Reversible Circuits

y1

x1

···

···

y2 .. . yn g2n−1

··· g2n−2

··· gn

g2

g1

x2 .. . xn

Figure 5.3. Synthesis based on Young subgroups

Proof. Reversible functions of n variables are isomorphic to the symmetric group S2n . Consequently, f corresponds to an element a ∈ S2n . The element a can be decomposed as a = h2 vh1 , where both h1 and h2 are members of the Young subgroup S22n−1 and v is a member n−1 of S22 [353]. From h1 , h2 , and v one can derive g1 , g2 , and f [90]. Corollary 5.7. Each reversible function f ∈ Bn,n can be implemented as a reversible circuit with at most 2n − 1 single-target gates. Proof. When applying Theorem 5.14 to all variables in an iterative manner, f will be the identity function after at most n steps and at most 2n gates have been collected. Since f is the identity function in the last step, the last two gates can be combined into one single-target gate. A truth table based algorithm that makes use of the results of Theorem 5.14 and Corollary 5.7 has been presented in [90]. Since the variables are selected in a decreasing order, the target lines of the resulting single-target gates are aligned on a V-shape (see Figure 5.3).

5.1.5. Upper Bounds for Toffoli Gate Circuits Upper Bounds Based on MCT Gates Tighter upper bounds for reversible circuits can be obtained by combining the synthesis approach outlined above using MCT gates and

A Framework for Reversible Circuit Complexity

335

upper bounds for the size of PPRM expressions. Theorem 5.15. An n-bit single-target gate can be realized with at most u(n) = (n − 1)2n+1 NCT gates. Proof. An n-bit single-target gate can be realized with at most uMCT (n) = 2n−1 MCT gates. This follows from the PPRM representation, which is canonical for a given function when disregarding the order of product terms. Hence, there exists a control function g : Bn−1 → B which PPRM expression consists of all 2n−1 product terms, and therefore uMCT (n) = 2

n−1

=

n−1 i=0

i n−1

(5.7)

i being the total number of product terms that have i litwith n−1 erals, or in other words, the total number of Toffoli gates that have i controls. Let’s consider now the number of gates after mapping to the NCT gate library. In [24], it has been shown that each MCT gate with i controls and i ≥ 5 can be decomposed to 8(i − 3) Toffoli gates with two controls. This first yields n−1

i u(n) ≤ 8(i − 3) n − 1 i=0

by over-approximating gates with less than 5 control lines. Using the identity n i i = n2n−1 n − 1 i=0 we obtain u(n) ≤ 8

n−1 i=0

i

i n−1

= 8(n − 1)2n−2 = (n − 1)2n+1 .

336

Reversible Circuits

Upper Bounds Based on MPMCT Gates Tighter upper bounds for reversible circuits can be obtained by combining the synthesis approach outlined above using MPMCT gates and upper bounds for the size of ESOP expressions. Theorem 5.16. An n-bit single-target gate can be realized with at most u(n) = 29 · 2n−5 NCT gates, if n ≥ 8.

Proof. The best known upper bound on the number of product terms in a minimum ESOP form for an n-variables Boolean function, is 29 · 2n−7

with

n≥7.

(5.8)

This upper bound was presented in [121]. Hence, the ESOP expression of the control function g : Bn−1 → B consists of at most 29 · 2n−8 product terms. We assume in the worst case that each product term has n − 1 terms, i.e., each MPMCT gate has n − 1 controls. After mapping it to Toffoli gates, we obtain 8(i−3) Toffoli gates with i being the number of controls, i.e., 8(n − 4) Toffoli gates with two controls. Hence u(n) = 8(n − 4)29 · 2n−8 = 29(n − 4)2n−5 .

5.1.6. Lower Bounds for Toffoli Gate Circuits Reversible circuits with n inputs that consist of only one MCT gate can represent n · 2n−1 reversible functions. There are n possible positions to fix the target line and then n−1 positions remain to either put or not put a control line. If one has two Toffoli gates one can represent at most (n · 2n−1 )(n · 2n−1 ) functions. The actual number is smaller, since for some circuits the order of gates does not matter or both gates are equal which corresponds to the empty circuit. In general, one can represent at most (n · 2n−1 )k reversible functions with a circuit that

A Framework for Reversible Circuit Complexity

337

consists of k Toffoli gates. Since there are 2n ! reversible functions one can derive that there is at least one function that requires 4

log(2n !) log(n2n−1 )

5 (5.9)

gates. Theorem 5.17. There exists a reversible function for which the smallest circuit realization requires an exponential number of Toffoli gates.

Proof. We show that there exists a constant c such that 4

5 log(2n !) ≥ c · 2n . log(n2n−1 )

We have 4

5 log2 (2n !) log(2n !) log(2n !) = ≥ c · 2n ≥ log(n2n−1 ) log(n2n−1 ) log2 (n2n−1 )

which can be rewritten to log2 (2n !) ≥ c · 2n log2 (n2n−1 ) = c · 2n (log2 n + (n − 1)) . Since (log2 n + (n − 1)) < 2n we are left to prove that log2 (2n !) ≥ c0 · 2n n for some constants c0 , which we do using induction on n. From the base case we obtain c0 = 12 . Assume for some n we have log2 (2n !) ≥ 12 · 2n n, then in the induction step we get log2 (2

n+1

!) =

n+1 2

log2 k

k=1 n

=

2

n

log2 k +

k=1

2

log2 (k + 2n )

k=1 n

= log2 (2n !) +

2 k=1

log2 (k + 2n ) .

(5.10)

338

Reversible Circuits

We will now derive a lower bound for the second term in the last expression of (5.10). We have n

2

n

2

k + 2n k k=1 2n 2n log2 1 + = k

(log2 (k + 2 ) − log2 k) = n

k=1

log2

k=1 n

≥

2

1 = 2n

k=1

from which we derive n

2 k=1

n

log2 (k + 2 ) ≥ n

2

log2 k + 2n = log2 (2n !) + 2n .

k=1

Plugging this into (5.10) we get log2 (2n+1 !) ≥ log2 (2n !) + log2 (2n !) + 2n 1 n · 2 n + 2n ≥2· 2 1 = · 2n+1 (n + 1) . 2 The proof for Theorem 5.17 in [204] uses m EXOR gates as the underlying gate library, which generalizes MCT gates to have more than one target line. The proof can be carried out analogously for MPMCT gates.

5.1.7. Framework for Complexity Analysis Both bounds that have been presented in the previous subsection are used to motivate a framework for reversible circuit complexity analysis that is illustrated in Figure 5.4. If only considering singletarget gates, one knows that at most linear many gates are required. When being restricted to Toffoli gates, there are functions that need at least an exponential number of gates. Since single-target gates

A Framework for Reversible Circuit Complexity

Single-target gates circuit

Toffoli gates circuit y1 x1 y2 Mapping x2 ... ... yn xn

x1 x2 ... xn

339

Linear Complexity

y1 y2 ... yn Exponential Complexity

Cascades of Toffoli (ESOP)

Figure 5.4. Reversible circuit complexity.

can be translated to cascades of Toffoli gates, there must be control functions which require exponential many product terms when being decomposed into ESOP expressions. However, when restricting the ESOP mapping interesting cases arise. Let us first consider that we only allow that single-target gates which control functions can be mapped to ESOP expressions that have a constant number of product terms, e.g., 1 or 2. Then the resulting Toffoli circuits are of linear size. The most interesting and important question is how to determine which class of reversible functions can be represented using these circuits when applying this restriction. We found this to be a difficult research problem during our investigations of this topic. The same idea can be extended to other cases. If, e.g., we allow that those single-target gates which control functions can be mapped to linear size ESOP expressions, one obtains Toffoli circuits of quadratic size.

5.1.8. Application: “Better than Optimal” Embedding In this subsection we consider half of the V-shaped circuit that has been considered in the previous subsection. That is, we consider reversible circuits with n variables and n single-target gates that have their target lines in subsequent order from the top line to the bottom line. Also, no two single-target gates have the 1ntarget line in com/ n−1 different reversible mon. It can easily be seen that at most 22

340

Reversible Circuits

functions can be realized with such circuits. / n−1 1n different functions We will now show that in fact exactly 22 can be realized, which implies that these circuits are a canonical representation for this subset of reversible functions. Lemma 5.30. Reversible circuits with k ≥ n lines that have n singletarget gates with pairwise different target / k−1 1n lines in increasing order on 2 reversible functions. lines 1 to n can realize exactly 2

Proof. We prove this using induction on n. The base case is simple since each single-target gate realizes a different function and the target k−1 line is fixed on the first line, there are 22 possibilities for choosing the control function. Assuming the claim holds for all circuits with up to n gates, we consider a circuit that has/ n + 11gates. The sub-circuit k−1 n functions due to the C consisting of the first n gates realizes 22 induction hypothesis. Since the (n + 1)th gate has its target line on a line that has not been used as a target line in C and since there are no two gates that realize the same function, the statement follows. Corollary 5.8. Reversible circuits with n lines that have n singletarget gates with their targets being lines (from the top / on increasing 1 to the bottom) can realize exactly 22

n−1

n

reversible functions.

/ n−1 1n There are also 22 Boolean multiple-output functions in Bn−1,n . When realizing these functions as reversible circuits one needs to embed them first into reversible functions, which will have up to 2n − 1 variables. The additional variables are required to ensure that the function is bijective. However, since there is a 1-to-1 correspondence between the number of half V-shaped circuits on n lines and the number of functions in Bn−1,n , a better than optimal embedding is possible because in the conventional case there are functions that require at least 2n − 1 lines (e.g., the constant functions). One only needs to define a mapping function from the multiple-output function to the reversible one as well as an interpretation function for the computed outputs. Based on Lemma 5.30 this embedding technique can be extended to functions in Bk−1,n with k ≥ n.

A Framework for Reversible Circuit Complexity

341

5.1.9. Open Problems and Future Work We have motivated a framework for the analysis of the complexity of reversible circuits based on two bounds. As a first application of this framework we have presented an idea for a better than optimal embedding. Research in this area is still in its infancy so far, however, the discussions started in this section provide a starting point in tackling the open problems. One direction for future work is to find a way to derive function classes from a subset of reversible circuits. It is also open as to whether these will be function classes already known from the literature or whether new ones for the special case of reversible functions need to be defined.

342

Reversible Circuits

5.2. Gate Count Minimal Reversible Circuits Jerzy Jegier

Pawel Kerntopf

5.2.1. A Need for New Reversible Benchmarks During the last decade the field of reversible circuit synthesis has been intensively studied and many synthesis algorithms have been developed. However, it is difficult to evaluate the quality of the proposed algorithms, especially for larger reversible functions. Some sequences of reversible functions of an arbitrary number of variables have been proposed as benchmarks, but no minimal circuits are known for them. In this section, we present infinite sequences of functions of arbitrary number of variables, for which we have constructed gate count minimal circuits, and present proof of their minimality. A gate (circuit) is called reversible if there is a one-to-one correspondence between its input and output signals. Research on reversible logic circuits is motivated by advances in quantum computing, nanotechnology and low-power design, therefore, reversible logic synthesis has been intensively studied recently [82, 268]. Attention is focused mainly on the synthesis of circuits built from NOT, CNOT, Toffoli gates, and generalized Toffoli gates. The cost of a reversible circuit is usually estimated by the gate count (GC) or by a metric, called Quantum Cost (QC). Synthesis of reversible circuits differs substantially from classical logic synthesis. Finding reversible circuits that have the minimal cost for any reversible function is very difficult. Many algorithms have been proposed. However, satisfactory reversible logic synthesis algorithms for arbitrary libraries of gates and arbitrary cost functions have not yet been created. Also, developing techniques for local optimization of synthesized circuits [204, 304] has not led to the problem being solved because most circuits cannot be reduced by local optimization alone [251]. In addition, NOT, CNOT, and Toffoli (NCT) library synthesis techniques cannot always find the optimal circuits even for small numbers of inputs and outputs. As a result, authors of papers com-

Gate Count Minimal Reversible Circuits

343

pare the results of synthesis for all three-variable reversible functions, propose their own benchmarks, apply benchmark suites consisting of irreversible functions developed for classical logic circuits, or use randomly generated reversible functions as benchmarks. The search for better synthesis algorithms capable of finding minimal circuits requires appropriate tests to evaluate their quality. Unfortunately, few provable minimal cost benchmarks have been proposed [203, 269, 369]. Recently, tools capable of synthesizing gate count minimal circuits for four-variable reversible functions have been developed [131]. However, very few optimal circuits have been found for n-variable functions with n > 4 and they are all for short circuits [135]. Also, no proofs have been published on the minimality of individual reversible circuits besides the use of exhaustive calculations. A few sequences of reversible functions of an arbitrary number of variables, e.g., hwb n and nth prime inc, have been proposed as benchmarks, but these functions are quite complex and no minimal gate count or minimal quantum cost circuits are known for them for n > 4. Thus, developing methods of constructing functions with known minimal circuits is needed. The first approaches to constructing sequences of reversible functions were proposed by us in [153], [152]. In this section, we present two infinite sequences of simple self-inverse functions of any number of variables for which we have constructed gate count minimal circuits. Our main aim consists in proving their minimality. The proofs in [153] and here are the first such proofs in literature.

5.2.2. Preliminaries Below, basic definitions on reversible Boolean functions and their properties are provided. Definition 5.44. A mapping F : {0, 1}n → {0, 1}p , for some positive integers n and p, p ≤ n, is called an (n, p) Boolean function. The Boolean functions f1 , f2 , . . . , fp defined at every xi ∈ {0, 1} by F (x) = (f1 (x), f2 (x), . . . , fp (x)), where x = (x1 , x2 , . . . , xn ), are called coordinate functions of F .

344

Reversible Circuits

We shall also write in short (y1 , y2 , . . . , yp ) = F (x1 , x2 , . . . , xn ) . Let us denote the sequence of all vectors belonging to {0, 1}n arranged in lexicographical order by v0 = (0, . . . , 0, 0), v1 = (0, . . . , 0, 1), . . . , v2n −1 = (1, . . . , 1, 1) . (5.11) They will be also written as vi = (vi,1 , vi,2 , . . . , vi,n ), i ∈ {0, . . . , 2n − 1} . Definition 5.45. The binary array with 2n rows and p columns defined by ((f1 (v0 ), f1 (v1 ), . . . , f1 (v2n −1 )), . . . , (fp (v0 ), fp (v1 ), . . . , fp (v2n −1 ))) (5.12) is called the truth table of an (n, p) Boolean function F . Definition 5.46. An (n, p) function F is called balanced if every vector belonging to {0, 1}p appears 2n−p times in its truth table. Otherwise, it is called unbalanced. Of course, in any (n, p) balanced function all (n, 1) coordinate functions are balanced. However, an (n, p) Boolean function with all (n, 1) coordinate functions being balanced is not necessarily a balanced function. Definition 5.47. A balanced (n, n) function F : {0, 1}n → {0, 1}n is called reversible. In the literature a reversible function is also defined in an equivalent way as a bijective function. Each reversible Boolean function can be considered as a permutation on the set {0, . . . , 2n −1}. It is well-known that every permutation can be considered as a collection of disjoint cycles. A cycle will be written in the following form a1 , a2 , . . . , ak , meaning that a1 is mapped onto a2 , a2 is mapped onto a3 , . . . , and, ak is mapped onto a1 . The number k is the length of the cycle. To specify a cycle we shall use decimal equivalents of binary vectors as being the elements of the cycles.

Gate Count Minimal Reversible Circuits

345

Definition 5.48. A Boolean (n, n) reversible function F is called self-inverse if and only if F −1 = F . It is easy to note that every permutation corresponding to a selfinverse reversible function has cycles of length 1 or 2 (a cycle of length 2 is called a transposition or two-cycle). Thus it can be described in a simple way, for example as a list of its two-cycles. This specification will be used further in the text. Definition 5.49. Let fk (x1 , x2 , . . . , xn ) denote a coordinate function of an (n, p) function F . Hamming measure H(fk ) is the Hamming distance between the column fk (k-th output) in the truth table of F and the column corresponding to xk (k-th input). The Hamming measure of any coordinate function of a balanced (n, p) function F is an even number [103]. Definition 5.50. The Vectorial Hamming Measure (VHM) of a reversible (n, n) Boolean function F (x) = (f1 (x), f2 (x), . . . , fn (x)) is the vector of the Hamming measures of the coordinate functions of F , i.e., H(f1 , . . . , fn ) = (H(f1 ), H(f2 ), . . . , H(fn )) . Definition 5.51. The shortened form of a reversible function truth table is a subset of rows from the full truth table that meet the following condition: (f1 (vi ), . . . , fk (vi ), . . . , fn (vi )) = vi , where vi ∈ {v0 , v1 , . . . , v2n −1 }. Definition 5.52. A k-input, k-output gate is reversible if it realizes a reversible function. Below we define multiple control Toffoli gates which are most commonly used in literature. Definition 5.53. A Multiple-Controlled Toffoli (MCT) gate is defined by C m N OT (C; t) ,

346

x1

Reversible Circuits

y1

x1

y1

x2

y2

x1 x2

y1 y2

x3

y3

x1 x2 x3

y1 y2 y3

x4

y4

x1 x2 x3 x4

y1 y2 y3 y4

x5

y5

Figure 5.5. Graphical symbols for reversible MCT gates.

where C = {xk1 , . . . , xkm } ⊂ {x1 , x2 , . . . , xn } is the set of control lines and t = xk ∈ {x1 , x2 , . . . , xn } is called the target line; C ∩ t = ∅. The number of all inputs/outputs of such a gate is m + 1. This gate operates as follows: the value of the target line is inverted if and only if all control lines have value 1, i.e., xk1 = xk2 = · · · = xkm = 1, otherwise its value is unchanged. The reversible function realized by this gate can be written as C m N OT (C; xk ) = (x1 , . . . , xk ⊕ xk1 xk2 · · · xkm , . . . , xn ) ,

(5.13)

where k ∈ / {k1 , . . . , km }. For m = 0 and m = 1 the gates are called NOT and CNOT, respectively. If m = 2 it is called a Toffoli gate and if m > 2 then it is sometimes called a generalized Toffoli gate. A reversible circuit composed only of MCT-gates will be called an MCT-circuit. In the rest of this section only MCT-circuits will be considered. Graphical symbols for the above defined gates are shown in Figure 5.5. Vertical lines denote gates. Control lines are denoted by black dots, while the target line contains an XOR symbol. For any reversible function there exist many reversible circuits for implementing it. Thus a cost function has to be defined to evaluate the quality of a circuit. The simplest cost function of a reversible circuit is equal to the total number of gates. It is called a gate count. Other cost functions are also considered, e.g., the quantum cost is equal to the cost of elementary quantum gates as each reversible gate can be built from a number of elementary quantum gates. In this section we use the gate count cost function only. Definition 5.54. The Hamming measure of C m N OT (C; xk ) is equal to H(fk ). It will be denoted as H(C m N OT (C; xk )). Theorem 5.18. If n is the number of lines in a reversible circuit

Gate Count Minimal Reversible Circuits

347

and C m N OT (C; t) is one of the gates in this circuit, then for all k, 1 ≤ k ≤ n, (5.14) H(C m N OT (C; xk )) = 2n−m . Proof. Since fk = xk ⊕ xk1 xk2 · · · xkm it holds fk = xk only when xk1 = xk2 = ... = xkm = 1. In the truth table for this gate these values appear in 2n−m input vectors. In all these rows the values of xk and fk differ while in the other rows they are equal. Thus the theorem holds. Corollary 5.9. From Theorem 5.18 it follows that in (n, n) reversible circuits Hamming measures of MCT gates are H(C 0 N OT (t)) = 2n , H(C 1 N OT (C; t)) = 2n−1 , .. . . = .. H(C

n−2

N OT (C; t)) = 4 ,

H(C

n−1

N OT (C; t)) = 2 .

(5.15)

5.2.3. Constructing Sequences of Reversible Functions Our approach for constructing an infinite sequence of reversible functions of any number of variables greater than three is based on extrapolating structural regularities of minimal circuits. For this aim we use our database of all gate count minimal circuits with three inputs/outputs [165, 166] and our tool for generating all minimal circuits for any four-variable reversible function [338]. The methodology used by us [152] is presented below: Step 1 Select a four-variable reversible function and find all gate count minimal reversible circuits implementing this function. Step 2 Among the circuits calculated in Step 1 look for a circuit C with some kind of structural regularities.

348

Reversible Circuits

x1

y1

x2 x3 x4

y2 y3 y4

Figure 5.6. A gate count minimal circuit for n = 4.

Step 3 Once the circuit C has been found in Step 2 calculate all disjoint cycles of the function F implemented by C. Step 4 Search the database of all gate count minimal circuits with three inputs/outputs for a circuit C with a similar structure to the circuit C found in Step 2. Step 5 Calculate all disjoint cycles of the function F implemented by the circuit C . Step 6 If the functions F and F have similar cycle structures try to extrapolate both circuits and the functions implementing them. We will illustrate the above formulated approach using an example of a self-inverse reversible function. It is because some circuits implementing such functions have a palindromic structure (i.e., the mirror image of such a circuit is the same as the circuit itself) so they exhibit a simple kind of structural regularity [167]. One palindromic minimal circuit implementing a self-inverse fourvariable function is shown in Figure 5.6. Its structure is not only palindromic but also regular in another way: the first and the last four gates have targets on consecutive neighboring lines. The function implemented by this circuit has the following specification in terms of two-cycles 0, 1513, 14 . (5.16) In our database of all gate count minimal circuits for three-variable reversible functions we have found the circuit with a similar structure shown in Figure 5.7 and with the function implemented by this circuit having similar two-cycles 0, 75, 6 .

(5.17)

Gate Count Minimal Reversible Circuits

349

x1

y1

x2 x3

y2 y3

Figure 5.7. A gate count minimal circuit for n = 3.

It is easy to note that the two specifications (5.16) and (5.17) can be generalized for any n in the following way 0, 2n − 12n − 3, 2n − 2

(5.18)

and the circuits shown in Figures 5.6 and 5.7 can be extrapolated to the circuit for any number of inputs shown in Figure 5.8. However, the above constructed extrapolation is not unique. Examples of different extrapolations of the same pair of three- and four-variable circuits are given in [152]. Using the above outlined approach we have constructed many infinite sequences of reversible functions (see [152, 153]). One of them will be considered in the next subsection.

5.2.4. Minimal Circuits for Selected Functions First we prove the following theorem. Theorem 5.19. Any MCT-circuit implementing a reversible function (y1 , y2 , . . . , yn ) = F (x1 , x2 , . . . , xn ) has at least n MCT gates if yk = xk for all k = 1, . . . , n. x1

y1

x2 x3 .. .

y2 y3 .. . yn−1 yn

xn−1 xn

Figure 5.8. A circuit for any number of variables n.

350

Reversible Circuits

Proof. Any circuit built from MCT gates, which implements a reversible function F , has at least one gate with the target on the line k, if the output signal yk is not equal to the input signal xk . By Definition 5.53 every MCT gate changes a signal only on one line, namely on the line at which its target is placed. Thus, at least n gates are required to change signals on all n lines. In search for sequences of reversible functions depending on an arbitrary number of variables we considered the so-called Miller function. Initially this reversible function was proposed as a three-variable function implementing transposition 3, 4. For the first time it appeared in [213] as a benchmark for the spectral techniques synthesis method. Soon it was named the Miller function [202]. Recently, this name is used for any function being a single transposition of two output vectors with the maximal Hamming measure for the given number of variables (see [281]), e.g., for four variables there are eight such functions: 0, 15, 1, 14, 2, 13, 3, 12, 4, 11, 5, 10, 6, 9, 7, 8 We considered generalized Miller functions gmf n for any number of variables n as the single transpositions 0, 2n − 1 and constructed circuits for these functions which are minimal with respect to the gate count cost. Examples of such circuits are shown in Figure 5.9 for n = 3, 4, 5, 6; the gate count of these circuits is 2n + 1. Now we will prove that for any n gate count minimal circuits’ implementing the n-variable generalized Miller function have 2n + 1 gates. Of course, the function 0, 2n − 1, n ≥ 3

(5.19)

is self-inverse and the shortened form of the truth table for the gmf n function is (f1 (v0 ), . . . , fn (v0 )) = (1, . . . , 1) , (f1 (v2n −1 ), . . . , fn (v2n −1 )) = (0, . . . , 0) . According to Equation (5.20) f1 (v0 ) = v0,1 , f2 (v0 ) = v0,2 , . . . , fn (v0 ) = v0,n

(5.20)

Gate Count Minimal Reversible Circuits

x1 x2 x3

351

y1 y2 y3 y1 y2 y3 y4

x1 x2 x3 x4

y1 y2 y3 y4 y5

x1 x2 x3 x4 x5

y1 y2 y3 y4 y5 y6

x1 x2 x3 x4 x5 x6

Figure 5.9. Gate count minimal circuits implementing the gmf n function for n = 3, 4, 5, 6.

and f1 (v2n −1 ) = v2n −1,1 , f2 (v2n −1 ) = v2n −1,2 , . . . , fn (v2n −1 ) = v2n −1,n . However f1 (vi ) = vi,1 , f2 (vi ) = vi,2 , . . . , fn (vi ) = vi,n for vi ∈ {v1 , . . . , v2n −2 }. Thus, the VHM of this function is uniform H(f1 , . . . , fn ) = (2, . . . , 2) .

(5.21)

To prove the minimality of our circuits implementing the gmf n function four lemmas are needed. Lemma 5.31. A MCT-circuit that implements any (n, n) reversible Boolean function with (f1 (v0 ), . . . , fn (v0 )) = (0, . . . , 0) contains at least one NOT gate.

352

Reversible Circuits

Proof. By Definition 5.53, if all inputs are set to 0 then the only way to change the state of any line to 1 is by using at least one gate without control lines, i.e., the NOT gate. Lemma 5.32. In any MCT-circuit implementing the gmf n function at least two more gate targets have to be placed on the line with the NOT gate. Proof. Due to Equation (5.15), the Hamming measure of the gate C 0 N OT (xk ) is equal to 2n . This is the maximum value of the Hamming measure of a gate in an (n, n) reversible circuit. Due to Equation (5.21) the presence of other gates in the circuit should decrease the measure to 2. But there is no gate that has a Hamming measure equal to 2n − 2. For this purpose we need at least two more gates with targets placed on the line k. Thus, due to Theorem 5.19, any circuit implementing the gmf n function is built from at least n + 2 MCT gates including at least one NOT gate. Corollary 5.10. In any MCT-circuit implementing the gmf n function there is at least one line with at least three gate targets (by Lemma 5.32) including at least one NOT gate (by Lemma 5.31). Lemma 5.33. There is no pair of lines in the MCT-circuit implementing the gmf n function with only a single gate target placed on each of them. Proof (by contradiction). Let us assume that there is such a pair of lines. Without a loss of generality, we may assume that they are lines k and l, where k, l ∈ {1, . . . , n}, k = l. This is equivalent to the existence of a balanced (n, 2) function described by the following truth table: ((v0,l , v1,l , . . . , v2n −1,l ), (fk (v0 ), fk (v1 ), . . . , fk (v2n −1 ))) .

(5.22)

It is easy to note that the vector (0, 1) appears 2n−2 + 1 times in the truth table (5.22). It is because among all vectors (vi,l , fk (vi )), / where k, l ∈ {1, . . . , n}, k = l and i = 1, 2, 3, . . . , 2n − 2 (i.e., vi ∈ {v0 , v2n −1 }); 2n−2 vectors are equal to (0, 1) and in addition the vector (v0,l , fk (v0 )) = (0, 1). Hence, the function described by the truth table

Gate Count Minimal Reversible Circuits

353

(5.22) is unbalanced which is in contradiction to our assumption. Thus the Lemma 5.33 holds. Corollary 5.11. In any MCT-circuit implementing the gmf n function there is no more than one line with a single gate target (by Lemma 5.33) and no circuit implementing the gmf n function has less than 2n MCT gates (by Lemma 5.32 and Lemma 5.33). Lemma 5.34. If only one gate target is located on the line k of any MCT-circuit implementing the gmf n function then this gate must be placed between two NOT gates in this circuit (see Figure 5.10).

Proof. Due to Equation (5.21) the VHM of the gmf n function is equal to (2, . . . , 2) and by Theorem 5.18 C n−1 N OT (C; xk ) is the only MCT gate G for which H(G) = 2. Because yk = fk (v0 ) = 1 for k = 1, . . . , n, all n − 1 control lines of the gate must be set to 1. Thus there is at least one NOT gate on the left-hand side of gate G. Similarly, because yk = fk (v2n −1 ) = 0, there is at least one NOT gate on the right-hand side of gate G. Thus the lemma holds. Theorem 5.20. Any gate count minimal MCT-circuit realizing the gmf n functions has 2n + 1 gates.

Proof. From Corollary 5.11 we know that there is no gate count minimal circuit implementing the gmf n function that has less than 2n gates. Let us assume that there exists a circuit having 2n gates. By Lemma 5.34 in this circuit there are either two NOT gates placed on different lines (Case 1 ) or both NOT gates are placed on the same line (Case 2 ). Case 1 This situation is presented in Figure 5.10 (a). By Lemma 5.32 on each of these two lines with a NOT gate there are in total at least three gate targets. Then the number of gates in the circuit is greater than 2n because, due to Corollary 5.10 and the first part of Corollary 5.11, there are two lines with at least three gate targets on each of them, n − 3 lines with at least two gate targets and one line with one gate target on that line. Hence, the circuit contains at least 2n + 1 gates.

354

Reversible Circuits

(a)

(b)

(c)

Figure 5.10. Circuits needed for the proof of Theorem 5.20.

Case 2 Let us consider the possibilities of placing the gate C n−1 N OT (C; xk ) between the gates C 0 N OT (xl ) and C 0 N OT (xl ) according to Lemma 5.34. Without a loss of generality these are circuits shown in Figure 5.10 (b) and (c) or possible mirror circuits. An analysis shows that all possibilities of placing the gate C n−1 N OT (C; xk ) between the gates C 0 N OT (xl ) and C 0 N OT (xl ) are those shown in Figure 5.10 (b) and (c) or their mirror images. In both figures there is no gate target between the left NOT gate and the control line l belonging to the set C of the gate C n−1 N OT (C; xk ). The (n, 2) function at the outputs k and l of the gate C n−1 N OT (C; xk ), i.e., ((v0,l , v1,l , . . . , v2n −1,l ), (fk (v0 ), fk (v1 ), . . . , fk (v2n −1 ))) (5.23) is unbalanced, where fk is a coordinate function of the gmf n function for k, l ∈ {1, ..., n}, k = l. It is in contradiction to our assumption that there is a circuit realizing the gmf n function having 2n gates. Thus, there are no circuits with less than 2n + 1 gates. Figure 5.11 shows that there are circuits for the gmf n functions having 2n + 1 gates for any n. Hence, Theorem 5.20 holds.

We showed that using the approach presented in the paper it is possible to prove gate count minimality of some circuits which implement self-inverse functions. A similar result was obtained earlier in [153] for conservative functions. Both are the first such proofs in literature. From Theorem 5.20 we get the following result:

Gate Count Minimal Reversible Circuits

x1 x2 x3 . ..

xn−1 xn

355

y1 y2 y3 . .. yn−1 yn

Figure 5.11. Gate count minimal circuit for the gmf n function.

Corollary 5.12. The n-variable function 2n−1 − 1, 2n−1 can be implemented with at least 2n − 1 MCT gates. Proof. The implementation is as shown in Figure 5.11 but with the two inverters removed. It is obvious that this circuit is gate minimal since if it exists as a smaller circuit it would then be transformed to a circuit for the generalized Miller function 0, 2n − 1 by adding two inverters which would contradict Theorem 5.20. Similar results can be obtained for the other functions represented by a single cycle whose elements are vectors with the maximal Hamming distance for the given number of variables.

6. Quantum Circuits 6.1. The Synthesis of a Quantum Circuit Alexis De Vos

Stijn De Baerdemacker

6.1.1. Quantum Computation A quantum computation [239], acting on w qubits, is described by a unitary 2w × 2w matrix U . The synthesis problem of quantum computing consists of decomposing a particular matrix U into simpler matrices. Many decompositions have been proposed in the literature. Most of them are based on basic unitary transformations, well-known from elementary particle, nuclear, atomic or molecular physics. Thus the quantum circuit is decomposed into building blocks called controlled ROTATORs, where a ROTATOR is a single-qubit gate represented by a 2 × 2 unitary matrix cos(θ/2)

1 0

0 0 −i sin(θ/2) n1 1 1

1 0 + n2 0 i

−i 1 + n3 0 0

0 −1

,

associated with the physical rotation of a spinor around the unit vector n1ex + n2ey + n3ez over an angle θ. The rotations around the x-axis, y-axis, and z-axis are well-known: cos(θ/2) −i sin(θ/2) Rx = , −i sin(θ/2) cos(θ/2) cos(θ/2) − sin(θ/2) , Ry = sin(θ/2) cos(θ/2) −iθ/2 e 0 Rz = . 0 eiθ/2

358

Quantum Circuits

This approach leads to powerful synthesis methods [24, 239]. Its major drawback however is that it does not illuminate its relationship with classical computing. It is well established that the intermediate step between conventional classical computing and quantum computing is made by classical reversible computing [82]. A classical reversible computer, acting on w bits, is described by a 2w × 2w permutation matrix. The synthesis of such a reversible circuit is based on controlled NOTs, i.e., on the single-bit gate represented by the 2 × 2 permutation matrix 0 1 . 1 0 As this matrix is neither an Rx matrix, nor an Ry , or an Rz matrix, the relationship between classical reversible computing and quantum computing is obscured. For this reason, it is natural to introduce controlled NEGATORs and controlled PHASORs, rather than controlled ROTATORs.

6.1.2. Building Blocks A classical reversible logic circuit, acting on w bits, is represented by a permutation matrix, i.e., a member of the finite matrix group P(2w ). A quantum circuit, acting on w qubits, is represented by a unitary matrix, i.e., a member of the infinite matrix group U(2w ). The classical reversible circuits form a subgroup of the quantum circuits. This is a consequence of the group hierarchy P(n) ⊂ U(n) , where n is allowed to have any (positive) integer value. Below, we will construct an arbitrary quantum circuit according to a bottom-up approach. For this purpose we start from the simplest logic operation possible on a single (qu)bit (i.e., w = 1 and thus n = 2w = 2), which is the IDENTITY operation 1 0 u= . 0 1

The Synthesis of a Quantum Circuit

359

Next, we consider two different square roots of that 2 × 2 matrix: 0 1 1 0 and . 1 0 0 −1 The former is a permutation matrix and thus represents a classical logic gate, i.e., the NOT gate; the latter is not a permutation matrix, but is a unitary matrix and therefore represents a quantum logic gate, called the PHASE gate. Next, we interpolate between the IDENTITY u and an as of yet arbitrary unitary matrix q: m = (1 − t)u + tq , where t is a parameter interpolating between u (for t = 0) and q (for t = 1). We impose m as a unitary matrix. If q 2 = u, then this leads to the condition that t is complex and of the form t=

1 ( 1 − eiθ ) , 2

where θ is a real parameter [84]. Note that t = 0 for θ = 0 and t = 1 for θ = π. For θ = 2π, the value of t has returned to 0. Now, by choosing q = NOT and q = PHASE, respectively, this leads to two different one-parameter single-qubit operations: 1 0 cos(θ/2)e−iθ/2 i sin(θ/2)e−iθ/2 and 0 eiθ i sin(θ/2)e−iθ/2 cos(θ/2)e−iθ/2 We denote them with the schematics N (θ)

and

Φ(θ)

,

respectively. The former operation we call the NEGATOR gate [88]; the latter we call the PHASOR gate. Each of these two sets of matrices constitutes a continuous group, i.e., a one-dimensional Lie group. Both groups contain the IDENTITY circuit. Indeed: NEGATOR(0) = PHASOR(0) = IDENTITY . Additionally, by construction, the NOT gate is a NEGATOR and the PHASE gate is a PHASOR. Indeed: 0 1 1 0 NEGATOR(π) = and PHASOR(π) = , 1 0 0 −1

360

Quantum Circuits

sometimes abbreviated to X and Z gate, respectively [302]. For θ = π/2, we have the square root of NOT and the square root of PHASE: 1 1+i 1−i 1 0 NEGATOR(π/2) = and PHASOR(π/2) = . 0 i 2 1−i 1+i The former is sometimes referred to as the V gate [273, 368], while the latter is sometimes called the S gate [288]. Finally, for θ = π/4, we have the quartic roots √ √ 1 2 + 1 + i 2 − 1 − i √ √ and NEGATOR(π/4) = √ 2−1−i 2+1+i 2 2 √ 1 2 0 PHASOR(π/4) = √ , 2 0 1+i sometimes called the W gate [273] and the T gate [12, 288], respectively. Now, we consider multiple-qubit (say, w-qubit) circuits. For this purpose, we introduce both the controlled NEGATOR gates and the controlled PHASOR gates. As an example, we give here the w = 3 schematic of the positive-polarity twice-controlled NEGATOR, with the lowermost quantum wire being the target line, represented by the block-diagonal matrix ⎛ ⎝

16×6

⎞ cos(θ/2)e−iθ/2

i sin(θ/2)e−iθ/2

i sin(θ/2)e−iθ/2 ⎠ cos(θ/2)e−iθ/2

= N (θ)

where 1a×a denotes the a × a unit matrix. Of course, we equally introduce controlled PHASORs, negative-polarity controls, a target on a higher-positioned wire, etc. It turns out [87] that all possible NEGATORs and controlled NEGATORs together generate a group XU(2w ), a subgroup of the unitary group U(2w ). They cannot generate the full U(2w ) group, because the matrix representing a (controlled) NEGATOR has all line sums, i.e., all row sums and all column sums, equal to 1. The multiplication of two matrices with all line sums equal to 1 again yields a unit-line-sum matrix. Therefore a quantum circuit composed exclusively of (controlled) NEGATORs cannot synthesize a unitary matrix with one or more line sums different from the value 1. Whereas the unitary group U(n) has

The Synthesis of a Quantum Circuit

361

n2 dimensions, the group XU(n) has only (n − 1)2 dimensions and is isomorphic to U(n − 1) [83, 86, 87]. Analogously, a quantum circuit composed exclusively of (controlled) PHASORs can only generate matrices from ZU(2w ), where ZU(n) is the group of diagonal unitary matrices with unit entry at the upper-left corner. The group ZU(n) has only (n − 1) dimensions [86]. We can summarize as follows. We find two subgroups of the unitary group U(n): • XU(n), i.e., all n × n unitary matrices with all of their 2n line sums equal to 1; • ZU(n), i.e., all n × n diagonal unitary matrices with the upperleft entry equal to 1. Whereas the infinite unitary group U(n) describes quantum computing, the finite permutation group P(n) describes classical reversible computing. Whereas XU(n) is both a supergroup of P(n) and a subgroup of U(n), in contrast, ZU(n) is a subgroup of U(n) but not a supergroup of P(n): P(n) ⊂ XU(n)

⊂

U(n)

(6.1)

ZU(n)

⊂

U(n) .

(6.2)

The XU circuits therefore can be considered as circuits between classical and quantum circuits, whereas the ZU circuits are truly nonclassical circuits.

6.1.3. First Decomposition of a Unitary Matrix In Reference [84], the following theorem is proved. Theorem 6.21. Any U(n) matrix U can be decomposed as U = eiα Z1 X1 Z2 X2 Z3 ...Zp−1 Xp−1 Zp , with p ≤ n(n − 1)/2 + 1 and where all Zj are ZU(n) matrices and all Xj are XU(n) matrices.

362

Quantum Circuits

Z1

eiα

X

Z2 .

Figure 6.1. Quantum schematic for U = eiα Z1 XZ2 .

In [86], it is proved that a shorter decomposition with p ≤ n exists. Finally, in [85], it is conjectured that an again shorter decomposition with p ≤ 2 exists. Idel and Wolf [148] have provided a nonconstructive proof. In the present paper, we investigate the consequences of the decomposition U = eiα Z1 XZ2 . (6.3) Reference [85] provides a numerical algorithm to find the number α and the matrices Z1 , X, and Z2 for a given matrix U , based on a Sinkhorn-like approach [298]. According to the Z X Z theorem, Figure 6.1 shows the quantum schematic for (6.3) with w = 3 and thus n = 8. If n is even, then we note the identity diag(a, a, a, a, a, ..., a, a) = P0 diag (1, a, 1, a, 1, ..., 1, a) P0−1 diag (1, a, 1, a, 1, ..., 1, a) , where a is a short-hand permutation matrix ⎛ 0 ⎜0 ⎜ ⎜0 ⎜ ⎜ .. ⎜. ⎜ ⎝0 1

notation for eiα and P0 is the (circulant) 1 0 0

0 1 0

0 ... 0 ... 1 ... .. .

0 0

0 0

0 ... 0 ...

⎞ 0 0⎟ ⎟ 0⎟ ⎟ .. ⎟ , .⎟ ⎟ 1⎠ 0

i.e., the P matrix called the cyclic-shift matrix, which can be implemented with classical reversible gates, i.e., one NOT and w−1 controlled NOTs [39, 82]. We thus can transform (6.3) into a decomposition exclusively containing XU and ZU matrices: U = P0 Z0 P0−1 Z1 XZ2 ,

(6.4)

The Synthesis of a Quantum Circuit

P0−1

P0

363

Z1

Z2

X

Z0

Figure 6.2. Quantum schematic for U = P0 Z0 P0−1 Z1 XZ2 .

where Z0 = diag(1, a, 1, a, 1, ..., 1, a) is a ZU matrix which can be implemented by a single (uncontrolled) PHASOR gate and where Z1 is the product Z0 Z1 . Figure 6.2 shows the quantum schematic for the decomposition (6.4).

6.1.4. Further Decomposition of a Unitary Matrix For convenience, we rewrite Equation (6.3) as U = eiαn Ln Xn Rn , where the left matrix Ln and the right matrix Rn are members of ZU(n) and Xn belongs to XU(n). As a member of the (n − 1)2 dimensional group XU(n), Xn has the following form [87]: 1 Xn = Tn Tn−1 , Un−1 where Un−1 is a member of U(n − 1) and Tn is an n × n generalized Hadamard matrix. Reference [87] provides the algorithm to find the Un−1 matrix corresponding to a given XU(n) matrix Xn . Again according to (6.3), the matrix Un−1 can be decomposed as eiαn−1 ln−1 xn−1 rn−1 , a product of a scalar, a ZU(n − 1) matrix, an XU(n − 1) matrix, and a second ZU(n − 1) matrix. We thus obtain for Xn the product Tn Ln−1 Xn−1 Rn−1 Tn−1 , where Ln−1 =

1

eiαn−1 ln−1

, Xn−1 =

1

xn−1

, Rn−1 =

1

rn−1

.

364

Quantum Circuits

R4

T4−1

R3

T3−1

R2

T2−1

L1

T2

L2

T3

L3

T4

L4

eiα4

3

0

2

0

1

0

1

0

2

0

3

0

3

1

Figure 6.3. Quantum schematic for the decomposition (6.5).

Hence, we have U = eiαn Ln Tn (Ln−1 Xn−1 Rn−1 ) Tn−1 Rn . By applying such a decomposition again and again, we find a decomposition eiαn Ln Tn Ln−1 Tn−1 Ln−2 . . . T2 L1 X1 R1 T2−1 −1 R2 . . . Rn−2 Tn−1 Rn−1 Tn−1 Rn

of an arbitrary member of XU(n). As X1 and R1 automatically equal the unit matrix 1n×n , we thus obtain U = eiαn Ln Tn Ln−1 Tn−1 Ln−2 . . . T2 L1 T2−1 −1 R2 . . . Rn−2 Tn−1 Rn−1 Tn−1 Rn ,

(6.5)

where all n matrices Lj and all n−1 matrices Rj belong to the (n−1)dimensional group ZU(n). The n − 1 matrices Tj are block-diagonal matrices of the form A Tj = , Sj where A is an arbitrary (n − j) × (n − j) unitary matrix and Sj is a j × j generalized Hadamard matrix. An obvious choice consists of A equal to 1(n−j)×(n−j) and Sj equal to the j × j discrete Fourier transform. For w = 2 (and thus n = 4), Figure 6.3 shows the cascade of six constant matrices, seven ZU circuits, and one overall phase, where the Tj blocks represent the n − 1 constant matrices ⎛ ⎞ 1 ⎜ 1 ⎟ √ √ ⎟, T2 = ⎜ ⎝ 1/√2 1/ √2 ⎠ 1/ 2 −1/ 2

The Synthesis of a Quantum Circuit ⎛ ⎜ T3 = ⎜ ⎝

1

√ 1/√3 1/√3 1/ 3

√ 1/ √3 ω/ √3 ω2 / 3

365

⎛ ⎞ 1 √ ⎜1 1 1/ √3 ⎟ ⎟ , T4 = ⎜ ω 2 /√ 3⎠ 2 ⎝1 1 ω/ 3

1 1 i −1 −1 1 −i −1

⎞ 1 −i ⎟ ⎟ , −1⎠ i

with ω equal to the cubic root of unity: ω = ei 2π/3 = −1/2 + i

√

3/2 .

Beneath each of the 4n − 2 blocks in Figure 6.3, the number of real parameters of the block is displayed. These numbers sum to 16, i.e., exactly n2 , the dimensionality of U(n). Hence, the synthesis problem of an arbitrary U(2w ) matrix is reduced to two smaller problems. First, for the given value of w, we have to synthesize the 2w − 1 circuits Tj . Subsection 6.1.5 provides a solution for this task. Then, for the particular matrix U , we have to synthesize the 2w+1 − 1 circuits of type ZU(2w ). The synthesis of an arbitrary ZU(2w ) circuit is discussed in Subsection 6.1.6. We close the present subsection by deriving from (6.5) a dual decomposition. By introducing the matrices −1 , Lj = Tj+1 Lj Tj+1 −1 Rj = Tj+1 Rj Tj+1 ,

for j < n, as well as Ln = Tn Ln Tn−1 , Rn = Tn Rn Tn−1 , we indeed find U = eiαn Tn−1 Ln Tn Ln−1 Tn Ln−2 Tn−1 . . . T3 L1 T3−1 −1 R2 . . . Tn−1 Rn−2 Tn−1 Rn−1 Tn−1 Rn Tn ,

where all matrices Lj and Rj belong to a (j −1)-dimensional subgroup of XU(n). If, in particular, each Tj is composed of the (n − j) × (n − j) unit block combined with the j×j discrete Fourier transform, then this subgroup consists of block-diagonal matrices with an (n − j) × (n − j) unit block and a j × j circulant matrix from XU(j) [39, 71].

366

Quantum Circuits

Φ(π/2) T2 =

T3 = H

H

H

×

H

T4 = ×

H

Φ(π/2)

Figure 6.4. Quantum schematics for T2 , T3 , and T4 .

6.1.5. Synthesizing a Fourier Circuit We have the following decomposition: ⎛ ⎜ T3 = ⎜ ⎝

⎞⎛

1 1

√ 1/√2 1/ 2

⎟⎜ √ ⎟⎜ 1/ √2 ⎠ ⎝ −1/ 2

1

√ √1/ √3 2/ 3

√ 2/√3 −1/ 3

⎞⎛

√

⎟⎜ ⎟⎜ ⎠⎝ i

⎞

1 1

√ 1/√2 1/ 2

⎟ √ ⎟. 1/ √2 ⎠ −1/ 2

On the other hand, [39] provides a circuit for T4 , whereas the quantum schematics for T2 and T3 are introduced in Figure 6.4. The boxes in Figure 6.4, filled with the letter H, are either uncontrolled or controlled HADAMARD gates, performing the √ √ 1/√2 1/ √2 -transformation . 1/ 2 −1/ 2 The empty box in the schematic of the T3 circuit in Figure 6.4 is a controlled √ √ √ 2/√3 √1/ √3 -gate . 2/ 3 −1/ 3 Its synthesis may benefit from the following decomposition: √ √ √ 2/√3 1 0 cos(ϕ) i sin(ϕ) 1 0 √1/ √3 = 0 −i i sin(ϕ) cos(ϕ) 0 −i 2/ 3 −1/ 3 = eiϕ Φ(−π/2) N (2ϕ) Φ(−π/2) ,

The Synthesis of a Quantum Circuit

Φ(α7 )

Φ(α5 )

Φ(α3 )

367

Φ(α8 )

Φ(α6 )

Φ(α4 )

Φ(α2 )

.

Figure 6.5. Quantum schematic for a ZU circuit with w = 3.

√ where ϕ equals arctan( 2). Also the HADAMARD gate H has an eiα ZXZ decomposition in closed form: 1 1 1 H= √ 2 1 −1 1−i 1 0 (1 + i)/2 (1 − i)/2 1 0 = √ (1 − i)/2 (1 + i)/2 0 i 2 0 i = e−iπ/4 Φ(π/2) N (π/2) Φ(π/2) = N (π)Φ(7π/4)N (π)Φ(π/4)N (π/2)Φ(π/2) = XZSTXTVS .

6.1.6. Synthesizing a ZU Circuit The decomposition of a matrix Z, an arbitrary member of ZU(n), is straightforward. Indeed, for an even n, the matrix can be written as the following product of four matrices: diag(1, a2 , a3 , a4 , a5 , ..., an ) = diag(1, a2 , 1, a4 , 1, a6 , ..., 1, an )P0 diag(1, 1, 1, a3 , 1, a5 , ..., 1, an−1 ) P0−1 , where aj is a short-hand notation for eiαj . If n equals 2w , then the diagonal matrix diag(1, a2 , 1, a4 , 1, a6 , . . . ) represents 2w−1 PHASORs, controlled (w − 1) times, and the diagonal matrix diag(1, 1, 1, a3 , 1, a5 , . . . ) represents 2w−1 − 1 PHASORs, controlled (w − 1) times. Figure 6.5 shows, as an example, a quantum synthesis for w = 3.

368

Quantum Circuits

We thus have a total of 2w − 1 controlled PHASORs. According to Lemma 7.5 of Barenco et al. [24], each multiple-controlled gate Φ(α) can be replaced by classical gates and three single-controlled PHASORs Φ(± α/2). According to De Vos and De Baerdemacker [86], each single-controlled PHASOR Φ(β) can be decomposed into two controlled NOTs and three uncontrolled PHASORs Φ(± β/2). We thus obtain a circuit with a total of 9(2w − 1) uncontrolled PHASORs. Thus the computer is built from uncontrolled PHASORs as its basic blocks. The precision of the quantum computer is thus directly related to the precision of the PHASOR gate Φ(θ). Above, we assumed that the angle θ can be chosen arbitrarily, with arbitrary precision. Probably, in future quantum computer hardware, this will not be possible and only angles θ which are a multiple of some basic angle 2π/k (with k integer) will be available, e.g., k = 8 if the basic building block is the T gate. Determining the precision of the computer, for a given value of k, remains a challenge for future research.

6.1.7. Summary We have demonstrated that, thanks to the ZXZ-theorem, an arbitrary quantum circuit, acting on w qubits, can be decomposed into 2w+1 −1 blocks, each described by a 2w ×2w matrix from the (2w −1)dimensional Lie group ZU(2w ), a subgroup of U(2w ), separated by 2(2w −1) FOURIER circuits. The ZU blocks can be further decomposed into classical gates and a total of 9(2w+1 − 1)(2w − 1) uncontrolled PHASOR gates. As Φ(θ) = H N (θ)H, each uncontrolled PHASOR gate can be substituted by two HADAMARD gates and one uncontrolled NEGATOR gate. Taking into account that the HADAMARD gate is a FOURIER circuit, we have thus provided two synthesis algorithms, based on two different (dual) gate libraries: • classical gates + FOURIER circuits + PHASOR gate and • classical gates + FOURIER circuits + NEGATOR gate.

Universal Two-Qubit Quantum Gates

369

6.2. Universal Two-Qubit Quantum Gates Md. Mazder Rahman

Gerhard W. Dueck

6.2.1. The NCV Gate Library Elementary quantum gates (quantum primitives) can be represented by unitary matrices [239] that may include complex elements. While universal logic gates for conventional classical computation are limited (such as NAND gate), universal quantum gates can be many. When the construction of quantum circuits is considered, classical reversible circuits are decomposed [24] into quantum circuits. The quantum gates that are most commonly used in the decomposition are NOT, controlled-NOT, controlled-V, and controlled-V† gates. This set of gates is known as the NCV library. The unitary matrices of the NCV gates are shown in Table 6.1. Table 6.1. Quantum gates and their unitary matrices

Gate Name NOT

controlled-NOT

controlled-V

controlled-V†

Gate Symbols x0

o0

x0 x1

o0 o1

x0

o0

x1

x0 x1

V

V†

o1

o0 o1

⎛

⎛

1 ⎜0 ⎜ ⎝0 0

1 0 ⎜0 1 ⎜ ⎝0 0 0 0 ⎛ 1 0 ⎜0 1 ⎜ ⎝0 0 0 0

Matrix 0 1 1 0 0 1 0 0

0 0 0 1 0 0

(1+i) 2 (1−i) 2

0 0 (1−i) 2 (1+i) 2

⎞ 0 0⎟ ⎟ 1⎠ 0

⎞ 0 0 ⎟ ⎟ (1−i) ⎠ 2 (1+i) 2

⎞ 0 0 ⎟ ⎟ (1+i) ⎠

2 (1−i) 2

370

Quantum Circuits

The controlled-V and controlled-V† gates are known as the controlledsqrt-of-NOT gates. NOT is a single-qubit gate; controlled-NOT, controlled-V, and controlled-V† are two-qubit gates. NOT and controlled-NOT are self-inverse gates; controlled-V and controlled-V† are the inverse of each other. Logic operations in quantum circuits are unitary transformations and the evolution of states can be shown by the appropriate matrix-vector multiplications where a matrix represents the gate and a vector represents the state of qubit(s). The state of a qubit is represented by a vector in a two-dimensional complex vector space [239] and can be expressed as follows α (6.6) |ψ = α|0 + β|1 = β where α and β are complex numbers that satisfy the constraint |α|2 + |β|2 = 1 . The states |0 and |1 are known as computational basis states that are analogous to the states 0 and 1 of a classical bit. A generalized two-qubit state can be described as ⎛ ⎞ λ1 ⎜λ 2 ⎟ ⎟ (6.7) |ψ = λ1 |00 + λ2 |01 + λ3 |10 + λ4 |11 = ⎜ ⎝λ 3 ⎠ . λ4 This state is separable as a tensor product of two states if and only if λ 1 λ 4 = λ2 λ3 . Otherwise, the state is entangled. This condition can be easily visualized if we consider the tensor product of two single-qubit states denoted by ⎛ ⎞ α1 α2 ⎜ α1 β2 ⎟ α1 α2 ⎟ (6.8) ⊗ =⎜ ⎝ β1 α2 ⎠ β1 β2 β1 β2 that satisfies the condition of separability as α1 α2 β1 β2 = α1 β2 β1 α2 . For a classical reversible function, if a Multiple-Controlled Toffoli (MCT) [348] circuit is decomposed [24] into a quantum circuit with

Universal Two-Qubit Quantum Gates

(a)

x0 x1

o0 o1

x2

o2

371

x0

o0 o1

x1 (b)

x2

V

V†

V

o2

Figure 6.6. Toffoli-3 gate: (a) graphical symbol, (b) NCV realization.

NCV gates, then every composite state of qubits can be shown as a separable state during the logic operations. For the NCV gate library, it has been shown that only four states 0 |0 = , 1 0 |1 = , 1 (1 + i) 1 |v0 = , and −i 2 (1 + i) −i |v1 = 1 2 are sufficient to realize classical reversible functions. If a state |v0 or |v1 is applied to the control of a two-qubit NCV gate, then the application results in an entangled state [144, 239]. However, quantum circuits obtained from the NCV decomposition can be shown as limited to four values in logic operations [373]. Quantum circuits that are limited to such four values in logic operations are known as semi-classical quantum circuits. Every classical reversible gate such Toffoli [348], Fredkin [119] can be implemented with NCV gates. The Toffoli-3 (Toffoli gate with 2 controls and a target) and its NCV realization are shown in Figure 6.6. By replacing the Toffoli-3 gates in the circuit in Figure 6.6 (a) with their NCV realizations, the semi-classical quantum circuit obtained is shown in Figure 6.6 (b). Quantum cost is the most prevalent criteria in measuring the quality of circuits. It is usually defined by the total number of gates required to implement a quantum circuit. For example, the quantum cost of the circuit shown in Figure 6.7 (b) is 14. However, this circuit is obtained by a heuristic and it can be further optimized by using post-synthesis methods such as template matching [258] to reduce the cost. The problem of synthesizing quantum circuits is commonly believed

372

Quantum Circuits

(a) 0

1

2

x0 (b)

o0

x0 x1 x2 3

o1 o2 4

V

5 V†

6

7

8

9 10 11 12 13 o0

V

x1 x2

V†

V

o1 o2

V

Figure 6.7. Schematics for the benchmark 3 17: (a) classical reversible circuit, (b) semi-classical quantum circuit.

to be intractable, however, no proof is available. Therefore, minimal circuits with few qubits serve as benchmarks in evaluating optimization methods as well as basic blocks for larger circuits. A circuit is said to be minimal if no other realization with fewer gates realizes the same function. For all three-qubit classical reversible functions, it is possible to obtain minimal semi-classical circuits with the restriction of Definition 6.55. Definition 6.55. If an arbitrary cascade of quantum gates generates an entangled state as an output for any computational basis state applied as an input, then the cascade is said to be an entangled circuit. Example 6.53. If we apply the state |001 to the input of the circuit in Figure 6.8 (c) then the resulting output will be entangled.

(a)

x0

o0

x0

o0

x0

x1 x2

o1 o2

x1 x2

o1 o2

x1 x2

V

(b)

(c)

o0 V

o1 o2

Figure 6.8. Quantum circuits: (a) and (b) are non-entangled; (c) is entangled.

By using an iterative method, all three-qubit minimal semi-classical quantum circuits can be obtained and are shown in Table 6.2. Column 2 shows the number of semi-classical quantum circuits that realize binary functions. The column #NCV in Table 6.2 also refers to the quantum costs of circuits.

Universal Two-Qubit Quantum Gates

373

Table 6.2. Results of finding semi-classical NCV circuits #NCV

#semi-classical (binary)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 9 51 187 417 714 1,373 3,176 4,470 4,122 10,008 5,036 1,236 8,340 1,180

Total

40,320

6.2.2. The Semi-Classical Two-Qubit Gate Library As mentioned, there are many possibilities to define two-qubit gates with unitary matrices. The cost of a two-qubit NCV gate is usually considered as one [24]. With the NCV library, the Fredkin [119] gate can be realized with seven gates as shown in Figure 6.9 (a). It has been shown that five two-qubit quantum gates are sufficient to implement the quantum Fredkin gate [300]. Gates in the circuit 0

1

2

3

4

5

6

x0 (a)

o0 o1

x1 x2 0

V

V

1

2

3

4

x2

5

6 o0 o1

x0 x1 (b)

o2

V†

V

V

V†

o2

Figure 6.9. Quantum realizations of the Fredkin gate.

374

Quantum Circuits

(a)

x0 x1

o0 V

o1

x0 (b)

x1

V

o0 o1

Figure 6.10. Circuit realizations: (a) semi-classical, (b) entangled.

of Figure 6.9 (a) can be reordered resulting in the circuit shown in Figure 6.9 (b). Since the first two gates in Figure 6.9 (b) are acting on the same two qubits they can therefore be replaced by a single two-qubit gate. Similarly, gates {4, 5} in Figure 6.9 (b) can be replaced by another two-qubit gate. So the circuit in Figure 6.9 (b) will have a quantum cost of 5. Further, by using the concept of merge gates (if both controlled-NOT and controlled-V/V† are acting on the same two qubits in a symmetric pattern, their total cost is considered as unit) along with the naive NCV gates, an improved quantum cost calculation as well as the reduction of search space in the synthesis method has been shown in [144]. Some potential gate sequences that can form a single twoqubit gate in circuits can be detected by employing post synthesis methods as shown for the Fredkin gate. However, it may require a significant amount of time due to the fact that gates in circuits can be reordered in many different ways [257]. If a sequence of one-qubit and two-qubit quantum primitives in a circuit act on the same two qubits, then the sequence of gates can be represented by a unitary matrix, and the logic operations of a twoqubit function can be considered to have unit cost. This cost model is referred to as two-qubit cost. Definition 6.56 specifies semi-classical gates having a two-qubit cost of 1. Definition 6.56. A sequence of single-qubit and two-qubit NCV gates is said to be a semi-classical quantum gate if it generates nonentangled outputs for all binary inputs. An iterative method takes NCV gates as primitives and finds all possible two-qubit sequences that comply with Definition 6.56. Such two-qubit gates are obtained by iteratively concatenating NCV gates. In each iteration, only those cascades of NCV gates for which the functions have not yet been found are considered. If a resulting cas-

Universal Two-Qubit Quantum Gates

375

Table 6.3. Semi-classical two-qubits gates #NCV gates

#Two-qubit semi-classical gates

1 2 3 4 5 6

8 25 47 51 28 8

Total

167

cade does not comply with Definition 6.56, then it is discarded. For example, the sequence of the NCV gates shown in Figure 6.10 (a) is a semi-classical two-qubit gate. However, the gate sequence shown in Figure 6.10 (b) is ignored since it is an entangled realization by Definition 6.55. The results of generating two-qubit semi-classical gates are shown in Table 6.3. The second column shows the number of generated two-qubit semi-classical gates and the first column represents how many NCV gates are used. Table 6.4. Minimal three-qubit circuits using two-qubit semi-classical gates Using two-qubit semi-classical gates Two-qubit cost 0 1 2 3 4 5 6 7 8 9 10 11 Total Average

#fsemi−classical

#fclassical

1 492 38,991 166,532 139,008 123,072 742,272 1,042,176 464,640 1,233,792 908,928 18,816

1 66 893 384 1,920 4,416 2,880 5,952 11,904 1,152 10,368 384

4,878,720

40,320

7.71

7.57

By using the proposed two-qubit semi-classical gate library, all minimal three-qubit circuits are obtained. Results are shown in Table 6.4

376

Quantum Circuits

where #fsemi−classical is the number of minimal circuits that realize semi-classical functions and #fclassical is the number of minimal circuits that realize classical functions. Detailed algorithms for finding three-qubit minimal circuits for classical reversible functions can be found in [255]. Table 6.5 also shows results of minimal circuits for three-quibit functions in which each NCV gate has unit cost. In comparison, it can be observed that costs of quantum circuits can be reduced if each two-qubit gate has unit cost. With the proposed gate library, a cost reduction of 26.65% is achieved for three-input circuits that realize binary reversible functions. Table 6.5. Generated three-qubit minimal circuits with NCV gates Quantum cost (#NCV gates) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Total Average

#fsemi−classical

#fclassical

1 21 219 1,489 7,389 26,328 66,269 125,318 211,248 389,346 632,088 748,016 781,620 872,340 759,436 237,144 19,680 768

1 9 51 187 417 714 1,373 3,176 4,470 4,122 10,008 5,036 1,236 8,340 1,180 -

4,878,720

40,320

11.56

10.32

It can be observed that both gate sequences shown in Figure 6.10 results in an identical matrix, therefore, they can both act as semiclassical two-qubit gates; that is, if a semi-classical circuit c1 realizes a function f , then there may exist entangled sub-circuits in a circuit c2 that realizes the same function f . This raises the question: Can the inclusion of entangled gates lead to circuits with less cost? It is found that entangled two-qubit gates have the potential to reduce two-qubit costs in circuits as illustrated in the following example.

Universal Two-Qubit Quantum Gates

377

Example 6.54. The circuit with the two-qubit cost of 5, as shown in Figure 6.11 (a), is obtained by using a two-qubit semi-classical gate library. The circuit realizes the function f (0, 1, 2, 3, 4, 5, 6, 7) → (1, 7, 4, 2, 6, 0, 5, 3) . However, for the function f , there is another circuit that includes an entangled gate (the two-qubit gate on the right-hand side) as shown in Figure 6.11 (b), and the circuit has a two-qubit cost of 4. 0

1

x0 x1 (a)

3

4

V

5

6

7 o0 o1

V

x2

x2

two-qubit cost: 5

o2 0

x0 x1 (b)

2

1

2 V

3

4

5 V

6

7 V

8

9 o0 o1

two-qubit cost: 4

o2

Figure 6.11. Circuit with different two-qubit costs: (a) semi-classical gates, (b) entangled gates.

6.2.3. A Restricted Two-Qubit Gate Library If a semi-classical quantum circuit is obtained from the decomposition of a classical reversible circuit, then it can be shown that logic operations on each gate in the circuit can be performed without entanglement. However, the gates in circuits can be rearranged in such a way that many sequences of single-qubit and two-qubit gates can be formed as either two-qubit semi-classical or two-qubit entangled gates. Therefore, it may be advantageous to consider two-qubit gates that are not semi-classical. Moreover, Example 6.54 shows that if we use both two-qubit semi-classical and entangled gates, then we can obtain circuits with reduced two-qubit costs. With this motivation, we expanded the two-qubit gate library to include entangled gates as well. The resulting gate library is referred to as a restricted two-qubit gate library. Definition 6.57 specifies the gates of this library.

378

Quantum Circuits

Table 6.6. Restricted two-qubit gates #NCV gates

#Restricted two-qubit gates

1 2 3 4 5 6 7

8 37 103 195 216 164 60

Total

783

Definition 6.57. A sequence of single qubit and two-qubit elementary quantum gates that are acting on the same two qubits in a circuit is said to be a restricted two-qubit gate if the gate sequence produces at least four two-qubit states that are not entangled. The motivation for Definition 6.57 is explained as follows. We are only considering circuits that have binary inputs and binary outputs. However, since we are using NCV gates, internal values may take 4 values (even with binary inputs). This has the consequence that a twoqubit system may be in one of 16 states. However, of these 16 states, only 4 will be used, since the input to the circuit is binary. Therefore, as long as 4 states are not entangled, it is possible to use such a subcircuit since it has the potential to generate a non-entangled output. By using an iterative search method, a set of restricted two-qubit gates that comply with Definition 6.57 is found. Different realizations of a function are possible, but only one realization for a given function is considered. Equivalence of gate realizations are verified by checking the equivalence of the matrices of the gates. Details of generating restricted two-qubit gates can also be found in [259]. The total number of restricted two-qubit gates is 783. Results are shown in the righthand column of Table 6.6. The results of finding three-qubit minimal circuits are shown in Table 6.7 in which the left-hand column represents the two-qubit cost of circuits. The results show a significant difference in realizing classical and semi-classical functions by using restricted two-qubit gates

Universal Two-Qubit Quantum Gates

379

Table 6.7. Minimal three-qubit circuits with restricted two-qubit gates Realizations with restricted two-qubit gates Two-qubit cost 0 1 2 3 4 5 6 7 8 9 Total Average

#fsemi−classical

#fclassical

1 492 38,415 167,108 209,952 632,160 1,108,032 1,096,704 1,421,568 204,288

1 66 893 384 4,320 4,896 4,032 14,976 0 10,752

4,878,720

40,320

6.58

6.71

and semi-classical two-qubit gates. In realizing classical functions, the number of realizations with costs of 4 and 6 obtained by using restricted two-qubit gates is, on average, double the number of realizations obtained by using semi-classical gates. A dramatic improvement can be noticed for costs of 7 and 9. However, by using the restricted two-qubit gate library, we have not found any circuits with a cost of 8 that realizes a classical reversible function. If we use a restricted twoqubit gate library, then the average cost improvements are 11.36% and 34.98% with respect to the semi-classical two-qubit gate library and the NCV library.

6.2.4. Impact on Toffoli Gates By using two-qubit semi-classical gates, we found the quantum realization for the Toffoli-3 gate with positive and negative controls as shown in Table 6.8. Toffoli-3 with two negative controls can be realized with 6 NOT, Controlled-NOT, Controlled-V, and Controlled-V† (NCV) gates, however, its two-qubit cost is 5 according to our proposed cost model. Moreover, with the restricted two-qubit gate library, the same results are obtained.

380

Quantum Circuits

Table 6.8. Realizations of Toffoli-3 with positive and negative controls Reversible realization

NCV realization

x0 x1 x2

o0 o1 o2

x0 x1 x2

x0 x1 x2

o0 o1 o2

x0 x1 x2

x0 x1 x2

o0 o1 o2

V†

V

V†

V

x0 x1 x2

V

V

Quantum cost

Two-qubit cost

V

o0 o1 o2

5

5

V†

o0 o1 o2

5

5

6

5

V

o0 o1 o2

We observed that by using an entangled gate in the Toffoli-3 realization, it is possible to transform it with the adjacent controls and targets into its Linear Nearest Neighbor (LNN) circuit [256] as shown in Figure 6.12 (b). For comparison, the best known LNN circuit of Toffoli-3 is shown in Figure 6.12 (c). Both circuits have the same two-qubit cost. x0 (a)

o0 o1

x1 x2

o2 0

1

2

3

4

x0 x1 (b)

x2

5

6

V

1

2

3

4

5

6

7

8 o0 o1

x1 V

V

†

two-qubit cost: 6

o2

V†

V

0

x2

8 o0 o1

x0 (c)

7

V

two-qubit cost: 6

o2

Figure 6.12. Toffoli gate: (a) classical reversible circuit, (b) LNN circuit with an entangled gate, (c) the best known LNN circuit.

Universal Two-Qubit Quantum Gates

(a)

x0 x1

o0 o1

x2

o2 0

x0 x1 (c)

x0 x1

3

V

x2

(b) 2

V†

V

4

5

6

7

8

9 10 11 12

V†

V

V

o0 o1 o2

0

x2

o0 o1 o2

x2

x0 x1 (d)

1

381

V†

1

2

3

4 V†

5

6

7

8

two-qubit cost: 8

9 V

o0 o1 o2

two-qubit cost: 6

Figure 6.13. Toffoli-3 with non-adjacent controls: (a) classical circuit, (b) decomposition, (c) LNN-c8 circuit, (d) LNN-c6 circuit.

However, we have found that the Toffoli-3 gate with a non-adjacent control and target shown in Figure 6.13 (a) can be transformed into an LNN circuit with improved costs under the LNN constraints. Two different heuristics can be considered. 1. Decompose Toffoli-3 and transform the circuit into an LNN circuit: After moving the control of each CNOT gate in the circuit shown in Figure 6.13 (b) towards the target, the resulting circuit with a two-qubit cost of 8 is shown in Figure 6.13 (c). The circuit has 13 NCV gates. By moving the target of each CNOT gate in the circuit of Figure 6.13 (b) towards the control, the obtained circuit can be minimized which results in the circuit with a two-qubit cost of 6 as shown in Figure 6.13 (d). The resulting circuit has 10 NCV gates. 2. Transform Toffoli-3 such that controls are adjacent and also the target and controls are adjacent, and then replace the Toffoli-3 with its LNN circuit: If we replace the Toffoli-3 in the circuit shown in Figure 6.14 (b) with the best known non-entangled circuit in Figure 6.12 (c), then the resulting circuit will have 13 NCV gates and a two-qubit cost of 7. On the other hand, if we use the entangled circuit shown in Figure 6.12 (b) in replacing the Toffoli-3, then we obtain the min-

382

Quantum Circuits

(a)

x0 x1

o0 o1

x2

o2 0

1

x0 (b) 2

3

4

x1

o0 o1

x2

o2

5

6

x0 x1 (c)

x2

9 10 11 12 o0 o1

1

2

o2

V†

V

0

x2

8

V

3

4

x0 x1 (d)

7

5

6

7 V

V

V†

8

9 10 o0 o1 o2

Figure 6.14. Minimized Toffoli-3 with non-adjacent controls: (a) classical circuit, (b) decomposition, (c) replaced Toffoli-3 circuit, (c) minimized LNN circuit.

imized circuit with 11 NCV gates as shown in Figure 6.14 (d). However, the two-qubit cost of the circuit is still 7. In summary, Toffoli-3 with adjacent controls and targets can be realized with a two-qubit cost of 5. This two-qubit cost does not change if we consider mixed polarity of controls in the Toffoli-3 gate. However, if the controls (even with mixed polarity) of the Toffoli-3 gate are non-adjacent, then the two-qubit cost is increased by only 1 as shown in Figure 6.13 (d).

Bibliography [1]

N. Abdessaied et al. “Upper bounds for reversible circuits based on Young subgroups”. In: Information Processing Letters 114.6 (2014), pp. 282–286. doi: 10.1016/j.ipl.2014.01.003. url: http://dx.d oi.org/10.1016/j.ipl.2014.01.003.

[2]

A. Abdollahi. “Probabilistic Decision Diagrams for Exact Probabilistic Analysis”. In: Proceedings of IEEE/ACM International Conference on Computer-Aided Design. ICCAD. San Jose, CA, USA: IEEE, 2007, pp. 266–272. isbn: 978-1-4244-1381-2. doi: 10.1109/I CCAD.2007.4397276.

[3]

G. M. Adelson-Velsky, V. L. Arlazarov, and M. V. Donskoy. Algorithms for Games. New York, NY, USA: Springer, 1988. isbn: 0-387-96629-3.

[4]

N. Ahmed and K. R. Rao. Orthogonal Transforms for Digital Signal Processing. Berlin Heidelberg: Springer, 1975. isbn: 978-3-64245452-3. doi: 10.1007/978-3-642-45450-9.

[5]

A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques and Tools. Boston, MA, USA: Addison-Wesley, 1986. isbn: 0-201-10088-6.

[6]

R. M. Aiex, G. Mauricio, and C. C. Riberio. “Probability Distribution of Solution Time in GRASP: An Experimental Investigation”. In: Journal of Heuristics 8 (2002), pp. 343–373.

[7]

M. Aitkin, D. Anderson, and J. Hinde. “Statistical Modelling of Data on Teaching Styles (with discussion)”. In: Journal of the Royal Statistical Society. A (general) 144.4 (1981), pp. 419–461. doi: 10.2 307/2981826. url: http://www.jstor.org/stable/2981826.

[8]

M. Aitkin and D. B. Rubin. “Estimation and Hypothesis Testing in Finite Mixture Models”. In: Journal of the Royal Statistical Society. B (methodological) 1 (1985), pp. 67–75. url: http://www.jstor.o rg/stable/2345545.

384

Bibliography

[9]

S. B. Akers. “Binary Decision Diagrams”. In: IEEE Transactions on Computers C-27.6 (June 1978), pp. 509–516. issn: 0018-9340. doi: 10.1109/TC.1978.1675141.

[10]

J. M. Alonso-Meijide and J. Freixas. “A new Power Index Based on Minimal Winning Coalitions Without any Surplus”. In: Decision Support Systems 49.1 (2010), pp. 70–76. doi: 10.1016/j.dss.2010 .01.003.

[11]

G. M. Amdahl. “Validity of the single processor approach to achieving large scale computing capabilities”. In: Proceedings of the April 18-20, 1967, Spring Joint Computer Conference. AFIPS ’67 (Spring). Atlantic City, New Jersey: ACM, 1967, pp. 483–485. doi: 10.1145/1465482.1465560. url: http://doi.acm.org/10.1145/1 465482.1465560.

[12]

M. Amy, D. Maslov, and M. Mosca. “Polynomial-time T -depth Optimization of Clifford+T circuits via Matroid Partitioning”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 33.10 (2014). arXiv:1303.2042 [quant-ph], pp. 1476–1489. issn: 0278-0070. doi: 10.1109/TCAD.2014.2341953.

[13]

J. Aoe. Computer Algorithms: String Pattern Matching Strategies. Wiley-IEEE Computer Society Press, 1994. isbn: 978-0-8186-54626.

[14]

B. Arneson, R. Hayward, and P. Henderson. “MoHex Wins Hex Tournament”. In: International Computer Games Association Journal 32.2 (2009), pp. 114–116.

[15]

P. Aspinwall et al. Dirichlet Branes and Mirror Symmetry. American Mathematical Society, Clay Mathematics Institute, 2009. isbn: 978-0821838488. url: http://books.google.nl/books?id=4gvQYr OmRNAC.

[16]

J. T. Astola et al. GPU Computing with Applications in Digital Logic. TICSP 62. Tampere, Finland: Tampere International Center for Signal Processing, 2012. isbn: 978-952-15-2920-7.

[17]

ATLAS Collaboration. “Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC”. In: Physics Letters B 716.1 (2012), pp. 1–29. issn: 03702693. doi: doi . org / 10 . 1016 / j . physletb . 2012 . 08 . 020. url: http://www.sciencedirect.com/science/article/pii/S0370269 31200857X.

Bibliography

385

[18]

G. Ausiello et al. Complexity and Approximation. Berlin Heidelberg: Springer Verlag, 1999. isbn: 978-3-540-65431-5. doi: 10.1007/9783-642-58412-1.

[19]

M. Avdispahić, N. Memić, and F. Weisz. “Maximal functions, Hardy spaces and Fourier multiplier theorems on unbounded Vilenkin groups”. In: Journal of Mathematical Analysis and Applications 390.1 (2012), pp. 68–73. doi: 10.1016/j.jmaa.2012.01.019.

[20]

AV-TEST - The Independent IT-Security Institute. url: http://w ww.av-test.org/.

[21]

R. I. Bahar et al. “Algebraic Decision Diagrams and Their Applications”. In: Proceedings of the 1993 IEEE/ACM International Conference on Computer-Aided Design. ICCAD. IEEE, 1993, pp. 188– 191. isbn: 0-8186-4490-7. doi: 10.1109/ICCAD.1993.580054.

[22]

J. Balc´ arek, P. Fiˇser, and J. Schmidt. “Techniques for SAT-based Constrained Test Pattern Generation”. In: Microprocessors and Microsystems 37.2 (Mar. 2013), pp. 185–195. doi: 10.1016/j.micpro .2012.09.010.

[23]

J. S. Banks. “Sophisticated Voting Outcomes and Agenda Control”. In: Social Choice and Welfare 1.4 (1985), pp. 295–306. issn: 01761714. doi: 10.1007/BF00649265.

[24]

A. Barenco et al. “Elementary Gates for Quantum Computation”. In: Physical Review A 52 (1995), pp. 3457–3467. doi: http://dx.d oi.org/10.1103/PhysRevA.52.3457.

[25]

M. Becchi and P. Crowley. “An Improved Algorithm to Accelerate Regular Expression Evaluation”. In: Proceedings of the 3rd ACM/IEEE Symposium on Architecture for Networking and Communications Systems. ANCS. Orlando, Florida, USA: ACM, Dec. 2007, pp. 145–154. isbn: 978-1-59593-945-6. doi: 10.1145/1323548 .1323573.

[26]

T. Benaglia et al. “Mixtools: An R Package for Analyzing Mixture Models”. In: Journal of Statistical Software 32.6 (2009), pp. 1–29. issn: 1548-7660. url: http://www.jstatsoft.org/v32/i06.

[27]

R. Berghammer. “Computing and visualizing Banks sets of dominance relations using relation algebra and RelView”. In: The Journal on Logic and Algebraic Programming 82.3–4 (2013), pp. 123–136. issn: 1567-8326. doi: 10.1016/j.jlap.2013.02.001.

386

Bibliography

[28]

R. Berghammer and S. Bolus. “On the Use of Binary Decision Diagrams for Solving Problems on Simple Games”. In: European Journal of Operational Research 222.3 (2012), pp. 529–541. doi: 10.101 6/j.ejor.2012.04.015.

[29]

R. Berghammer and A. Fronk. “Exact Computation of Minimum Feedback Vertex Sets with Relational Algebra”. In: Fundamenta Informaticae 70.4 (2006), pp. 301–316. issn: 0169-2968.

[30]

R. Berghammer, B. Leoniuk, and U. Milanese. “Implementation of Relational Algebra Using Binary Decision Diagrams”. In: Relational Methods in Computer Science. Ed. by H. C. M. de Swart. Vol. 2561. LNCS. Berlin Heidelberg: Springer, 2002, pp. 241–257. isbn: 978-3540-00315-1. doi: 10.1007/3-540-36280-0_17.

[31]

R. Berghammer and F. Neumann. “RelView – An OBDD-Based Computer Algebra System for Relations”. In: Computer Algebra in Scientific Computing. Ed. by V. G. Ganzha, E. W. Mayr, and E. V. Vorozhtsov. Vol. 3718. LNCS. Berlin Heidelberg: Springer, 2005, pp. 40–51. isbn: 978-3-540-28966-1. doi: 10.1007/11555964_4.

[32]

R. Berghammer, A. Rusinowska, and H. de Swart. “Computations on Simple Games Using RelView”. In: Computer Algebra in Scientific Computing. Ed. by V. P. Gerdt et al. Vol. 6885. LNCS. Berlin Heidelberg: Springer, 2011, pp. 49–60. doi: 10.1007/978-3-642-2 3568-9_5.

[33]

Berkeley Logic Synthesis and Verification Group. ABC: A System for Sequential Synthesis and Verification. 2000. url: http://www.e ecs.berkeley.edu/~alanmi/abc/.

[34]

Berkeley PLA Tools: espresso. url: http://www.eecs.berkeley.e du/XRG/Software/Description/platools.html.

[35]

A. Bernasconi et al. “Compact DSOP and Partial DSOP Forms”. In: Theory of Computing Systems 53.4 (2013), pp. 583–608. doi: 10.1007/s00224-013-9447-2.

[36]

G. Berry and R. Sethi. “From Regular Expressions to Deterministic Automata”. In: Theoretical Computer Science. Vol. 48. Elsevier, 1986, pp. 117–126. doi: 10.1016/0304-3975(86)90088-5.

[37]

A. Bérut et al. “Experimental verification of Landauer’s principle linking information and thermodynamics”. In: Nature 483 (2012), pp. 187–189. doi: 10.1038/nature10872.

Bibliography

387

[38]

P. W. Besslich. “Spectral Processing of Switching Functions Using Signal–flow Transformations”. In: Spectral Techniques and Fault Detection. Ed. by M. G. Karpovsky. Boston, Massachusetts: Academic Press, 1985, pp. 91–141. isbn: 978-0-12-400060-5.

[39]

T. Beth and M. R¨ otteler. “Quantum Algorithms: Applicable Algebra and Quantum Physics”. In: Quantum information. Ed. by G. Alber et al. Springer Tracts in Modern Physics. Springer, 2001, pp. 96–150. isbn: 978-3-540-41666-1. doi: 10.1007/3-540-44678-8 _4.

[40]

A. Biere. Lingeling, Plingeling, PicoSAT and PrecoSAT at SAT Race 2010. Tech. rep. 10/1. Altenbergerstr. 69, 4040 Linz, Austria: Institute for Formal Models and Verification, Johannes Kepler University, Aug. 2010.

[41]

A. Biere et al., eds. Handbook of Satisfiability. Vol. 185. Frontiers in Artificial Intelligence and Applications. IOS Press, Feb. 2009. isbn: 978-1-58603-929-5.

[42]

J. Bispo et al. “Regular Expression Matching for Reconfigurable Packet Inspection”. In: Proceedings of 2006 IEEE International Conference on Field Programmable Technology. FPT. Bangkok: IEEE, 2006, pp. 119–126. isbn: 0-7803-9729-0. doi: 10.1109/FPT .2006.270302.

[43]

D. Bochmann and B. Steinbach. Logic Design with XBOOLE. In German: Logikentwurf mit XBOOLE. Berlin: Verlag Technik, 1991. isbn: 3-341-01006-8.

[44]

D. Bogdanov, R. Talviste, and J. Willemson. “Deploying Secure Multi-Party Computation for Financial Data Analysis - (Short Paper)”. In: Financial Cryptography. Vol. 7397. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, 2012, pp. 57–64. isbn: 978-3-642-32945-6. doi: 10.1007/978-3-642-32946-3_5.

[45]

S. Bolus. “A QOBDD-based Approach to Simple Games”. PhD thesis. University of Kiel, 2012.

[46]

S. Bolus. “Power Indices of Simple Games and Vector-Weighted Majority Games by Means of Binary Decision Diagrams”. In: European Journal of Operational Research 210.2 (2011), pp. 258–272. doi: 10.1016/j.ejor.2010.09.020.

[47]

G. Boole. An Investigation of the Laws of Thought: on Which are Founded the Mathematical Theories of Logic and Probabilities. Reissued by CUP 2009. Cambridge University Press, 1854. isbn: 9781116557343.

388

Bibliography

[48]

B. Bose and D. J. Lin. “Systematic Unidirectional Error-Detecting Codes”. In: IEEE Transactions on Computers 34 (1985), pp. 1026– 1032. issn: 0018-9340. doi: http://doi.ieeecomputersociety.or g/10.1109/TC.1985.1676535.

[49]

K. Brace, R. Rudell, and R. E. Bryant. “Efficient Implementation of a BDD Package”. In: 27th ACM/IEEE Design Automation Conference. Orlando, FL, USA: IEEE, June 1990, pp. 40–45. isbn: 089791-363-9. doi: 10.1109/DAC.1990.114826.

[50]

A. Braeken and B. Preneel. “On the Algebraic Immunity of Symmetric Boolean Functions”. In: Progress in Cryptology - INDOCRYPT 2005. Lecture Notes in Computer Science 3797. Berlin, Heidelberg: Springer, 2005, pp. 35–48. isbn: 978-3-540-30805-8. doi: 10.1007/1 1596219_4.

[51]

R. Brayton and A. Mishchenko. “ABC: An Academic Industrial Strength Verification Tool”. In: Proceedings of the 22dn International Conference on Computer Aided Verification. Lecture Notes in Computer Science 6174. Edinburgh, UK: Springer, 2010, pp. 24– 40. isbn: 978-3-642-14294-9. doi: 10.1007/978-3-642-14295-6_5.

[52]

M. A. Breuer. “Generation of Optimal Code for Expressions via Factorization”. In: Communications of the ACM 12.6 (June 1969), pp. 333–340. issn: 0001-0782. doi: 10.1145/363011.363153. url: http://doi.acm.org/10.1145/363011.363153.

[53]

F. Brglez and H. Fujiwara. “A Neutral Netlist of 10 Combinational Benchmark Circuits and A Target Translator in FORTRAN”. In: Proceedings of the 1985 IEEE International Symposium on Circuits and Systems. ISCAS. IEEE, 1985, pp. 663–698.

[54]

C. Browne et al. “A Survey of Monte Carlo Tree Search Methods”. In: Computational Intelligence and AI in Games, IEEE Transactions on 4.1 (2012), pp. 1–43. issn: 1943-068X. doi: 10.1109/TCIA IG.2012.2186810.

[55]

R. E. Bryant. “Graph-Based Algorithms for Boolean Function Manipulation”. In: IEEE Transactions on Computers C-35.8 (1986), pp. 677–691. issn: 0018-9340. doi: 10.1109/TC.1986.1676819.

[56]

J. T. Butler et al. “Comments on Sympathy: Fast Exact Minimization of Fixed Polarity Reed-Muller Expansion for Symmetric Functions”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 19.11 (2000), pp. 1386–1388. issn: 0278-0070. doi: 10.1109/43.892862.

Bibliography

389

[57]

J. T. Butler et al. “On the number of generators for transeunt triangles”. In: Discrete Applied Mathematics 108.3 (2001), pp. 309–316. doi: 10.1016/S0166-218X(00)00240-7.

[58]

J. T. Butler et al. “On the Use of Transeunt Triangles to Synthesize Fixed-Polarity Reed-Muller Expansions of Functions”. In: Proceedings – Reed-Muller Workshop 2009. RM. Okinawa, Japan, 2009, pp. 119–126.

[59]

A. Canteaut and M. Videau. “Symmetric Boolean Functions”. In: IEEE Transactions on Information Theory 51.8 (2005), pp. 2791– 2811.

[60]

C. Carlet and K. Feng. “An Infinite Class of Balanced Functions with Optimal Algebraic Immunity, Good Immunity to Fast Algebraic Attacks and Good Nonlinearity”. In: Advances in CryptologyASIACRYPT 2008. Ed. by J. Pieprzyk. Vol. 5350. Lecture Notes in Computer Science. Berlin Heidelberg: Springer, 2008, pp. 425–440. isbn: 978-3-540-89254-0. doi: 10.1007/978-3-540-89255-7_26.

[61]

N. Cascarano et al. “iNFAnt: NFA Pattern Matching on GPGPU Devices”. In: ACM SIGCOMM Computer Communication Review. Vol. 40. 5. New York, NY, US: ACM, Oct. 2010, pp. 20–26. doi: 10.1145/1880153.1880157.

[62]

T. Cazenave. “Nested Monte-Carlo Search”. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence. Ed. by C. Boutilier. IJCAI. Pasadena, California, USA, July 2009, pp. 456– 461. url: http://ijcai.org/papers09/Papers/IJCAI09-083.pdf.

[63]

M. Ceberio and V. Kreinovich. “Greedy Algorithms for Optimizing Multivariate Horner Schemes”. In: SIGSAM Bull. 38.1 (Mar. 2004), pp. 8–15. issn: 0163-5824. doi: 10.1145/980175.980179. url: htt p://doi.acm.org/10.1145/980175.980179.

[64]

G. M.-B. Chaslot et al. “Progressive Strategies for Monte-Carlo Tree Search”. In: New Mathematics and Natural Computation. NMNC 4.03 (2008), pp. 343–357. doi: 10.1142/S1793005708001094. url: http://ideas.repec.org/a/wsi/nmncxx/v04y2008i03p343-357.h tml.

[65]

A. Chattopadhyay, C. Chandak, and K. Chakraborty. “Complexity Analysis of Reversible Logic Synthesis”. In: arXiv abs/1402.0491 (2014). url: http://arxiv.org/abs/1402.0491.

[66]

C.-T. Chen. Linear System Theory and Design. 2nd ed. New York, NY USA: Holt, Rinehart, and Winston, 1984. isbn: 0-03-060289-0.

390

Bibliography

[67]

H. Chen, C. Gomes, and B. Selman. “Formal Models of HeavyTailed Behavior in Combinatorial Search”. In: Principles and Practice of Constraint Programming — CP 2001. Vol. 2239. Lecture Notes in Computer Science. Berlin Heidelberg: Springer, 2001, pp. 408–421. isbn: 978-3-540-42863-3. doi: 10 . 1007 / 3 - 540 - 455 78-7_28.

[68]

E. Clarke, M. Fujita, and X. Zhao. “Applications of Multi-Terminal Binary Decision Diagrams, Technical Report”. In: IFIP WG 10.5 Workshop on Applications of the Reed-Muller Expansion in Circuit Design. Chiba Makuhari, Japan, 1995, pp. 1–17.

[69]

E. M. Clarke et al. “Multi-Terminal Binary Decision Diagrams: An Efficient Data Structure for Matrix Representation”. In: Proceedings of the 1993 IEEE International Workshop on Logic Synthesis. IWLS. 1993, pp. 1–15.

[70]

A. C. Cohen. Truncated and Censored Samples: Theory and Applications. New York: Marcel Dekker, 1991. isbn: 978-0824784478.

[71]

M. Combescure. “Circulant matrices, Gauss sums and mutually unbiased bases, I. The prime number case”. In: CUBO Mathematical Journal 11 (2009), pp. 73–86.

[72]

S. A. Cook. “The Complexity of Theorem-proving Procedures”. In: Proceedings of the Third Annual ACM Symposium on Theory of Computing. STOC ’71. Shaker Heights, Ohio, USA: ACM, 1971, pp. 151–158. doi: 10.1145/800157.805047. url: http://doi.acm .org/10.1145/800157.805047.

[73]

J. W. Cooley and J. W. Tukey. “An Algorithm for the Machine Calculation of Complex Fourier Series”. In: Mathematics of Computation 19.90 (1965), pp. 297–301. doi: 10 . 2307 / 2003354. url: http://www.jstor.org/stable/2003354.

[74]

F. Corno, M. S. Reorda, and G. Squillero. “RT-level ITC’99 benchmarks and first ATPG results”. In: Design Test of Computers, IEEE 17.3 (July 2000), pp. 44–53. issn: 0740-7475. doi: 10.1109/54.867 894.

[75]

N. T. Courtois. “Fast Algebraic Attacks on Stream Ciphers with Linear Feedback”. In: Advances in Cryptology-CRYPTO 2003. Ed. by E. Biham. Vol. 2729. Lecture Notes in Computer Science. Berlin Heidelberg: Springer, 2003, pp. 176–194. isbn: 978-3-540-40674-7. doi: 10.1007/978-3-540-45146-4_11.

Bibliography

391

[76]

N. T. Courtois and W. Meier. “Algebraic Attacks on Stream Ciphers with Linear Feedback”. In: Advances in Cryptology-EUROCRYPT 2003. Ed. by E. Biham. Vol. 2656. Lecture Notes in Computer Science. Berlin Heidelberg: Springer, 2003, pp. 345–359. isbn: 978-3540-14039-9. doi: 10.1007/3-540-39200-9_21.

[77]

N. T. Courtois and J. Pieprzy. “Cryptanalysis of Block Ciphers with Overdefined Systems of Equations”. In: Advances in CryptologyASIACRYPT 2002. Ed. by Y. Zheng. Vol. 2501. Lecture Notes in Computer Science. Berlin Heidelberg: Springer, 2002, pp. 267–287. isbn: 978-3-540-00171-3. doi: 10.1007/3-540-36178-2_17.

[78]

Cracking DES. Electronic Frontier Foundation (EFF). 1999. url: h ttps://w2.eff.org/Privacy/Crypto/Crypto_misc/DESCracker/.

[79]

Y. Crama and P. L. Hammer. Boolean Models and Methods in Mathematics, Computer Science, and Engineering. 1st ed. Encyclopedia of Mathematics and its Applications 134. Cambridge University Press, 2010. isbn: 978-0-5218-4752-0.

[80]

T. W. Cusick and P. St˘ anic˘ a. Cryptographic Boolean Functions and Applications. 1st ed. Elsevier, 2009. isbn: 978-0-1237-4890-4. doi: 10.1016/B978-0-12-374890-4.00001-X.

[81]

R. Datta and N. A. Touba. “Multiple Bit Upset Tolerant Memory Using a Selective Cycle Avoidance Based SEC-DED-DAEC Code”. In: VLSI Test Symposium, IEEE (May 2007), pp. 349–354. issn: 1093-0167. doi: 10.1109/VTS.2007.40.

[82]

A. De Vos. Reversible Computing: Fundamentals, Quantum Computing, and Applications. Weinheim: Wiley-VCH, 2010. isbn: 9783-527-40992-1.

[83]

A. De Vos and S. De Baerdemacker. “Logics between classical reversible logic and quantum logic”. In: Quantum Physics and Logic, 9th International workshop, Proceedings. Ed. by R. Duncan and P. Panangaden. Brussels, Belgium: Université Libre de Bruxelles (ULB), 2012, pp. 123–128.

[84]

A. De Vos and S. De Baerdemacker. “Matrix Calculus for Classical and Quantum Circuits”. In: ACM Journal on Emerging Technologies in Computing Systems. JETC 11.2 (2014), pp. 1–9. issn: 15504832. doi: 10.1145/2669370. url: http://doi.acm.org/10.1145 /2669370.

392

Bibliography

[85]

A. De Vos and S. De Baerdemacker. “Scaling a unitary matrix”. In: Open Systems & Information Dynamics 21.1450013 (2014). arXiv:1401.7883 [math-ph], pp. 1–21. issn: 1230-1612. doi: 10 . 1 142/S1230161214500139.

[86]

A. De Vos and S. De Baerdemacker. “The decomposition of U(n) into XU(n) and ZU(n)”. In: Proceedings of the 44th International Symposium on Multiple-Valued Logic. ISMVL. Bremen, Germany: IEEE, May 2014, pp. 173–177. doi: 10.1109/ISMVL.2014.38.

[87]

A. De Vos and S. De Baerdemacker. “The NEGATOR as a Basic Building Block for Quantum Circuits”. In: Open Systems & Information Dynamics 20.1 (2013). 1350004. issn: 1230-1612. doi: 10.1142/S12 30161213500042.

[88]

A. De Vos and S. De Baerdemacker. “The Roots of the NOT Gate”. In: Proceedings of the 42nd IEEE International Symposium on Multiple-Valued Logic. ISMVL. Victoria, BC, Canada: IEEE, May 2012, pp. 167–172. isbn: 978-1-4673-0908-0. doi: 10.1109/ISMVL.2 012.14.

[89]

A. De Vos and Y. Van Rentergem. “Reversible computing: from mathematical group theory to electronical circuit experiment”. In: Proceedings of the Second Conference on Computing Frontiers. Ischia, Italy: ACN, May 2005, pp. 35–44. isbn: 1-59593-019-1. doi: 10.1145/1062261.1062270.

[90]

A. De Vos and Y. Van Rentergem. “Young subgroups for reversible computers”. In: Advances in Mathematics of Communications 2.2 (2008), pp. 183–200. issn: 1930-5346. doi: 10.3934/amc.2008.2.1 83.

[91]

J. Deegan and E. W. Packel. “A New Index of Power for Simple n-Person Games”. In: International Journal of Game Theory 7.2 (1978), pp. 113–123. doi: 10.1007/BF01753239.

[92]

A. P. Dempster, N. M. Laird, and D. B. Rubin. “Maximum Likelihood from Incomplete Data via the EM Algorithm”. In: Journal of the Royal Statistical Society. B (methodological) 39.1 (1977), pp. 1– 38.

[93]

M. Dertouzos. Threshold Logic: A Synthesis Approach. 1st ed. Cambridge, USA: MIT Press, 1965. isbn: 978-0262040099.

[94]

B. Desoete and A. De Vos. “A reversible carry-look-ahead adder using control gates”. In: Integration 33.1-2 (2002), pp. 89–104. doi: 10.1016/S0167-9260(02)00051-2. url: http://dx.doi.org/10.1 016/S0167-9260(02)00051-2.

Bibliography

393

[95]

D. L. Dietmeyer. “Generating minimal covers of symmetric functions”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 12.5 (1993), pp. 710–713. issn: 02780070. doi: 10.1109/43.277615.

[96]

W. Diffie and M. E. Hellman. “New Directions in Cryptography”. In: IEEE Transactions on Information Theory 22.6 (1976), pp. 644– 654. doi: 10.1109/TIT.1976.1055638.

[97]

C. Ding, G. Xiao, and W. Shan. The Stability Theory of Stream Ciphers. Lecture Notes in Computer Science 561. Berlin Heidelberg: Springer, 1991. isbn: 978-3-540-54973-4. doi: 10.1007/3-540-5497 3-0.

[98]

P. A. M. Dirac. “A New Notation for Quantum Mechanics”. In: Proceedings of the Cambridge Philosophical Society 35.03 (1939), pp. 416–422. doi: 10.1017/S0305004100021162.

[99]

J. Divyasree, H. Rajashekar, and K. Varghese. “Dynamically Reconfigurable Regular Expression Matching Architecture”. In: Proceedings of International Conference on Application-Specific Systems, Architectures and Processors. ASAP. Leuven: IEEE, July 2008, pp. 120–125. isbn: 978-1-4244-1897-8. doi: 10 . 1109 / ASAP . 2008 .4580165.

[100]

R. Drechsler and B. Becker. Binary Decision Diagrams: Theory and Implementation. Springer, 1998. isbn: 978-0-7923-8193-8. doi: 10.1 007/978-1-4757-2892-7.

[101]

R. Drechsler, A. Finder, and R. Wille. “Improving ESOP-Based Synthesis of Reversible Logic Using Evolutionary Algorithms”. In: Applications of Evolutionary Computation. Ed. by C. Chio et al. Vol. 6625. Lecture Notes in Computer Science. Berlin Heidelberg: Springer, 2011, pp. 151–161. isbn: 978-3-642-20519-4. doi: 10.1007 /978-3-642-20520-0_16.

[102]

F. Dresig et al. Programming with XBOOLE. In German: Programmieren mit XBOOLE. TU Chemnitz, 1992.

[103]

G. Dueck and D. Maslov. “Reversible Function Synthesis with Minimum Garbage Outputs”. In: Proceedings of the 6th Symposium on Representations and Methodology of Future Computing Technologies. RM. Trier, Germany: University of Trier, Mar. 2003, pp. 154– 161.

394

Bibliography

[104]

G. W. Dueck et al. “A Method to Find the Best Mixed Polarity Reed-Muller Expression Using Transeunt Triangle”. In: International Workshop on Applications of Reed-Muller Expansion in Circuit Design. RM. Starkville, MS, USA, 2001, pp. 82–93.

[105]

C. R. Edwards. “Characterization of threshold functions under the Walsh transform and linear translation”. In: Electronics Letters 11.23 (1975), pp. 563–565. issn: 0013-5194. doi: 10 . 1049 / el : 19 750430.

[106]

C. R. Edwards and S. L. Hurst. “A Digital Synthesis Procedure Under Function Symmetries and Mapping Methods”. In: IEEE Transactions on Computers C-27.11 (1978), pp. 985–997. issn: 0018-9340. doi: 10.1109/TC.1978.1674988.

[107]

J. Ellis et al. “Higgs bosons in a non-minimal supersymmetric model”. In: Physical Review D 39.3 (Feb. 1989), pp. 844–869. doi: 10.1103/PhysRevD.39.844. url: http://link.aps.org/doi/10.1 103/PhysRevD.39.844.

[108]

G. Epstein. “Synthesis of Electronic Circuits for Symmetric Functions”. In: IRE Transactions on Electronic Computers EC-7.1 (1958), pp. 57–60. issn: 0367-9950. doi: 10 . 1109 / TEC . 1958 . 522 2097.

[109]

EXTRA home page. url: http://www.ee.pdx.edu/~alanmi/rese arch/extra.htm.

[110]

S. Fenner et al. Rectangle Free Coloring of Grids. 2009. url: http: //www.cs.umd.edu/~gasarch/papers/grid.pdf.

[111]

D. Ficara et al. “An Improved DFA for Fast Regular Expression Matching”. In: ACM SIGCOMM Computer Communication Review. Vol. 38. 5. Oct. 2008, pp. 31–40.

[112]

D. Ficara et al. “Differential Encoding of DFAs for Fast Regular Expression Matching”. In: IEEE/ACM Transactions on Networking 19.3 (June 2011), pp. 683–694. issn: 1063-6692. doi: 10.1109/TNET .2010.2089639.

[113]

P. Fiˇser and J. Schmidt. “How Much Randomness Makes a Tool Randomized?” In: Proceedings of the 20th International Workshop on Logic and Synthesis. IWLS. San Diego, California, USA, 2011, pp. 136–143.

[114]

P. Fiˇser and J. Schmidt. “On Using Permutation of Variables to Improve the Iterative Power of Resynthesis”. In: 10th International Workshop on Boolean Problems. Ed. by B. Steinbach. IWSBP. Freiberg, Germany, 2012, pp. 107–114. isbn: 978-3-86012-438-3.

Bibliography

395

[115]

P. Fiˇser, J. Schmidt, and J. Balc´ arek. “On Robustness of EDA Tools”. In: Proceedings of the 17th Euromicro Conference on Digital Systems Design. DSD. Verona, Italy: IEEE, 2014, pp. 427–434. doi: 10.1109/DSD.2014.22.

[116]

P. Fiˇser, J. Schmidt, and J. Balc´ arek. “Sources of Bias in EDA Tools and Its Influence”. In: Proc. of 17th IEEE Symposium on Design and Diagnostics of Electronic Systems. DDECS. Warsaw, Poland, 2014, pp. 258–261.

[117]

M. J. Flynn. “Some Computer Organizations and Their Effectiveness”. In: IEEE Trans. Comput. 21.9 (Sept. 1972), pp. 948–960. issn: 0018-9340. doi: 10.1109/TC.1972.5009071. url: http://dx .doi.org/10.1109/TC.1972.5009071.

[118]

M. J. Foster and H. T. Kung. “The Design of Special-Purpose VLSI Chips”. In: Computer 13.1 (1980), pp. 26–40. issn: 0018-9162. doi: 10.1109/MC.1980.1653338.

[119]

E. Fredkin and T. Toffoli. “Conservative Logic”. In: International Journal of Theoretical Physics 21.3–4 (1982), pp. 219–253. issn: 0020-7748. doi: 10.1007/BF01857727.

[120]

E. Fujiwara and D. K. Pradhan. “Error-Control Coding in Computers”. In: Computer 23.7 (July 1990), pp. 63–72. issn: 0018-9162. doi: 10.1109/2.56853.

[121]

A. Gaidukov. “Algorithm to derive minimum ESOP for 6-variable function”. In: 5th International Workshop on Boolean Problems. Ed. by B. Steinbach. IWSBP. 2002, pp. 141–148. isbn: 3-86012-180-4.

[122]

T. Ganegedara, Y. Yang, and V. K. Prasanna. “Automation Framework for Large-Scale Regular Expression Matching on FPGA”. In: Proceedings of 2010 IEEE International Conference on Field Programmable Logic and Applications. FPL. Milano, Italy: IEEE, Sept. 2010, pp. 50–55. isbn: 978-1-4244-7842-2. doi: 10.1109/FPL.2010 .21.

[123]

G. G´ at, ed. Conference on Dyadic Analysis and Related Fields with Applications. DARFA. Nyiregyhyza, Hungary: College of Nyiregyhaza, June 2014.

[124]

G. G´ at, ed. International Conference Dyadic Analysis and Applications. DARFA. Nyiregyhyza, Hungary: College of Nyiregyhaza, Oct. 2013.

[125]

G. G´ at, ed. International Workshop on Theory of the Walsh System and Related Areas. Nyiregyhyza, Hungary: College of Nyiregyhaza, Oct. 2013.

396

Bibliography

[126]

G. G´ at and U. Goginava. “Triangular Fejér summability of twodimensional Walsh-Fourier series”. In: Analysis Mathematica 40.2 (2014), pp. 83–104. issn: 0133-3852. doi: 10.1007/s10476-014-02 01-z.

[127]

G. P. Gavrilov and A. A. Sapoghenko. Collection task on discrete mathematics. In Russian. Moscow, USSR: Nauka, 1977.

[128]

M. Gebser et al. “clasp: A Conflict-Driven Answer Set Solver”. In: Logic Programming and Nonmonotonic Reasoning. Vol. 4483. Lecture Notes in Computer Science. Tempe, AZ, USA: Springer, 2007, pp. 260–265. isbn: 978-3-540-72199-4. doi: 10.1007/978-3-540-72 200-7_23.

[129]

A. Ghosh et al. “Estimation of Average Switching Activity in Combinational and Sequential Circuits”. In: Proceedings of the 29th ACM/IEEE Design Automation Conference. Anaheim, CA, USA: IEEE, June 1992, pp. 253–259. isbn: 0-8186-2822-7. doi: 10.1109 /DAC.1992.227826.

[130]

J. E. Gibbs. “Walsh spectrometry, a form of spectral analysis well suited to binary digital computation”. In: NPL DES Reports. Teddington, Middlesex, UK: National Physical Laboratory, 1967.

[131]

O. Golubitsky and D. Maslov. “A Study of Optimal 4-bit Reversible Toffoli Circuits and Their Synthesis”. In: IEEE Transactions on Computers 61.9 (Sept. 2012), pp. 1341–1353. issn: 0018-9340. doi: 10.1109/TC.2011.144.

[132]

D. A. Gorodecky and V. P. Suprun. “Polynomial expansion of symmetric Boolean functions”. In: Recent Progress in the Boolean Domain. Ed. by B. Steinbach. Newcastle upon Tyne, UK: Cambridge Scholars Publishing, 2014, pp. 247–262. isbn: 978-1-4438-5638-6.

[133]

M. G¨ ossel and E. Sogomonyan. “A Non-linear Split Error Detection Code”. In: Fundamenta Informaticae 83.1-2 (June 2008), pp. 109– 115. issn: 0169-2968.

[134]

A. Granville. “Arithmetic Properties of Binomial Coefficients I: Binomial coefficients modulo prime powers”. In: Canadian Mathematical Society Conference Proceedings. Vol. 20. 1997, pp. 253–275.

[135]

D. Grosse et al. “Exact Multiple Control Toffoli Network Synthesis with SAT Techniques”. In: IEEE Transactions on CAD of Integrated Circuits and Systems 28.5 (May 2009), pp. 703–715. issn: 0278-0070. doi: 10.1109/TCAD.2009.2017215.

Bibliography

397

[136]

W. H. Hanson. “Ternary threshold logic”. In: IEEE Transactions on Electronic Computers EC-12.3 (1963), pp. 191–197. issn: 0367-7508. doi: 10.1109/PGEC.1963.263530.

[137]

J. Hashimoto et al. “Accelerated UCT and Its Application to Two-Player Games”. In: Advances in Computer Games. Ed. by J. Hashimoto et al. Vol. 7168. Lecture Notes in Computer Science. Berlin Heidelberg: Springer, 2012, pp. 1–12. isbn: 978-3-642-318658. doi: 10.1007/978-3-642-31866-5_1. url: http://id.nii.ac.j p/0023/00008786.

[138]

J. T. L. Ho and G. G. F. Lemieux. “PERG: A Scalable FPGA-Based Pattern-Matching Engine with Consolidated Bloomier Filters”. In: Proceedings of 2008 IEEE International Conference on Field Programmable Technology. Taipei: IEEE, Dec. 2008, pp. 73–80. isbn: 978-1-4244-3783-2. doi: 10.1109/FPT.2008.4762368.

[139]

M. J. Holler. “Forming Coalitions and Measuring Voting Power”. In: Political Studies 30.2 (1982), pp. 262–271. doi: 10.1111/j.146 7-9248.1982.tb00537.x.

[140]

H. H. Hoos and T. St¨ utzle. “Empirical analysis of randomized algorithms”. In: Handbook of Approximation algorithms and metaheuristics. Ed. by T. F. Gonzalez. Boca Raton, FL, USA: Chapman & Hall/CRC, 2007, pp. 14.1–14.17. isbn: 978-0262633246.

[141]

H. H. Hoos and T. St¨ utzle. “SATLIB: An Online Resource for Research on SAT”. In: SAT 2000. Ed. by I. P.Gent, H. v. Maaren, and T. Wals. IOS Press, 2000, pp. 283–292. url: www.satlib.org.

[142]

J. E. Hopcroft, R. Motwani, and J. D. Ullman. Introduction to Automata, Theory, Languages and Computation. 3rd ed. AddisonWesley, 2006. isbn: 978-0321462251.

[143]

W. Horner. A New Method of Solving Numerical Equations of All Orders by Continuous Approximation. Dover reprint, 2 vols 1959. W. Bulmer & Co, 1819. url: http://books.google.nl/books?id =2fLpGwAACAAJ.

[144]

W. Hung et al. “Optimal synthesis of multiple output Boolean functions using a set of quantum gates by symbolic reachability analysis”. In: Transactions on Computer Aided Design 25.9 (2006), pp. 1652–1663. issn: 0278-0070. doi: 10.1109/TCAD.2005.858352.

[145]

S. H. Hurst. “The Haar Transform in Digital Network Synthesis”. In: Proceedings of the IEEE 11th International Symposium on Multiple Valued Logic. ISMVL (1981), pp. 10–18.

398

Bibliography

[146]

S. L. Hurst. The Logical Processing of Digital Signals. New York, USA: Crane Russak & Co., 1979. isbn: 978-0844809076.

[147]

S. L. Hurst, D. M. Miller, and J. C. Muzio. Spectral Techniques in Digital Logic. Orlando, FL, USA: Academic Press, 1985. isbn: 978-0123626806.

[148]

M. Idel and M. M. Wolf. “Sinkhorn normal form for unitary matrices”. In: Linear Algebra and its Applications (2015). arXiv:mathph:1408.5728 [math-ph], pp. 76–84. doi: 10.1016/j.laa.2014.12 .031.

[149]

IEEE Design Automation Standards Committee. IEEE Standard Multi Valued Logic System for VHDL Model Interoperability. IEEE Standard 1076–1987. Piscataway, NJ, USA: IEEE Press, 1993. isbn: 0-7381-0991-6. doi: 10.1109/IEEESTD.1993.115571.

[150]

IEEE Design Automation Standards Committee. IEEE Standard Verilog Hardware Description Language. IEEE Standard 1364-2001. Piscataway, NJ, USA: IEEE, 2001. isbn: 0-7381-2826-0. doi: 10.11 09/IEEESTD.2001.93352.

[151]

S. Ishihara and S. Minato. “Manipulation of Regular Expressions under Length Constraints Using Zero-Suppressed-BDDs”. In: Proceedings of Asia South Pacific Design Automation Conference. ASPDAC. IEEE, 1995, pp. 391–396. isbn: 4-930813-67-0. doi: 10.1109 /ASPDAC.1995.486250.

[152]

J. Jegier and P. Kerntopf. “Progress Towards Constructing Sequences of Benchmarks for Quantum Boolean Circuits Synthesis”. In: Proceedings of the 14th IEEE International Conference on Nanotechnology. NANO. Toronto, Canada: IEEE Press, Aug. 2014, pp. 250–255. doi: 10.1109/NANO.2014.6967983.

[153]

J. Jegier, P. Kerntopf, and M. Szyprowski. “An Approach to Constructing Reversible Multi-Qubit Benchmarks with Provably Minimal Implementations”. In: Proceedings of the 13th IEEE International Conference on Nanotechnology. NANO. Beijing, China: IEEE, Aug. 2013, pp. 99–104. isbn: 978-1-4799-0675-8. doi: 10.1109/NAN O.2013.6721041.

[154]

Y. Jiang et al. “Evaluating the Output Probabilities of Boolean Functions Without Floating Point Operations”. In: Proceedings of IEEE Canadian Conference on Electrical and Computer Engineering. Edmonton, Alberta, Canada: IEEE, 1999, pp. 433–437. isbn: 0-7803-5579-2. doi: 10.1109/CCECE.1999.807237.

Bibliography

399

[155]

S. P. Jordan. “Strong equivalence of reversible circuits is coNPcomplete”. In: Quantum Information & Computation 14.15–16 (Nov. 2014), pp. 1302–1307. issn: 1533-7146. url: http://www.rin tonpress.com/xxqic14/qic-14-1516/1302-1307.pdf.

[156]

P. Junod. “On the Complexity of Matsui’s Attack”. In: Selected Areas in Cryptography, 8th Annual International Workshop, SAC 2001 Toronto, Ontario, Canada, August 16–17, 2001 Revised Papers. Ed. by S. Vaudenay and A. M. Yousse. Vol. 2259. Lecture Notes in Computer Science. Berlin Heidelberg: Springer, 2001, pp. 199–211. isbn: 978-3-540-43066-7. doi: 10.1007/3-540-45537-X_16.

[157]

Y. Kaneta et al. “Dynamic Reconfigurable Bit-Parallel Architecture for Large-Scale Regular Expression Matching”. In: Proceedings of 2010 IEEE International Conference on Field Programmable Technology. Dec. 2010, pp. 21–28.

[158]

M. Karpovsky, K. J. Kulikowski, and A. Taubin. “Robust Codes and Robust, Fault Tolerant Architectures of the Advanced Encryption Standard”. In: Journal of System Architecture 53.2-3 (2007), pp. 138–149.

[159]

M. G. Karpovsky. Finite Orthogonal Series in the Design of Digital Circuits. New York, NY, USA: John Wiley, 1976. isbn: 0470150157.

[160]

M. G. Karpovsky, R. S. Stanković, and J. T. Astola. Spectral Logic and Its Applications for The Design of Digital Devices. New York, USA: John Wiley, 2008. isbn: 978-0-471-73188-7.

[161]

Y. Kawanaka, S. Wakabayashi, and S. Nagayama. “A Fast Regular Expression Matching Engine for an FPGA-Based Network Intrusion Detection System”. In: Proceedings of 15th Workshop on Synthesis and System Integration of Mixed Information Technologies. SASIMI. 2009, pp. 88–93.

[162]

Y. Kawanaka, S. Wakabayashi, and S. Nagayama. “A Systolic Regular Expression Pattern Matching Engine and Its Application to Network Intrusion Detection”. In: Proceedings of 2008 IEEE International Conference on Field Programmable Technology. Taipei: IEEE, 2008, pp. 297–300. isbn: 978-1-4244-3783-2. doi: 10.1109/F PT.2008.4762402.

[163]

Y. Kawanaka, S. Wakabayashi, and S. Nagayama. “A Systolic String Matching Algorithm for High-Speed Recognition of a Restricted Regular Set”. In: Proceedings of International Conference on Engineering of Reconfigurable Systems and Algorithms. ERSA. 2009, pp. 151–157.

400

Bibliography

[164]

Y. Kawanaka, S. Wakabayashi, and S. Nagayama. “Design and FPGA Implementation of a High-Speed String Matching Engine”. In: Proceedings of 14th Workshop on Synthesis and System Integration of Mixed Information Technologies. SASIMI. 2007, pp. 122– 129.

[165]

P. Kerntopf. “Some Remarks on Reversible Logic Synthesis”. In: Proceedings of the 8th International Reed-Muller Workshop. RM. Oslo, Norway: University of Oslo, May 2007, pp. 51–57.

[166]

P. Kerntopf and M. Szyprowski. “On Some Properties of Reversible Boolean Functions”. In: Proceedings of the 8th International Workshop on Boolean Problems. IWSBP. Freiberg, Germany: Freiberg University of Mining and Technology, Sep. 2008, pp. 25–32. isbn: 978-3-86012-346-1.

[167]

P. Kerntopf and M. Szyprowski. “Symmetry in reversible functions and circuits”. In: Proceedings of the 20th International Workshop on Logic and Synthesis. IWLS. San Diego, USA: University of California, San Diego, June 2011, pp. 67–73.

[168]

F. Kerschbaum et al. “Secure Collaborative Supply-Chain Management”. In: Computer 44.9 (2011), pp. 38–43. doi: 10.1109/MC.201 1.224.

[169]

Khronos Group. Khronos OpenCL Registry. 2015. url: https://ww w.khronos.org/registry/cl/ (visited on 05/11/2015).

[170]

B.-G. Kim and D. L. Dietmeyer. “Multilevel logic synthesis of symmetric switching functions”. In: IEEE Transactions on ComputerAided Design 10.4 (1991), pp. 436–446. issn: 0278-0070. doi: 10.1 109/43.75627.

[171]

D. B. Kirk and W.-M. W. Hwu. Programming Massively Parallel Processors. Burlington, Massachusetts, USA: Morgan Kaufmann, 2012. isbn: 978-012-41-5992-1.

[172]

D. B. Kirk and W.-M. W. Hwu. Programming Massively Parallel Processors: A Hands-on Approach. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2010. isbn: 978-0-12381-472-2.

[173]

S. Kirkpatrick, C. D. Gelatt Jr., and M. P. Vecchi. “Optimization by Simulated Annealing”. In: Science 220.4598 (1983), pp. 671–680.

[174]

T. Kløve, P. Oprisan, and B. Bose. “The Probability of Undetected Error for a Class of Asymmetric Error Detecting Codes”. In: IEEE Transactions on Information Theory 51.3 (2005), pp. 1202–1205.

Bibliography

401

[175]

D. E. Knuth. The Art of Computer Programming, Volume 2 (3rd Ed.): Seminumerical Algorithms. Boston, MA, USA: AddisonWesley Longman Publishing Co., Inc., 1997. isbn: 0-201-89684-2.

[176]

L. Kocsis and C. Szepesv´ ari. “Bandit based Monte-Carlo Planning”. In: ECML-06. LNCS 4212. Berlin, Heidelberg: Springer, 2006, pp. 282–293.

[177]

L. Kocsis and C. Szepesv´ ari. “Discounted UCB. Video Lecture”. Slides: Lectures of PASCAL Second Challenges Workshop 2006. 2006. url: https : / / www . lri . fr / ~sebag / Slides / Venice / Kocs is.pdf.

[178]

V. Kolesnikov and T. Schneider. “Improved Garbled Circuit: Free XOR Gates and Applications”. In: Automata, Languages and Programming. Vol. 5126. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, 2008, pp. 486–498. isbn: 978-3-540-70582-6. doi: 10.1007/978-3-540-70583-3_40.

[179]

Y. Komamiya. “Theory of relais networks for the transformation between the decimal and binary systems”. In: Bulletin of Electrotechnical Labors 15.8 (1951), pp. 188–197.

[180]

B. Korte and J. Vygen. Combinatorial Optimization: Theory and Algorithms. Berlin Heidelberg: Springer, 2012. isbn: 978-3-642-244872.

[181]

V. N. Kravets. “Constructive Multi-Level Synthesis by Way of Functional Properties”. PhD thesis. University of Michigan, MI, USA, 2001.

[182]

J. Kuipers, T. Ueda, and J. Vermaseren. Code Optimization in FORM. 2013. arXiv: 1310 . 7007 [cs.SC]. url: http : / / arxiv . o rg/abs/1310.7007.

[183]

J. Kuipers et al. “FORM version 4.0”. In: CoRR abs/1203.6543 (2012). url: http://arxiv.org/abs/1203.6543.

[184]

J. Kuipers et al. “Improving multivariate Horner schemes with Monte Carlo Tree Search”. In: Computer Physics Communications 184.11 (2013), pp. 2391–2395. doi: 10.1016/j.cpc.2013.05.008. url: http://www.sciencedirect.com/science/article/pii/S00 10465513001689.

[185]

P. V. Kumar, R. A. Scholtz, and L. R. Welch. “Generalized bent functions and their properties”. In: Journal of Combinatorial Theory. A 40.1 (1985), pp. 90–107. doi: 10.1016/0097-3165(85)90049 -4.

402

Bibliography

[186]

S. Kumar et al. “Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection”. In: Proceedings of the 2006 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications. SIGCOMM’06. Pisa, Italy: ACM, Sept. 2006, pp. 339–350. isbn: 1-59593-308-5. doi: 10 .1145/1159913.1159952. url: http://doi.acm.org/10.1145/115 1659.1159952.

[187]

H. T. Kung. “Why Systolic Architectures?” In: Computer. Vol. 15. 1. IEEE, 1982, pp. 37–45. doi: 10.1109/MC.1982.1653825.

[188]

R. Kurai et al. “Fast Regular Expression Matching Based on Dual Glushkov NFA”. In: Proceedings of the Prague Stringology Conference 2014. Sept. 2014, pp. 3–16. isbn: 978-80-01-05547-2.

[189]

Y.-T. Lai, S. Sastry, and M. Pedram. “Boolean matching using binary decision diagrams with applications to logic synthesis and verification”. In: Proceedings of the International Conference on Computer-Aided Design. Santa-Clara, CA, USA, 1992, pp. 452–458. isbn: 0-8186-3110-4. doi: 10.1109/ICCD.1992.276313.

[190]

R. Landauer. “Irreversibility and Heat Generation in the Computing Process”. In: IBM Journal of Research and Development 5.3 (1961), pp. 183–191. issn: 0018-8646. doi: 10.1147/rd.53.0183.

[191]

R. J. Lechner. “Harmonic analysis of switching functions”. In: Recent Developments in Switching Theory. Ed. by A. Mukhopadhyay. New York, USA: Academic Press, 1971, pp. 121–228. isbn: 978-012-509850-2.

[192]

C.-S. Lee et al. “The Computational Intelligence of MoGo Revealed in Taiwan’s Computer Go Tournaments”. In: IEEE Transactions on Computational Intelligence and AI in Games 1.1 (2009), pp. 73– 89. issn: 1943-068X. doi: 10 . 1109 / TCIAIG . 2009 . 2018703. url: http://dx.doi.org/10.1109/TCIAIG.2009.2018703.

[193]

D. L. Lee and F. H. Lochovsky. “HYTREM — A Hybrid TextRetrieval Machine for Large Databases”. In: IEEE Transactions on Computers. Vol. 39. 1. IEEE, 1990, pp. 111–123. doi: 10.1109/12 .46285.

[194]

C. E. Leiserson et al. “Efficient Evaluation of Large Polynomials”. In: Mathematical Software – ICMS 2010. Vol. 6327. Lecture Notes in Computer Science. Kobe, Japan: Springer, 2010, pp. 342–353. isbn: 978-3-642-15581-9. doi: 10.1007/978- 3- 642- 15582- 6_55. url: http://dl.acm.org/citation.cfm?id=1888390.1888464.

Bibliography

403

[195]

B. Leoniuk. “ROBDD-based Implementation of Relation Algebra with Applications (in German)”. PhD thesis. University of Kiel, 2001.

[196]

P. M. Lewis and C. L. Coates. Threshold Logic. 1st ed. New York, USA: John Wiley & Sons Inc., 1967. isbn: 978-0471532507.

[197]

M. Luis and C. Moraga. “On functions with flat Chrestenson spectra”. In: Proceedings of the 19th IEEE International Symposium on Multiple-Valued Logic. Guangzhou, China: IEEE-CS-Press, 1989, pp. 406–413. isbn: 978-0818689475.

[198]

F. Mailhot and G. De Micheli. “Technology Mapping Using Boolean Matching and Don’t Care Sets”. In: Proceedings of the European Design Automation Conference. EDAC. Glasgow, UK: IEEE, 1990, pp. 212–216. isbn: 0-8186-2024-2. doi: 10.1109/EDAC.1990.136647.

[199]

J. A. Maiorana. “A Class of Bent Functions”. In: R41 Technical Paper (1971).

[200]

D. Malkhi et al. “Fairplay - Secure Two-Party Computation System”. In: USENIX Security Symposium. 2004, pp. 287–302.

[201]

T. W. Manikas, M. A. Thornton, and D. Y. Feinstein. “Modeling System Threat Probabilities Using Mixed-Radix Multiple-Valued Logic Decision Diagrams”. In: Journal of Multiple-Valued Logic and Soft Computing 24.1-4 (2015), pp. 93–108. issn: 1542-3980.

[202]

D. Maslov. “Reversible Logic Synthesis”. PhD thesis. Fredricton, NB, Canada, 2003.

[203]

D. Maslov. Reversible Logic Synthesis Benchmarks Page. url: htt p://webhome.cs.uvic.ca/~dmaslov/.

[204]

D. Maslov, G. W. Dueck, and M. D. Miller. “Toffoli Network Synthesis with Templates”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 24.6 (June 2005), pp. 807– 817. issn: 0278-0070. doi: 10.1109/TCAD.2005.847911.

[205]

M. E. McCay, J. T. Butler, and P. St˘ anic˘ a. “Computing Algebraic Immunity by Reconfigurable Computer”. In: Boolean Problems 10th International Workshop. Ed. by B. Steinbach. Vol. 10. Proceedings of the International Workshop on Boolean Problems. Freiberg University of Mining and Technology, 2012, pp. 225–232. isbn: 978-386012-438-3.

[206]

K. McElvain. LGSynth93 benchmark set: Version 4.0. distributed as part of the IWLS’93 benchmark distribution. 1993.

404

Bibliography

[207]

G. J. McLachlan. “On Bootstrapping the Likelihood Ratio Test Stastistic for the Number of Components in a Normal Mixture”. In: Journal of the Royal Statistical Society 36.3 (1987), pp. 318– 324. doi: 10.2307/2347790.

[208]

C. Mead and L. Conway. Introduction to VLSI Systems. AddisonWesley, 1979. isbn: 978-0201043587.

[209]

C. Meinel and T. Theobald. Algorithms and Data Structures in VLSI Design: OBDD – Foundations and Applications. Springer, 1998. isbn: 978-3540644866.

[210]

R. D. Merrill. “Some properties of ternary threshold logic”. In: IEEE Transactions on Electronic Computers EC-13.5 (1964), pp. 632–635. issn: 0367-7508. doi: 10.1109/PGEC.1964.263747.

[211]

P. Messmer. CUDA Workshop Dresden - GPU Computing Intro. 2013. url: http : / / ccoe - dresden . de / ?p = 445 (visited on 12/08/2013).

[212]

U. Milanese. “On the Implementation of a ROBDD-Based Tool for the Manipulation and Visualization of Relations (in German)”. PhD thesis. University of Kiel, 2003.

[213]

M. D. Miller. “Spectral and Two-Place Decomposition Techniques in Reversible Logic”. In: Proceedings of the 45th Midwest Symposium on Circuits and Systems. Vol. 2. MWSCAS. Tulsa, OK, USA: IEEE, Aug. 2002, pp. 493–496. isbn: 0-7803-7523-8. doi: 10.1109/MWSCAS .2002.1186906.

[214]

S. Minato. “Minimum-Width Method of Variable Ordering for Binary Decision Diagrams”. In: IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E75.A3 (1992), pp. 392–399. url: http://hdl.handle.net/2115/47490.

[215]

S. Minato. “Zero-suppressed BDDs for Set Manipulation in Combinatorial Problems”. In: Proceedings of the 30th International Design Automation Conference. DAC ’93. Dallas, Texas, USA: ACM, 1993. isbn: 0-89791-577-1. doi: 10.1145/157485.164890.

[216]

S. Minato, N. Ishiura, and S. Yajima. “Shared Binary Decision Diagram with Attributed Edges for Efficient Boolean Function Manipulation”. In: Proceedings of the 1990 IEEE/ACM Design Automation Conference. DAC. IEEE, 1990. isbn: 0-89791-363-9. doi: 10.1109 /DAC.1990.114828.

Bibliography

405

[217]

S. Minato and T. Uno. “Frequentness-Transition Queries for Distinctive Pattern Mining from Time-Segmented Databases”. In: Proceedings of 10th SIAM International Conference on Data Mining. SDM. Apr. 2010, pp. 339–349. url: http://hdl.handle.net/2115 /47334.

[218]

J. D. V. Mir´ o. Fast clustering Expectation Maximization algorithm for Gaussian Mixture Models. Tech. rep. url: https://github.com /juandavm/em4gmm.

[219]

A. Mishchenko. An Introduction to Zero-Suppressed Binary Decision Diagrams. Tech. rep. Portland State University, 2001. url: http: //www.eecs.berkeley.edu/ ~alanmi/publications/2001/tech01 _zdd_.pdf.

[220]

A. Mishchenko. “Fast Computation of Symmetries in Boolean Functions”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 22.11 (2003), pp. 1588–1593. issn: 02780070. doi: 10.1109/TCAD.2003.818371.

[221]

A. Mishchenko and M. Perkowski. “Fast Heuristic Minimization of Exclusive-Sums-of-Products”. In: 5th International Workshop on Applications of the Reed Muller Expansion in Circuit Design. RM. 2001.

[222]

A. Mishchenko et al. “Scalable Don’t-Care-Based Logic Optimization and Resynthesis”. In: ACM Transactions on Reconfigurable Technology and Systems 4.4 (Dec. 2011), 34:1–34:23. issn: 19367406. doi: 10.1145/2068716.2068720.

[223]

A. Mitra, W. Najjar, and L. Bhuyan. “Compiling PCRE to FPGA for Accelerating SNORT IDS”. In: Proceedings of 2007 ACM/IEEE Symposium on Architecture for Networking and Communications Systems. New York, NY, USA: ACM, Dec. 2007, pp. 127–136. isbn: 978-1-59593-945-6. doi: 10.1145/1323548.1323571.

[224]

E. H. Moore. “On the Reciprocal of the General Algebraic Matrix”. In: Bulletin of the American Mathematical Society 26.9 (1920), pp. 394–395.

[225]

C. Moraga. “A monograph on ternary threshold logic”. In: Computer Science and Multiple-valued Logic. Theory and Applications. Ed. by D. C. Rine. Amsterdam, Netherlands: North Holland, 1977, pp. 355–394. isbn: 978-0444868824.

406

Bibliography

[226]

C. Moraga. “Permutations under Spectral Transforms”. In: Proceedings of the 38th IEEE International Symposium on Multiple-Valued Logic. ISMVL. Dallas, Texas: IEEE-CS-Press, 2008, pp. 76–81. isbn: 978-0-7695-3155-7. doi: 10.1109/ISMVL.2008.16.

[227]

C. Moraga. “Spectral characterization of ternary threshold functions”. In: Electronics Letters 15.22 (1979), pp. 712–713. issn: 00135194. doi: 10.1049/el:19790506.

[228]

C. Moraga. “Spectral Techniques: The first decade of the XXI Century”. In: Proceedings of the 40th IEEE International Symposium on Multiple-valued Logic. ISMVL. Barcelona, Spain: IEEE, 2010, pp. 3–8. isbn: 978-0-7695-4024-5. doi: 10.1109/ISMVL.2010.10.

[229]

C. Moraga, M. Stanković, and R. S. Stanković. Some Properties of Ternary Functions with Flat Reed-Muller Spectra. Tech. rep. Research Report FSC-2014-04. Asturias, Spain: European Center for Soft Computing, 2014.

[230]

C. Moraga et al. “Contribution to the Study of Multiple-Valued Bent Functions”. In: Proceedings of the 43rd IEEE International Symposium on Multiple-Valued Logic. ISMVL. Toyama, Japan: IEEE, 2013, pp. 340–245. isbn: 978-0-7695-4976-7. doi: 10 . 1109 /ISMVL.2013.21.

[231]

C. Moraga et al. “On the Maiorana Method to Generate MultipleValued Bent Functions, Revisited”. In: Proceedings of the 44th IEEE International Symposium on Multiple-Valued Logic. ISMVL. Bremen, Germany: IEEE-CS-Press, 2014, pp. 19–24. isbn: 978-1-47993534-5. doi: 10.1109/ISMVL.2014.12.

[232]

B. Moret. “Towards a discipline of experimental algorithmics”. In: Data Structures, Near Neighbor Searches, and Methodology: Fifth and sixth DIMACS implementation challenges. Ed. by M. H. Goldwasser, D. S. Johnson, and C. C. McGeoch. Vol. 59. AMS, 2002, pp. 197–213.

[233]

A. Mukhopadhyay. “Hardware Algorithms for Nonnumeric Computation”. In: IEEE Transactions on Computers C-28.6 (1979), pp. 384–394. issn: 0018-9340. doi: 10.1109/TC.1979.1675378.

[234]

S. Nagayama. “Efficient Regular Expression Matching Method Using ZBDDs”. In: Reed-Muller Workshop 2013. RM. Toyama, Japan, May 2013, pp. 48–54.

Bibliography

407

[235]

S. Nagayama et al. “On Optimizations of Edge-Valued MDDs for Fast Analysis of Multi-State Systems”. In: IEICE Transactions of Information & Systems E97-D.9 (2014), pp. 2234–2242. issn: 09168532. doi: 10.1587/transinf.2013LOP0011.

[236]

H. Nakahara, T. Sasao, and M. Matsuura. “A Regular Expression Matching Circuit Based on a Decomposed Automaton”. In: Reconfigurable Computing: Architectures, Tools and Applications. Lecture Notes in Computer Science. Berlin Heidelberg: Springer, 2011, pp. 16–28. isbn: 978-3-642-19474-0. doi: 10.1007/978-3-642-1947 5-7_4.

[237]

G. Navarro and M. Raffinot. Flexible Pattern Matching in Strings. Cambridge University Press, 2002. isbn: 0-521-81307-7.

[238]

J. von Neumann. Mathematical Foundations of Quantum Mechanics. Princeton, NJ, USA: Princeton University Press, 1932.

[239]

M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information. Cambridge University Press, 2000. isbn: 9780521635035. url: http : / / www . bibsonomy . org / bibtex / 222b f6f3de23faf420214d738924ac21b/mcclung.

[240]

Nvidia. NVIDIAs Next Generation CUDA Compute Architecture: Fermi. 2010. url: http://www.nvidia.com/content/PDF/fermi_w hite_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper .pdf (visited on 12/08/2013).

[241]

K. Nyberg. “S-boxes and Round Functions with Controllable Linearity and Differential Uniformity”. In: Fast Software Encryption. Ed. by B. Preneel. Vol. 1008. Lecture Notes in Computer Science. Berlin Heidelberg: Springer, 1995, pp. 111–130. isbn: 978-3-54060590-4. doi: 10.1007/3-540-60590-8_9.

[242]

K. P. Parker and E. J. McCluskey. “Probabilistic Treatment of General Combinational Networks”. In: IEEE Transactions on Computers vol. c-24 (June 1975), pp. 668–670.

[243]

M. Pedram. “Power Minimization in IC Design: Principles and Applications”. In: ACM Transactions on Design Automation of Electronic Systems. TODAES 1.1 (1996), pp. 3–56. doi: 10.1145/2258 71.225877.

[244]

J. Peng, Q. Wu, and H. Kan. “On Symmetric Boolean Functions With High Algebraic Immunity on Even Number of Variables”. In: IEEE Transactions on Information Theory 57.10 (2011), pp. 7205– 7220. issn: 0018-9448. doi: 10.1109/TCAD.2003.818371.

408

Bibliography

[245]

B. Pinkas et al. “Secure Two-Party Computation Is Practical”. In: Advances in Cryptology – ASIACRYPT 2009. Vol. 5912. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, 2009, pp. 250–267. isbn: 978-3-642-10365-0. doi: 10.1007/978-3-642-10 366-7_15.

[246]

C. Posthoff and B. Steinbach. Boolean Equations - Algorithms and Programs. In German: Bin¨ are Gleichungen - Algorithmen und Programme. TH Karl-Marx-Stadt, 1979.

[247]

C. Posthoff and B. Steinbach. Logic Functions and Equations - Binary Models for Computer Science. Dordrecht, The Netherlands: Springer, 2004. isbn: 978-1-4020-2937-0.

[248]

C. Posthoff and B. Steinbach. “Sudoku Solutions Using Logic Equations”. In: Boolean Problems, Proceedings of the 8th International Workshops on Boolean Problems. Ed. by B. Steinbach. IWSBP. Freiberg, Germany, 2008, pp. 49–57. isbn: 978-3-86012-346-1.

[249]

C. Posthoff and B. Steinbach. “The Solution of Discrete Constraint Problems Using Boolean Models”. In: Proceedings of the 2nd International Conference on Agents and Artificial Intelligence. Valencia, Spain: INSTICC – Institute for Systems, Technologies of Information, Control, and Communication, 2010, pp. 487–493. isbn: 978989-674-021-4.

[250]

C. Posthoff and B. Steinbach. “The Solution of SAT Problems Using Ternary Vectors and Parallel Processing”. In: International Journal of Electronics and Telecommunications (JET) 57.3 (2011), pp. 233– 249. issn: 0867-6747.

[251]

A. K. Prasad et al. “Data Structures and Algorithms for Simplifying Reversible Circuits”. In: ACM Journal on Emerging Technologies in Computing Systems 2.4 (Oct. 2006), pp. 277–293. issn: 1550-4832. doi: 10.1145/1216396.1216399.

[252]

W. H. Press et al. “Gaussian Mixture Models and k-Means Clustering”. In: Numerical Recipes: The Art of Scientific Computing. 3rd ed. New York: Cambridge University Press, 2007. isbn: 978-0521-88068-8.

[253]

A. Previti et al. “Monte-Carlo Style UCT Search for Boolean Satisfiability”. In: AI*IA 2011: Artificial Intelligence Around Man and Beyond. Ed. by R. Pirrone and F. Sorbello. Vol. 6934. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, 2011, pp. 177– 188. isbn: 978-3-642-23953-3. doi: 10.1007/978-3-642-23954-0_1 8. url: http://dx.doi.org/10.1007/978-3-642-23954-0_18.

Bibliography

409

[254]

A. Puggelli et al. “Are Logic Synthesis Tools Robust?” In: Proceedings of the 2011 48th ACM/EDAC/IEEE Design Automation Conference. DAC. IEEE, 2011, pp. 633–638. isbn: 978-1-4503-06362.

[255]

M. M. Rahman and G. W. Dueck. “Optimal Quantum Circuits of Three Qubits”. In: 42nd IEEE International Symposium on Multiple-Valued Logic. ISMVL. Victoria, BC, Canada: IEEE, May 2012, pp. 161–166. isbn: 978-1-4673-0908-0. doi: 10.1109/ISMVL.2 012.43.

[256]

M. M. Rahman and G. W. Dueck. “Synthesis of Linear Nearest Neighbor Quantum Circuits”. In: 10th International Workshop on Boolean Problems. Ed. by B. Steinbach. IWSBP. 2012, pp. 277–284. isbn: 978-3-86012-488-8.

[257]

M. M. Rahman and G. W. Dueck. “Template Matching in Quantum Circuits Optimization”. In: Proceedings Reed-Muller Workshop. RM. Toyama, Japan, 2013, pp. 75–79.

[258]

M. M. Rahman, G. W. Dueck, and J. D. Horton. “An Algorithm for Quantum Template Matching”. In: Journal on Emerging Technologies in Computing Systems 11.3 (Dec. 2014), 31:1–31:20. issn: 1550-4832. doi: 10.1145/2629537.

[259]

M. M. Rahman et al. “Two-Qubit Quantum Gates to Reduce the Quantum Cost of Reversible Circuit”. In: Proceedings of the 2011 41st IEEE International Symposium on Multiple-Valued Logic. ISMVL. 2011, pp. 86–92. isbn: 978-1-4577-0112-2. doi: 10.1109/ISM VL.2011.56.

[260]

RelView home page. url: http://www.informatik.uni-kiel.de /~progsys/relview.

[261]

R. L. Rivest, A. Shamir, and L. Adleman. “A method for obtaining digital signatures and public-key cryptosystems”. In: Communications of the ACM 21.2 (1978), pp. 120–126. doi: 10.1145/359340 .359342.

[262]

H.-C. Roan, W.-J. Hwang, and C.-T. Dan Lo. “Shift-Or Circuit for Efficient Network Intrusion Detection Pattern Matching”. In: Proceedings of International Conference on Field Programmable Logic and Applications. FPL. Madrid, Spain: IEEE, 2006, pp. 785–790. isbn: 1-4244-0312-X. doi: 10.1109/FPL.2006.311314.

[263]

O. S. Rothaus. “On ‘Bent’ Functions”. In: Journal of Combinatorial Theory. A 20.3 (1976), pp. 300–305. doi: 10.1016/0097-3165(76)9 0024-8.

410

Bibliography

[264]

R. Rudell. “Dynamic Variable Ordering for Ordered Binary Decision Diagrams”. In: 1993 IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers. ICCAD. Santa Clara, CA, USA: IEEE, Nov. 1993, pp. 42–47. isbn: 0-8186-4490-7. doi: 10.1109/ICCAD.1993.580029.

[265]

R. Rudell and A. L. Sangiovanni-Vincintelli. “Espresso-MV: Algorithms for Multiple-Valued Logic Minimization”. In: Proceedings of the IEEE Custom Integrated Circuits Conference. CICC-85. Portland, Oregon, May 1985, pp. 230–234.

[266]

R. A. Rueppel. Analysis and Design of Stream Ciphers. Berlin Heidelberg: Springer, 1986. isbn: 978-3-642-82867-6. doi: 10.1007/978 -3-642-82865-2.

[267]

B. Ruijl et al. “Combining Simulated Annealing and Monte Carlo Tree Search for Expression Simplification”. In: Proceedings of the 6th International Conference on Agents and Artificial Intelligence. Vol. 1. 2014, pp. 724–731. isbn: 978-989-758-015-4. arXiv: 1312.08 41 [cs.SC]. url: http://arxiv.org/abs/1312.0841.

[268]

M. Saeedi and I. L. Markov. “Synthesis and Optimization of Reversible Circuits – A Survey”. In: ACM Computing Surveys 45.2 (Feb. 2013), 21:1–21:34. issn: 0360-0300.

[269]

M. Saeedi et al. Reversible Logic Synthesis Benchmarks. url: http ://ceit.aut.ac.ir/QDA/benchmarks.

[270]

Y. SangKyun and L. KyuHee. “Optimization of Regular Expression Pattern Matching Circuit Using at-Most Two-Hot Encoding on FPGA”. In: Proceedings of 2010 IEEE International Conference on Field Programmable Logic and Applications. FPL. Milano, Italy: IEEE, Sept. 2010, pp. 40–43. isbn: 978-1-4244-7842-2. doi: 10.110 9/FPL.2010.19.

[271]

D. Sanoy. WOW64 - A Comprehensive Reference. 2013. url: http s://www.khronos.org/registry/cl/ (visited on 05/11/2015).

[272]

N. Saraf and K. Bazargan. “Sequential Logic to Transform Probabilities”. In: Proceedings of IEEE/ACM International Conference on Computer-Aided Design. San Jose, CA, USA: IEEE, 2013, pp. 732– 738. doi: 10.1109/ICCAD.2013.6691196.

[273]

Z. Sasanian and D. Miller. “Transforming MCT circuits to NCVW circuits”. In: Reversible Computation. Vol. 7165. Lecture Notes in Computer Science. Berlin Heidelberg: Springer, 2012, pp. 77–88. isbn: 978-3-642-29516-4. doi: 10.1007/978-3-642-29517-1_7.

Bibliography

411

[274]

T. Sasao. “An Application of 16-Valued Logic to Design of Reconfigurable Logic Arrays”. In: Proceedings of 37th International Symposium Multiple-Valued Logic. ISMVL. Oslo, Norway: IEEE, 2007, pp. 1–40. isbn: 0-7695-2831-7. doi: 10.1109/ISMVL.2007.7.

[275]

T. Sasao. “AND-EXOR Expressions and Their Optimization”. In: Logic Synthesis and Optimization. Ed. by T. Sasao. Kluwer Academic Publisher, 1993, pp. 287–312. isbn: 978-1-4615-3154-8.

[276]

T. Sasao. “Representations of logic functions using EXOR operators”. In: Representations of Discrete Functions. Ed. by T. Sasao and M. Fujita. Springer US, 1996, pp. 29–54. isbn: 978-1-4612-85991. doi: 10.1007/978-1-4613-1385-4_2.

[277]

T. Sasao. Switching Theory for Logic Synthesis. Kluwer Academic Publishers, 1999. isbn: 0-7923-8456-3.

[278]

T. Sasao and J. T. Butler. “The Eigenfunction of the Reed-Muller Transformation”. In: Proceedings of Reed-Muller Workshop. RM. Oslo, Norway, 2007.

[279]

T. Sasao and M. Fujita, eds. Representations of Discrete Functions. Dordrecht, the Netherlands: Kluwer Academic Publishers, 1996. isbn: 978-1-4612-8599-1. doi: 10.1007/978-1-4613-1385-4.

[280]

H. Savoj. “Don’t Cares in Multi-level Network Optimization”. PhD thesis. EECS Department, University of California, Berkeley, 1992. url: http://www.eecs.berkeley.edu/Pubs/TechRpts/1992/2205 .html.

[281]

B. Schaeffer et al. “Synthesis of Reversible Functions Based on Products of Exclusive Or Sums”. In: Proceedings of the 43th IEEE International Symposium on Multiple-Valued Logic. ISMVL. Toyama, Japan: IEEE, May 2013, pp. 35–40. isbn: 978-1-4673-6067-8. doi: 10.1109/ISMVL.2013.54.

[282]

G. Schmidt. Relational Mathematics. Encyclopedia of Mathematics and its Applications. Cambridge University Press, 2010. isbn: 9780521762687.

[283]

G. Schmidt and T. Str¨ ohlein. Relations and Graphs: Discrete Mathematics for Computer Scientists. EATCS Monographs on Theoretical Computer Science. Berlin Heidelberg: Springer, 1993. isbn: 9783642779701.

[284]

S. Schober and M. Bossert. “Boolean Functions with Noisy Inputs”. In: Proceedings of IEEE International Symposium on Information Theory. ISIT. Toronto, ON, Canada, 2008, pp. 2347–2350. isbn: 978-1-4244-2256-2. doi: 10.1109/ISIT.2008.4595410.

412

Bibliography

[285]

C. Scholl et al. “BDD minimization using symmetries”. In: IEEE Transactions on Computer-Aided Design 18.2 (1999), pp. 81–100. issn: 0278-0070. doi: 10.1109/43.743706.

[286]

B. Schwarzkopf. “Without Cover on a Rectangular Chessboard.” In: Feenschach (1990). German title: “Ohne Deckung auf dem Rechteck”, pp. 272–275.

[287]

J. Seberry, X.-M. Zhang, and Y. Zheng. “Nonlinearity and Propagation Characteristics of Balanced Boolean Functions”. In: Information and Computation 119.1 (1995), pp. 1–13.

[288]

P. Selinger. “Efficient Clifford+T approximations of single-qubit operators”. In: Quantum Information & Computation 15.1-2 (2015), pp. 159–180. issn: 1533-7146. url: http://dl.acm.org/citation .cfm?id=2685188.2685198.

[289]

B. Selman, D. Mitchell, and H. Levesque. “Generating Hard Satisfiability Problems”. In: Artificial Intelligence 81 (1996), pp. 17– 29.

[290]

J. L. Shafer et al. “Enumeration of Bent Boolean Functions by Reconfigurable Computer”. In: 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines. Ed. by R. Sass and R. Tessier. FCCM. Charlotte, NC, USA: IEEE, 2010, pp. 265–272. isbn: 978-0-7695-4056-6. doi: 10.1109/FCCM.2010.48.

[291]

J. L. Shanks. “Computation of the Fast Walsh-Fourier Transform”. In: IEEE Transactions on Computers C-18.5 (1969), pp. 457–459. issn: 0018-9340. doi: 10.1109/T-C.1969.222685.

[292]

C. E. Shannon. “A Mathematical Theory of Communication”. In: Bell System Technical Journal 27 (1948), pp. 379–423.

[293]

C. E. Shannon. “A Symbolic Analysis of Relay and Switching Circuits”. MA thesis. Boston, MS, USA: Massachusetts Institute of Technology (MIT), 1937.

[294]

C. E. Shannon. “Communication Theory of Secrecy Systems”. In: Bell System Technical Journal 28.4 (1949), pp. 656–715.

[295]

V. V. Shende et al. “Synthesis of reversible logic circuits”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 22.6 (2003), pp. 710–722. issn: 0278-0070. doi: 10.1109/T CAD.2003.811448.

Bibliography

413

[296]

W. Shum and J. H. Anderson. “Analyzing and predicting the impact of CAD algorithm noise on FPGA speed performance and power”. In: Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays. FPGA. New York, NY, USA: ACM, 2012, pp. 107–110. isbn: 978-1-4503-1155-7. doi: 10 . 1145 /2145694.2145711.

[297]

R. Sidhu and V. K. Prasanna. “Fast Regular Expression Matching Using FPGAs”. In: The 9th Annual IEEE Symposium on FieldProgrammable Custom Computing Machines. FCCM. Rohnert Park, CA, USA: IEEE, Mar. 2001, pp. 227–238. isbn: 0-7695-2667-5.

[298]

R. Sinkhorn. “A Relationship Between Arbitrary Positive Matrices and Doubly Stochastic Matrices”. In: The Annals of Mathematical Statistics 35.2 (1964), pp. 876–879. doi: 10.1214/aoms/1177703591.

[299]

R. Smith et al. “Deflating the Big Bang: Fast and Scalable Deep Packet Inspection with Extended Finite Automata”. In: Proceedings of the ACM SIGCOMM 2008 conference on Data communication. New York, NY, USA: ACM, Aug. 2008, pp. 207–218. isbn: 978-160558-175-0. doi: 10.1145/1402958.1402983.

[300]

J. A. Smolin and D. P. DiVincenzo. “Five two-bit quantum gates are sufficient to implement the quantum Fredkin gate”. In: Physical Review A 53.4 (1996), pp. 2855–2856. doi: 10.1103/PhysRevA.53 .2855.

[301]

SNORT Network Intrusion Detection System. Sourcefire Inc. url: http://www.snort.org/.

[302]

M. Soeken, M. D. Miller, and R. Drechsler. “Quantum circuits employing roots of the Pauli matrices”. In: Physical Review A 88.042322 (2013). arXiv:1308.2493 [quant-ph], pp. 1–7. doi: 10.1 103/PhysRevA.88.042322.

[303]

M. Soeken et al. “Self-Inverse Functions and Palindromic Circuits”. In: Proceedings Reed-Muller Workshop 2015. RM. Waterloo, Canada, 2015, pp. 1–6.

[304]

M. Soeken et al. “Window Optimization of Reversible and Quantum Circuits”. In: Proceedings of the 13th IEEE International Symposium on Design and Diagnostics of Electronic Circuits and Systems. DDECS. Vienna, Austria: IEEE, Apr. 2010, pp. 341–345. isbn: 9781-4244-6613-9. doi: 10.1109/DDECS.2010.5491754.

[305]

F. Somenzi. CUDD Package, Release 2.5.0. Apr. 2014. url: http: //vlsi.colorado.edu/~fabio/CUDD/.

414

Bibliography

[306]

M. Stanisavljevic, A. Schmid, and Y. Leblebici. “Output Probability Density Functions of Logic Circuits: Modeling and Fault-Tolerance Evaluation”. In: Proceedings of 2010 18th IEEE/IFIP VLSI System on Chip Conference. Madrid, Spain: IEEE, 2010, pp. 328–334. isbn: 978-1-4244-6469-2. doi: 10.1109/VLSISOC.2010.5642682.

[307]

R. S. Stanković and J. T. Astola. Gibbs Derivatives - the First Forty Years. TICSP 39. Tampere, Finland: Tampere International Center for Signal Processing, 2008. isbn: 978-952-15-1973-4.

[308]

R. S. Stanković, J. T. Astola, and K. Egiazarian. “Remarks on Symmetric Binary and Multiple-Valued Logic Functions”. In: Proceedings of the 6th International Workshop on Boolean Problems. Ed. by B. Steinbach. IWSBP. Fraiberg, Grermany: TU Bergakademie Freiberg, 2004, pp. 83–87. isbn: 3-86012-233-9.

[309]

R. S. Stanković, J. T. Astola, and C. Moraga. Representation of Multiple-Valued Logic Functions. Princeton, USA: Morgan & Claypool Publishers, 2012. isbn: 978-1-60845-942-1. doi: 10.2200/S004 20ED1V01Y201205DCS037.

[310]

R. S. Stanković, C. Moraga, and J. T. Astola. Fourier Analysis on Finite Non-Abelian Groups with Applications in Signal Processing and System Design. New York, USA: Wiley-IEEE Press, 2005. isbn: 978-0-471-69463-2. doi: 10.1002/047174543X.

[311]

B. Steinbach, ed. 11th International Workshop on Boolean Problems. IWSBP. Freiberg, Germany, Sept. 2014. isbn: 978-3-86012488-8.

[312]

B. Steinbach. “Program System to Solve Boolean Equations”. In German: Programmsystem zur Behandlung bin¨ arer Gleichungen. Diploma. TH Karl-Marx-Stadt, 1977.

[313]

B. Steinbach, ed. Recent Progress in the Boolean Domain. Newcastle upon Tyne, UK: Cambridge Scholars Publishing, Apr. 2014. isbn: 978-1-4438-5638-6.

[314]

B. Steinbach. “XBOOLE - A Toolbox for Modelling, Simulation, and Analysis of Large Digital Systems”. In: System Analysis and Modeling Simulation 9.4 (1992), pp. 297–312.

[315]

B. Steinbach and C. Posthoff. “An Extended Theory of Boolean Normal Forms”. In: Proceedings of the 6th Annual Hawaii International Conference on Statistics, Mathematics and Related Fields. Honolulu, Hawaii, 2007, pp. 1124–1139. url: http://www.informa tik.tu-freiberg.de/prof2/publikationen/HICSM_2007_ETBNF.p df.

Bibliography

415

[316]

B. Steinbach and C. Posthoff. “Artificial Intelligence and Creativity - Two Requirements to Solve an Extremely Complex Coloring Problem”. In: Proceedings of the 5th International Conference on Agents and Artificial Intelligence. Ed. by J. Filipe and A. Fred. Vol. 2. ICAART. Valencia, Spain, 2013, pp. 411–418. isbn: 978-989-856539-6.

[317]

B. Steinbach and C. Posthoff. “Boolean Differential Calculus”. In: Progress in Applications of Boolean Functions. Ed. by T. Sasao and J. T. Butler. San Rafael, CA, USA: Morgan & Claypool Publishers, 2010, pp. 55–78. isbn: 978-1-60845-181-4.

[318]

B. Steinbach and C. Posthoff. “Boolean Differential Calculus - Theory and Applications”. In: Journal of Computational and Theoretical Nanoscience 7.6 (2010), pp. 933–981. issn: 1546-1955.

[319]

B. Steinbach and C. Posthoff. Boolean Differential Equations. Morgan & Claypool Publishers, June 2013. isbn: 978-1627052412. doi: 10.2200/S00511ED1V01Y201305DCS042.

[320]

B. Steinbach and C. Posthoff. “Complete Sets of Hamiltonian Circuits for Classification of Documents”. In: Computer Aided System Theory - EUROCAST 2009. Ed. by R. Moreno-Diaz, F. Pichler, and A. Quesada-Arencibia. Lecture Notes in Computer Science 5717. Berlin Heidelberg: Springer, 2009, pp. 526–533. isbn: 978-3-64204771-8. doi: 10.1007/978-3-642-04772-5_68.

[321]

B. Steinbach and C. Posthoff. “Evaluation and Optimization of GPU Based Unate Covering Algorithms”. In: Computer Aided Systems Theory – EUROCAST 2015. Vol. 9520. Lecture Notes in Computer Science. Switzerland: Springer International Publishing, 2015, pp. 617–624. isbn: 978-3-319-27339-6. doi: 10.1007/978-3-319-27 340-2_76.

[322]

B. Steinbach and C. Posthoff. “Extremely Complex 4-Colored Rectangle-Free Grids: Solution of Open Multiple-Valued Problems”. In: Proceedings of the IEEE 42nd International Symposium on Multiple-Valued Logic. ISMVL. Victoria, BC, Canada, 2012, pp. 37– 44. isbn: 978-0-7695-4673-5. doi: 10.1109/ISMVL.2012.12.

[323]

B. Steinbach and C. Posthoff. “Fast Calculation of Exact Minimal Unate Coverings on Both the CPU and the GPU”. In: 14th International Conference on Computer Aided Systems Theory – EUROCAST 2013 - Part II. Vol. 8112. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, 2013, pp. 234–241. isbn: 9783-642-53861-2. doi: 10.1007/978-3-642-53862-9_30.

416

Bibliography

[324]

B. Steinbach and C. Posthoff. “Highly Complex 4-Colored Rectangle-free Grids – Solution Unsolved Multiple-Valued Problems”. In: Multiple-Valued Logic and Soft Computing 24.1–4 (2014), pp. 369– 404. issn: 1542-3980.

[325]

B. Steinbach and C. Posthoff. Logic Functions and Equations - Examples and Exercises. Springer Science + Business Media B.V., 2009. isbn: 978-1-4020-9594-8.

[326]

B. Steinbach and C. Posthoff. “Multiple-Valued Problem Solvers Comparison of Several Approaches”. In: Proceedings of the IEEE 44th International Symposium on Multiple-Valued Logic (ISMVL 2014). ISMVL. Bremen, Germany: IEEE, 2014, pp. 25–31. doi: 10 .1109/ISMVL.2014.13.

[327]

B. Steinbach and C. Posthoff. “New Results Based on Boolean Models”. In: Boolean Problems, Proceedings of the 9th International Workshops on Boolean Problems. Ed. by B. Steinbach. IWSBP. Freiberg, Germany: Freiberg University of Mining and Technology, Sept. 2010, pp. 29–36. isbn: 978-3-86012-404-8.

[328]

B. Steinbach and C. Posthoff. “Solution of the Last Open FourColored Rectangle-free Grid - an Extremely Complex MultipleValued Problem”. In: Proceedings of the IEEE 43nd International Symposium on Multiple-Valued Logic. ISMVL. Toyama, Japan, 2013, pp. 302–309. isbn: 978-0-7695-4976-7. doi: 10 . 1109 / ISMVL .2013.51.

[329]

B. Steinbach and C. Posthoff. “Sources and Obstacles for Parallelization - a Comprehensive Exploration of the Unate Covering Problem Using Both CPU and GPU”. In: GPU Computing with Applications in Digital Logic. Tampere: Tampere International Center for Signal Processing (TICSP), 2012, pp. 63–96. isbn: 978-952-152920-7. doi: 10.13140/2.1.4266.4320.

[330]

B. Steinbach and C. Posthoff. “The Last Unsolved Four-Colored Rectangle-Free Grid: The Solution of Extremely Complex MultipleValued Problems”. In: Multiple-Valued Logic and Soft Computing 25.4–5 (2015), pp. 617–624. issn: 1542-3980.

[331]

B. Steinbach and C. Posthoff. “The Solution of Combinatorial Problems using Boolean Equations: New Challenges for Teaching”. In: Open Mathematical Education Notes 5 (2015). Special issue, pp. 1– 30. issn: 2303-4882.

Bibliography

417

[332]

B. Steinbach and M. Werner. “Alternative Approaches for Fast Boolean Calculations Using the GPU”. In: Computational Intelligence and Efficiency in Engineering Systems. Ed. by G. Borowik et al. Vol. 595. Studies in Computational Intelligence. Switzerland: Springer International Publishing, 2015, pp. 17–31. isbn: 978-3-31915719-1. doi: 10.1007/978-3-319-15720-7_2.

[333]

B. Steinbach and M. Werner. “Fast Boolean Operations Using the GPU”. In: 2nd Asia-Pacific Conference on Computer Aided System Engineering - APCASE 2014, Book of Extended Abstracts. Ed. by Z. Chaczko, F. L. Gaol, and C. Chiu. APCASE. Bali, Indonesia, 2014, pp. 86–89. isbn: 978-0-9924518-0-6.

[334]

L. Storme, A. De Vos, and G. Jacobs. “Group Theoretical Aspects of Reversible Logic Gates”. In: Journal of Universal Computer Science 5.5 (1999), pp. 307–321. doi: 10.3217/jucs-005-05-0307. url: ht tp://www.jucs.org/jucs_5_5/group_theoretical_aspects.

[335]

Y. Sugawara, M. Inaba, and K. Hiraki. “Over 10Gbps String Matching Mechanism for Multi-stream Packet Scanning Systems”. In: Field Programmable Logic and Application. Lecture Notes in Computer Science. Berlin Heidelberg: Springer, Aug. 2004, pp. 484–493. isbn: 978-3-540-22989-6. doi: 10.1007/978-3-540-30117-2_50.

[336]

V. P. Suprun. “Polynomial Expression of Symmetric Boolean Functions”. In: Soviet Journal of Computer and Systems Science 23.6 (1985). In Russian, pp. 88–91. issn: 1064-2307.

[337]

V. P. Suprun. “Table Method for Polynomial Decomposition of Boolean Functions”. In: Kibernetika 1 (1987). In Russian, pp. 116– 117.

[338]

M. Szyprowski and P. Kerntopf. “An approach to quantum cost optimization in reversible circuits”. In: Proceedings of the 11th IEEE International Conference on Nanotechnology. NANO. Portland, USA: IEEE Press, Aug. 2011, pp. 1521–1526. isbn: 978-1-4577-1514-3. doi: 10.1109/NANO.2011.6144568.

[339]

A. Takahara. “ASP-DAC 2015 Keynote Speech II: Programmable Network”. In: Proceedings of 20th Asia and South Pacific Design Automation Conference. Jan. 2015, p. 285.

[340]

K. Thompson. “Regular Expression Search Algorithm”. In: Communications of the ACM 11.6 (June 1968), pp. 419–422. doi: 10.1 145/363347.363387.

418

Bibliography

[341]

M. A. Thornton. “Modified Haar Transform Calculation Using Circuit Output Probabilities”. In: Proceedings of the 1997 IEEE International Conference on Information, Communications and Signal Processing. Vol. 1. ICICS. Singapore: IEEE, 1997, pp. 52–58. isbn: 0-7803-3676-3. doi: 10.1109/ICICS.1997.647056.

[342]

M. A. Thornton, R. Drechsler, and D. M. Miller. Spectral Methods in VLSI CAD. Boston, MA USA: Kluwer Academic Publishers, 2001. isbn: 978-1461355472.

[343]

M. A. Thornton and V. S. S. Nair. “BDD Based Spectral Approach for Reed-Muller Circuit Realisation”. In: IEE Proceedings on Computers and Digital Techniques 143 (2 1996), pp. 145–150. issn: 13502387. doi: 10.1049/ip-cdt:19960067.

[344]

M. A. Thornton and V. S. S. Nair. “Efficient Calculation of Spectral Coefficients and Their Applications”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 14.11 (Nov. 1995), pp. 1328–1341. issn: 0278-0070. doi: 10.1109/43.469 660.

[345]

M. A. Thornton and V. S. S. Nair. “Efficient Spectral Coefficient Calculation Using Circuit Output Probabilities”. In: Digital Signal Processing: A Review Journal 4 (4 Oct. 1994), pp. 245–254. issn: 1051-2004. doi: 10.1006/dspr.1994.1024.

[346]

M. A. Thornton and V. S. S. Nair. “Fast Reed-Muller Spectrum Computation Uing Output Probabilities”. In: Proceedings of the IFIP WG 10.5 Workshop on Applications of the Reed-Muller Expansion in Circuit Design (Aug. 1995), pp. 281–287.

[347]

D. Titterington, A. Smith, and U. Makov. Statistical Analysis of Finite Mixture Distributions. Wiley, 1985. isbn: 978-0471907633.

[348]

T. Toffoli. “Reversible Computing”. In: Automata, Languages and Programming. Vol. 85. Lecture Notes in Computer Science. Berlin Heidelberg: Springer, 1980, pp. 632–644. isbn: 978-3-540-10003-4. doi: 10.1007/3-540-10003-2_104.

[349]

N. N. Tokareva. “Generalizations of Bent Functions: A Survey”. In: Journal of Applied and Industrial Mathematics 5.1 (2011), pp. 110– 129. issn: 1990-4789. doi: 10.1134/S1990478911010133.

Bibliography

419

[350]

C. Y. Tsui, M. Pedram, and A. Despain. “Efficient Estimation of Dynamic Power Dissipation Under a Real Delay Model”. In: 1993 IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers. ICCAD. Santa Clara, CA, USA: IEEE, June 1993, pp. 224–228. isbn: 0-8186-4490-7. doi: 10.1109/ICCAD .1993.580061.

[351]

Z. Tu and Y. Deng. “A Conjecture on Binary String and Its Applications on Constructing Boolean Functions of Optimal Algebraic Immunity”. In: Designs, Codes and Cryptography 60.1 (2011), pp. 1– 14. issn: 0925-1022. doi: 10.1007/s10623-010-9413-9.

[352]

J. van den Herik et al. “Investigations with Monte Carlo Tree Search for Finding Better Multivariate Horner Schemes”. In: Communications in Computer and Information Science (2014), pp. 3–20. doi: 10.1007/978-3-662-44440-5_1.

[353]

Y. Van Rentergem, A. De Vos, and L. Storme. “Implementing an arbitrary reversible logic gate”. In: Journal of Physics A: Mathematical and General 38.16 (2005), p. 3555. doi: 10.1088/0305-447 0/38/16/007. url: http://stacks.iop.org/0305-4470/38/i=16 /a=007.

[354]

J. Vermaseren. “HEPGAME ERC Grant Proposal”. 2013. url: ht tp://www.nikhef.nl/~form/maindir/HEPgame/DoW.pdf.

[355]

V. Volkov. Better Performance at Lower Occupancy. 2010. url: ht tp://www.cs.berke-ley.edu/~volkov/volkov10-GTC.pdf (visited on 12/08/2013).

[356]

H. Vollmer. Introduction to Circuit Complexity: A Uniform Approach. Texts in Theoretical Computer Science. An EATCS Series. Berlin Heidelberg: Springer, 1999. isbn: 978-3-540-64310-4. doi: 10 .1007/978-3-662-03927-4.

[357]

Y. Wakaba, M. Inagi, and S. Wakabayashi. “A Practical FPGA Implementation of Regular Expression Matching with Look-Ahead Assertion”. In: Proceedings of International Conference on Engineering of Reconfigurable Systems and Algorithms. ERSA. July 2012, pp. 105–110. isbn: 1-60132-233-X.

[358]

Y. Wakaba et al. “A Flexible and Compact Regular Expression Matching Engine Using Partial Reconfiguration for FPGA”. In: Proceedings of the 16th Euromicro Conference on Digital System Design. DSD. Los Alamitos, CA, USA: IEEE, Sept. 2013, pp. 293–296. doi: 10.1109/DSD.2013.115.

420

Bibliography

[359]

Y. Wakaba et al. “A Matching Method for Look-Ahead Assertion on Pattern Independent Regular Expression Matching Engine”. In: Proceedings of 17th Workshop on Synthesis and System Integration of Mixed Information Technologies. SASIMI. Mar. 2012, pp. 361– 366.

[360]

Y. Wakaba et al. “An Area Efficient Regular Expression Matching Engine Using Partial Reconfiguration for Quick Pattern Updating”. In: IPSJ Transactions on System LSI Design Methodology 7 (Aug. 2014), pp. 110–118. doi: 10.2197/ipsjtsldm.7.110.

[361]

Y. Wakaba et al. “An Efficient Hardware Matching Engine for Regular Expression with Nested Kleene Operators”. In: Proceedings of 2011 IEEE International Conference on Field Programmable Logic and Applications. FPL. Chania, Crete, Greece: IEEE, Sept. 2011, pp. 157–161. isbn: 978-1-4577-1484-9. doi: 10.1109/FPL.2011.36.

[362]

Z. Wang, M. Karpovsky, and B. Sunar. “Multilinear codes for robust error detection”. In: On-Line Testing Symposium, IEEE International (2009), pp. 164–169. doi: http://doi.ieeecomputersoci ety.org/10.1109/IOLTS.2009.5196002.

[363]

I. Wegener. Branching Programs and Binary Decision Diagrams: Theory and Applications. SIAM Monographs on Discrete Mathematics and Applications. SIAM, 2000. isbn: 978-0-89871-458-6. doi: 10.1137/1.9780898719789.

[364]

I. Wegener. Complexity Theory - Exploring the Limits of Efficient Algorithms. Dordrecht, The Netherlands: Springer, 2005. isbn: 9783-540-27477-3.

[365]

I. Wegener. The complexity of Boolean functions. Wiley, B. G. Teubner, 1987. isbn: 3-519-02107-2.

[366]

M. Werner. CUDA Quick Reference. July 2012. url: http://1123 5813tdd.blogspot.de/2012/07/cuda-cheatsheet-quick-referen ce.html (visited on 12/08/2013).

[367]

M. Werner. “Parallelization of XBOOLE-Operations using CUDA”. In German: Parallelisierung von XBOOLE-Operationen mit CUDA. MA thesis. Freiberg University of Mining and Technology, 2014.

[368]

R. Wille and R. Drechsler. Towards a Design Flow for Reversible Logic. Dordrecht, The Netherlands: Springer, 2010. isbn: 978-90481-9578-7. doi: 10.1007/978-90-481-9579-4.

[369]

R. Wille et al. RevLib: An Online Resource for Reversible Functions and Reversible Circuits. url: http://www.revlib.org.

Bibliography

421

[370]

N. Wilt. The CUDA Handbook: A Comprehensive Guide to GPU Programming. Pearson Education, 2013. isbn: 9780133261509.

[371]

N. Yamagaki, R. Sidhu, and S. Kamiya. “High-Speed Regular Expression Matching Engine Using Multi-Character NFA”. In: Proceedings of the International Conference on Field Programmable Logic and Applications. Heidelberg, Germany: IEEE, Aug. 2008, pp. 131–136. isbn: 978-1-4244-1960-9. doi: 10.1109/FPL.2008.4 629920.

[372]

H. Yamamoto. “Regular Expression Matching Algorithms Using Dual Position Automata”. In: Journal of Combinatorial Mathematics and Combinatorial Computing. Vol. 71. Nov. 2009, pp. 103–125.

[373]

S. Yamashita, S.-i. Minato, and D. M. Miller. “Synthesis of SemiClassical Quantum Circuits”. In: Multiple-Valued Logic and Soft Computing 18.1 (2012), pp. 99–114. issn: 1542-3980.

[374]

L. Yang et al. “Improving NFA-Based Signature Matching Using Ordered Binary Decision Diagrams”. In: Recent Advances in Intrusion Detection. Vol. 6307. Lecture Notes in Computer Science. Berlin Heidelberg: Springer, 2010, pp. 58–78. isbn: 978-3-642-15511-6. doi: 10.1007/978-3-642-15512-3_4.

[375]

S. Yang. Logic Synthesis and Optimization Benchmarks User Guide Version 3.0. User Guide. Microelectronic Center, 1991.

[376]

Y.-H. E. Yang and V. K. Prasanna. “Space-Time Tradeoff in Regular Expression Matching with Semi-Deterministic Finite Automata”. In: Proceedings of IEEE INFOCOM 2011. Shanghai, China: IEEE, Apr. 2011, pp. 1853–1861. isbn: 978-1-4244-9919-9. doi: 10 .1109/INFCOM.2011.5934986.

[377]

S. N. Yanushkevich et al. Decision Diagram Techniques for Microand Nanoelectronic Design, Handbook. Boca Raton, London, New York: CRC Press, 2006. isbn: 978-0-8493-3424-5.

[378]

S. N. Yanushkevich et al. Introduction to Noise-Resilient Computing. 1st. San Rafael, CA, USA: Morgan and Claypool Publishers, 2013. isbn: 9781627050227.

[379]

A. C.-C. Yao. “How to Generate and Exchange Secrets (Extended Abstract)”. In: 27th Annual Symposium on Foundations of Computer Science. IEEE, 1986, pp. 162–167. doi: 10.1109/SFCS.1986 .25.

422

Bibliography

[380]

S. Yusuf and W. Luk. “Bitwise Optimised CAM for Network Intrusion Detection Systems”. In: Proceedings of International Conference on Field Programmable Logic and Applications. IEEE, Aug. 2005, pp. 444–449. isbn: 0-7803-9362-7. doi: 10.1109/FPL.2005.15 15762.

[381]

A. Zakrevskij. Logic Equations. In Russian (Logiqeskie Uravneni). Minsk: Nauka i Tehnika, 1975.

[382]

G. Zhang and C. Moraga. “Polynomial Fourier Transforms”. In: Proceedings of the 18th International Symposium on Multiple-Valued Logic. ISMVL. Palma de Mallorca, Spain: IEEE, 1988, pp. 412–419. isbn: 0-8186-0859-5. doi: 10.1109/PGEC.1964.263747.

[383]

W. Zhang and G. Xiao. “Constructions of Almost Optimal Resilient Boolean Functions on Large Even Number of Variables”. In: IEEE Transactions on Information Theory 55.12 (2009), pp. 5822–5831. issn: 0018-9448. doi: 10.1109/TIT.2009.2032736.

[384]

W. Zhang and G. Xiao. “On Constructions of Multi-Output Flattened Functions”. In: Chinese Journal of Electronics 15.1 (2006), pp. 169–172.

List of Authors Nabila Abdessaied Cyber-Physical Systems DFKI GmbH Bremen, Germany E-Mail: [email protected] Rudolf Berghammer Department of Computer Science University of Kiel Kiel, Germany E-Mail: [email protected] Rudolf B. Blaˇ zek Faculty of Information Technology Czech Technical University of Prague Prague, Czech Republic E-Mail: [email protected] Stefan Bolus Department of Computer Science University of Kiel Kiel, Germany E-Mail: [email protected] Stelvio Cimato Department of Computer Science University of Milan Crema, Italy E-Mail: [email protected]

424

Valentina Ciriani Department of Computer Science University of Milan Crema, Italy E-Mail: [email protected] Stijn De Baerdemacker Inorganic and Physical Chemistry Ghent University Ghent, Belgium E-Mail: [email protected] Alexis De Vos Electronics and Information Systems Ghent University Ghent, Belgium E-Mail: [email protected] Rolf Drechsler Department of Mathematics and Computer Science University of Bremen Bremen, Germany E-Mail: [email protected] Gerhard Dueck Department of Computer Science University of New Brunswick Fredericton, Canada E-Mail: [email protected] Petr Fiˇ ser Faculty of Information Technology Czech Technical University of Prague Prague, Czech Republic E-Mail: [email protected]

List of Authors

List of Authors

´ Duˇ san Gajic Department of Computer Science University of Niˇs Niˇs, Serbia E-Mail: [email protected] Danila A. Gorodecky United Institute of Informatics Problems National Academy of Science of Belarus Minsk, Belarus E-Mail: [email protected] ¨ ssel Michael Go Institute of Computer Science University of Potsdam Potsdam, Germany E-Mail: [email protected] Jerzy Jegier Division of Open Systems and Open Data Orange Labs Poland Warsaw, Poland E-Mail: [email protected] Thomas Kern Automotive Division Infineon Technologies AG Neubiberg, Germany E-Mail: [email protected] Pawel Kerntopf Faculty of Physics and Applied Informatics University of L o ´d´z L o ´d´z, Poland E-Mail: [email protected]

425

426

Claudio Moraga Faculty of Computer Science TU Dortmund University Dortmund, Germany E-Mail: [email protected] Matteo Moroni Department of Computer Science University of Milan Crema, Italy E-Mail: [email protected] Shinobu Nagayama Department of Computer and Network Engineering Hiroshima City University Hiroshima, Japan E-Mail: [email protected] ¨ nther Nieß Gu Institute of Computer Science University of Potsdam Potsdam, Germany E-Mail: [email protected] Aske Plaat Leiden Institute of Advanced Computer Science Leiden University Leiden, The Netherlands E-Mail: [email protected] Christian Posthoff Department of Computing and Information Technology The University of the West Indies St. Augustine, Trinidad & Tobago E-Mail: [email protected]

List of Authors

List of Authors

´ Miloˇ s Radmanovic Department of Computer Science University of Niˇs Niˇs, Serbia E-Mail: [email protected] Md. Mazder Rahmen Department of Computer Science University of New Brunswick Fredericton, Canada E-Mail: [email protected] Ben Ruijl Leiden Institute of Advanced Computer Science Leiden University Leiden, The Netherlands E-Mail: [email protected] Jan Schmidt Faculty of Information Technology Czech Technical University of Prague Prague, Czech Republic E-Mail: [email protected] Mathias Soeken Department of Mathematics and Computer Science University of Bremen Bremen, Germany E-Mail: [email protected] ´ Milena Stankovic Department of Computer Science University of Niˇs Niˇs, Serbia E-Mail: [email protected]

427

428

´ Radomir S. Stankovic Department of Computer Science University of Niˇs Niˇs, Serbia E-Mail: [email protected] Bernd Steinbach Institute of Computer Science Freiberg University of Mining and Technology Freiberg, Germany E-Mail: [email protected] ´ Suzana Stojkovic Department of Computer Science University of Niˇs Niˇs, Serbia E-Mail: [email protected] Micah A. Thornton Department of Computer Science and Engineering Southern Methodist University Dallas, Texas, USA E-Mail: [email protected] Mitchell A. Thornton Department of Computer Science and Engineering Southern Methodist University Dallas, Texas, USA E-Mail: [email protected] Jaap van den Herik Leiden Institute of Advanced Computer Science Leiden University Leiden, The Netherlands E-Mail: [email protected]

List of Authors

List of Authors

Jos Vermaseren Department of Theoretical Physics Nikhef Amsterdam, The Netherlands E-Mail: [email protected] Shin’ichi Wakabayashi Department of Computer and Network Engineering Hiroshima City University Hiroshima, Japan E-Mail: [email protected] Matthias Werner Institute of Computer Science Freiberg University of Mining and Technology Freiberg, Germany E-Mail: [email protected] Chunhui Wu Department of Computer Science Guangdong University of Finance Guangzhou, P.R. China E-Mail: [email protected]

429

Index of Authors A

K

Abdessaied, Nabila . . . . . . . . . 327

Kern, Thomas . . . . . . . . . . . . . . 255 Kerntopf, Pawel . . . . . . . . . . . . 342

B Berghammer, Rudolf . . . . . . . 287 Blaˇzek, Rudolf B. . . . . . . . . . . 167 Bolus, Stefan . . . . . . . . . . . . . . . 287 C Cimato, Stelvio . . . . . . . . . . . . . 241 Ciriani, Valentina . . . . . . . . . . 241 D De Baerdemacker, Stijn . . . . .357 De Vos, Alexis . . . . . . . . . . . . . .357 Drechsler, Rolf . . . . . . . . . . . . . 327 Dueck, Gerhard W. . . . . . . . . . 369

M Moraga, Claudio . . . . . . . . . . . 309 Moroni, Matteo . . . . . . . . . . . . 241 N Nagayama, Shinobu . . . . . . . . 189 Nieß, G¨ unther . . . . . . . . . . . . . . 255 P Plaat, Aske . . . . . . . . . . . . . . . . . . 76 Posthoff, Christian . . . . . . . . . . 51 R

F

Radmanović, Miloˇs . . . . . . . . . 150 Rahman, Md. Mazder . . . . . . 369 Ruijl, Ben . . . . . . . . . . . . . . . . . . . 76

Fiˇser, Petr . . . . . . . . . . . . . . . . . .167

S

G G¨ ossel, Michael . . . . . . . . . . . . . 255 Gajić, Duˇsan . . . . . . . . . . . . . . . 150 Gorodecky, Danila A. . . . . . . . . 96

Schmidt, Jan . . . . . . . . . . . . . . . 167 Soeken, Mathias . . . . . . . . . . . . 327 Stanković, Milena . . . . . . . . . . 309 Stanković, Radomir S. . 150, 309 Steinbach, Bernd . . . 51, 117, 220 Stojković, Suzana . . . . . . . . . . 150

J

T

Jegier, Jerzy . . . . . . . . . . . . . . . .342

Thornton, Micah A. . . . . . . . . 269

432

Thornton, Mitchell A. . . . . 3, 269 V van den Herik, Jaap . . . . . . . . . 76 Vermaseren, Jos . . . . . . . . . . . . . 76

Index of Authors

W Wakabayashi, Shin’ichi . . . . . 189 Werner, Matthias . . . . . . . . . . . 117 Wu, Chunhui . . . . . . . . . . . . . . . 220

Index accelerator pattern-independent, 203 algebra set, 8 algorithm, 334 encryption, 242 growth, 79 in EDA, deterministic, 167 in EDA, iterative, 178 in EDA, randomization, 177 logic optimization, 274 stochastic model of performance, 179 synthesis, 274, 328, 330, 368 analysis complexity, 331 fault tolerance, 270 linear system, 17 logic, 333 reversible, 333 reliability, 270 annealing simulated, 91 approach synthesis, 336 vector space, 6 Aristotle, 7 array

input, 40 output, 40 systolic, 190, 198, 202, 218, 219 associativity, 79 automorphism, 311 benchmark, 77, 342, 343, 350 binate, 54 bit, 13, 358, 370 Boole George, 8 Boolean Algebra, 3, 4, 276 bound greatest lower, 15 least upper, 15 lower, 328 exponential, 328 upper, 327, 334–336 best known, 336 linear, 327 bra, 12 bra-ket, 12 buffer tri-state, 15 calculation logical, 8 Calculator Oxford, 8 cascade, 328–330

434

check parity, 266 self-inversion, 333 checking equivalence, 277, 327 model, 277 circuit, 340, 348, 350, 352, 353 FOURIER, 368 IDENTITY, 359 Boolean, 327 classical, 361 digital, 309 entangled, 372, 381 four-variable, 349 garbled, 242 integrated, 5 LNN, 381 low power, 270 minimal, 342, 343, 347 gate count, 343, 347, 348 palindromic, 348 multiple-qubit, 360 non-classical, 361 non-entangled, 381 optimal, 342, 343 pattern-specific, 195 quantum, 358, 360, 361, 369–371 arbitrary, 358, 368 semi-classical, 371, 377 synthesis, 371 reversible, 327–330, 334, 336, 339–343, 346, 352, 358, 369 (n,n), 347 classical, 358, 377 cost, 342 sequential, 195 switching, 9

Index

synchronous, 198 synthesized, 342 three-qubit, 375 minimal, 378 three-variable, 349 Toffoli, 339 transistor-based, 9 tri-state, 15 voltage-mode, 21 XU, 361 ZU, 361 circuitry analog, 5 current-mode, 21 digital, 5 class function, 331, 332, 341 Boolean, 332 reversible, 339 classification Boolean functions, 309 co-factor, 272, 273, 276–278, 283–285 negative, 272 positive, 272 code linear, 264 separable, 264 coefficient Chrestenson, 310 Haar, 283–285 spectral, 270, 279, 281 Walsh, 280, 310 cofactor, 246 commutative, 13 commutativity, 79 comparator, 198, 202 complexity, 213, 215, 327 circuit, 327 reversible, 327, 338,

Index

341 computational, 78 exponential, 20 worst-case, 42 computation quantum, 327 computer, 368 quantum, 368 hardware, 368 reversible, 358 computing classical, 358 conventional, 358 low-power, 327 quantum, 4, 342, 357, 358, 361 reversible, 358 classical, 358, 361 concept syllogistic, 7 conclusion, 7 conductor, 22 confidence level, 171 confusion, 233 conjunction, 329 consensus, 276, 277 construction function, 332 characteristic, 332, 333 contradiction, 354 controller, 198 correlation, 283 cost function, 342 gate count, 350 minimal, 342 quantum, 342, 343, 346, 371, 374 two-qubit, 374, 376–382 unit, 376

435

cost function, 241 cover, 273 cube, 274 crossover, 22, 45–47 cryptography, 79, 310 cube, 274 disjoint, 275 partially redundant, 275 totally redundant, 275 cube cover, 274 cube list, 274 cycle, 344, 345 disjoint, 344 single, 355 Davio negative, 329 positive, 329 decomposition, 272, 357, 362, 364, 366, 367, 369, 377 NCV, 371 dual, 365 Shannon, 283 singular value, 12 derivative Boolean, 272, 276, 278 simple, 277 design low-power, 342 determinate, 11 device mixed-signal, 4 diffusion, 233 Dirac Paul, 12 disjoint, 274 distance Hamming, 345, 355 distributivity, 79

436

domain spectral, 5 don’t care, 15, 40 input , 15 internal, 15 output, 15 duality, 333 eigenvalue, 11 eigenvector, 11 electronic conventional, 4 Electronic Design Automation, 167 ABC tool, 177 robustness, 167, 177 embedding optimal, 341 encoding binary, 209–211, 217 one-hot, 200, 206, 209–211, 214, 216–218 equivalent decimal, 344 error 2-bit, 265, 267, 268 asymmetric, 268 combinatorial, 256 double-bit, 263, 264 functional, 258 least-squared, 33, 34 unidirectional, 257, 268 estimation power dissipation, 270 expansion Shannon, 206, 272 expectation, 172 experimental algorithm qualitative, 171

Index

experimental evaluation quantitative, 171 exploitation, 77, 79, 83, 90, 91, 94 exploration, 77–79, 83–85, 87, 88, 90, 91, 94 explosion state-space, 205 expression ESOP, 336, 339 linear size, 339 multivariate, 77 polynomial, 311 PPRM, 335 factor switching activity, 270 factorization, 44 partial syntactic, 79 false, 7 fan-in, 21 fan-out, 20 fault bridging, 258, 259 stuck-at-0, 257, 258 fault model stuck-at, 257, 258 finite automaton, 191, 199 deterministic, 192 non-deterministic, 192 FORM, 76 form complemented, 282 ESOP, 336 non-complemented, 282 shortened, 345 formula propositional, 328 framework analysis, 341

Index

application, 341 function p-valued, 310, 311, 314, 316, 317 n-place, 316–318 balanced, 309, 332, 344, 352 bent, 310, 316, 318, 324, 332 n-place, 316 p-valued, 314, 316 binary, 316 Boolean, 310 five-valued, 310 one-place, 316 ternary, 310, 320, 324 two-place, 320 bijective, 327, 340, 344 binary, 316 Boolean, 12, 16, 309, 327, 329, 331, 333, 336, 343, 344 (n,p), 343 most non-linear, 310 multiple-output, 327, 328, 340 reversible, 343–345 characteristic, 331–333 class, 331 classical, 376, 378, 379 completely specified, 271 constituent, 279–285 coordinate, 343–345, 354 cost, 249, 346 gate count, 346 covering, 273 Horn, 332 identity, 334 irreversible, 327, 343 Krom, 332

437

linear, 309 logic, 205 Miller, 350 generalized, 350, 355 monotone, 309, 332 multiple-output, 327, 331 multiple-valued, 309, 310, 324 non-bent, 320 ternary, 320–323 non-linear, 309, 310 objective, 247 one-place, 311 probability mass, 271 reversible, 327–334, 336, 340, 342–347, 349, 350, 370, 371 binary, 376 classical, 376, 379 four-variable, 343, 347 random, 343 self-inverse, 345, 348 three-variable, 343, 348 self-dual, 331, 332 self-inverse, 343, 345 semi-classical, 376, 378 state transition, 192 switching, 16 symmetric, 309, 332 ternary, 319–323 two-place, 324 three-qubit, 376 three-variable, 350 threshold, 309 p-valued, 310 binary, 310 ternary, 309, 310 transfer, 17 unbalanced, 353, 354

438

Galois field, 259, 310, 324 gate AND, 16, 40 CNOT, 342, 346, 381 EXOR, 338 HADAMARD, 366–368 H, 367 MCT, 330, 334–336, 338 MPMCT, 330, 336, 338 NAND, 369 NCT, 335 NCV, 369, 371, 374–376, 378, 379, 381, 382 two-qubit, 371 NEGATOR, 359, 360 controlled, 358, 360 uncontrolled, 368 NOT, 40, 342, 346, 351–353, 359 controlled, 368 OR, 40 PHASE, 359 PHASOR, 359, 363 controlled, 358, 360, 368 single-controlled, 368 uncontrolled, 368 ROTATOR, 357 controlled, 358 S, 360 T, 360, 368 V, 360 W, 360 XOR, 16 X, 360 Z, 360 classical, 368 controlled-V, 370 controlled-V† , 370 entangled, 380

Index

Fredkin, 371, 373, 374 library, 342 logic, 16, 20, 359, 369 classical, 359 quantum, 359 multiple-controlled, 368 quantum, 346, 369 elementary, 369 semi-classical, 374 universal, 369 reversible, 329, 345, 346, 371 classical, 371 self-inverse, 370 single-bit, 358 single-qubit, 357, 370, 374, 377, 378 entangled, 377 semi-classical, 377 single-target, 327–330, 333–336, 338–340 reversible, 329 Toffoli, 328, 329, 335–339, 342, 346, 371 generalized, 342, 346 multiple control, 345 two-qubit, 370, 373, 374, 376–378 NCV, 373 entangled, 376, 377 restricted, 378, 379 semi-classical, 375–377, 379 gate count, 342, 343, 346–351, 353–355 Gramian, 33, 34, 37 graph error, 255–263, 265, 267 group (n − 1)-dimensional, 364

Index

(n − 1)2 -dimensional, 363 continuous, 359 Lie, 359 (2w − 1)-dimensional, 368 one-dimensional, 359 matrix, 358 finite, 358 infinite, 358 multiplicative, 317 parity, 266 permutation, 361 finite, 361 symmetric, 333, 334 unitary, 360 infinite, 361 U(2w ), 360 U(n), 360, 361 XU(2w ), 360 XU(n), 361 ZU(n), 361, 364 Hamming weight, 266 hardware, 195 pattern-independent, 202 hierarchy group, 358 Hilbert Space, 10, 11, 13–16 Hobbes Thomas, 8 Horner scheme, 77, 79–81 Horner’s rule, 77, 79–81 idempotence, 8 impedance high, 15, 21 index cut, 46, 47 input, 350 output, 350

439

inequality Boolean, 55 inference, 7, 9 infimum, 58 information, 9 correlation, 283 totally global, 283 input binary, 374 integrated circuit mixed-signal, 6 intelligence artificial, 78 involution, 333 isomorphism, 310 justification, 26, 33 ket, 12 key cryptographic, 242 Kleene closure, 191, 199 Kronecker product, 13, 280, 312–314, 317 levelization, 45 library gate, 329, 338, 368, 376 NCT, 335, 342 NCV, 369, 371, 373, 379 entangled, 377 most common, 329 restricted, 377 semi-classical, 377 two-qubit, 379 restricted, 379 semi-classical, 379 line ancilla, 330 cut, 46, 47 first, 340

440

target, 336, 338, 340 literal, 329 literals, 335 Llull Raymond, 8 logic, 7, 9 adiabatic, 4 Boolean, 9 classical, 7 computational, 78 digital, 9 fuzzy, 4 multiple-valued, 4 quantum, 4, 12, 17, 32 reversible, 4 symbolic, 78 L ukasiewicz Jan, 4 Maiorana class, 316–318, 324 manipulation logic symbolic, 9 mapping ESOP, 339 matching one-character, 199 regular expression, 189–192, 194–197, 199, 200, 202–206, 209, 211–214, 216, 218, 219 pattern-independent, 196 pattern-specific, 195 string, 195, 201, 202 matching engine pattern-independent, 196, 197 pattern-specific, 196, 197 matrix

Index

arbitrary, 365 U(2w ), 365 basis, 312 block-diagonal, 360, 364, 365, 367 circulant, 365 cyclic-shift, 362 Dirac-delta, 18 Gramian, 12 Haar, 283 Hadamard, 363 generalized, 363, 364 identity, 11, 283 justification, 37–39 left, 363 non-singular, 310 non-square, 11 P, 362 permutation, 358, 359 circulant, 362 pseudo-inverse, 33 right, 363 square, 12, 33 Gram, 12, 33, 34 transfer, 5, 12, 17–32, 34, 35, 37, 39, 41–49 factored, 44 transform, 283, 286, 312 Reed-Muller, 312 transformation, 17, 279, 280, 282 transpose, 33 unit, 360, 364 unit-line-sum, 360 unitary, 17, 32, 357–361, 369, 373, 374 2 × 2, 357 arbitrary, 364 diagonal, 361 ZU(2w ), 361

Index

XU, 362 ZU, 362 maximization, 172 maximum simple, 277 maxterm, 34 mean, 174 measure Hamming, 345, 346, 350, 352 vector, 345 mechanics quantum, 4, 32 method overarching, 6 spectral, 5 synthesis, 358 minimization ESOP, 247 minimum simple, 277 minterm, 34, 274–276 model complex-valued, 6 continuous, 6 error, 255, 256, 268 graph, 256 Hilbert Space, 4 mathematical, 5 qubit, 4 stochastic, 172 switching algebraic, 30 switching theory, 5 vector, 4 vector space, 4–7, 15, 16, 30, 32, 33, 38–40, 44, 49 model check symbolic, 205 Moore-Penrose, 17, 33

441

nanotechnology, 342 netlist structural, 44 network communication, 9 contradictory, 31 irreversible, 33 quantum, 32 reversible, 32, 33 switching, 9 tautological, 31 Neumann John von, 4 notation bra-ket, 13 standard, 333 number natural, 333 polarity, 282 operation IDENTITY, 358 additive logical, 8 AND-type, 21 conjunctive, 41 disjunctive, 41 functional, 212, 213 ITE, 213 mathematical, 8 multiplicative logical, 8 negation, 13 projection, 212, 214, 215 relational, 213, 214 restriction, 212–214 rotation, 11 scaling, 11 single-qubit, 359 one-parameter, 359 operator, 276 atomic, 19

442

Index

consensus, 278 multiplicative, 16, 17 smoothing, 276, 277 optimization combinatorial, 78 local, 342 order dyadic, 279 revised sequency, 279 sequency, 279 subsequent, 339 tensor, 10 output non-entangled, 374 vector, 280

linear algebraic, 6 open, 341 research, 339 product Cartesian, 10 inner, 13 Kronecker, 280 outer, 13 scalar, 280 tensor, 370 vector-matrix, 279 proposition, 7 binary-valued, 9 two-state, 9 pseudo-inverse, 33–35, 37

palindrom, 320 permutation, 317, 333, 344, 345 explicit, 333 self-inverse, 333 pipeline, 199 Plato, 7 Post Emil, 4 predicate, 7 premise, 7 probability absolute, 286 circuit output, 269–271 conditional, 273, 274, 276, 285, 286 conditional output, 271 output, 270–272, 278, 279, 282, 285, 286 signal switching, 270 problem Boolean, 6, 78 intractable, 6 justification, 33, 34

quantification existential, 213, 277 universal, 277 quantifier existential, 276 universal, 276 quantum bit, 4 Quantum Field Theory, 77 qubit, 4, 357, 368 qubits, 358 reduction cost, 247 redundant partially, 274 totally, 274 register, 198 regular expression, 190, 192–199, 202, 203, 209 regular set, 190 regularity structural, 347 relation, 205, 209

Index

state transition, 192, 201 relay electrical, 8 representation cycle, 333 permutation, 333 PPRM, 335 response multiple output, 28 total network, 28 total output, 29 reversible logically, 32 rotation x-axis, 357 y-axis, 357 z-axis, 357 physical, 357 row vector, 280 schematic quantum, 362 search breadth-first, 66 depth-first, 66 heuristic, 66 self-inverse, 333 self-inversion, 333 set, 205, 209 cover, 271 finite, 59, 192 support, 273, 274, 276 Shannon, 329 Claude, 8 Shannon expansion, 272 signal input, 342 output, 342 significance

443

statistical, 171 simplification expression, 78 simulated annealing, 184 simulation, 79 Sinkhorn theorem, 362 size linear, 339 quadratic, 339 smoothing, 276, 277 solution multiple, 33 single, 33 spectrum Γ, 319 bent, 319, 324 Boolean function, 269 Chrestenson, 309 Haar, 283 modified, 283 Hadamard-Walsh, 279 inverse, 311 Maiorana, 319 Paley-Walsh, 280 Rademacher-Walsh, 280 Reed-Muller, 270, 282, 311, 313, 316, 318, 320, 324 bent, 319, 320, 324 Maiorana, 318 Vilenkin-Chrestenson, 314, 324 Walsh, 270, 280, 282, 309 spinor, 357 state accepting, 192 basis, 370 computational, 370, 372 composite, 371

444

entangled, 370, 371 initial, 192 separable, 370, 371 single-qubit, 370 steady, 7 two-qubit, 370, 378 stochastic model confidence in, 182 construction, 172 EDA tool performance, 179 Gaussian Mixtures, 172 random MAX-3SAT valuation, 181 simulated annealing performance, 184 string, 201 structure factored-form, 44 palindromic, 348 similar, 348 cycle, 348 subgroup, 358, 360, 361, 368 (j − 1)-dimensional, 365 XU(n), 365 Young, 334 subject, 7 sum tensor, 314–319, 324 supergroup, 361 supersymmetry, 76 supremum, 58 switching theory, 3, 6, 9 classical, 4 syllogism, 7 synthesis, 241, 272, 342, 357, 358, 365, 366 algorithm, 342, 343 circuit, 342 reversible, 342

Index

logic, 333, 342 algorithm, 342 classical, 342, 343 reversible, 333, 342 method, 350 system computer algebra, 78 over-constrained, 33 under-constrained, 33 systolic array, 198, 202 technique algorithmic, 8 mechanized, 8 spectral, 310 techniques spectral, 350 tensor, 10, 13 second-order, 10 unity-order, 10 zero-order, 10 term product, 329, 335, 336, 339 test, 272 random, 269 theorem Bayes, 273 no-cloning, 21 theory linear system, 17 probability, 273 throughput, 203 tool EDA, 6 mixed-signal, 6 transform Fourier, 364, 365 discrete, 364 Haar, 270, 283, 285

Index

modified, 283 wavelet, 283 Reed-Muller, 285, 311, 324 Vilenkin-Chrestenson, 324 Walsh, 283, 285 wavelet, 270 transformation unitary, 357, 370 basic, 357 transpose, 33 transposition, 345, 350 single, 350 true, 7 truncation, 172 truth table, 12, 30, 35, 269, 270, 332, 334, 344, 345, 347, 352 encrypted, 242 garbled, 242, 243

445

unate, 54, 98, 99 value steady-state, 8 variance, 174 vector basic, 14 binary, 344 function, 283 null, 15, 32 output, 279, 283 row, 283, 286, 312 total, 15, 32 unit, 357 value, 310, 311 vector space, 10 complex, 370 method, 3 verification formal, 272, 277