Use of abstraction and logic in mathematics 1774695006, 9781774695005

“The Use of Abstraction and Logic in Mathematics” is an edited book consisting of 16 contemporaneous open-access article

201 107 16MB

English Pages 422 [424] Year 2022

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Use of abstraction and logic in mathematics
 1774695006, 9781774695005

Table of contents :
Cover
Title Page
Copyright
DECLARATION
ABOUT THE EDITOR
TABLE OF CONTENTS
List of Contributors
List of Abbreviations
Preface
Chapter 1 Classical Logic and Quantum Logic with Multiple and Common Lattice Models
Abstract
Introduction: Is Logic Empirical?
Kinds of Logic
Lattices
Soundness and Completeness
Discussion
Acknowledgments
References
Chapter 2 A Novel Categorical Approach to Semantics of Relational First-Order Logic
Abstract
Introduction
A Relational First-Order Logic
Category Theory
A Categorical Semantics
An Implementation of the Categorical Semantics
Conclusions
Author Contributions
Funding
References
Chapter 3 Infinitary Classical Logic: Recursive Equations and Interactive Semantics
Introduction
Preliminaries: Positions and Labeled Trees
Infinitary Classical Logic
Interactive Seantics
References
Chapter 4 Formalization of Linear Space Theory in the Higher-Order Logic Proving System
Abstract
Introduction
Preliminaries in Hol
Formalization of Linear Space Theory in Hol4
Conclusion
Acknowledgment
References
Chapter 5 Language and Proofs for Higher-Order SMT (Work in Progress)
Introduction
A Syntax Extension for the Smt-Lib Language
An Extension for the Verit Proof Format
Conclusion and Future Work
Acknowledgment
References
Chapter 6 Alternation Is Strict For Higher-Order Modal Fixpoint Logic
Introduction
Alternating Parity Krivine Automata
APKA and HFL
The Alternation Hierarchy for Alternating Parity Krivine Automata
Discussion
Acknowledgements
References
Chapter 7 Bisimulation in Inquisitive Modal Logic
Introduction
Inquisitive Modal Logic
Inquisitive Bisimulation
An Ehrenfeucht–Fra¨Isse Theorem
Relational Inquisitive Models
The ~-Invariant Fragment of FO
Conclusion
References
Chapter 8 Graphical Sequent Calculi for Modal Logics
Introduction
The Syntax of Modal Graphs
The Graphical Calculi Kg
Extensions
Graphical and Sequent Calculi
Conclusion
Acknowledgements
References
Chapter 9 Categorical Abstract Algebraic Logic: Meet-Combination of Logical Systems
Abstract
Introduction
Basic Framework
Meet-Combinations
Soundness
c-Completeness
Conservativeness and Consistency
Examples from Classical Propositional Logic
References
Chapter 10 Fuzzy Logic versus Classical Logic: An Example in Multiplicative Ideal Theory
Abstract
Introduction
Preliminaries and Notations
Fuzzy Logic versus Classical Logic: An Example
References
Chapter 11 Link Prediction Using A Probabilistic Description Logic
Abstract
Introduction
Background
Link Prediction with CR ALC
Experiments
Conclusion
Acknowledgments
References
Chapter 12 Reasoning about Social Choice and Games in Monadic Fixed-Point Logic
Introduction
The Improvement Graph Structure
Monadic Fixed-Point Logic With Counting
Model Checking Algorithm
Discussion
Acknowledgements
References
Chapter 13 Formal Analysis of 2D Image Processing Filters using Higher-order Logic Theorem Proving
Abstract
Introduction
Contributions of the Paper
Preliminaries
Methods
Results
Discussions
Conclusions
Acknowledgements
References
Chapter 14 GRAN3SAT: Creating Flexible Higher-Order Logic Satisfiability in the Discrete Hopfield Neural Network
Abstract
Introduction
G-Type Random K Satisfiability
Gran3sat in the Discrete Hopfield Neural Network
Experimental Setup
Results and Discussion
Conclusions
Author Contributions
Acknowledgments
References
Chapter 15 Design of a Computable Approximate Reasoning Logic System for AI
Abstract
Introduction
Mathematical Logic System Based on Precise Reasoning
Irrationality of a Fuzzy Logic System
Preliminary Knowledge of a Regression Logic Route in Fuzziness Research
Redundancy Theory: Computable Logic, Approximate Reasoning Logic
Generalized Dynamic Logic System Characterized By Machine Learning
Conclusions
Author Contributions
References
Chapter 16 On the Possibility of Correct Concept Learning in Description Logics
Abstract
Introduction
Notation and Semantics of Description Logics
Concept Normalization
A Concept Learning Algorithm
C-Learnability in Description Logics
On Concept Learning Using Queries
Concluding Remarks
Acknowledgements
References
Index
Back Cover

Citation preview

Use of Abstraction and Logic in Mathematics

Use of Abstraction and Logic in Mathematics

Edited by: Olga Moreira

ARCLER

P

r

e

s

s

www.arclerpress.com

Use of Abstraction and Logic in Mathematics Olga Moreira

Arcler Press 224 Shoreacres Road Burlington, ON L7L 2H2 Canada www.arclerpress.com Email: [email protected]

e-book Edition 2023 ISBN: 978-1-77469-700-9 (e-book) This book contains information obtained from highly regarded resources. Reprinted material sources are indicated. Copyright for individual articles remains with the authors as indicated and published under Creative Commons License. A Wide variety of references are listed. Reasonable efforts have been made to publish reliable data and views articulated in the chapters are those of the individual contributors, and not necessarily those of the editors or publishers. Editors or publishers are not responsible for the accuracy of the information in the published chapters or consequences of their use. The publisher assumes no responsibility for any damage or grievance to the persons or property arising out of the use of any materials, instructions, methods or thoughts in the book. The editors and the publisher have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission has not been obtained. If any copyright holder has not been acknowledged, please write to us so we may rectify. Notice: Registered trademark of products or corporate names are used only for explanation and identification without intent of infringement. © 2023 Arcler Press ISBN: 978-1-77469-500-5 (Hardcover)

Arcler Press publishes wide variety of books and eBooks. For more information about Arcler Press and its products, visit our website at www.arclerpress.com

DECLARATION Some content or chapters in this book are open access copyright free published research work, which is published under Creative Commons License and are indicated with the citation. We are thankful to the publishers and authors of the content and chapters as without them this book wouldn’t have been possible.

ABOUT THE EDITOR

Olga Moreira is a Ph.D. and M.Sc. in Astrophysics and B.Sc. in Physics/Applied Mathematics (Astronomy). She is an experienced technical writer and data analyst. As a graduate student, she held two research grants to carry out her work in Astrophysics at two of the most renowned European institutions in the fields of Astrophysics and Space Science (the European Space Agency, and the European Southern Observatory). She is currently an independent scientist, peer-reviewer and editor. Her research interest is solar physics, machine learning and artificial neural networks.

TABLE OF CONTENTS

List of Contributors .......................................................................................xv List of Abbreviations .................................................................................... xix Preface.................................................................................................... ....xxi

Chapter 1

Classical Logic and Quantum Logic with Multiple and Common Lattice Models ........................................................................... 1 Abstract ..................................................................................................... 1 Introduction: Is Logic Empirical? ................................................................ 2 Kinds of Logic ............................................................................................ 4 Lattices ...................................................................................................... 7 Soundness and Completeness .................................................................. 13 Discussion ............................................................................................... 23 Acknowledgments ................................................................................... 25 References ............................................................................................... 26

Chapter 2

A Novel Categorical Approach to Semantics of Relational First-Order Logic ..................................................................................... 31 Abstract ................................................................................................... 31 Introduction ............................................................................................. 32 A Relational First-Order Logic .................................................................. 35 Category Theory....................................................................................... 42 A Categorical Semantics .......................................................................... 49 An Implementation of the Categorical Semantics ..................................... 59 Conclusions ............................................................................................. 65 Author Contributions ............................................................................... 65 Funding ................................................................................................... 66 References ............................................................................................... 67

Chapter 3

Infinitary Classical Logic: Recursive Equations and Interactive Semantics .............................................................................. 71 Introduction ............................................................................................. 72 Preliminaries: Positions and Labeled Trees ............................................... 75 Infinitary Classical Logic .......................................................................... 77 Interactive Seantics .................................................................................. 88 References ............................................................................................... 97

Chapter 4

Formalization of Linear Space Theory in the Higher-Order Logic Proving System .............................................................................. 99 Abstract ................................................................................................... 99 Introduction ........................................................................................... 100 Preliminaries in Hol ............................................................................... 100 Formalization of Linear Space Theory in Hol4........................................ 101 Conclusion ............................................................................................ 108 Acknowledgment ................................................................................... 108 References ............................................................................................. 109

Chapter 5

Language and Proofs for Higher-Order SMT (Work in Progress) ........... 111 Introduction ........................................................................................... 112 A Syntax Extension for the Smt-Lib Language ......................................... 113 An Extension for the Verit Proof Format .................................................. 115 Conclusion and Future Work ................................................................. 119 Acknowledgment ................................................................................... 120 References ............................................................................................. 121

Chapter 6

Alternation Is Strict For Higher-Order Modal Fixpoint Logic ................ 123 Introduction ........................................................................................... 123 Alternating Parity Krivine Automata........................................................ 126 APKA and HFL....................................................................................... 134 The Alternation Hierarchy for Alternating Parity Krivine Automata .......... 137 Discussion ............................................................................................. 144 Acknowledgements ............................................................................... 145 References ............................................................................................. 146

x

Chapter 7

Bisimulation in Inquisitive Modal Logic ................................................ 149 Introduction ........................................................................................... 150 Inquisitive Modal Logic ......................................................................... 152 Inquisitive Bisimulation ......................................................................... 157 An Ehrenfeucht–Fra¨Isse Theorem .......................................................... 158 Relational Inquisitive Models ................................................................. 160 The ∼-Invariant Fragment of FO ............................................................. 163

Conclusion ............................................................................................ 167 References ............................................................................................. 168 Chapter 8

Graphical Sequent Calculi for Modal Logics ......................................... 171 Introduction ........................................................................................... 172 The Syntax of Modal Graphs .................................................................. 173 The Graphical Calculi Kg........................................................................ 175 Extensions.............................................................................................. 179 Graphical and Sequent Calculi .............................................................. 181 Conclusion ............................................................................................ 185 Acknowledgements ............................................................................... 186 References ............................................................................................. 187

Chapter 9

Categorical Abstract Algebraic Logic: Meet-Combination of Logical Systems ................................................................................. 191 Abstract ................................................................................................. 191 Introduction ........................................................................................... 192 Basic Framework ................................................................................... 195 Meet-Combinations ............................................................................... 197 Soundness ............................................................................................. 200 c-Completeness ..................................................................................... 203 Conservativeness and Consistency ......................................................... 206 Examples from Classical Propositional Logic.......................................... 207 References ............................................................................................. 210

Chapter 10 Fuzzy Logic versus Classical Logic: An Example in Multiplicative Ideal Theory ................................................................... 213 Abstract ................................................................................................. 213 Introduction ........................................................................................... 213 Preliminaries and Notations ................................................................... 214 xi

Fuzzy Logic versus Classical Logic: An Example .................................... 217 References ............................................................................................. 221 Chapter 11 Link Prediction Using A Probabilistic Description Logic ....................... 223 Abstract ................................................................................................. 223 Introduction ........................................................................................... 224 Background ........................................................................................... 225 Link Prediction with

................................................................ 231

Experiments ........................................................................................... 233 Conclusion ............................................................................................ 242 Acknowledgments ................................................................................. 243 References ............................................................................................. 244 Chapter 12 Reasoning about Social Choice and Games in Monadic Fixed-Point Logic................................................................................... 247 Introduction ........................................................................................... 248 The Improvement Graph Structure ......................................................... 251 Monadic Fixed-Point Logic With Counting............................................. 256 Model Checking Algorithm .................................................................... 261 Discussion ............................................................................................. 265 Acknowledgements ............................................................................... 266 References ............................................................................................. 267 Chapter 13 Formal Analysis of 2D Image Processing Filters using Higher-order Logic Theorem Proving .................................................... 271 Abstract ................................................................................................. 271 Introduction ........................................................................................... 272 Contributions of the Paper ..................................................................... 273 Preliminaries.......................................................................................... 274 Methods ................................................................................................ 277 Results ................................................................................................... 277 Discussions............................................................................................ 290 Conclusions ........................................................................................... 291 Acknowledgements ............................................................................... 292 References ............................................................................................. 293

xii

Chapter 14 GRAN3SAT: Creating Flexible Higher-Order Logic Satisfiability in the Discrete Hopfield Neural Network ............................................. 295 Abstract ................................................................................................. 295 Introduction ........................................................................................... 296 G-Type Random K Satisfiability.............................................................. 300 Gran3sat in the Discrete Hopfield Neural Network ................................ 302 Experimental Setup ................................................................................ 305 Results and Discussion .......................................................................... 311 Conclusions ........................................................................................... 330 Author Contributions ............................................................................. 330 Acknowledgments ................................................................................. 330 References ............................................................................................. 331 Chapter 15 Design of a Computable Approximate Reasoning Logic System for AI .. 335 Abstract ................................................................................................. 335 Introduction ........................................................................................... 336 Mathematical Logic System Based on Precise Reasoning ....................... 338 Irrationality of a Fuzzy Logic System ...................................................... 339 Preliminary Knowledge of a Regression Logic Route in Fuzziness Research ...................................................................... 342 Redundancy Theory: Computable Logic, Approximate Reasoning Logic ............................................................................................ 348 Generalized Dynamic Logic System Characterized By Machine Learning ....................................................................................... 354 Conclusions ........................................................................................... 358 Author Contributions ............................................................................. 358 References ............................................................................................. 359 Chapter 16 On the Possibility of Correct Concept Learning in Description Logics................................................................................. 363 Abstract ................................................................................................. 363 Introduction ........................................................................................... 364 Notation and Semantics of Description Logics ....................................... 369 Concept Normalization ......................................................................... 372 A Concept Learning Algorithm ............................................................... 382 C-Learnability in Description Logics ...................................................... 386 On Concept Learning Using Queries ..................................................... 387 xiii

Concluding Remarks.............................................................................. 389 Acknowledgements ............................................................................... 390 References ............................................................................................. 391 Index ..................................................................................................... 395

LIST OF CONTRIBUTORS Mladen Pavičić Department of Physics-Nanooptics, Faculty of Mathematics and Natural Sciences, Humboldt University of Berlin, Berlin, Germany Center of Excellence for Advanced Materials and Sensing Devices (CEMS), Photonics and Quantum Optics Unit, Ruđer Bošković Institute, Zagreb, Croatia Wolfgang Schreiner Research Institute for Symbolic Computation (RISC), Johannes Kepler University, Altenbergerstraße 69, A-4040 Linz, Austria William Steingartner Faculty of Electrical Engineering and Informatics, Technical University of Košice, Letná 9, 042 00 Košice, Slovakia Valerie Novitzká Faculty of Electrical Engineering and Informatics, Technical University of Košice, Letná 9, 042 00 Košice, Slovakia Michele Basaldella Universite d’Aix–Marseille, CNRS, I2M, Marseille, France Jie Zhang College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China Danwen Mao College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China Yong Guan College of Information Engineering, Capital Normal University, Beijing 100048, China Haniel Barbosa University of Lorraine, CNRS, Inria, and LORIA, Nancy, France

Jasmin Christian Blanchette University of Lorraine, CNRS, Inria, and LORIA, Nancy, France Vrije Universiteit Amsterdam, Amsterdam, The Netherlands Max-Planck-Institut f¨ur Informatik, Saarbr¨ucken, Germany Simon Cruanes University of Lorraine, CNRS, Inria, and LORIA, Nancy, France Daniel El Ouraoui University of Lorraine, CNRS, Inria, and LORIA, Nancy, France Pascal Fontaine University of Lorraine, CNRS, Inria, and LORIA, Nancy, France Florian Bruse Universit¨at Kassel Kassel, Germany Ivano Ciardelli Institute for Logic, Language, and Computation University of Amsterdam Martin Otto Department of Mathematics, Logic Group Technische Universit¨at Darmstadt Minghui Ma Institute of Logic and Cognition, Sun Yat-Sen University, Guangzhou, China Ahti-Veikko Pietarinen Chair of Philosophy, Tallinn University of Technology, Tallinn, Estonia George Voutsadakis School of Mathematics and Computer Science, Lake Superior State University, Sault Sainte Marie, MI 49783, USA Olivier A. Heubo-Kwegna Department of Mathematical Sciences, Saginaw Valley State University, 7400 Bay Road, University Center, MI 48710-0001, USA José Eduardo Ochoa Luna Escola Politécnica, Universidade de São Paulo, Av. Prof. Mello Morais 2231, São Paulo, SP, Brazil Kate Revoredo Departamento de Informática Aplicada, Unirio, Av. Pasteur, 458, Rio de Janeiro, RJ, Brazil xvi

Fabio Gagliardi Cozman Escola Politécnica, Universidade de São Paulo, Av. Prof. Mello Morais 2231, São Paulo, SP, Brazil Ramit Das IMSc (HBNI), Chennai, India R. Ramanujam IMSc (HBNI), Chennai, India Sunil Simon Department of CSE IIT Kanpur, Kanpur, India Adnan Rashid School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), Islamabad, Pakistan Sa’ed Abed Computer Engineering Department, College of Engineering and Petroleum, Kuwait University, Kuwait City, Kuwait Osman Hasan School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), Islamabad, Pakistan Yuan Gao School of Mathematical Sciences, Universiti Sains Malaysia, Penang 11800, Malaysia School of Medical Information Engineering, Chengdu University of Traditional Chinese Medicine, Chengdu 610000, China Yueling Guo School of Mathematical Sciences, Universiti Sains Malaysia, Penang 11800, Malaysia School of Science, Hunan Institute of Technology, Hengyang 421002, China Nurul Atiqah Romli School of Mathematical Sciences, Universiti Sains Malaysia, Penang 11800, Malaysia Mohd Shareduwan Mohd Kasihmuddin School of Mathematical Sciences, Universiti Sains Malaysia, Penang 11800, Malaysia Weixiang Chen School of Medical Information Engineering, Chengdu University of Traditional Chinese Medicine, Chengdu 610000, China

xvii

Mohd. Asyraf Mansor School of Distance Education, Universiti Sains Malaysia, Penang 11800, Malaysia Ju Chen School of Mathematical Sciences, Universiti Sains Malaysia, Penang 11800, Malaysia School of Medical Information Engineering, Chengdu University of Traditional Chinese Medicine, Chengdu 610000, China Kaidi LiuORCID Institute of Uncertainty Information, Hebei University of Engineering, Handan 056038, China Yancang Li Institute of Uncertainty Information, Hebei University of Engineering, Handan 056038, China Rong Cui Institute of Uncertainty Information, Hebei University of Engineering, Handan 056038, China Ali Rezaei Divroodi Education Department, Enghelab 2, 46731-83354 Noshahr, Iran Quang-Thuy Ha Faculty of Information Technology, VNU University of Engineering and Technology, 144 Xuan Thuy, Hanoi, Vietnam Linh Anh Nguyen Division of Knowledge and System Engineering for ICT, Ton Duc Thang University, No. 19, Nguyen Huu Tho Street, Tan Phong Ward, District 7, Ho Chi Minh City, Vietnam Institute of Informatics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland Hung Son Nguyen Division of Knowledge and System Engineering for ICT, Ton Duc Thang University, No. 19, Nguyen Huu Tho Street, Tan Phong Ward, District 7, Ho Chi Minh City, Vietnam

xviii

LIST OF ABBREVIATIONS APKA

Alternating Parity Krivine Automata

ATL

Alternating temporal logic

AI

Artificial Intelligence

ANN

Artificial Neural Networks

CCTV

Closed-circuit television

DLs

Description logics

DHN

Discrete Hopfield Network

DHNN

Discrete Hopfield Neural Network

ES

Exhaustive Search Algorithm

FBRP

Finite best response property

FIP

Finite improvement property

FOL

First-order logic

FR

Fuzzy reasoning

GRANkSAT

G-Type Random 3 satisfiability

HO

Higher-order

HFL

Higher-Order Modal Fixpoint Logic

HNN

Hopfield Neural Network

HTAF

Hyperbolic Tangent Activation Function

IIR

Infinite impulse response

IEL

Inquisitive epistemic logic

LCCDE

Linear constant coefficient difference equation

MAPE

mean absolute percent error

MLFP

Monadic least fixed point logic

NN

Neural networks

OML

Orthomodular lattice

RBFNN

Radial Basis Function Neural Network

ROC

Region of convergence

RMSE

Root mean square error

SMT

Satisfiability modulo theories

TFIDF

Term Frequency −− Inverse Document Frequency

PREFACE

Abstraction and logic are the core foundation of mathematics. One of the most important aspects of mathematics is the formulation and proving of theorems. Logic as a foundation of mathematics provides a language for the formulation of theorems and for constructing mathematical proofs. There are many different logics, they differ in how mathematics can be expressed in them. In propositional logic, for instance, the fundamental logical units are declarative statements (prepositions) that can be either true or false values. This is the simplest and oldest form of mathematical logic that was developed to deal with relations between propositions, and allows the construction of compound propositions by introducing logical connectives such as conjunction (“and”), disjunction (“or”), negation (“not”), and conditionals (“if”, “else”). The first-order logic, in addition to those, also covers predicates and quantification. It uses variables as well as quantifiers and can deal with non-logical objects. Higher-order logic extends the capabilities of first-order logic, by having stronger semantics. It features higher-order predicates, and unlike the first-order, allows the definition of predicate quantifiers and/ or function quantifiers. Logic systems constitute a powerful method for representing knowledge and formalizing natural language into a computable format. They can be implemented via software and hardware. They are the basis of computer automation. We can now program computers to assist us in solving mathematical and scientific problems. The implementation and development of logical systems have allowed important breakthroughs in many scientific fields and engineering. It paved the way for the emergence of data science, machine learning, artificial intelligence, and more. Stronger logic systems have been developed, the infinitary, modal and quantum logical systems are a few examples of such. This book is focused on the description and implementation of classical and non-classical logical systems, as well as their applications. The first part of the book (chapters 1 to 9) is devoted to describing the implementation and construction of several classical and non-classical logic systems (first-order, higher-order, quantum, Infinitary, fixed-point, and modal). Chapter 1 is described the classical and quantum logic systems and shows that both can be used to construct sound and complete meta-structures for dealing with different algebras, and lattices, as their models. Chapter 2 presents a categorical formalization of a variant of first-order logic. Chapter 3 presents an interactive semantics method for derivations in an infinitary logic system. Chapters 4 and 5 are focused on high-order logic. Chapter 4 presents

an implementation of a higher-order logic system (HOL4) for formalizing the linear space theory, while chapter 5 presents the development of an extension of the SMTLIB language for handling higher-order logic constructs. Chapters 6 to 8 are devoted to fixed-point and modal logic systems. Chapter 6 provides operational semantics to Higher-Order Modal Fixpoint Logic (HFL) in the context of extending parity automata. Chapter 7 presents an inquisitive modal logic system, INQML. Chapter 8 presents a reformulation of graphical calculi for the construction of modal logics. The second part of the book (chapters 9 to 10) is devoted to comparing classical logic to algebraic and fuzzy logic systems. Chapter 9 explores a novel method of combining logics, called meet-combination, and its role in the construction of a categorical abstract algebraic logic system. Chapter 10 includes an example that shows how a classical argument can fail to work when switching from classical logic to fuzzy logic. The third part of the book (11 to 16) includes several applications of abstract logical systems in the fields of data science, social networks analysis, neural networks, automata, and artificial intelligence. Chapter 11 proposes a few algorithms for link prediction that combines graph-based and ontological information through the use of probabilistic description logics. Chapter 12 proposes a method for counting user votes based on monadic fixed-point, first-order logic systems. Chapter 13 proposes the use of a higher-order logic theorem proving system for formally analyzing 2-D image processing filters. Chapter 14 presents GTAN3SAT, which capitalizes both higher-order systematic and non-systematic logical rules in Discrete Hopfield Neural Networks (DHNNs). Chapter 15 propose a generalized dynamic logic system characterized by machine learning for building an AI. Chapter 16 shows that description logic can be used for concept learning in an information system.

Chapter

CLASSICAL LOGIC AND QUANTUM LOGIC WITH MULTIPLE AND COMMON LATTICE MODELS

1

Mladen Pavičić1,2 Department of Physics-Nanooptics, Faculty of Mathematics and Natural Sciences, Humboldt University of Berlin, Berlin, Germany 2 Center of Excellence for Advanced Materials and Sensing Devices (CEMS), Photonics and Quantum Optics Unit, Ruđer Bošković Institute, Zagreb, Croatia 1

ABSTRACT We consider a proper propositional quantum logic and show that it has multiple disjoint lattice models, only one of which is an orthomodular lattice (algebra) underlying Hilbert (quantum) space. We give an equivalent proof for the classical logic which turns out to have disjoint distributive and nondistributive ortholattices. In particular, we prove that both classical logic and quantum logic are sound and complete with respect to each of these lattices. We also show that there is one common nonorthomodular lattice that is a model of both quantum and classical logic. In technical terms, that Citation: (APA): Pavičić, M. (2016). Classical logic and quantum logic with multiple and common lattice models. Advances in Mathematical Physics, 2016. (13 pages). Copyright: © Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/).

2

Use of Abstraction and Logic in Mathematics

enables us to run the same classical logic on both a digital (standard, twosubset, 0-1-bit) computer and a nondigital (say, a six-subset) computer (with appropriate chips and circuits). With quantum logic, the same six-element common lattice can serve us as a benchmark for an efficient evaluation of equations of bigger lattice models or theorems of the logic.

INTRODUCTION: IS LOGIC EMPIRICAL? In his seminal paper “Is Logic Empirical?” [1], Putnam argues that logic we make use of to handle the statements and propositions of the theories we employ to describe the world around us is uniquely determined by it. “Logic is empirical. It makes ... sense to speak of ‘physical logic.’ We live in a world with a nonclassical logic [of subspaces of the quantum Hilbert space which form an orthomodular (non-distributive, non-Boolean) lattice]. Certain statements—just the ones we encounter in daily life—do obey classical logic, but this is so because the corresponding subspaces of form a Boolean lattice” [1, Ch. V]. We see that Putnam, in effect, reduces the logic to lattices, while they should only be their models. “[We] just read the logic off from the Hilbert space ” [1, Ch. III]. This technical approach has often been adopted in both classical and quantum logic. In classical logic, it has been known as two-valued interpretation for more than a century. In quantum logic, it has been introduced by Birkhoff and von Neumann in 1936 [2] and it is still embraced by many authors [3]. Subsequently, varieties of relational logic formulations, which closely follow lattice ordering relations, have been developed, for example, by Dishkant [4], Goldblatt [5], Chiara [6], Nishimura [7, 8], Mittelstaedt [9], Stachow [10], and Pták and Pulmannová [11]. More recently, Engesser and Gabbay [12] made related usage of nonmonotonic consequence relation, Rawling and Selesnick [13] of binary sequent, Herbut [14] of state-dependent implication of lattice of projectors in the Hilbert space, Tylec and Kuś [15] of partially ordered set (poset) map, and Bikchentaev et al. [16] of poset binary relation. Another version of Birkhoff-von-Neumann style of viewing propositions as projections in Hilbert space rather than closed subspaces and their lattices as in the original Birkhoff-von-Neumann paper has been introduced by Engesser et al. [17]. Recently, other versions of quantum logic have been developed, such as a dynamic quantum logic by Baltag and Smets [18, 19], exogenous quantum propositional logic by Mateus and Sernadas

Classical Logic and Quantum Logic with Multiple and Common Lattice...

3

[20], a categorical quantum logic by Abramsky and Duncan [21, 22], and a projection orthoalgebraic approach to quantum logic by Harding [23]. However, we are interested in nonrelational kinds of logic which combine propositions according to a set of true formulas/axioms and rules imposed on them. The propositions correspond to statements from a theory, say classical or quantum mechanics, and are not directly linked to particular measurement values. Such kinds of logic employ models which evaluate a particular combination of propositions and tell us whether it is true or not. Evaluation means mapping from a set of logic propositions to an algebra, for example, a lattice, through which a correspondence with measurement values emerges, but indirectly. Therefore we shall consider a classical and a quantum logic defined as a set of axioms whose Lindenbaum-Tarski algebras of equivalence classes of expressions from appropriate lattices correspond to the models of the logic. Let us call such a logic an axiomatic logic. An axiomatic logic is a language consisting of propositions and a set of conditions and rules imposed on them called axioms and rules of inference. We shall consider classical and quantum axiomatic logic. We show that an axiomatic logic is wider than its relational logic variety in the sense of having many possible models and not only distributive ortholattice (Boolean algebra) for the classical logic and not only orthomodular lattice for the quantum logic. We shall make use of the PM classical logical system—Whitehead and Russell’s Principia Mathematica axiomatization in Hilbert and Ackermann’s presentation [24] in the schemata form and of Kalmbach’s axiomatic quantum logic [25, 26] (slightly modified by Pavičić and Megill [27, 28]—original Kalmbach axioms A1, A11, and A15 are dropped because they were proven redundant in [29]), as typical examples of axiomatic logic. It is well-known that there are many interpretations of the classical logic, for example, two-valued, general Boolean algebra (distributive ortholattice) and set-valued ones [30, Ch. 8, 9]. These different interpretations are tantamount to different models of the classical logic and in this paper and several previous papers of ours we show that they are enabled by different definitions of the relation of equivalence for its different Lindenbaum-Tarski algebras. One model of the classical logic is a distributive numerically valued, mostly two-valued, lattice, while the others are nondistributive nonorthomodular lattices, one of them being the so-called O6 lattice, which can also be given set-valuations [30, Ch. 8, 9].

4

Use of Abstraction and Logic in Mathematics

As for quantum logic, one of its models is an orthomodular lattice, while others are nonorthomodular lattices, one of them being again O6—the common model of both kinds of logic. Within a logic we establish a unique deduction of all logic theorems from valid algebraic equations in a model and vice versa by proving the soundness and completeness of logic with respect to a chosen model. That means that we can infer the distributivity or orthomodularity in one model and disprove them in another by means of the same set of logical axioms and theorems. We can also consider O6 in which both the distributivity and orthomodularity fail; however, particular nondistributive and nonorthomodular conditions pass O6 only to map into the distributivity and orthomodularity through classical and quantum logic in other models of these kinds of logic. We see that logic is at least not uniquely empirical since it can simultaneously describe distinct realities. The paper is organised as follows. In Section 2 we define classical and quantum logic. In Section 3 we introduce distributive (ortho)lattices and orthomodular lattices as well as two nondistributive (one is O6) and four nonorthomodular ones (one is again O6), all of which are our models for classical and quantum logic, respectively. In Section 4, we prove soundness and completeness of classical and quantum logic with respect to the models introduced in Section 3. In Section 5, we discuss the obtained results.

KINDS OF LOGIC In our axiomatic logic defined as follows.

the propositions are well-formed formulae (wffs),

We denote elementary, or primitive, propositions by 𝑝0, 𝑝1, 𝑝2,...; we have the following primitive connectives: ¬ (negation) and ∨ (disjunction). 𝑝𝑗 is a wff for 𝑗 = 0, 1, 2, . . .; ¬𝐴 is a wff if 𝐴 is a wff; 𝐴 ∨ 𝐵 is a wff if 𝐴 and 𝐵 are wffs. Operations are defined as follows.

Definition 1 (conjunction). One has

(1) Definition 2 (classical implication). One has (2)

Classical Logic and Quantum Logic with Multiple and Common Lattice...

5

Definition 3 (Kalmbach’s implication). One has (3)

Definition 4 (quantum equivalence). One has (4) Definition 5 (classical Boolean equivalence). One has (5)

Connectives bind from weakest to strongest in the order →, ≡, ∨, ∧, ¬. Let

be the set of all propositions, that is, of all wffs. wffs containing

∨ and ¬ within logic are used to build an algebra . In , a set of axioms and rules of inference are imposed on . From a set of axioms by means of rules of inference, we get other expressions which we call theorems. Axioms themselves are also theorems. A special symbol ⊢ is used to denote the set of theorems. Hence 𝐴 ∈ ⊢ iff 𝐴 is a theorem. The statement 𝐴 ∈ ⊢ is usually written as ⊢ 𝐴. We read this as follows: “𝐴 is provable,” meaning that if 𝐴 is a theorem, then there is a proof of it. We present the axiom systems of our propositional logic in the schemata form (so that we dispense with the rule of substitution). Definition 6. For one says that 𝐴 is derivable from Γ and writes or just Γ ⊢ 𝐴 if there is a finite sequence of formulae, the last of which is 𝐴, and each of which is either one of the axioms of or is a member of Γ or is obtained from its precursors with the help of a rule of inference of the logic.

Classical Logic In the classical logic , the sign will denote provability from the axioms and the rule of , but we shall omit the subscript when it is obvious from context as, for example, in the following axioms and the rule of inference that define .

6

Use of Abstraction and Logic in Mathematics

Axioms

(6)

Rule of Inference (Modus Ponens) (7) We assume that the only legitimate way of inferring theorems in is by means of these axioms and the Modus Ponens rule. We make no assumption about valuations of the primitive propositions from which wffs are built but instead are interested in wffs that are valid, that is, true in all possible valuations of the underlying models. Soundness and completeness will show that those theorems that can be inferred from the axioms and the rule are exactly those that are valid.

Quantum Logic Quantum logic is defined as a language consisting of propositions and connectives (operations) as introduced above and the following axioms and a rule of inference. We will use to denote provability from the axioms and the rule of and omit the subscript when it is obvious from the context, for example, in the list of axioms and the rule of inference that follow.

Axioms (8) (9) (10) (11)

Classical Logic and Quantum Logic with Multiple and Common Lattice...

7

(12) (13) (14) (15) (16) (17) (18) (19)

Rule of Inference (Modus Ponens) (20) Soundness and completeness will show that those theorems that can be inferred from the axioms and the rule of inference are exactly those that are valid.

LATTICES For the presentation of the main result it would be pointless and definitely unnecessarily complicated to work with the full-fledged models, that is, Hilbert space, and the new non-Hilbert models that would be equally complex. It would be equally too complicated to present complete quantum or classical logic of the second order with all the quantifiers. Instead, we shall deal with lattices and the propositional logic we introduced in Section 2. We start with a general lattice which contains all the other lattices we shall use later on. The lattice is called an ortholattice and we shall first briefly present how one arrives at it starting with Hilbert space.

8

Use of Abstraction and Logic in Mathematics

A Hilbert lattice is a kind of orthomodular lattice which we define below. In any Hilbert lattice the operation meet, 𝑎 ∩ 𝑏, corresponds to set intersection, of subspaces of the Hilbert space ; the ordering relation 𝑎 ≤ 𝑏 corresponds to ; the operation join, 𝑎 ∪ 𝑏, corresponds to the smallest closed subspace of H containing ; and , the set of vectors orthogonal to the orthocomplement 𝑎 ’ corresponds to all vectors in . Within the Hilbert space there is also an operation which has no parallel in the Hilbert lattice: the sum of two subspaces which is defined as the set of sums of vectors from and . We also have . One can define all the lattice operations on the Hilbert space itself following the above definitions ( , etc.). Thus we have [33, p. 175], where is the closure of , and therefore . When H is finite dimensional or when the closed subspaces H𝑎 and are [34, pp. 21–29], [25, orthogonal to each other then pp. 66, 67], and [9, pp. 8–16]. is given by (𝑥) = 𝑦 for vector 𝑥 from The projection associated with that has a unique decomposition 𝑥 = 𝑦 + 𝑧 for 𝑦 from and 𝑧 from . The closed subspace belonging to𝑃 is . Let 𝑃 𝑎 ∩ 𝑃 𝑏 denote a projection on a projection on , and 𝑃 𝑎 + 𝑃 𝑏 a projection on if , and let 𝑃 𝑎 ≤ 𝑃 𝑏 mean . Then 𝑎 ∩ 𝑏 corresponds to 𝑃 𝑎 ∩ 𝑃 𝑏 = lim𝑛→∞(𝑃 𝑎 𝑃 𝑏)𝑛 [9, p. 20], 𝑎 ’ to 𝐼−𝑃 𝑎 , 𝑎 ∪ 𝑏 to 𝑃 𝑎 ∪ 𝑃 𝑏 = 𝐼 − lim𝑛→∞[(𝐼 − 𝑃 𝑎 )(𝐼 − 𝑃 𝑏)]𝑛 [9, p. 21], and 𝑎 ≤ 𝑏 to 𝑃 𝑎 ≤ 𝑃 𝑏. 𝑎 ≤ b also corresponds to either 𝑃 𝑎 = 𝑃 𝑎 𝑃 𝑏 or 𝑃 𝑎 = 𝑃 𝑏𝑃 𝑎 or 𝑃 𝑎 − 𝑃 𝑏 = 𝑃 𝑎 ∩𝑏’. Two projectors commute iff their associated closed subspaces commute. This means that 𝑎 ∩ (𝑎 ’ ∪ 𝑏) ≤ 𝑏 corresponds to 𝑃 𝑎 𝑃 𝑏 = 𝑃 𝑏𝑃 𝑎 . In the latter case we have 𝑃 𝑎 ∩ 𝑃 𝑏 = 𝑃 𝑎 𝑃 𝑏 and 𝑃 𝑎 ∪ 𝑃 𝑏 = 𝑃 𝑎 + 𝑃 𝑏 − 𝑃 𝑎 𝑃 𝑏. 𝑎 ⊥ 𝑏; that is, 𝑃 𝑎 ⊥ 𝑃 𝑏 is characterised by 𝑃 𝑎 𝑃 𝑏 = 0 [33, pp. 173–176], [25, pp. 66, 67], [9, pp. 18–21], and [35, pp. 47–50].

as well as the corresponding projectors Closed subspaces 𝑃 𝑎 , 𝑃 𝑏,... form an algebra called the Hilbert lattice which is an ortholattice. The conditions of the following definition can be easily read off from the properties of the aforementioned Hilbert subspaces or projectors. Definition 7. An ortholattice, OL, is an algebra the following conditions are satisfied for any 𝑎 , 𝑏, 𝑐 ∈

such that [36]:

Classical Logic and Quantum Logic with Multiple and Common Lattice...

9

(21) , we define the

In addition, since 𝑎 ∪ 𝑎 ’ = 𝑏∪ 𝑏’ for any greatest and the least element of the lattice:

(22) and the ordering relation (≤) on the lattice:

(23) Definition 8 (Sasaki hook). One has (24) Definition 9 (quantum equivalence). One has (25) Definition 10 (classical equivalence). One has (26) Connectives bind from weakest to strongest in the order →, ≡, ∪ , ∩, and ‘.

10

Use of Abstraction and Logic in Mathematics

Definition 11 (Pavičić, [37]). An orthomodular lattice (OML) is an OL in which the following condition (orthomodularity) holds:

(27) Every Hilbert space (finite and infinite) and every phase space is orthomodular. Definition 12 (Pavičić, [38]). (The proof of the opposite claim in [37, Theorem 3.2] is wrong.) A distributive ortholattice (DL) (also called a Boolean algebra) is an OL in which the following condition (distributivity) holds:

(28) Every phase space is distributive and, of course, orthomodular since every distributive ortholattice is orthomodular. The opposite directions of metaimplications in (27) and (28) hold in any OL. Definition 13 (Pavičić and Megill, [27]). An OL in which either of the following conditions (weak orthomodularity) holds

(29)

(30) is called a weakly orthomodular ortholattice, WOML. Definition 14 (Pavičić, this paper). A WOML in which the following condition holds (31) is called a WOML1.

Classical Logic and Quantum Logic with Multiple and Common Lattice...

11

Definition 15 (Pavičić, this paper). A WOML1 in which the following condition holds (32) is called a WOML2. Definition 16 (Pavičić, this paper). A WOML in which neither (27), (32), nor (31) hold is called a WOML∗.

Definition 17 (Pavičić and Megill, [27, 39]). An OL in which the following condition (commensurability) holds (33)

is called a weakly distributive ortholattice, WDL. Definition 18 (Pavičić and Megill, [27]). A WOML in which the following condition (weak distributivity) holds (34) is called a weakly distributive ortholattice, WDL. Definitions 17 and 18 are equivalent. We give both definitions here in order to, on the one hand, stress that a WDL is a lattice in which all variables are commensurable and, on the other, to show that in WDL the distributivity holds only in its weak form given by (34) which we will use later on. Definition 19 (Pavičić and Megill, [39]). A WDL in which (28) does not hold is called a WDL∗.

Any finite lattice can be represented by a Hasse diagram that consists of points (vertices) and lines (edges). Each point represents an element in a lattice, and positioning element 𝑎 above element 𝑏 and connecting them by a line means 𝑎 ≤ 𝑏. For example, in Figure 1(a) we have 0 ≤ 𝑥 ≤ 𝑦 ≤ 1. We also see that in this lattice, for example, 𝑥 does not have a relation with either 𝑥’ or 𝑦’.

12

Use of Abstraction and Logic in Mathematics

Figure 1. (a) O6; (b) O7 (Beran, Figure 7b [31]); (c) O8 (Rose–Wilkinson-1 [32]).

The statement “orthomodularity (27) does not hold in WOML∗” reads ∼ [(∀𝑎 , 𝑏 ∈ WOML∗)((𝑎 ≡ 𝑏 = 1) ⇒ (𝑎 = 𝑏))] which can be written as (∃𝑎 , 𝑏 ∈ WOML∗)(𝑎 ≡ 𝑏 = 1 & 𝑎 ≠ 𝑏), where “∼” is a metanegation and “&” a metaconjunction. An example of a WOML∗ is O6 from Figure 1(a) and we can easily check the statement on it. O6 is also an example of a WDL∗ and we can verify the statement “distributivity (28) does not hold in WDL∗” on it, as well. Similarly, “condition (32) does not hold in WOML∗” can be written as (∃𝑎 , 𝑏 ∈ WOML∗)(((𝑎 ≡ 𝑏)’ → 1 𝑎 ’) ≠ (𝑎 → 1 𝑏)). Definition 20 (Pavičić, this paper). A WOML1 in which neither (27) nor (32) hold is called a WOML1∗. An example of a WOML1∗ is O7 from Figure 1(b).

Definition 21 (Pavičić, this paper). A WOML2 in which (27) does not hold is called a WOML2∗. An example of a WOML2∗ is O8 from Figure 1(c).

Lemma 22. OML is properly included in (i.e., it is stronger than) WOML2, WOML2 is properly included in WOML1, and WOML1 is properly included in WOML. Proof. Equation (29) passes O6, O7, and O8 from Figure 1. Equation (31) passes O7 and O8 but fails in O6. Equation (32) passes O8 but fails in both O6 and O7. Equation (27) fails in O6, O7, and O8. To find the failures and passes we used our program lattice [40]. Lemma 23. OML is included in neither WOML2∗, WOML1∗, nor WOML∗. WOML2∗ is included in neither WOML1∗ nor WOML∗. WOML1∗ is not included in WOML∗.

Classical Logic and Quantum Logic with Multiple and Common Lattice...

13

Proof. The proof follows straightforwardly from the proof of Lemma 22 and the definitions of WOML∗, WOML1∗, WOML2∗, and OML.

According to Definitions 16, 20, 21, and 19, of WOML∗, WOML1∗, WOML2∗, and WDL∗, respectively, these lattices denote set-theoretical differences and that is going to play a crucial role in our proof of completeness in Section 4.2 in contrast to [27] where we considered only WOML without excluding the orthomodular equation. In Section 4.2 we shall come back to this decisive difference between the two approaches. Note that the setdifferences are not equational varieties. For instance, WOML2∗ is a WOML2 in which the orthomodularity condition does not hold, but we cannot obtain WOML2∗ from WOML2 by adding new equational conditions to those defining WOML. Instead, WOML2∗ can be viewed as a set of lattices in all of which the orthomodularity condition is violated. Remarks on Implications. As we could see above, the implications do not play any decisive role in the definition of lattices, especially not in the definitions of OML and DL where they do not appear at all, and they also do not play a decisive role in the definition of logic. A few decades ago that was a major issue, though: “I would argue that a ‘logic’ without an implication ... is radically incomplete, and indeed, hardly qualifies as a theory of deduction” (Jay Zeman) [41]. So, an extensive search was undertaken in the seventies and eighties to single proper implications from possible ones [42–44]. Apart from →1 and →3 it turns out [25] that one can also define (classical),

(Dishkant),

(non-

tollens), and (relevance). In 1987 Pavici ˇ c´ [45] proved that an OL in which 𝑎 →𝑖 𝑏 = 1 ⇒ 𝑎 ≤ 𝑏, 𝑖 = 1, . . . , 5, holds is an OML. In 1987 Pavici ˇ c [45] also proved ´ that an OL in which 𝑎 →0 𝑏 = 1 ⇒ 𝑎 ≤ 𝑏 holds is a DL. Therefore 5 different but nevertheless equivalent relational kinds of logic could be obtained by linking lattice inequality to 5 implications. With our linking of a single equivalence to lattice equality this ambiguity is avoided and we obtain a uniquely defined axiomatic quantum logic. Note that we have 𝑎 ≡𝑞 𝑏 = (𝑎 →𝑖 𝑏) ∩ (𝑏 →𝑖 𝑎 ), 𝑖 = 1, . . . , 5, in every OML but not in every OL.

SOUNDNESS AND COMPLETENESS

We shall connect our types of logic with our lattices so as to show that the latter are the models of the former.

14

Use of Abstraction and Logic in Mathematics

Definition 24. One calls a model if 𝐿 is an algebra and called a valuation, is a morphism of formulae into 𝐿, preserving the operations ¬, ∨ while turning them into ‘, ∪ .

Whenever the base set 𝐿 of a model belongs to O6, WOML∗, WOML1∗, WOML2∗, OML, WDL∗, or DL we say (informally) that the model belongs to WOML∗, ..., DL. In particular, if we say “for all models in O6, WOML∗, ..., DL,” we mean for all base sets in O6, WOML∗, ..., DL and for all valuations on each base set. The term “model” may refer either to a specific pair ⟨𝐿, ℎ⟩ or to all such possible pairs with the base set 𝐿, depending on the context. Definition 25. One calls a formula

valid in the model

and

writes , if ℎ(𝐴) = 1 for all valuations ℎ on the model, that is, for all ℎ associated with the base set 𝐿 of the model. We call a formula a in the model and write if ℎ(𝑋) = 1 for consequence of all 𝑋 in Γ implies ℎ(𝐴) = 1, for all valuations ℎ.

Soundness

To prove soundness means to prove that all axioms as well as the rules of inference (and therefore all theorems) of hold in its models. The models of are O6, WOML∗, WOML1∗, WOML2∗, and OML and of are O6, WDL∗, and DL. With the exception of O6 which is a special case of both WOML∗ andWDL∗, they do not properly include each other For brevity, whenever we do not make it explicit, the notations will always be implicitly quantified over all models of the and appropriate type, in this section for all proper lattice models . Similarly, when we say “valid” without qualification, we will mean valid in all models of that type.

The following theorems show that if 𝐴 is a theorem of , then 𝐴 will be valid in O6 and any WOML∗, WOML1∗, WOML2∗, or OML model, and , then 𝐴 will be valid in O6 and any WDL∗ or DL if 𝐴 is a theorem of model. In [27, 28] we proved the soundness for WOML. Since that proof uses no additional conditions that hold in O6, WOML∗, ..., OML the proof given there for WOML is a proof of soundness for O6, WOML∗, WOML1∗, WOML2∗, and OML, as well. Also, in [27, 28] we proved the soundness for WDL. Since that proof uses no additional conditions that hold in O6, WDL∗, and DL, the proof given there for WDL is a proof of soundness for O6, WDL∗, and DL, as well. Hence, we can prove the soundness of quantum and

Classical Logic and Quantum Logic with Multiple and Common Lattice...

15

classical logic by means of WOML and WDL conditions without referring to condition (28), (27), (32), or (31), that is, to any condition in addition to those that hold in the WOML and WDL themselves. Theorem 26 (soundness of

). One has

(35) Proof. By Theorem 4.3 of [27] any WDL (in particular, O6, WDL∗, or . DL) is a model for Theorem 27 (soundness of

). One has

(36) Proof. By Theorem 3.10 of [27] any WOML (in particular, O6, WOML∗, WOML1∗, WOML2∗, or OML) is a model for .

and in Theorems 26 and 27 express the fact that axiomatic logic types and correspond to 𝑎 = ℎ(𝐴) = 1 in their lattice models, from O6 and WOML till WDL. That means that we do not arrive at equations of the form 𝑎 = 𝑏 and that starting from Γ ⊢ 𝐴 ≡𝑞 𝐵 we cannot arrive at 𝑎 = ℎ(𝐴) = 𝑏 = ℎ(𝐵) but only at 𝑎 ≡𝑞 𝑏 = 1. We can obtain a better understanding of this through the following properties of OML and DL. The equational theory of OML consists of equality conditions, (21) together with the orthomodular equality condition [28] (37) which is equivalent to the condition given by (27). We now map each of these OML equations, which are of the form 𝑡 = 𝑠, to the form 𝑡 ≡𝑞 𝑠 = 1. This is possible in any WOML since (38) holds in every OL [28] and (21) mapped to the form 𝑡 ≡𝑞 𝑠 = 1 also hold in any OL. Any equational proof in OML can then be simulated in WOML by replacing each axiom reference in the OML proof with its corresponding WOML mapping. Such mapped proof will make use of just a proper subset of the equations that hold in WOML.

16

Use of Abstraction and Logic in Mathematics

It follows that equations of the form 𝑡 ≡𝑞 𝑠 = 1, where 𝑡 and 𝑠 are such that 𝑡 = 𝑠 holds in OML, cannot determine OML when added to an OL since all such forms pass O6 and an OL is an OML if and only if it does not include a subalgebra isomorphic to O6 [35]. , the equational theory of distributive ortholattices can be As for simulated by a proper subset of the equational theory of WDLs since it consists of equality conditions equations (21) together with the distributivity equation (39) which is equivalent to condition (28). As with WOML above, we map these algebra conditions of the form 𝑡=𝑠 to the conditions of the form 𝑡 ≡𝑐 𝑠 = 1, which hold in any WDL since the weak distributivity condition given by (34) holds in any WDL. Any equational proof in a DL can then be simulated in WDL by replacing each condition in a DL proof with its corresponding WDL mapping. Such a mapped proof will use only a proper subset of the equations that hold in WDL. Therefore, no set of equations of the form 𝑡 ≡𝑐 𝑠 = 1, where 𝑡 = 𝑠 holds in DL, can determine a DL when added to an OL. Such equations hold in WDL and none of the WDL equations (21) and (39) is violated by O6 which itself violates the distributivity condition [28]. Similar reasoning applies to O6, WOML∗, WOML1, WOML1∗, WOML2, and WOML2∗ which are all WOMLs and to O6 and WDL∗ which are WDLs. Soundness applies to them all through WOML and WDL and which particular model we shall use for and is determined by a particular Lindenbaum-Tarski algebra which we use for the completeness proof in the next subsection.

Completeness Our main task in proving the soundness of and in the previous section was to show that all axioms as well as the rules of inference (and therefore all theorems) from and hold in any WOML. The task of proving the completeness of and is the opposite one: ∗ we have to impose the structures of O6, WDL , and DL and O6, WOML∗, of formulae of and WOML1∗, WOML2∗, and OML on the sets , respectively. But here, as opposed to the soundness proof, we shall have as many completeness proofs as there are models. The completeness proofs for O6, WOML∗, WOML1∗, and WOML2∗ can be inferred neither

Classical Logic and Quantum Logic with Multiple and Common Lattice...

17

from the proof for OML nor from the proofs for the other two. The same holds for O6, WDL∗, and DL.

We start with a relation of congruence, that is, a relation of equivalence compatible with the operations in and . We make use of an equivalence relation to establish a correspondence between formulae of and formulae of O6, WOML∗, WOML1∗, WOML2∗, and OML and and O6, WDL∗, and DL, respectively. The resulting equivalence classes stand for elements of these lattices and enable the completeness proof of and for them. Our definition of congruence involves a special set of valuations on and (shown in Figure 1) called O6, O7, and O8 and lattices defined as follows. Definition 28. Letting O𝑖, 𝑖 = 6, 7, 8, represent the lattices from Figure

1, one defines

as the set of all mappings and (𝐴 ∨ 𝐵) = 𝑜(𝐴) ∪ 𝑜𝑖(𝐵).

such that for

The purpose of is to let us refine the equivalence class used for the completeness proof, so that the Lindenbaum-Tarski algebras are O6, WOML∗, WOML1∗, and WOML2∗. This

is

accomplished

by

conjoining the term 𝑖 = 6, 7, 8, to the equivalence relation definition, meaning that for equivalence we require also that (whenever the valuations 𝑜𝑖 of the wffs in Γ are all 1) the valuations of wffs 𝐴 and 𝐵 map to the same point in the lattice O𝑖. Thus, for example, in O6 wffs 𝐴 ∨ 𝐵 and 𝐴 ∨ (¬𝐴 ∧ (𝐴 ∨ 𝐵)) become members of two separate equivalence classes, which by Theorem 39 amounts to nonorthomodularity of WOML. Without the conjoined term, these two wffs would belong to the same equivalence class. The point of doing this is to provide a completeness proof that is not in any way dependent on the orthomodular law and to show that completeness does not require that any of the underlying models be an OML. The equivalence classes so defined work for WOML1∗ and WOML2∗ will let (31) through but will not let through either the as well since will let neither the orthomodularity, (32), orthomodularity or (32), and nor (31) through. will also let us refine the equivalence class used for the completeness , so that the Lindenbaum-Tarski algebras are O6 and WDL∗. proof of

18

Use of Abstraction and Logic in Mathematics

To obtain OML and DL Lindenbaum algebras we will make use of the standard equivalence classes without the conjoined terms. All these equivalence classes are relations of congruence. Theorem 29. The relations of equivalence ≈Γ,QL,𝑖, 𝑖 = 6, 7, 8, or simply ≈𝑖, 𝑖 = 6, 7, 8, defined as (40) are relations of congruence, where

.

Proof. Let us first prove that ≈ is an equivalence relation. 𝐴 ≈ 𝐴 follows from A1 [(8)] of system and the identity law of equality. If Γ ⊢ 𝐴 ≡ 𝐵, we can detach the left-hand side of A12 to conclude Γ ⊢ 𝐵 ≡ 𝐴, through the use of A13 and repeated uses of A14 and R1. From this and commutativity of equality, we conclude 𝐴 ≈ 𝐵 ⇒ 𝐵 ≈ 𝐴. (For brevity we will mostly not mention further uses of A12, A13, A14, and R1 in what follows.) The proof of transitivity runs as follows (𝑖 = 6, 7, 8).

(41) Γ ⊢ 𝐴 ≡ 𝐶 above follows from A2 and the metaconjunction in the second but last line reduces to (𝐴) = (𝐶) by transitivity of equality.

In order to be a relation of congruence, the relation of equivalence must be compatible with the operations ¬ and ∨. These proofs run as follows (𝑖 = 6, 7, 8).

Classical Logic and Quantum Logic with Multiple and Common Lattice...

19

(42)

(43) In the second step of (42), we used A3. In the second step of (43), we used A4 and A10. For the quantified part of these expressions, we applied the definition of . Theorem 30. The relation of equivalence ≈Γ,QL,1, or simply ≈1, defined as (44) is a relation of congruence, where

.

Proof. The proof for the relation of equivalence given by (44) is the well-known standard one. Theorem 31. The relation of equivalence ≈Γ,CL,6, or simply ≈6, defined as (45) is a relation of congruence, where

.

Use of Abstraction and Logic in Mathematics

20

Proof. It is as given in [28]. Theorem 32. The relation of equivalence ≈Γ,CL,2, or simply ≈2, defined as (46) is a relation of congruence, where

.

Proof. The proof for the relation of equivalence given by (46) is the well-known standard one. Definition 33. The equivalence class for wff A under the relation of equivalence ≈ given by (40), (44), (45), and (46) is defined as and one denotes . The equivalence classes define the natural morphism which gives

. One writes 𝑎 = (𝐴), 𝑏 = (𝐵), and so forth.

Lemma 34. The relation 𝑎 = 𝑏 on

is given by

(47) Lemma

35.

The

Lindenbaum-Tarski

algebras ∗

are WOML (or O6), or WOML1∗, or WOML2∗, or OML, or WDL∗ (O6), or DL; that is, (21) and (30), or (31), or (32), or (27), or (33), or (28) hold for ¬/≈𝑗 and as ‘ and ∪ , respectively, where—for simplicity—one uses the same symbols (‘ and ∪ ) as for Oi, since there are no ambiguous expressions in which the origin of the operations would not be clear from the context.

Proof. For the Γ ⊢ 𝐴 ≡ 𝐵 part of the 𝐴 ≈ 𝐵 definition, the proofs of the ortholattice conditions, (21), follow from A5, A6, A9, the dual of A8, the dual of A7, and DeMorgan’s laws, respectively. (The duals follow from DeMorgan’s laws, derived from A10, A9, and A3.) For (31) and (32) we use Lemma 3.5 from [27] according to which any 𝑡 = 1 condition that holds in OML also holds in any WOML. Program beran [32] shows that the expressions ((𝑎 →1 𝑏) ≡ (𝑏 →1 𝑎 )) ≡ (𝑎 ≡ 𝑏) and ((𝑎 ≡ 𝑏)’ →1 𝑎 ’) ≡ (𝑎 →1 𝑏) reduce to 1 in an OML. By Lemma 3.5 this means that ((𝑎 →1 𝑏) ≡ (𝑏 →1 𝑎 )) ≡ (𝑎 ≡ 𝑏) = 1 and ((𝑎 ≡ 𝑏)’ →1 𝑎 ’) ≡ (𝑎 →1 𝑏) = 1 in any WOML. Now the Γ ⊢ 𝐴 ≡ 𝐵 part from (40) forces these WOML conditions into (31) and (32). For the quantified part of the 𝐴 ≈ 𝐵 definition, lattice O6 is a (proper) WOML. For the OML, we carry out the proof with the relation of

Classical Logic and Quantum Logic with Multiple and Common Lattice...

21

equivalence without the quantified part in (40). Then the Γ ⊢ 𝐴 ≡ 𝐵 part from (40) forces the condition (𝑎 ∪ (𝑎 ’ ∩ (𝑎 ∪ 𝑏))) ≡ (𝑎 ∪ 𝑏) = 1 which holds in any ortholattice into the OM law given by (27).

We stress here that the Lindenbaum-Tarski algebras from Lemma 35 will be uniquely assigned to and via Theorems 42 and 43 in the sense that we have to use the relations of congruence given by (40) and (45) and that we cannot use those given by (44) and (46). For we have to use the latter ones and we cannot use the former ones. This is in contrast to the completeness proof given in [27] where we did not consider the settheoretical difference WOML∗ but only WOML. But since WOML contains OML (unlike WOML∗), in [27] (unlike in this paper) we can use both relations of congruence (40), (45) and (44), and (46) to prove the completeness. We see that the usage of set-theoretical differences in this paper establishes a correlation between lattice models and equivalence relations for a considered logic as shown in Figure 2.

Figure 2. Lattice models of quantum and classical logic together with the corresponding equivalence relations which define their Lindenbaum-Tarski algebras.

Lemma 36. In the Lindenbaum-Tarski algebra in Γ implies 𝑓(𝐴) = 1, then Γ ⊢ 𝐴.

, if 𝑓(𝑋) = 1 for all 𝑋

Proof. We carry out the proof just for . Proofs for other cases run analogously. Let us assume that (𝑋) = 1 for all 𝑋 in Γ imply (𝐴) = 1, that is, |𝐴| = 1 = |𝐴| ∪ |𝐴|’ = |𝐴 ∨ ¬ 𝐴|, where the first equality is from Definition 33, the second equality follows from (22) (the definition of 1 in

22

Use of Abstraction and Logic in Mathematics

an ortholattice), and the third from the fact that ≈ is a congruence. Thus 𝐴 ≈ (𝐴 ∨ ¬𝐴), which by definition means . The same holds for and . This implies, in particular (by dropping the second conjunct), Γ ⊢ 𝐴 ≡ (𝐴 ∨ ¬𝐴). Now in any ortholattice, 𝑎 ≡ (𝑎 ∪ 𝑎 ’) = 𝑎 holds. By mapping the steps in the proof of this ortholattice identity to steps in a proof in the logic, we can prove ⊢ (𝐴 ≡ (𝐴 ∨ ¬𝐴)) ≡ 𝐴 from axioms A2–A14. (A direct proof of ⊢ (𝐴 ≡ (𝐴 ∨ ¬𝐴)) ≡ 𝐴 is also not difficult.) Detaching the left-hand side (using A12, A13, A14, and R1), we conclude Γ ⊢ 𝐴. Theorem 37. The orthomodular law does not hold in models WOML∗ (O6), WOML1∗, and WOML2∗.

for

Proof. We assume contains at least two elementary (primitive) propositions 𝑝0, 𝑝1,.... We pick a valuation 𝑜 that maps two of them, 𝐴 and 𝐵, to distinct nodes (𝐴) and (𝐵) of O6 that are neither 0 nor 1 such that (𝐴) ≤ 𝑜(𝐵) [i.e., 𝑜(𝐴) and 𝑜(𝐵) are on the same side of hexagon O6 in Figure 1]. From the structure of O6, we obtain 𝑜(𝐴) ∪ 𝑜(𝐵) = 𝑜(𝐵) and 𝑜(𝐴) ∪ (𝑜(𝐴)’ ∩ (𝑜(𝐴) ∪ 𝑜(𝐵))) = 𝑜(𝐴) ∪ (𝑜(𝐴)’ ∩ 𝑜(𝐵)) = 𝑜(𝐴) ∪ 0 = o(𝐴).Therefore 𝑜(𝐴) ∪ 𝑜(𝐵) ≠ 𝑜(𝐴) ∪ (𝑜(𝐴)’ ∩ (𝑜(𝐴) ∪ 𝑜(𝐵)), that is, 𝑜(𝐴 ∨ 𝐵) ≠ 𝑜(𝐴 ∨ (¬𝐴 ∧ (𝐴 ∨ 𝐵))). This falsifies (𝐴 ∨ 𝐵) ≈ (𝐴 ∨ (¬𝐴 ∧ (𝐴 ∨ 𝐵)) which is an alternative way of expressing the orthomodularity property [45, 46]. Therefore 𝑎 ∪ 𝑏 ≠ 𝑎 ∪ (𝑎 ’ ∩ (𝑎 ∪ 𝑏)), providing a counterexample to the orthomodular law . We can follow the steps given above by taking (𝐴) = 𝑥 and (𝐵) for = 𝑦 in Figure 1(a). For O7 and O8 the proofs are analogous. For instance, the orthomodularity is violated in Figure 1(b) for (𝐴) = 𝑥 and (𝐵) = 𝑦 and in Figure 1(c) for (𝐴) = 𝑤 and 𝑜(𝐵) = 𝑦. Theorem 38. The orthomodular law holds in

for an OML model.

Proof. It is well-known.

Theorem 39. The distributive law does not hold in Proof. It is as given in [28].

for WDL∗ (O6).

Schechter [30, Sec. 9.4] gives O6 a set-valued interpretation by assigning {−1, 0, 1, } to 1 in Figure 1(a), {−1, 0} to 𝑦, {0, 1} to 𝑥’, {−1} to 𝑥, {1} to 𝑦’, and ⌀ to 0 and calls it the hexagon interpretation. “The hexagon interpretation is not distributive. That fact came as a surprise to some logicians, since the two-valued logic itself is distributive” [30, Sec. 9.5]. Schechter also gives crystal (6 subsets) and Church’s diamond (4 subsets) in his Sections 9.7.–13. and 9.14.–17. set-valued interpretations of

Classical Logic and Quantum Logic with Multiple and Common Lattice...

Theorem 40. The distributive law holds in algebra).

23

for a DL model (Boolean

Proof. It is well-known. is a proper WOML∗ Lemma 41. ∗ ∗ ∗ (O6), WOML1 , WOML2 , OML, WDL (O6), or DL model. Proof. It follows from Lemma 35.

Now we are able to prove the completeness of and ; that is, if a formula 𝐴 is a consequence of a set of wffs Γ in all O6,WOML∗,WOML1∗,WOML2∗, and OML models and in all O6, WDL∗,

and DL models then and , respectively. In particular, when Γ = ⌀, all valid formulae are provable in . Theorem 42 (completeness of quantum logic). One has (48) Proof. and OML models

means that, in all WOML∗ (O6), WOML1∗, WOML2∗, , if 𝑓(𝑋) = 1 for all 𝑋 in Γ, then 𝑓(𝐴) = 1 holds. In

, which is a WOML∗ (O6), WOML1∗, particular, it holds for ∗ WOML2 , or OML model by Lemma 41. Therefore, in the LindenbaumTarski algebra , if 𝑓(𝑋) = 1 for all 𝑋 in Γ, then 𝑓(𝐴) = 1 holds. By Lemma 36, it follows that Γ ⊢ 𝐴. Theorem 43 (completeness of classical logic). One has

(49) Proof. It is as given in [28].

DISCUSSION We have shown that quantum and classical axiomatic logic are metastructures for dealing with different algebras, in our case lattices, as their models. On the one hand, well- formed formulas in logic can be mapped to equations in different lattices, and on the other, equations from one lattice, which we are more familiar with or which are simpler or easier to handle, can be translated into equations of another lattice, through the logic which they are both models of.

24

Use of Abstraction and Logic in Mathematics

In Section 4 we proved that quantum logic can be modelled by five different lattice models only one of which is orthomodular and that classical logic can be modelled by at least three lattice models only one of which is distributive. As we indicated in [39] there might be many more, possibly infinitely many, different lattice models quantum and classical axiomatic logic can be modelled with. (See also the remarks below Theorem 39.) The models are presented in a chart in Figure 2. The key step that allows the multiplicity of lattice models for both kinds of logic is the refinement of the equivalence relations for the Lindenbaum-Tarski algebras in Theorems 29, 30, 31, and 32. They are also given in the chart where we can see that two different equivalence relations enable O6 to be a model of both quantum and classical logic. This is possible because both the weak orthomodularity (30) and the weak distributivity (34) pass O6 as pointed out below Definition 19. The essence of the equivalence classes of the Lindenbaum-Tarski algebras is that they are determined by special simple lattices, for example, those shown in Figure 1, in which conditions that define definite other lattice models fail. The failure is significant because it proves that the orthomodularity (27) of OML is not needed to prove the completeness of quantum logic for WOML2∗, that neither orthomodularity (27) nor condition (32) is needed to prove the completeness for WOML1∗, and that neither orthomodularity (27) nor condition (32) nor condition (31) is needed for WOML∗.

At the level of logical gates, classical or quantum, with today’s technology for computers and artificial intelligence, we can use only bits and qubits, respectively, that is, only valuations corresponding to two-valued DL (digital, binary, two-valued Boolean algebra) and OML, respectively. And when we talk about logic today, we take for granted that they have the latter valuation—{TRUE, FALSE} in the case of classical logic and Hasse diagrams in the case of quantum logic [40]. This is because a valuation is all we use to implement a logic. In its final application, we do not use a logic as given by its axioms and rules of inferences but as given by its models. So, it would be interesting to investigate how other valuations, that is, various WOMLs and WDLs, might be implemented in complex circuits. That would provide us with the possibility of controlling essentially different algebraic structures (logical models) implemented into radically different hardware (logic circuits consisting of logic gates) by the same logic that we use today with the standard bit and qubit gate technology.

Classical Logic and Quantum Logic with Multiple and Common Lattice...

25

With these possible applications of quantum and classical logic we come back to the question which we started with: “Is Logic Empirical?” We have seen that logic is not uniquely empirical since it can simultaneously describe distinct realities. However, we have also seen (cf. Figure 2) that by means of chosen relations of equivalence we can link particular kinds of “empirical” models to quantum logic on the one hand and classical logic, on the other. Let us therefore briefly review the most recent elaborations on the question given by Bacciagaluppi [47] and Baltag and Smets [19]. They state “quantum logic is suitable as a logic that locally replaces classical logic when used to describe ‘a class of propositions in the context of quantum mechanical experiments.’” Our results show that this point can be supported as follows. The propositions of quantum logic correspond to elements of a Hilbert lattice and are not directly linked to measurement values. Such logic employs models which evaluate particular combinations of propositions and tells us whether they are true or not. Evaluation means mapping from a set of propositions to an algebra (lattice), through which a correspondence with measurement values indirectly emerges. Since the algebra must be an orthomodular lattice and cannot be a Boolean algebra we can say that quantum logic which has an orthomodular lattice as one of its models is “empirical” whenever we theoretically describe quantum measurements, simply because it can be linked to its algebraic model which serves for such a description: an orthomodular Hilbert lattice, that is, the lattice of closed subspaces of a complex Hilbert space.

ACKNOWLEDGMENTS Supports by the Alexander von Humboldt Foundation and the Croatian Science Foundation through project IP-2014-09-7515 as well as CEMS funding by the Ministry of Science, Education and Sports of Croatia are acknowledged. Computational support was provided by the cluster Isabella of the University Computing Centre of the University of Zagreb and by the Croatian National Grid Infrastructure.

26

Use of Abstraction and Logic in Mathematics

REFERENCES 1.

2. 3.

4.

5. 6.

7. 8.

9. 10. 11.

12. 13.

H. Putnam, “Is logic empirical,” in Boston Studies in the Philosophy of Science, R. S. Cohen and M. W. Wartofsky, Eds., vol. V, pp. 216–241, Reidel Publishing Company, Dordrecht, The Netherlands, 1969. G. Birkhoff and J. von Neumann, “The logic of quantum mechanics,” Annals of Mathematics, vol. 37, no. 4, pp. 823–843, 1936. A. Wilce, “Quantum logic and probability theory,” in The Stanford Encyclopedia of Philosophy, E. N. Zalta, Ed., Stanford University, 2012, http://plato.stanford.edu/archives/fall2012/entries/qt-quantlog/. H. Dishkant, “Semantics of the minimal logic of quantum mechanics,” Polish Academy of Sciences. Institute of Philosophy and Sociology. Studia Logica. An International Journal for Symbolic Logic, vol. 30, pp. 23–32, 1972. R. I. Goldblatt, “Semantic analysis of orthologic,” Journal of Philosophical Logic, vol. 3, no. 1-2, pp. 19–35, 1974. M. L. D. Chiara, “Quantum logic,” in Handbook of Philosophical Logic, D. Gabbay and F. Guenthner, Eds., vol. 3, pp. 427–469, D. Reidel, Dordrecht, The Netherlands, 1986. H. Nishimura, “Sequential method in quantum logic,” The Journal of Symbolic Logic, vol. 45, no. 2, pp. 339–352, 1980. H. Nishimura, “Gentzen methods in quantum logic,” in Handbook of Quantum Logic and Quantum Structures, K. Engesser, D. Gabbay, and D. Lehmann, Eds., pp. 227–260, Elsevier, Amsterdam, The Netherlands, 2009. P. Mittelstaedt, Quantum Logic, vol. 126 of Synthese Library, Reidel, London, UK, 1978. E.-W. Stachow, “Completeness of quantum logic,” Journal of Philosophical Logic, vol. 5, no. 2, pp. 237–280, 1976. P. Pták and S. Pulmannová, Orthomodular Structures as Quantum Logics, vol. 44 of Fundamental Theories of Physics, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1991. K. Engesser and D. M. Gabbay, “Quantum logic, Hilbert space, revision theory,” Artificial Intelligence, vol. 136, no. 1, pp. 61–100, 2002. J. P. Rawling and S. A. Selesnick, “Orthologic and quantum logic: models and computational elements,” Journal of the ACM, vol. 47, no. 4, pp. 721–751, 2000.

Classical Logic and Quantum Logic with Multiple and Common Lattice...

27

14. F. Herbut, “State-dependent implication and equivalence in quantum logic,” Advances in Mathematical Physics, vol. 2012, Article ID 385341, 23 pages, 2012. 15. T. I. Tylec and M. Kuś, “Non-signaling boxes and quantum logics,” Journal of Physics A: Mathematical and Theoretical, vol. 48, no. 50, Article ID 505303, 17 pages, 2015. 16. A. Bikchentaev, M. Navara, and R. Yakushev, “Quantum logics of idempotents of unital rings,” International Journal of Theoretical Physics, vol. 54, no. 6, pp. 1987–2000, 2015. 17. K. Engesser, D. M. Gabbay, and D. Lehmann, A New Approach to Quantum Logic, vol. 8 of Studies in Logic, College Publications, London, UK, 2007. 18. A. Baltag and S. Smets, “Complete axiomatizations for quantum actions,” International Journal of Theoretical Physics, vol. 44, no. 12, pp. 2267–2282, 2005. 19. A. Baltag and S. Smets, “Quantum logic as a dynamic logic,” Synthese, vol. 179, no. 2, pp. 285–306, 2011. 20. P. Mateus and A. Sernadas, “Weakly complete axiomatization of exogenous quantum propositional logic,” Information and Computation, vol. 204, no. 5, pp. 771–794, 2006. 21. S. Abramsky and R. Duncan, “A categorical quantum logic,” Mathematical Structures in Computer Science, vol. 16, no. 3, pp. 469– 489, 2006. 22. S. Abramsky and B. Coecke, “A categorical semantics of quantum protocols,” in Proceedings of the 19th Annual IEEE Symposium on Logic in Computer Science (LICS ‘04), pp. 415–425, July 2004. 23. J. Harding, “A link between quantum logic and categorical quantum mechanics,” International Journal of Theoretical Physics, vol. 48, no. 3, pp. 769–802, 2009. 24. D. Hilbert and W. Ackermann, Principles of Mathematical Logic, Chelsea, New York, NY, USA, 1950. 25. G. Kalmbach, Orthomodular Lattices, Academic Press, London, UK, 1983. 26. G. Kalmbach, “Orthomodular logic,” Zeitschrift für Mathematische Logik und Grundlagen der Mathematik, vol. 20, pp. 395–406, 1974. 27. M. Pavičić and N. D. Megill, “Non-orthomodular models for both standard quantum logic and standard classical logic: repercussions for

28

28.

29.

30.

31. 32.

33. 34. 35.

36.

37.

38.

39.

Use of Abstraction and Logic in Mathematics

quantum computers,” Helvetica Physica Acta, vol. 72, pp. 189–210, 1999. M. Pavičić and N. D. Megill, “Is quantum logic a logic?” in Handbook of Quantum Logic and Quantum Structures, K. Engesser, D. Gabbay, and D. Lehmann, Eds., pp. 23–47, Elsevier, Amsterdam, The Netherlands, 2009. M. Pavičić and N. D. Megill, “Binary orthologic with modus ponens is either orthomodular or distributive,” Helvetica Physica Acta, vol. 71, no. 6, pp. 610–628, 1998. E. Schechter, Classical and Nonclassical Logics: An Introduction to the Mathematics of Propositions, Princeton University Press, Princeton, NJ, USA, 2005. L. Beran, Orthomodular Lattices: Algebraic Approach, D. Reidel, Dordrecht, The Netherlands, 1985. N. D. Megill and M. Pavičić, “Equivalencies, identities, symmetric differences, and congruencies in orthomodular lattices,” International Journal of Theoretical Physics, vol. 42, no. 12, pp. 2797–2805, 2003. C. J. Isham, Lectures on Quantum Theory, Imperial College Press, London, UK, 1995. P. R. Halmos, Introduction to Hilbert Space and the Spectral Theory of Spectral Multiplicity, Chelsea, New York, NY, USA, 1957. S. S. Holland Jr., “The current interest in orthomodular lattices,” in Trends in Lattice Theory, J. C. Abbot, Ed., pp. 41–126, Van Nostrand Reinhold, New York, NY, USA, 1970. N. D. Megill and M. Pavičić, “Deduction, ordering, and operations in quantum logic,” Foundations of Physics, vol. 32, no. 3, pp. 357–378, 2002. M. Pavičić, “Nonordered quantum logic and its YES-NO representation,” International Journal of Theoretical Physics, vol. 32, no. 9, pp. 1481–1505, 1993. M. Pavičić, “Identity rule for classical and quantum theories,” International Journal of Theoretical Physics, vol. 37, no. 8, pp. 2099– 2103, 1998. M. Pavičić and N. D. Megill, “Standard logics are valuationnonmonotonic,” Journal of Logic and Computation, vol. 18, no. 6, pp. 959–982, 2008.

Classical Logic and Quantum Logic with Multiple and Common Lattice...

29

40. B. D. McKay, N. D. Megill, and M. Pavičić, “Algorithms for Greechie diagrams,” International Journal of Theoretical Physics, vol. 39, no. 10, pp. 2381–2406, 2000. 41. J. Jay Zeman, “Generalized normal logic,” Journal of Philosophical Logic, vol. 7, no. 2, pp. 225–243, 1978. 42. G. M. Hardegree, “The conditional in abstract and concrete quantum logic,” in The Logico-Algebraic Approach to Quantum Mechanics, C. A. Hooker, Ed., vol. 2, pp. 49–108, D. Reidel, Dordrecht, The Netherlands, 1979. 43. M. Pavičić, “Bibliography on quantum logics and related structures,” International Journal of Theoretical Physics, vol. 31, no. 3, pp. 373– 461, 1992. 44. M. Pavičić and N. D. Megill, “Quantum and classical implication algebras with primitive implications,” International Journal of Theoretical Physics, vol. 37, no. 8, pp. 2091–2098, 1998. 45. M. Pavičić, “Minimal quantum logic with merged implications,” International Journal of Theoretical Physics, vol. 26, no. 9, pp. 845– 852, 1987. 46. M. Pavičić, “Unified quantum logic,” Foundations of Physics, vol. 19, no. 8, pp. 999–1016, 1989. 47. G. Bacciagaluppi, “Is logic empirical?” in Handbook of Quantum Logic and Quantum Structures, K. Engesser, D. Gabbay, and D. Lehmann, Eds., Quantum Logic, pp. 49–78, Elsevier, Amsterdam, The Netherlands, 2009.

Chapter

A NOVEL CATEGORICAL APPROACH TO SEMANTICS OF RELATIONAL FIRST-ORDER LOGIC

2

Wolfgang Schreiner1, William Steingartner2, and Valerie Novitzká2 Research Institute for Symbolic Computation (RISC), Johannes Kepler University, Altenbergerstraße 69, A-4040 Linz, Austria 2 Faculty of Electrical Engineering and Informatics, Technical University of Košice, Letná 9, 042 00 Košice, Slovakia 1

ABSTRACT We present a categorical formalization of a variant of first-order logic. Unlike other texts on this topic, the goal of this paper is to give a very transparent and self-contained account without requiring more background than basic logic and set theory. Our focus is to show how the semantics of first-order formulas can be derived from their usual deduction rules. For understanding the core ideas, it is not necessary to investigate the internal term structure of atomic formulas, thus we abstract atomic formulas to (syntactically opaque) Citation: (APA): Schreiner, W., Steingartner, W., & Novitzká, V. (2020). A novel categorical approach to semantics of relational first-order logic. Symmetry, 12(10), 1584. (25 pages). Copyright: © Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/).

Use of Abstraction and Logic in Mathematics

32

relations; in this sense, our variant of first-order logic is “relational”. While the derived semantics is based on categorical principles (even the duality that arises from a symmetry between two ways of looking at something where there is no reason to choose one over the other), it is nevertheless “constructive” in that it describes explicit computations of the truth values of formulas. We demonstrate this by modeling the categorical semantics in the RISCAL (RISC Algorithm Language) system which allows us to validate the core propositions by automatically checking them in finite models. Keywords: category, functor, RISCAL, relation, relational first-order logic, semantics

INTRODUCTION Most introductions to first-order logic first define the syntax of formulas, then formalize their meaning in the form established in the 1930s by Tarski [1] (essentially what we call today in programming language theory a “denotational semantics” [2]), then introduce a deduction calculus, and finally show the soundness and completeness of this calculus concerning the semantics: if a formula can be derived in the calculus, it is true according to the semantics, and vice versa. These relationships between truth and derivability have to be established because there is no self-evident link between the semantics of a formula and the deduction rules associated with it. Historically, deduction came first; the soundness of a deduction calculus was established by showing that it could not lead to apparent inconsistencies, i.e. that both a formula and its negation could not be derived in a deduction system. It was Tarski who first gave meaning to formulas that was independent of deduction. However, as it did since the 1940s to many other mathematical areas, category theory [3,4,5,6,7], the general theory of mathematical structures, can bring and provide an alternative light also to first-order logic. It does so by considering logical notions as special instances of “universal” constructions, where a value of interest is determined •

first, by depicting the core property that the value shall satisfy; and • second, by giving a criterion how to choose a canonical value from all values that satisfy the property. It was eventually recognized that by such universal constructions the

A Novel Categorical Approach to Semantics of Relational First-Order Logic

33

semantics of the connectives of propositional logic could be determined directly from their associated introduction and elimination rules. However, it took until the late 1960s until Lawvere gained the fundamental insight that this idea could be also applied to the quantifiers of first-order logic [8], thus establishing a direct relationship between its semantics and its proof calculus. However, this insight has not yet obtained a foothold in basic texts on logic and its basic education. The main reason may be that the corresponding material is found mostly in texts on category theory and its applications where it is dispersed among examples of the application of categorical notions without a clear central presentation. Furthermore, the general treatment of first-order logic with terms and variables requires a complex mathematical apparatus [9] which is much beyond the scope of basic introductions. Reasonably compact introductions can be found, e.g., in Section 2.1.10 of [10], in [11], in Section 1.6 of [12] (however, in the context of type theory rather than classical first-order logic), in Section 9.5 of [3] (the treatment of quantifiers only), and in Section 7.1.12 of [7] (again only the treatment of quantifiers). The goal of this paper is to give a compact introduction to a categorical version of first-order logic that is fully self-contained, only introduces the categorical notions relevant for the stated purpose, and presents them from the point of view of the intended application. For this purpose, it elaborates a simple but completely formalized syntactic and semantic framework of firstorder logic that represents the background of the discussion, without gaps and inconsistencies. As a deliberate decision, this framework does not address the syntax and semantics of terms but abstracts atomic formulas to opaque relations; this allows for focusing the discussion on the essentials. However, to describe a reasonably close relative of first-order logic, this framework is (in contrast to other presentations) not based on relations of fixed arity, i.e., with a fixed number of variables; instead, we consider relations of infinite arity, i.e., with infinitely many variables. However, only finitely many variables may influence the truth value of the relation, which represents the effect that a classical atomic formula can only reference a finite number of variables. The overall result is a slick and elegant presentation. Because the duality has many manifestations in logic and it is agreed by all hands that a duality is like a “giant symmetry”—a symmetry between theories, we focus on this concept in our approach. For the implementation, we use the RISCAL—the RISC Algorithm Language [13], which is a specification

34

Use of Abstraction and Logic in Mathematics

language with an associated software system for describing mathematical algorithms, formally specifying their behavior based on mathematical theories, and validating the correctness of algorithms, specifications, and theories by the execution/evaluation of their formal semantics. The term “algorithm language” indicates that RISCAL is intended to model, rather than low-level code, algorithms (as can be found in textbooks on discrete mathematics) in a high-level language, and specifying the behavior of these algorithms by formal contacts. RISCAL has been developed to validate the correctness of mathematical theories, specifications, and programs, by checking instances of these artifacts on finite domains; applications of RISCAL are for instance in discrete mathematics, number theory, and computer algebra. Software based on formal logic plays an ever-increasing role in areas where a mathematically precise understanding of a subject domain and sound rules for reasoning about the properties of this domain are essential. A prime example is the formal modeling, specification, and verification of computer programs and computing systems, but there are many other applications in areas such as knowledge-based systems, computer mathematics, or the semantic web [14]. Furthermore, the intent of all our projects (namely LogTechEdu, SemTech [15], and others listed in Funding section) is to further advance education in computer science and related topics. In academical courses for computer science and mathematics, by utilizing the power of modern software based on formal logic and semantics, students shall engage with the material they encounter by actively producing the problem solutions rather than just passively taking them from the lecturer. The remainder of this paper is structured as follows: in Section 2, we define a term-free variant of first-order logic and give it a semantics in the usual style based on set-theoretic notions. In Section 3, we introduce those categorical notions that are necessary for understanding the following elaboration and discuss their relationships. The core of this paper is Section 4 where we elaborate the categorical formulation of the semantics of our variant of first-order logic. In Section 5, we demonstrate that these semantics are constructive by modeling it in the RISCAL system [13], which allows us to automatically check the core propositions in particular finite models. Section 6 concludes our presentation and gives an outlook on our future work.

A Novel Categorical Approach to Semantics of Relational First-Order Logic

35

A RELATIONAL FIRST-ORDER LOGIC In this section, we introduce a simplified variant of first-order logic that abstracts from the syntactic structure of atomic formulas and thus copes without the concept of terms, constants, function symbols, predicate symbols, and all of the associated semantic apparatus. Towards this goal, atomic formulas are replaced by relations over assignments (maps of variables to values) that are constrained to only depend on a finite number of variables; we will call such relations “predicates”. Consequently, the semantics of every non-atomic formula is also a relation (i.e., a predicate, as mentioned). We begin with some standard notions. First, we specify variables and the values that variables hold. Axiom 1 (Variables and Values). Let Var denote an arbitrary infinite and enumerable set; we call the elements of this set variables. Furthermore, let Val denote an arbitrary non-empty set; we call the elements of these set values. Next, we define assignments. Definition 1 (Assignments). We define as the set of all mappings of variables to values (a function space); we call the elements of this set assignments. Thus, for every assignment a ∈ Ass and every variable x ∈ Var, we have a(x) ∈ Val. We note that assignment is similar to a concept of state in theory of formal semantics of programming languages (see, e.g., [16]) where the state is a function from variables to values: to each variable, the state associates its current value.

Definition 2 (Updates). Let a ∈ Ass be an assignment, x ∈ Var a variable, and v ∈ Val a value. We define the update assignment a[x ↦ v] ∈ Ass as follows:

Consequently, a[x ↦ v] is identical to a except that it maps variable x to value v. Based on this, we can formulate the following updating properties.

Proposition 1 (Update Properties). Let a ∈ Ass be an assignment, x, y ∈ Var variables, and v, v1, v2 ∈ Val values. Then, we have the following properties:

36

Use of Abstraction and Logic in Mathematics

Proof. Directly from the definitions. □ The properties of assignments listed above (and only these) will be of importance in the subsequent proofs. Now, we turn to the fundamental semantic notions. Definition 3 (Relations). We define as the set of all sets of assignments; we call the elements of this set “relations”. Consequently, a relation is a set of assignments. Definition 4 (Variable Independence). We state that relation R ∈ Rel is independent of variable x ∈ Var, written as R ⫫ x, if and only if the following holds: Consequently, if R ⫫ x, the value of x in any assignment a does not influence whether a is in R. We say that R depends on x if R ⫫ x does not hold. We transfer the central syntactic property of atomic formulas (they can only refer to finitely many variables) to its semantic counterpart. Definition 5 (Predicates). A relation R ∈ Rel is a predicate, if it only depends on finitely many variables. We denote by Pred the set of all predicates and by the subset of all predicates that are independent of x. Now, we are ready to introduce the central entities of our paper. First, we give a definition of the abstract syntax of formulas. Definition 6 (Abstract Syntax of Formulas). We define For as that smallest set of abstract syntax trees in which every element F ∈ For is generated by an application of a rule of the following context-free grammar (where P ∈ Pred denotes an arbitrary predicate and x ∈ Var denotes an arbitrary variable):

A Novel Categorical Approach to Semantics of Relational First-Order Logic

37

We call the elements of this set formulas. In this definition, the role of a classic atomic predicate p(t1, …, tn) with argument terms t1, …, tk in which n variables x1, …, xn occur freely is abstracted to a predicate P that depends on variables x1, …, xn. Now, we establish the relationship between the syntax and semantics of formulas.

Definition 7 (Semantics of Formulas). Let F ∈ For be a formula. We define the relation 〚F〛 ∈ Rel, called the semantics of F, by induction on the structure of F:

The above definition is well-defined in that every formula denotes a relation. To show that formulas indeed denote predicates, some more work is required. Proposition 2 (Quantified Formulas and Variable Independence). For every variable x ∈ Var and formula F ∈ For, we have 〚∀x. F〛⫫ x and 〚∃x. F〛⫫ x, i.e., the semantics of quantified formulas do not depend on x. Proof. We prove this proposition by reductio ad absurdum.

First, assume that 〚∀x. F〛 depends on x. Then, we have some assignment a ∈ 〚∀x. F〛 and some values v1, v2 ∈ Val such that a[x ↦ v1] ∈ 〚∀x. F〛 and a[x ↦ v2] ∉〚∀x. F〛. From a[x ↦ v2] ∉〚∀x. F〛 we have some v ∈ Val with a[x ↦ v2][x ↦ v] ∉〚F〛 and thus a[x ↦ v] ∉〚F〛.

Use of Abstraction and Logic in Mathematics

38

However, a[x ↦ v1] ∈〚∀x. F〛 implies a[x ↦ v1][x ↦ v] ∈〚F〛 and thus a[x ↦ v] ∈〚F〛, which represents a contradiction.

Now, assume that 〚∃x. F〛 depends on x. Then, we have some assignment a ∈〚∃x. F〛 and some values v1, v2 ∈ Val such that a[x ↦ v1] ∈ 〚∃x. F〛 and a[x ↦ v2] ∉〚∃x. F〛. From a[x ↦ v1] ∈〚∃x. F〛, we have some v ∈ Val with a[x ↦ v1][x ↦ v] ∈〚F〛 and thus a[x ↦ v] ∈〚F〛. However, a[x ↦ v2] ∉〚∃x. F〛 implies a[x ↦ v2][x ↦ v] ∉〚F〛 and thus a[x ↦ v] ∉〚F〛, which represents a contradiction. □

Proposition 3 (Formula Semantics and Predicates). For every formula F ∈ For, we have 〚F〛∈ Pred, i.e., the semantics of F is a predicate. Proof. The proof proceeds by induction over the structure of F. • •

If F = P, we have 〚F〛 = {a ∈ Ass | a ∈ P} = P ∈ Pred. If F ∈ {⊤, ⊥ }, there are no x, a, v1, v2 such that a[x ↦ v1] ∈〚F〛 and a[x ↦ v2] ∉〚F〛 because, for F = ⊤, the second condition must be false and for F = ⊥ the first one; thus, F does not depend on any variable. • If F = ¬F1, by the induction hypothesis, we may assume that 〚F1 〛 depends only on the variables in some finite variable set X. From the definition of 〚F〛, it is then easy to show that 〚¬F1 〛 also depends only on the variables in X. • If F ∈ {F1 ∧ F2, F1 ∨ F2, F1 → F2, F1 ↔ F2}, we may assume by the induction hypothesis that 〚F1〛 depends only on the variables in some finite set X1 while P2 only depends on the variables in some finite set X2. From the definition of 〚F〛, it is then easy to show that 〚F〛 depends only on the variables in the finite set X1 ∪ X2. • If F ∈ {∀x. F1, ∃x. F1}, we may assume by the induction hypothesis that 〚F1〛 only depends on the variables in some finite variable set X. We are now going to show that 〚F〛 only depends on the variables in the finite set X\{x}. Actually, we assume that this is not the case and show a contradiction. From this assumption and Proposition 2, we have a variable y ≠ x ∧ y ∉ X on which F depends; thus, we have an assignment a and values v1, v2 such that a[y ↦ v1] ∈〚F〛 and a[y ↦ v2] ∉〚F〛. If F = ∀x. F1, from a[y ↦ v2] ∉〚F〛, we have a v ∈ Val with a[y ↦ v2] [x ↦ v] ∉〚F1〛 and thus (since y ≠ x) a[x ↦ v][y ↦ v2] ∉〚F1〛. From a[y ↦ v1] ∈〚F〛, we know a[y ↦ v1][x ↦ v] ∈〚F1〛 and thus a[x ↦ v][y ↦

A Novel Categorical Approach to Semantics of Relational First-Order Logic

39

v1] ∈〚F1〛. Thus, 〚F1〛 depends on a variable y ∉ X which contradicts the induction assumption.

If F = ∃x. F1, from a[y ↦ v1] ∈〚F〛, we have a v ∈ Val with a[y ↦ v1] [x ↦ v] ∈〚F1〛 and thus (since y ≠ x) a[x ↦ v][y ↦ v1] ∈〚F1〛. From a[y ↦ v2] ∉〚F〛 we know a[y ↦ v2][x ↦ v] ∉〚F1〛 and thus a[x ↦ v][y ↦ v2] ∉〚F1〛. Thus, 〚F1〛 depends on a variable y ∉ X which contradicts the induction assumption. This completes our proof. □

In the following, we transfer the classical model-theoretic notions to our framework. Definition 8 (Satisfaction). Let a ∈ Ass be an assignment and F ∈ For be a formula. We define a ⊧ F (read: a satisfies F) as follows: Definition 9 (Validity). Let F ∈ For be a formula. We define ⊧ F (read: F is valid) as follows: Definition 10 (Logical Consequence). Let F, G ∈ For be formulas. We define F ⊧ G (read: G is a logical consequence of F) as follows: Definition 11 (Logical Equivalence). Let F, G ∈ For be formulas. We define F ≡ G (read: F and G are logically equivalent) as follows: Proposition 4 (Logical Consequence and Logical Equivalence). Let F, G ∈ For be formulas. Then, we have the following equivalences: • (F ⊧ G) ⇔ (⊧ F → G) • (F ⊧ G) ⇔ (〚F〛⊆〚G〛) • (F ≡ G) ⇔ (⊧ F ↔ G) • (F ≡ G) ⇔ (〚F〛=〚G〛) Proof. Directly from the definitions. □

Thus, a logical consequence on the meta-level coincides with an implication on the formula level and with the subset relation on the semantic

40

Use of Abstraction and Logic in Mathematics

level. Furthermore, logical equivalence on the meta-level coincides with equivalence on the formula level and with the equality relation on the semantic level. In the following, we establish a set-theoretic interpretation of the logical operations of our formula language. Definition 12 (Complement). We define the complement of . Consequently, an assignment is relation R ∈ Rel as the relation in if and only if it is not in R.

Proposition 5 (Propositional Semantics as Set Operations). Let F, F1, F2 ∈ For be formulas. We then have the following equalities:

Proof. Directly from the definition of the semantics.



While the above results are quite intuitive, a corresponding set-theoretic interpretation of quantified formulas is not. In the following, we only state the plain result without indication of how it can be intuitively understood; we will delegate this explanation to Section 4, where the categorical framework will provide us with adequate insight. Proposition 6 (Quantifier Semantics as Set Operations). Let F ∈ For be a formula. We then have the following equalities:

In other words, 〚∀x. F〛 is the weakest predicate P (“weakest” in the sense of the largest set) that is independent from x and that satisfies the property P ⊆〚F〛 while 〚∃x. F〛 is the strongest predicate P (“strongest” in the sense of the smallest set) that is independent of x and that satisfies the property 〚F〛⊆ P.

A Novel Categorical Approach to Semantics of Relational First-Order Logic

41

Proof. The proof is in two stages. First, we take an arbitrary assignment a ∈ Ass and show ⇒: We assume a ∈〚∀x. F〛 and prove for (1)

(2) (3) (4) From Proposition 3, we have (1). From Proposition 2, we have (2). From a ∈〚∀x. F〛, we have (4). To show (3), we take arbitrary assignment a0 ∈ P and show a0 ∈〚F〛. From a0 ∈ P, we know a0[x ↦ v] ∈〚F〛 for . Since a0[x ↦ a0(x)] = a0, we thus know a0 ∈〚F〛.

⇐: We assume for some P ∈ Pred (5)

(6) (7) and prove a ∈〚∀x. F〛. For this, we take arbitrary v ∈ Val and prove a[x ↦ v] ∈〚F〛. From (6), it suffices to show a[x ↦ v] ∈ P. From (5), we know (8)

From (7) and a = a[x ↦ a(x)], we know a[x ↦ v0] ∈ P for Thus, with (8), we know a[x ↦ v] ∈ P.

.

Now, we take arbitrary a ∈ Ass and show

⇒: We assume a ∈〚∃x. F〛 and take arbitrary but fixed P ∈ Pred for which we assume (9)

(10)

Use of Abstraction and Logic in Mathematics

42

Our goal is to show a ∈ P. From a ∈〚∃x. F〛, we know a[x ↦ v] ∈ 〚F〛 for some v ∈ Val. From (10), we thus know a[x ↦ v] ∈ P. From (9), we thus know a[x ↦ a(x)] ∈ P. Since a[x ↦ a(x)] = a, we thus know a ∈ P. ⇐: We assume

(11)

and and prove a ∈〚∃x. F〛. From (11) instantiated with Propositions 3 and 2, it suffices to prove 〚F〛⊆〚∃x. F〛. Take arbitrary assignment a0 ∈〚F〛. Since a0[x ↦ a(x)] = a, we thus have a0[x ↦ v] ∈ and thus a0 ∈〚∃x. F〛. □ 〚F〛 for

CATEGORY THEORY

In this section, we discuss those aspects of category theory that are relevant for the subsequent categorical formulation of our relational first-order logic.

Basic Notions We begin with the basic notions of category theory. Definition 13 (Category). A category C is a triple ⟨O, A, ∘⟩ of the following components:

A class O of elements called C-objects or just objects. A class A of elements called C-arrows or just arrows. Each arrow has a source object and a target object from O; we write f : a → b to indicate that f is an arrow with source a and target b. We write C(a, b) to denote the class of all arrows of A with source a and target b (called the hom-class of all arrows from a to b). For every object x in O, A contains an arrow idx : x → x called the identity arrow for x. • A composition—binary operation ∘ defined on arrows. For all arrows f : a → b and g : b → c, we have (g ∘ f) : a → c. Furthermore, the composition satisfies the following axioms: − Associativity: (h ∘ g) ∘ f = h ∘ (g ∘ f), for all arrows f : a → b, g : b → c, h : c → d. − Identity: idb ∘ f = f = f ∘ ida, for all arrows f : a → b. Definition 14 (Isomorphism). Let C be a category and a, b be C-objects a, b. Then, we have a ≃ b (read: a and b are isomorphic) if there are C-arrows f : a → b and g : b → a, called isomorphisms, such that g ∘ f = ida and f ∘ g • •

A Novel Categorical Approach to Semantics of Relational First-Order Logic

43

= idb.

Definition 15 (Subcategory). A category C is a subcategory of category 𝒟 if every C-object is also a 𝒟-object, every C-arrow is also a 𝒟-arrow, every identity arrow in C is also an identity arrow in 𝒟, and g ∘C f = g ∘D f for all C-arrows f : a → b and g : b → c, where ∘C denotes the composition in C and ∘D denotes the composition in 𝒟.

Object Constructions

We are now introducing constructions of categorical objects that will subsequently play an important role in the categorical formulation of relational first-order logic. Definition 16 (Initial and Final Objects). Let C be a category. A C-object 0 is initial if for every C-object a there exists exactly one arrow 0a : 0 → a. A C-object 1 is final if for every C-object a there exists exactly one arrow 1a : a → 1. The following diagram illustrates the arrows of an initial object 0 and a final object 1 with respect to an arbitrary object a:

This construction of initial/final objects is “universal” in the sense that it describes a class of entities (objects and accompanying arrows) that share a common property and picks from this class an entity whose characterizing property is the existence of exactly one arrow from/to every entity of this class. This defines the entity uniquely up to isomorphism. Further instances of such constructions will be given later. Definition 17 (Product and Coproduct). Let C be a category. Then, the triple ⟨a × b, π1, π2⟩ is a product of C-objects a and b if a × b is a C-object, the product object, with arrows π1 : a × b → a and π2 : a × b → b, the projections, such that for every triple ⟨c, f, g⟩ with C-object c and arrows f : c → a and g : c → b there exists exactly one arrow ⟨f, g⟩ : c → a × b such that the following diagram commutes:

44

Use of Abstraction and Logic in Mathematics

Dually, the triple (a + b, ι1, ι2) is a coproduct of C-objects a and b if a + b is a C-object, the coproduct object, with arrows ι1 : a → a + b and ι2 : b → a + b, the injections, such that, for every triple ⟨c, f, g⟩ with C-object c and arrows f : a → c and g : b → c, there exists exactly one arrow [f, g] : a + b → c such that the following diagram commutes:

The product and the coproduct are thus defined by universal constructions analogous to those of the final and the initial element, respectively; thus, products and coproducts are also uniquely defined up to isomorphism. Definition 18 (Product Arrow). Let C be a category with products and and arrows f : a1 → b1 and g : a2 → b2, respectively. Then, the product arrow f × g : a1 × a2 → b1 × b2 is the arrow ⟨f ∘ π1, g ∘ π2⟩.

Definition 19 (Exponential). Let C be a category in which, for all C-objects, there exists a product object. Then, the tuple ⟨ba, evala,b⟩ is an exponential of C-objects a and b if ba is a C-object, the exponential object, with arrow evala,b : ba × a → b, the evaluation arrow, such that for every C-object c with arrow f : c × a → b there exists exactly one arrow curryf : c → ba, the currying arrow, such that the following diagram commutes:

A Novel Categorical Approach to Semantics of Relational First-Order Logic

45

Since the exponential is also defined by a universal construction, it is uniquely defined up to isomorphism.

Functors and Adjunction Moving on from individual categories, we will now discuss some concepts that address relationships between categories. Definition 20 (Functor). Let C and 𝒟 be categories. A functor F : C → 𝒟 is a map that takes every C-object a to a 𝒟-object F(a) and every C-arrow f : a → b to a 𝒟-arrow F(f) : F(a) → F(b) such that

• F(ida) = idF(a) for every C-object a, and • F(g ∘C f) = F(g) ∘D F(f) for all C-arrows f : a → b and g : b → c. Definition 21 (Adjunction, Left, and Right Adjoint). Let C and 𝒟 be categories with functors F : C → 𝒟 and G : 𝒟 → C. Then, we have F ⊣ G (read: ⟨F, G⟩ is an adjunction, F is a left adjoint of G, G is a right adjoint of F) if for every C-object a and 𝒟-object b the arrow classes (F(a), b) and C(a, G(b)) are isomorphic, i.e., there exists a bijection between them. This is equivalent to saying that, for every C-object a and 𝒟-object b, there exist two surjective mappings s1 : 𝒟(F(a), b) → C(a, G(b)) and s2 : C(a, G(b)) → 𝒟(F(a), b), i.e.,

for every 𝒟-arrow g : F(a) → b we have a C-arrow f : a → G(b) with s2(f) = g and • for every C-arrow f : a → G(b), we have a 𝒟-arrow g : F(a) → b with s1(g) = f. Note. This equivalence is a consequence of the Cantor–Schröder– Bernstein theorem which states that there exists a bijective function between sets A and B if there exist injective functions f : A → B and g : B → A. This implies that such a bijective function also exists if there exist surjective functions f′ : A → B and g′ : B → A because, from these, we can define the injective functions and . While the theorem has been formulated for sets, it can also be generalized to classes. •

The above formulation will become handy in proving that two functors represent an adjunction. Proposition 7 (Equivalence of Adjunctions and Universals). Let C and 𝒟 be categories with functors F : C → 𝒟 and G : 𝒟 → C. Then, the condition F ⊣ G is equivalent to each of the following two conditions:

Use of Abstraction and Logic in Mathematics

46

1.

For every C-object a, there is a C-arrow u : a → G(F(a)), the “universal arrow”, such that, for every 𝒟-object b and C-arrow f : a → G(b), there exists a 𝒟-arrow gb,f : F(a) → b:

2.

For every 𝒟-object b, there is a C-arrow v : F(G(b)) → b, the “couniversal arrow”, such that, for every C-object a and 𝒟-arrow g : F(a) → b, there is a C-arrow fa,g : a → G(b):

Proof. See the proof of Propositions 6 and 7 in [12].



Object Constructions by Adjunction We conclude this section by demonstrating that the previously described object conjunctions can be also considered as applications of functors that are determined as left respectively right adjoints to certain basic functors. Proposition 8 (Initial and Final Object by Adjunction). Let 1 be the “singleton” category with a single object ∗ (and consequently a single arrow id∗ : ∗ → ∗); this category is uniquely defined up to isomorphism. Let C be a category with the constant functor C : C → 1; in addition, this functor is uniquely defined up to isomorphism. Then, the following holds: •



Let C-object 0 be initial and the “initial object functor” I0 : 1 → C be defined by and . Then, we have I0 ⊣ C, i.e., the initial object functor is a left adjoint of the constant functor. Let C-object 1 be final and the “final object functor” F1 : 1 → C be defined by

and

. Then, we have C

A Novel Categorical Approach to Semantics of Relational First-Order Logic

47

⊣ F1, i.e., the final object functor is a right adjoint of the constant functor. Proof. For showing the first statement, we take the initial object 0 with initial object functor I0. We show I0 ⊣ C, i.e., that C(I0(∗), a) and 1(∗, C(a)) are isomorphic, for arbitrary C-object a. This follows from 1(∗, C(a)) = 1(∗, ∗), C(I0(∗), a) = C(0, a), and the fact that there exists exactly one 1-arrow id∗ : ∗ → ∗ and, since 0 is initial, exactly one C-arrow f : 0 → a.

For showing the second statement, we take the final object 1 with final functor F1. We prove C ⊣ F1, i.e., that 1(C(a), ∗) and C(a, F1(∗)) are isomorphic, for arbitrary C-object a. This follows from 1(C(a), ∗) = 1(∗, ∗), C(a, F1(∗)) = C(a, 1), and the fact that there exists exactly one 1-arrow id∗ : ∗ → ∗ and, since 1 is final, exactly one C-arrow f : a → 1. □

Proposition 9 (Product and Coproduct by Adjunction). Let C be a category. Let the “product category” C × C be the category whose objects (a, b) are pairs of C-objects a and b, whose arrows (f, g) : (a, c) → (b, d) are pairs of C-arrows f : a → b and g : c → d, where the identity arrows are pairs of identity arrows, and where composition is component-wise composition. Let the “diagonal functor” Δ : C → C × C be defined by Δ(a) = (a, a) for every C-object a and Δ(f) = (f, f) for every C-arrow f : a → b. Then, the following holds:

Assume that every pair of C-objects a and b has a product a × b and let the “product functor” P : C × C → C be defined by . Then, we have Δ ⊣ P, i.e., the product functor is a right adjoint of the diagonal functor. • Assume that every pair of C-objects a and b has a coproduct a + b and let the “coproduct functor” C : C → C × C be defined by . Then, we have C ⊣ Δ, i.e., the coproduct functor is a left adjoint of the diagonal functor. Proof. For showing the first statement, we take arbitrary category C and functor P satisfying the stated assumption. We show Δ ⊣ P, i.e., that, for arbitrary C-objects p, a, b, the arrow classes (C × C)(Δ(p), (a, b)) and C(p, P(a, b)) are isomorphic. Since Δ(p) = (p, p) and P(a, b) = a × b, it suffices to find surjections s1 : (C × C)((p, p), (a, b)) → C(p, a × b) and s2 : C(p, a × b) where ⟨f, g⟩ : p → (C × C)((p, p), (a, b)). First, we define → a × b is the unique C-arrow given to us by Definition 17 with property f = π1 ∘ ⟨f, g⟩ and g = π2 ∘ ⟨f, g⟩. Now, we show that, for every C-arrow h : p → a × b, there exist some C-arrows f : p → a and g : p → b with s1(f, g) •

48

Use of Abstraction and Logic in Mathematics

= h. We take and . Due to the uniqueness of ⟨f, g⟩, the equalities f = π1 ∘ h and g = π2 ∘ h imply h = ⟨f, g⟩ and thus s1(f, g) = h. . Now, we show that, for every Second, we define (C × C)-arrow (f, g) : (p, p) → (a, b), i.e., for all C-arrows f : p → a and g : p → b, there exists some C-arrow h : p → a × b with s2(h) = (f, g), i.e., π1 ∘ h = f and π2 ∘ h = g. Definition 17 can be used to define h.

For showing the second statement, we take arbitrary category C and functor C satisfying the stated assumption. We prove C ⊣ Δ, i.e., that, for arbitrary C-objects a, b, c, the arrow classes C(C(a, b), c) and (C × C)((a, b), Δ(c)) are isomorphic. Since C(a, b) = a + b and Δ(c) = (c, c), it suffices to find surjections s1 : C(a + b, c) → (C × C)((a, b), (c, c)) and s2 : (C × C)((a, b), . Now, we prove (c, c)) → C(a + b, c). First, we define that for every (C × C)-arrow (f, g) : (a, b) → (c, c), i.e., for all C-arrows f : a → c and g : b → c, there exists some C-arrow h : a + b → c with s1(h) = (f, g), i.e., h ∘ ι1 = f and h ∘ ι2 = g. Definition 17 can be used to define h. where [f, g] : a + b → c is the unique Second, we define C-arrow given to us by Definition 17 with property f = [f, g] ∘ ι1 and g = [f, g] ∘ ι2. Now, we show that, for every C-arrow h : a + b → c, there exist some C-arrows f : a → c and g : b → c with s2(f, g) = h. We take and . Due to the uniqueness of [f, g], the equalities f = h ∘ ι1 and g = h ∘ ι2 imply h = [f, g] and thus s2(f, g) = h. □

Proposition 10 (Exponential by Adjunction). Let C be a category in which, for every pair of C-objects a and b, there exists a product object b × a and an exponential object ba. For every C-object a, let the “(unary) product functor” Pa : C → C be defined by and the “(unary) exponential functor” Ea : C → C be defined by . Then, we have Pa ⊣ Ea, i.e., the exponential functor is a right adjoint of the product functor. Proof. We take arbitrary category C, C-object a, and functors Pa and Ea satisfying the assumption. We show Pa ⊣ Ea, i.e., that, for arbitrary C-objects b, c, the arrow classes C(Pa(c), b) and C(c, Ea(b)) are isomorphic. Since Pa(b) = b × a and Ea(b) = ba, it suffices to find surjections s1 : C(c × a, b) → C(c,

. Now, ba) and s2 : C(c, ba) → C(c × a, b). First, we define a we show that, for every C-arrow g : c → b , there exists some C-arrow f : c × a → b with s1(f) = g, i.e., curryf = g. We define and show curryf = g. From the definition of f, we know that the C-arrow g : c → ba satisfies the equality f = evala,b ∘ (g × ida). However, Definition 19 implies that the only such C-arrow is curryf; thus, curryf = g. Second, we

A Novel Categorical Approach to Semantics of Relational First-Order Logic

49

define . Now, we show that, for every C-arrow f : c × a → b, there exists some C-arrow g : c → ba with s2(g) = f, i.e., evala,b ∘ (g × ida) = f. We define from which Definition 19 proves the goal. □ We are now ready to discuss the central aspects of categorical logic.

A CATEGORICAL SEMANTICS Based on the concepts introduced in the previous sections, this section elaborates a categorical semantics of our relational version of first-order logic. We advise the reader to consult Figure 1 to grasp the overall framework and the relationship between its various categories and functors.

Figure 1. A Categorical Semantics of Relational First-Order Logic.

Syntactic Category and Formula Functors We start by introducing the “syntactic category” follows: • •

as

The objects of this category are the formulas in the set For which was introduced in Definition 6. The arrow class A consists of all pairs ⟨F1, F2⟩ of formulas F1, F2 for which F1 ⊧ F2 holds, i.e., for which F2 is a logical consequence of F1, as described in Definition 8. The source object of such an arrow is F1, and its target object is F2. The existence of an arrow f : F1 → F2 thus indicates F1 ⊧ F2. The identity idF : F → F indicates the fact F ⊧ F.

Use of Abstraction and Logic in Mathematics

50



The composition ∘ denotes relational composition: for all arrows f : F1 → F2 and g : F2 → F3, and the existence of the arrow (g ∘f ) : F1 → F3 indicates the transitivity of the relation ⊧.

is that subcategory of whose objects For every variable x, are formulas whose semantics are independent of x (see Definition 4). For reasons explained below, we will exclude from the syntactic category negations and equivalences, i.e., formulas of form (¬F) and (F1 ↔ F2). We may do so by considering them as the following syntactic shortcuts:

The validity of these shortcuts can be easily shown by proving the corresponding logical equivalences. Consequently, negations and equivalences need subsequently not be considered any more and their semantics need not be explicitly defined. For the other kinds of formulas, we introduce the following (families of) “formula functors” where 1 is the “singleton” category with a single object ∗ (see Proposition 8):

These functors map formulas to formulas, and logical consequences to logical consequences. The formula mappings are naturally defined as follows:

A Novel Categorical Approach to Semantics of Relational First-Order Logic

51

As for the mapping of consequences, we notice that all functors are covariant in their arguments. It is exactly for this reason that negation and equivalence (which do not allow covariance in their arguments) are not modeled as formula functors and that implication (which is only covariant in its second argument) is not modeled by a binary functor but by a family of unary functors. Thus, we have for all formulas F, F1, F2, G, G1, G2 and every variable x the following (easy to prove) properties:

Therefore, the object maps of these functors naturally induce the necessary logical consequences.

Semantic Category and Predicate Functors Next, we introduce the “semantic category” •



as follows:

The objects of this category are the predicates in the set Pred which was introduced in Definition 5 (thus -objects are relations, i.e., sets). The arrow class B consists of all pairs ⟨P1, P2⟩ of predicates P1, P2 for which P1 ⊆ P2 holds, i.e., for which P1 is a subset of P2. The source object of such an arrow is P1, its target object is P2. The existence of an arrow f : P1 → P2 thus indicates P1 ⊆ P2. The identity idP : P → P indicates the fact P ⊆ P.

Use of Abstraction and Logic in Mathematics

52



The composition ∘ denotes relational composition: for all arrows f : P1 → P2 and g : P2 → P3, the existence of the arrow (g ∘ f) : P1 → P3 indicates the transitivity of the relation ⊆.

For every variable x, is the subcategory of are predicates that are independent of x (see Definition 4).

whose objects

Corresponding to the various kinds of formula constructions, we will have the following “predicate functors” (respectively families of functors):

These functors map predicates to predicates and subset relations to subset relations (their detailed definitions will be given later). As we will see, these functors are covariant in their -arguments, i.e., we have for all predicates P, P1, P2, Q, Q1, Q2 and every variable x the following properties:

Therefore, the object maps of these functors (defined by the respective predicate operations) naturally induce appropriate arrow maps (the corresponding subset relations).

The Semantic Functor Now, we introduce the “semantic functor”

defined as follows:

A Novel Categorical Approach to Semantics of Relational First-Order Logic



53

For every -object F, i.e., formula F, 〚F〛 denotes the semantics of F as defined in Definition 7, which according to Proposition 3 is a predicate, i.e., indeed a -object.

-arrow f : F1 → F2, i.e., every pair of formulas For every F1 and F2 with F1 ⊧ F2, we have the -arrow 〚f〛:〚F1 〛→〚F2〛, i.e., the fact 〚F1〛⊆〚F2〛, which is a direct consequence of Definition 10 which introduces the ⊧ relation. This semantic functor establishes the relationship between the previously introduced formula functors and predicate functors by the following identities on -objects, i.e., predicate identities that will hold for all formulas F, F1, F2 and every variable x: •

Categorical Semantics of First-Order Relational Logic We are now going to elaborate in detail the semantic functors from which all of the above can be shown; this elaboration is inspired from and indeed directly derived from the well-known logical inference rules of firstorder logic. The resulting definitions are based on the categorical notions introduced in Section 3, i.e., final and initial objects, products and coproducts, exponentials, and left and right adjoints, respectively. This gives us for every logical operation a “universal” definition of its semantics. Nevertheless, this semantics is also “constructive” in the sense that it is explicitly defined from well-known set-theoretic operations.

Logical Constants The role of the logical constants in reasoning is exhibited by the following two “rules” which follow directly from Definition 8 (these rules are propositions that are valid for every formula F; they mimic the corresponding inference rules of first-order logic):

54

Use of Abstraction and Logic in Mathematics

In other words, ⊤ is a logical consequence of every formula F, i.e., ⊤ is the “weakest” formula. Dually, every formula F is a logical consequence of ⊥ , i.e., ⊥ is the “strongest” formula. This implies that true(∗) = ⊤ is the final object of category and false(∗) = ⊥ is its initial one (see Definition and 16). Then, Proposition 8 implies functor true is the right adjoint of the constant functor while functor false is its left one.

i.e.,

Correspondingly, TRUE(∗) is the final object of category (the “weakest” predicate, i.e., the predicate which is a superset of every predicate) and FALSE(∗) is its initial object (the “strongest” predicate, i.e., the predicate which is a subset of every predicate). By Proposition 8, we then have and the right adjoint of the constant functor FALSE is its left one.

, i.e., functor TRUE is while functor

Therefore, corresponding to the above rules for formulas, we have the following rules for every predicate P:

Since final and initial objects are unique, these rules actually represent implicit but unique definitions of TRUE(∗) and FALSE(∗) which can be explicitly written as i.e., TRUE(∗) is the union of all predicates and FALSE(∗) is their intersection. Thus, we have derived alternative characterizations 〚⊤〛= TRUE(∗) and 〚⊥ 〛= FALSE(∗) that are both constructive and universal (Proposition 5 gives us 〚⊤〛= Ass and 〚⊥ 〛= ∅ from which it is easy to verify these equalities).

Conjunction and Disjunction The role of conjunction in reasoning is exhibited by the following rules for arbitrary formulas F1, F2, F (the first two ones mimic the logical inference rules of “elimination”, and the last one mimics the inference rule of “introduction”):

A Novel Categorical Approach to Semantics of Relational First-Order Logic

55

Dually, we have the following rules for disjunction:

These rules (whose soundness can be established with the help of Definition 8) state that (F1 ∧ F2) is the “weakest” formula F for which both (F ⊧ F1) and (F ⊧ F2) hold and that (F1 ∨ F2) is the “strongest” formula F for which both (F1 ⊧ F) and (F2 ⊧ F) hold. Thus, and(F1, F2) = (F1 ∧ F2) is the product of the -objects F1 and F2 and or(F1, F2) = (F1 ∨ F2) is their coproduct (see Definition 17). Furthermore, by Proposition 9, we have and

diagonal functor left one.

i.e., functor and is the right adjoint of the while functor or is its

Correspondingly AND(P1, P2) is the product of the -objects P1 and P2 (the “weakest” predicate P for which both (P ⊆ P1) and (P ⊆ P2) hold) and OR(P1, P2) is their coproduct (the “strongest” predicate P for which (P1 ⊆ and P) and (P2 ⊆ P) hold). By Proposition 9, we then have

i.e., functor AND is the right adjoint of the diagonal functor while functor OR is its left one.

Thus, we have, corresponding to the rules for formulas, the following rules for all predicates P1, P2, P:

Dually, we have

Since products and coproducts are uniquely defined, these rules actually represent implicit but unique definitions of AND(P1, P2) and OR(P1, P2) which can be explicitly written as follows:

56

Use of Abstraction and Logic in Mathematics

This gives us alternative characterizations 〚F1 ∧ F2〛=〚F1〛∪ 〚F2 〛= AND(〚F1〛,〚F2〛) and 〚F1 ∨ F2〛=〚F1〛∪ 〚F2〛= OR(〚F1〛, 〚F2〛) that are both constructive and universal (Proposition 5 implies 〚F1 ∧ F2〛=〚F1〛∩〚F2〛 and 〚F1 ∨ F2〛=〚F1〛∪ 〚F2〛 from which it is not difficult to verify these equalities).

Implication The role of implication in reasoning is exhibited by the following rules for arbitrary formulas F1, F2, F (the first rule mimics the logical inference rules of “implication elimination” or “modus ponens”, the last one mimics the inference rule of “implication introduction”):

These rules (whose soundness can be established with the help of Definition 8) state that (F1 → F2) is the “weakest” formula F for which (F ∧ F1 ⊧ F2) holds. Thus, is the exponential of the -objects F1 and F2 (see Definition 19). Proposition 10 then gives

, i.e., functor us conjunction functor

is the right adjoint of the unary with object map

. Correspondingly, is the product of the -objects P1 and P2 (the “weakest” predicate P for which (P ∩ P1 ⊆ P2) holds; Proposition

10 then gives us

, i.e., functor

is the right adjoint

with object map

of the unary functor

. Thus, corresponding to above rules for formulas, we have the following rules for all predicates P1, P2, P:

A Novel Categorical Approach to Semantics of Relational First-Order Logic

57

Since exponentials are uniquely defined, these rules represent an which can be explicitly written implicit but unique definition of as follows:

This

gives

us

an

alternative

characterization

that is both constructive and universal (Proposition 5 implies possible to verify this equality).

from which it is

Universal and Existential Quantification The role of universal quantification in reasoning is exhibited by the following rules for arbitrary formulas F, G provided that the semantics 〚G〛 of G do not depend on x (see Definition 4):

The first rule mimics the logical inference rule of “universal elimination”, the second one mimics the inference rule of “universal introduction” (except that our version of first-order logic does not involve terms and variables and thus copes without variable substitutions). This pair of rules in a nutshell yields that (∀x. F) is the “weakest” formula G from which F is a logical consequence and whose semantics do not depend on x. Dually, we have for existential quantification the following pair of rules:

These rules state that (∃x. F) is the “strongest” formula G that is a logical consequence of F and whose semantics 〚G〛 does not depend on x. We are now going to derive appropriate categorical characterizations of

the

corresponding

functors

and

from the category of all formulas to of all those formulas whose semantics do not depend the subcategory on x. For this, we may notice that, from above rules, the relations (G ⊧ F) and (F ⊧ G) involve two kinds of relations, a more general relation F that may depend on x and a more special relation G that is independent of x. In order

58

Use of Abstraction and Logic in Mathematics

to bring all relations to the “same level”, we introduce a syntactic “injection” functor whose maps are just identities, i.e., Ix(G) = G and Ix(f : F → G) = f : F → G. This allows us to express above rules as

and dually

Now, the first set of rules matches the assumptions of the second part of Proposition 7 for and (considering that the satisfaction relation ⊧ denotes the existence of an arrow in categories , respectively ); thus, we have . Likewise, the second set of rules matches the assumptions of the first part of that proposition for and ; thus, we have . Summarizing, the is the right adjoint of the injection functor Ix while universal functor the existential functor is its left adjoint. These categorical

considerations can be easily transferred to characterizations of the corresponding functors and from the category of all predicates to the subcategory of all those predicates that do not depend on x with the semantic “injection” functor whose maps are just identities, i.e., and . We then have

and dually

Now, the first set of rules matches the assumptions of the second part of Proposition 7 for and (considering that the subset relation ⊆ denotes the existence of an arrow in categories , respectively ); thus, we have . Likewise, the second

A Novel Categorical Approach to Semantics of Relational First-Order Logic

59

set of rules matches the assumptions of the first part of that proposition for and ; thus, we have . Summarizing, the universal functor is the right adjoint of the injection functor Jx while the existential functor is its left adjoint. is the weakest predicate Q that The above rules say that holds while is does not depend on x for which the strongest predicate Q that does not depend on x for which holds. Since left and right adjoints are uniquely defined, these rules represent and which can implicit but unique definitions of be explicitly written as follows:

From and Definition 5), this can also be written as follows:

(see

Thus, we have derived alternative characterizations 〚∀x. F〛= FORALLx(〚F〛) and 〚∃x. F〛= EXISTSx(〚F〛) that are both constructive and universal. This is exactly the characterization whose correctness we have proved in Proposition 6.

AN IMPLEMENTATION OF THE CATEGORICAL SEMANTICS In this section, we describe how the constructions that we have theoretically modeled in Section 2 can be actually implemented. For this purpose, we use RISCAL (RISCAL is developed at JKU, Linz, Austria, https://www3.risc. jku.at/research/formal/software/RISCAL/, see [13]), the RISC Algorithm Language [13,17], a specification language, and an associated software system for modeling mathematical theories and algorithms in a specification language based on first-order logic and set theory. The language is based on a type system where all types have finite sizes (specified by the user); this allows for fully automatically deciding formulas and verifying the correctness of algorithms for all possible inputs. To this end, the system translates every

60

Use of Abstraction and Logic in Mathematics

syntactic phrase into an executable form of its denotational semantics; the RISCAL model checker evaluates these semantics to determine the results of algorithms and the truth values of formulas such as the postconditions of algorithms. Since the domains of RISCAL models have (parameterized but) finite size, the validity of all theorems and the correctness of all algorithms can be fully automatically checked; the system has been mainly employed in educational scenarios [18,19]. Figure 2 gives a screenshot of the software with the RISCAL model that is going to be discussed below.

Figure 2. The RISCAL Software.

Figure 3 and Figure 4 list a RISCAL model of the categorical semantics over a domain of N + 1 variables (identified with the natural numbers 0, …, N) with M + 1 values, for arbitrary model parameters N, M ∈ ℕ0; all theorems over these domains are decidable and can be checked by RISCAL. The RISCAL definition of domains, functions, and predicates closely correspond to those given in this paper; in particular, we have a domain Pred of predicates (since the number of variables is finite, by definition all relations are predicates) and predicate functions TRUE, FALSE, AND, OR, IMP, FORALL, EXISTS. Different from the categorical formulation, IMP is a binary function, not a family of unary functions; likewise, FORALL and EXISTS are binary functions whose first argument is a variable. Furthermore, we introduce functions NOT and EQUIV for the semantics of negation and conjunction and show by theorems Not and Equiv that they can be reduced to the other functions.

A Novel Categorical Approach to Semantics of Relational First-Order Logic

Figure 3. A RISCAL Model of the Categorical Semantics (Part 1).

61

62

Use of Abstraction and Logic in Mathematics

Figure 4. A RISCAL Model of the Categorical Semantics (Part 2).

All other logical operations are first defined in their usual set-theoretic form. Subsequently, we describe their categorical semantics by a pair of theorems: the first theorem claims that the set-theoretic semantics is equivalent to an implicit definition of the categorical semantics while the second theorem claims equivalence to the corresponding constructive definition. Choosing small parameter values N = 2 and M = 1 (i.e., relations with variables x0, x1, x2 and values 0, 1), RISCAL can easily check the validity of all claims, as demonstrated by the following output: RISC Algorithm Language 2.6.4 (10 December 2018) http://www.risc.jku.at/research/formal/software/RISCAL (C) 2016-, Research Institute for Symbolic Computation (RISC) This is free software distributed under the terms of the GNU GPL. Execute “RISCAL -h” to see the available command line options. ----------------------------------------------------------------Reading file /usr2/schreine/papers/CategoricalLogic2019/catlogic.txt Using N=2. Using M=1. Computing the value of Ass... Computing the value of TRUE...

A Novel Categorical Approach to Semantics of Relational First-Order Logic

63

Computing the value of FALSE... Type checking and translation completed. Executing True1(). Execution completed (3 ms). Executing True2(). Execution completed (1 ms). Executing False1(). Execution completed (0 ms). Executing False2(). Execution completed (1 ms). Executing And1(Set[Array[ℤ]],Set[Array[ℤ]]) with all 65536 inputs. PARALLEL execution with 4 threads (output disabled). ... Execution completed for ALL inputs (18,373 ms, 65,536 checked, 0 inadmissible). Executing And2(Set[Array[ℤ]],Set[Array[ℤ]]) with all 65,536 inputs. PARALLEL execution with 4 threads (output disabled). 46273 inputs (36446 checked, 0 inadmissible, 0 ignored, 9827 open)... Execution completed for ALL inputs (3576 ms, 65536 checked, 0 inadmissible). Executing Or1(Set[Array[ℤ]],Set[Array[ℤ]]) with all 65536 inputs. PARALLEL execution with 4 threads (output disabled). ... Execution completed for ALL inputs (26,889 ms, 65,536 checked, 0 inadmissible). Executing Or2(Set[Array[ℤ]],Set[Array[ℤ]]) with all 65,536 inputs. PARALLEL execution with 4 threads (output disabled). 42,676 inputs (32,887 checked, 0 inadmissible, 0 ignored, 9789 open)... Execution completed for ALL inputs (3907 ms, 65,536 checked, 0 inadmissible). Executing Imp1(Set[Array[ℤ]],Set[Array[ℤ]]) with all 65,536 inputs. PARALLEL execution with 4 threads (output disabled).

64

Use of Abstraction and Logic in Mathematics

... Execution completed for ALL inputs (48,592 ms, 65,536 checked, 0 inadmissible). Executing Imp2(Set[Array[ℤ]],Set[Array[ℤ]]) with all 65,536 inputs. PARALLEL execution with 4 threads (output disabled). ... Execution completed for ALL inputs (9462 ms, 65,536 checked, 0 inadmissible). Executing Not(Set[Array[ℤ]]) with all 256 inputs. PARALLEL execution with 4 threads (output disabled). Execution completed for ALL inputs (28 ms, 256 checked, 0 inadmissible). Executing Equiv(Set[Array[ℤ]],Set[Array[ℤ]]) with all 65,536 inputs. PARALLEL execution with 4 threads (output disabled). Execution completed for ALL inputs (354 ms, 65,536 checked, 0 inadmissible). Executing Forall1(ℤ,Set[Array[ℤ]]) with all 768 inputs. PARALLEL execution with 4 threads (output disabled). Execution completed for ALL inputs (1315 ms, 768 checked, 0 inadmissible). Executing Forall2(ℤ,Set[Array[ℤ]]) with all 768 inputs. PARALLEL execution with 4 threads (output disabled). Execution completed for ALL inputs (512 ms, 768 checked, 0 inadmissible). Executing Exists1(ℤ,Set[Array[ℤ]]) with all 768 inputs. PARALLEL execution with 4 threads (output disabled). Execution completed for ALL inputs (1299 ms, 768 checked, 0 inadmissible). Executing Exists2(ℤ,Set[Array[ℤ]]) with all 768 inputs. PARALLEL execution with 4 threads (output disabled). Execution completed for ALL inputs (461 ms, 768 checked, 0 inadmissible). These values are, however, the largest ones with which model checking is realistically feasible; choosing, for example, N = 3 and M = 2 gives for the checking theorem And1 about 4 × 109 possible inputs whose checking on a single processor core would take RISCAL more than two decades.

A Novel Categorical Approach to Semantics of Relational First-Order Logic

65

CONCLUSIONS In this paper, we developed a novel categorical interpretation of the semantics of a relational (term-less) variant of first-order logic [20] with the goal to aid the intuitive understanding of these formulas and to lay the seed of tools that illustrate the meaning of these formulas by the visualization of their semantics. The main advantage of this formulation is the (in comparison to other previous approaches) much more explicit illustration of the various logical constructions (connectives and quantifiers) as categorical notions; this may provide an alternative route to teaching semantics for first-order logic. Furthermore, since this categorical semantics is constructive, we could directly implement (and thus validate) it in the RISCAL software. We hope that this paper, by its self-contained nature and by focusing on the core principles of categorical logic rather than attempting an exhaustive treatment, contributes to the more widespread dissemination of categorical ideas to students and researchers of logic, its applications, and its automation; in particular, it may provide an alternative view on the semantics of firstorder logic by complementing the classical formulation and thus help to gain deeper insights. It remains to be shown, however, whether and how this view can be indeed helpful and illuminating in educational scenarios, i.e., in courses on logic and its applications. In previous works [21,22,23,24], we have strived to improve the understanding of the formal semantics of programming languages by developing corresponding tools with appropriate visualization techniques, partially also based on categorical principles. Other work of ours [25] has extended this work towards the visualization of the semantics of first-order formulas by pruned evaluation trees, however, based on the classical formulation. Future work of us will investigate how the categorical principles outlined in this paper can be transferred to corresponding novel tools and visualization techniques for education in semantics and logic—for instance, in describing software component systems whose semantics are described both from the categorical and the logical side, or elaborating a precise logical definition of contracts that have to be satisfied for a successful composition of components.

AUTHOR CONTRIBUTIONS Conceptualization, W.S. (Wolfgang Schreiner), W.S. (William Steingartner) and V.N.; methodology, W.S. (Wolfgang Schreiner) and W.S. (William Steingartner); software, W.S. (Wolfgang Schreiner); validation, W.S.

66

Use of Abstraction and Logic in Mathematics

(Wolfgang Schreiner) and W.S. (William Steingartner); formal analysis, W.S. (Wolfgang Schreiner) and W.S. (William Steingartner); investigation, W.S. (Wolfgang Schreiner), W.S. (William Steingartner) and V.N.; resources, W.S. (Wolfgang Schreiner); data curation, W.S. (Wolfgang Schreiner); writing— original draft preparation, W.S. (Wolfgang Schreiner); writing—review and editing, W.S. (William Steingartner); visualization, W.S. (Wolfgang Schreiner); supervision, W.S. (Wolfgang Schreiner) and W.S. (William Steingartner); project administration, W.S. (Wolfgang Schreiner) and W.S. (William Steingartner); funding acquisition, W.S. (Wolfgang Schreiner) and W.S. (William Steingartner). All authors have read and agreed to the published version of the manuscript.

FUNDING This work was supported by the the project KEGA 011TUKE-4/2020: “A development of the new semantic technologies in educating of young IT experts”, also in the frame of the initiative project “Semantic Modeling of Component-Based Program Systems” under the bilateral program “Aktion Österreich–Slowakei, Wissenschafts- und Erziehungskooperation” and by the Johannes Kepler University Linz, Linz Institute of Technology (LIT), Project LOGTECHEDU “Logic Technology for Computer Science Education”.

A Novel Categorical Approach to Semantics of Relational First-Order Logic

67

REFERENCES 1. 2. 3. 4.

5. 6. 7. 8.

9.

10.

11.

12.

13.

Tarski, A. The Semantic Conception of Truth: In addition, the Foundations of Semantics. Philos. Phenomenol. Res. 1944, 4, 341–376. Schmidt, D.A. Denotational Semantics—A Methodology for Language Development; Allyn and Bacon: Boston, MA, USA, 1986. Awodey, S. Category Theory, 2nd ed.; Oxford University Press: Okford, UK, 2010. Barr, M.; Wells, C. Category Theory for Computing Science; PrenticeHall, Inc., Division of Simon and Schuster One Lake Street: Upper Saddle River, NJ, USA, 1990. Brandenburg, M. Einführung in Die Kategorientheorie; Springer Spektrum: Berlin/Heidelberg, Germany, 2017. (In German) Pierce, B.C. Basic Category for Computer Scientists; MIT Press: Cambridge, MA, USA, 1991. Spiwak, D.I. Category Theory for the Sciences; MIT Press: Cambridge, MA, USA, 2014. Lawvere, F.W. Adjointness in Foundations. Dialectica 1969, 23, 281– 296. Available online: http://www.tac.mta.ca/tac/reprints/articles/16/ tr16.pdf (accessed on 9 March 2020). Jacobs, B. Categorical Logic and Type Theory; Studies in Logic and the Foundations of Mathematics; Elsevier: Amsterdam, The Netherlands, 1999; Volume 141. Abramsky, S. Logic and Categories As Tools For Building Theories. J. Indian Counc. Philos. Res. Issue Log. Philos. Today 2010, 27, 277– 304. Poigné, A. Category Theory and Logic. In Proceedings of the Category Theory and Computer Programming: Tutorial and Workshop, Guildford, UK, 16–20 September 1985; Lecture Notes in Computer Science; Springer: Berlin, Germany; Volume 240, pp. 103–142. Abramsky, S.; Tzevelekos, N. Introduction to Categories and Categorical Logic. In New Structures for Physics; Lecture Notes in Physics; Springer: Berlin, Germany, 2010; Volume 813, pp. 3–94. RISCAL. The RISC Algorithm Language (RISCAL). Available online: https://www3.risc.jku.at/research/formal/software/RISCAL/ (accessed on 9 July 2020).

68

Use of Abstraction and Logic in Mathematics

14. Schreiner, W.; Reichl, F.X. Mathematical Model Checking Based on Semantics and SMT. Trans. Internet Res. 2020, 16, 4–13. 15. Semtech. Semantic Technologies for Computer Science Education. Available online: https://www3.risc.jku.at/projects/SemTech/ (accessed on 15 July 2020). 16. Nielson, H.R.; Nielson, F. Semantics with Applications: An Appetizer; Undergraduate Topics in Computer Science; Springer: London, UK, 2007. 17. Schreiner, W. The RISC Algorithm Language (RISCAL)—Tutorial and Reference Manual (Version 1.0); Technical Report; RISC, Johannes Kepler University: Linz, Austria, 2017. 18. Schreiner, W. Validating Mathematical Theories and Algorithms with RISCAL. In Proceedings of the 11th Conference on Intelligent Computer Mathematics (CICM 2018), Hagenberg, Austria, 13–17 August 2018; Rabe, F., Farmer, W., Passmore, G., Youssef, A., Eds.; Lecture Notes in Computer Science/Lecture Notes in Artificial Intelligence; Springer: Berlin, Germany, 2018; Volume 11006, pp. 248–254. 19. Schreiner, W.; Brunhuemer, A.; Fürst, C. Teaching the Formalization of Mathematical Theories and Algorithms via the Automatic Checking of Finite Models. In Proceedings of the Post-Proceedings ThEdu’17, Theorem Proving Components for Educational Software, Gothenburg, Sweden, 6 August 2017. 20. Schreiner, W.; Novitzká, V.; Steingartner, W. A Categorical Semantics of Relational First-Order Logic; Technical Report; RISC, Johannes Kepler University: Linz, Austria, 2019. 21. Schreiner, W.; Steingartner, W. Visualizing Execution Traces in RISCAL; Technical Report; RISC, Johannes Kepler University: Linz, Austria, 2018. 22. Steingartner, W.; Eldojali, M.A.M.; Radaković, D.; Dostál, J. Software support for course in Semantics of programming languages. In Proceedings of the IEEE 14th International Scientific Conference on Informatics, Poprad, Slovakia, 14–16 November 2017; pp. 359–364. 23. Steingartner, W.; Novitzká, V. Categorical Semantics of Programming Langages. In Selected Topics in Contemporary Mathematical Modeling; Monographs; Czestochowa University of Technology: Czestochowa, Poland, 2017; Volume 331, Chapter 11; pp. 167–192.

A Novel Categorical Approach to Semantics of Relational First-Order Logic

69

24. Steingartner, W.; Novitzká, V. Learning tools in course on semantics of programming languages. In Proceedings of the MMFT 2017— Mathematical Modelling in Physics and Engineering, Poraj, Poland, 18–21 September 2017; pp. 137–142. 25. Schreiner, W.; Steingartner, W. Visualizing Logic Formula Evaluation in RISCAL; Technical Report; RISC, Johannes Kepler University: Linz, Austria, 2018.

Chapter

INFINITARY CLASSICAL LOGIC: RECURSIVE EQUATIONS AND INTERACTIVE SEMANTICS

3

Michele Basaldella Universite d’Aix–Marseille, CNRS, I2M, Marseille, France

In this paper, we present an interactive semantics for derivations in an infinitary extension of classical logic. The formulas of our language are possibly infinitary trees labeled by propositional variables and logical connectives. We show that in our setting every recursive formula equation has a unique solution. As for derivations, we use an infinitary variant of Tait–calculus to derive sequents. The interactive semantics for derivations that we introduce in this article is presented as a debate (interaction tree) between a test (derivation candidate, Proponent) and an environment ¬S (negation of a sequent, Opponent). We show a completeness theorem for derivations that we call interactive completeness theorem: the interaction between (test) and ¬S (environ-

Citation: (APA): Basaldella, M. (2014). Alternation Is Strict For Higher-Order Modal Fixpoint Logic. Paulo Oliva (Ed.): Classical Logic and Computation 2014. (15 pages). Copyright: © Creative Commons Attribution 3.0 Unported (https://creativecommons. org/licenses/by/3.0/).

72

Use of Abstraction and Logic in Mathematics

ment) does not produce errors (i.e., Proponent wins) just in case from a syntactical derivation of S.

comes

INTRODUCTION In this article, we present an interactive semantics for derivations — i.e., formal proofs — in a proof– system that we call infinitary classical logic. Infinitary classical logic. The system we consider is an infinitary extension of Tait–calculus [8], a sequent calculus for classical logic which is often used to analyze the proof theory of classical arithmetic and its fragments. In Tait–calculus, formulas are built from positive and negated propositional variables by using disjunctions ∨ and conjunctions ∧ of arbitrary (possibly infinite) arity. Negation ¬ is defined by using a generalized form of De Morgan’s laws. As for derivations, sequents of formulas are derived by means of rules of inference with a possibly infinite number of premises. In Tait–calculus, formulas and derivations — when seen as trees — while not necessarily finitarily branching, are well–founded. In this work, we remove the assumption of well–foundedness, and we let formulas and derivations be infinitary in a broader sense. What we get is the system that we call infinitary classical logic. Recursive equations. The main reason for introducing infinitary classical logic is our interest in studying recursive (formula) equations in a classical context. Roughly speaking, by recursive equation we mean a pair of formulas (v,F), that we write as , where v is an atom and F is a formula which depends on v, i.e., such that v occurs in F. For instance, is a recursive equation. A solution of a recursive equation, say , is any formula G which is equal to ¬G∨G. Solutions of recursive equations are often called recursive types, and they have been studied extensively in the literature (see e.g., [2, 7] and the references therein). In this area, one usually aims at finding a mathematical space in which the (interpretations of) equations have a unique solution (or, at least, a canonical one). In this paper, we define formulas as (possibly infinitary) labeled trees, and we prove existence and uniqueness of solutions of equations within the “space” of the formulas. As it turns out, if G is the solution of a recursive equation (i.e., a recursive type), then G is not well–founded. This fact motivates us to consider infinitary formulas in our broader sense.

Infinitary Classical Logic: Recursive Equations and Interactive Semantics

73

Derivations and tests. Since formulas are infinitary, we let derivations be infinitary as well. We obtain a cut–free sequent calculus in which the solutions of all recursive equations are derivable (in the sense of Theorem 3.16(2)). As expected, since we deal with ill–founded (i.e., non–well– founded) derivations, the price to pay for this huge amount of expressivity is that our calculus is inconsistent (in the sense of Remark 3.17(c)). In spite of this, it is precisely this notion of infinitary derivation that we want to study in this work. To this aim, we introduce a semantics for derivations, as we now explain. Traditionally, the proof theory of classical logic is centered around the notion of derivability. In this paper, we are interested in analyzing the structure of our infinitary derivations. To this aim, we introduce the notion of test. A test is a tree labeled by logical rules (no sequents), and the fundamental relation between tests and derivations can be informally stated as follows: given a sequent S, if it is the case that by adding sequent information in an appropriate way to T we obtain a derivation of S, then we say that comes from a derivation of S. In this article, the syntactical concept that we investigate is “ comes from a derivation of S” rather than the traditional one “S is derivable.” Also, note that our concept is stronger than the usual one: if comes from a derivation of S, then S is obviously derivable! To grasp the idea behind our concept from another viewpoint, consider the lambda calculus. By the Curry–Howard correspondence, untyped lambda terms can be seen as “tests” for natural deduction derivations, and “ comes from a derivation of S” can be read as “the untyped lambda term has (simple) type S in the Curry–style type assignment.” Interactive semantics and completeness. Traditionally, in order to study the concept of derivability, one introduces a notion of model and eventually shows a completeness theorem: the syntactical notion of derivability and the semantical notion of validity (as usual, valid means “true in every model”) coincide. Here, we are interested in proving a completeness theorem as well. But, since we replace the syntactical concept of derivability with “ comes from a derivation of S”, we need to replace the semantical notion of validity with something else. So, we now let our interactive semantics enter the stage. In few words, our interactive semantics is organized as follows: first, we introduce the notion of environment ¬S (the negation of a sequent S) and then, we make the test and the environment ¬S interact. More precisely,

74

Use of Abstraction and Logic in Mathematics

we introduce the notion of configuration (a pair of the form ( ,¬S)) and define a procedure which makes configurations evolve from the initial configuration ( ,¬S) by means of a transition relation. As a result, we get a tree of configurations that we call an interaction tree. The procedure which determines the interaction tree is interactive in the sense that it can be seen as a debate between two players: Proponent (the test ) and Opponent (the environment ¬S). Opponent wins the debate if the interaction between and ¬S produces an error, i.e., a position in the interaction tree generated from ( ,¬S) which is labeled by an error symbol. Otherwise, Proponent wins the debate. Our main result is the interactive completeness theorem: (*) That is, the debate.

comes from a derivation of S if and only if Proponent wins

The motivation for introducing our semantics comes from our interest in extending the completeness theorem of ludics [5] to logics which are not necessarily polarized fragments of linear logic. The completeness theorem of [1] (called interactive completeness also there) can be stated, up to terminology and notation, in our setting as (**) The crucial point is, of course, the RHS of (**). Here, S represents a formula of a polarized fragment of linear logic, and and are designs (proof–like objects similar to our tests). In [1], formulas are interpreted as sets of designs such that ¬¬S = S, where ¬S is the set of designs given by: ∈ ¬S just in case for every ∈ S, the interaction between and M does not produce errors (so, the RHS of (**) means ∈ ¬¬S = S). In [1], the result of the interaction between and is determined by a procedure of reduction for designs which reflects the procedure of cut– elimination of the underlying logic. Of course, it would be very nice if we could use the same approach in our setting! Unfortunately, if one tries to remove the polarities from ludics, then one encounters several technical problems related to cut–elimination (that

Infinitary Classical Logic: Recursive Equations and Interactive Semantics

75

we do not discuss here). To the present author, the most convenient way to cope with non–polarized logics, is to build a new framework from the very beginning, keeping the format of the statement (**) as guiding principle, the rest being — in case — sacrificed. The choice here is purely personal: the present author believes that the significance of ludics is ultimately justified by the completeness theorem not e.g., by the fact that the interpretation of the logical formulas is induced by a procedure of reduction for designs. This describes the origin of this work. Finally, we note that the RHS of (*) is just the RHS of (**) in case ¬S is a singleton (and so, it can be identified with ¬S itself). Indeed, this is the case for the interactive semantics presented in this paper. As for future work, we plan to adapt our interactive semantics to analyze derivations in second–order propositional classical logic. Outline. This paper is organized as follows. In Section 2 we recall some preliminary notions about labeled trees. In Section 3 we introduce formulas and recursive formula equations, and we prove the existence and the uniqueness of solutions of equations. We also define the notion of derivation and discuss the derivability of some sequents. In Section 4 we present our interactive semantics and prove the interactive completeness theorem.

PRELIMINARIES: POSITIONS AND LABELED TREES In this section, we recall the basic notions of position and labeled tree. We also establish some notation and terminology that we extensively use in the sequel. Definition 2.1 (Position, length). Let Let ∞ be any object such that ∞ −→ Since ∞

be the set of the natural numbers.

. We call position any function p :

∪ {∞} which satisfies the following property: , the natural number n above is unique. We call it the length of

the position and denote it by lg(p). The set of all positions is denoted by ? and we use p,q,r... to range over its elements. Notation and Terminology 2.2. Let p, q and r be positions, and let U, V and W be sets of positions.

Use of Abstraction and Logic in Mathematics

76

(1)

(2)

A position p is also written as . In particular, we write for the unique position of length 0 that we call the empty position. The concatenation of p and q is the position p * q defined as follows:

In other words, The operation of concatenation is associative (i.e and it has as neutral element (i.e., . (3)

(4)

. ). Note also that

We write , and we say that q is an extension of p and that p is a restriction of q. If lg(t) ≥ 1 (resp. lg(t) = 1), then we also say that q is a proper (resp. immediate) extension of p and that p is a proper (resp. immediate) restriction of q, and we write We write for the set , and

, for every family of sets that of positions. Definition 2.3 (Labeled tree, domain). Let L be a set. Let ∞L be any object

. A tree labeled by L is a function such that which satisfies the following properties:

We call the set dom(T).

the domain of T and we denote it as

Let T and U be trees labeled by L. Since T(q) = ∞L for all , the tree T is completely determined by the values it takes on its domain. In particular, T = U if and only if dom(T) = dom(U) and T(p) = U(p) for all p ∈ dom(T) .

Infinitary Classical Logic: Recursive Equations and Interactive Semantics

77

Notation and Terminology 2.4. Let T and U be trees labeled by L, and let p ∈ dom(T). (1)

(2)

We say that p is a leaf of T if there is no q ∈ dom(T) such that . The subtree of T above p is the tree Tp labeled by L defined as follows:

Note that we have such that

, for every p and q in .

We say that U is a subtree of T if U = Tq, for some q ∈ dom(T). If lg(q) = 1, then we also say that U is an immediate subtree of T. (3)

(4)

If L is a product, i.e., L = A×B for some sets A and B, then we write TL(p) and TR(p) for the left and the right component of T(p) respectively (i.e., if T(p) = (a,b), then TL(p) = a and TR(p) = b). We say that T is ill–founded if there exists a function for every . We also say that T is well–founded if it is not ill–founded.

INFINITARY CLASSICAL LOGIC In this section, we present our infinitary version of classical logic. In Subsection 3.1 we introduce formulas as possibly infinitary labeled trees, and in Subsection 3.2 we introduce the notion of recursive formula equation and prove that solutions of equations exist and they are unique. Finally, in Subsection 3.3 we introduce the concept of derivation and observe some basic properties.

Formulas We now define the concept of formula and the operation of negation. Definition 3.1 (Propositional variable). Let + and − be two distinct symbols. Let

and

. We call the elements

positive (resp. negated) propositional variables, and we of (resp. (−, v) of ¬ ) by v (resp. ¬v). denote a generic element (+, v) of

Use of Abstraction and Logic in Mathematics

78

that

Definition 3.2 (Formula). Let ∨ and ∧ be two distinct symbols such

. We call formula any tree T labeled by which satisfies the following property:

(F) for every p ∈ dom(T), if

, then p is a leaf of T

In the sequel, we use the letters F,G,H,... to range over formulas Notation and Terminology 3.3. Let F be a formula (1)

Since for every p ∈ dom(F), the subtree Fp is a formula, we say that Fp is a subformula of F and that Fp occurs in F. If lg(p) = 1 then we also say that Fp is an immediate subformula of F.

(2)

If

and we say that F is

(resp. , then we an atom. If abusively denote F by v (resp. ¬v) and we say that F is a positive (resp. negated) atom. (3)

If , then we call F a compound formula. The set of natural numbers

is said to be the arity of F. Let

. Since

is an immediate subformula of F, we for every i ∈ I the formula and we say that F is a disjunctive also denote F by (resp. conjunctive) formula. If I = {0,...,n − 1} for some , then we for

also write F. Obviously,

something like “let F be , then we mean that F is a disjunctive (resp. conjunctive) formula of arity {0,...,n−1} and that

if

we

write

, for each k < n. Similarly, we may use the expression to denote the disjunctive (resp. conjunctive) ,

formula F whose arity is {0,1} and such that

— then we write f and so on. Finally, if (for false) and t (for true) rather than ∨[ ] (resp. ∧[ ]) respectively. Definition 3.4 (Negation). Let F be a formula. The formula ¬F, that we

call the negation of F, is defined as follows:

and

Infinitary Classical Logic: Recursive Equations and Interactive Semantics

79

We observe that the negation is involutive, i.e., ¬¬F = F, for every formula F. Furthermore,

According to Definition 3.2, formulas are allowed to be infinitary: they as arity, and they can also be ill–founded. may have an infinite set In Tait’s work [8], the situation is rather different: the arity of a compound formula need not be a subset of N, and only well–founded formulas are considered. Let us discuss our choices. As for the first difference, we restrict attention to sets of natural numbers mainly for expository reasons; a pleasant consequence of this choice is that we can define sequents as formulas (Definition 3.11) rather than finite sets (as in [8]), multi–sets, or sequences of formulas. As for the second one, we have to consider ill–founded formulas because solutions of recursive equations are formulas which always have this property (see Theorem 3.9). We finally observe that the principles of (transfinite) induction and recursion cannot be applied to ill–founded formulas. In particular, — even though we are in classical world — it is not possible to give to our formulas a Tarskian definition of truth. Nevertheless, in our setting it is possible to define a reasonable notion of derivation, as we do in Subsection 3.3.

Recursive Formula Equations In this subsection, we define the concepts of recursive equation and solution of an equation. We prove that every recursive equation has a unique solution. We also give some concrete examples. Definition 3.5 (Substitution). Let F and G be formulas, and let v be a positive atom. We define the formula F[G/v] obtained by the substitution of G for v and of ¬G for ¬v in F as follows. Let . and

We

set

Use of Abstraction and Logic in Mathematics

80

The correctness of the previous definition is justified by the following lemma. Lemma 3.6. Let R and S be as in Definition 3.5. (a) (b)

For every r ∈ R and every

Let , and let R and q, q’ in V such that and q = q’ .

Proof. (a) If as r is a leaf of F. Hence, have (and hence q = q’ ).

. . Suppose that there are r, r’ in and . Then, r = r’

, Since both r and r’ are restrictions of p, we

. Since r and r’ are leaves of F, we conclude r = r’

The following proposition easily follows from our definition of substitution. Proposition 3.7. Let F and G be formulas, and let v be a positive atom. Then: (1)

v[G/v] = G and (¬v)[G/v] = ¬G;

; (2) (3) F[G/v] = F, if neither v nor ¬v is a subformula of F; (4) ¬(F[G/v]) = (¬F)[G/v]. Definition 3.8 (Recursive formula equation, solution). A recursive formula equation (or recursive equation, or just equation) is an ordered pair of formulas (v,F), that we write as , such that: (R1) v is a positive atom; (R2) F is a compound formula; (R3) F(p) ∈ {v,¬v}, for some p ∈ dom(F).

A solution of

is a formula G such that G = F[G/v].

We now discuss the previous definition. To begin with, note that by (R2), a pair of atoms such as (v,a) is not a recursive equation. The reason to exclude such pairs is to avoid to consider (v,v) — which would be a trivial equation,

Infinitary Classical Logic: Recursive Equations and Interactive Semantics

81

as every formula would be a solution — and (v,¬v) — which would have no solution at all. Up to now, our choices are perfectly in line with [3, 2]. In our setting, we also impose the additional condition (R3). The aim of this clause is to exclude pairs of the form (v,F) where neither v nor ¬v actually occurs in F. The reason is that pairs like the one above would be trivial equations as well: by Proposition 3.7(3), F itself would be the unique solution, as we have F[G/v] = F for every formula G. We now turn attention to the literature on recursive types. In this topic, the theorem which states existence and uniqueness of solutions of recursive equations is perhaps the most important result. One usual way to prove it is to show that the mathematical space (in our case, it would be the set of formulas) forms a complete metric space with respect to some metric. Then, the result follows by applying Banach’s fixpoint theorem, by using the fact that the operation of substitution induces a contractive map from the space to itself [3, 2]. Another traditional way to prove that result requires to introduce in the space some notion of approximation. One then shows that suitable sequences of “lower” (resp. “upper”) approximations converge to a “lower” (resp. “upper”) solution of the equation. To prove the result, one eventually proves that the two solutions coincide (see e.g., [7, 1] and the references therein). By contrast, in our setting we are able to prove the result in a direct and elementary way; our method does not require to explicitly introduce any sort of metric or notions of approximation. We do not claim that our result is new, as there are several similar results in the literature. However, we believe that our proof deserves some attention, as it is quite simple and self–contained. Theorem 3.9 (Existence and uniqueness of solutions). Every recursive equation

has a unique solution G. Furthermore, G is ill–founded.

Proof. Let R and S be as in Definition 3.5. First, we observe that by condition (R2) of Definition 3.8, . In particular, . Furthermore, by condition (R3), R is non–empty. We now remark that by definition of substitution, if H is a solution

of the equation

, then the set of positions dom(H) has to satisfy . So, we are interested in studying those sets of positions X such that . To this aim, we define

Use of Abstraction and Logic in Mathematics

82

The key property, that we show in a moment, is that A is the unique set such that . This fact is known as Arden’s Rule in the literature of formal languages theory (see e.g., [6]). We now prove this fact in our setting (our proof is adapted from the one given in [6]). (i) Arden’s Rule: Proof of (i).

If

if and only if X = A. X = A,

then .

To

show .

By

the

converse,

induction

on

let n,

X

we

be

show

that

such

that

for

every .

Finally,

if

p



A,

then

p .



An

for

some

By (i), the domain of any solution of the recursive equation has to be A. However, this fact does not give us any hint about how to define the values of a solution G (i.e., G(p), for p ∈ A). To tackle this problem, we now show the following properties. (ii)

(a) For every

.

, each p ∈An can be factorized in a unique way as

(b)

For every and every then An and Am are disjoint. Proof of (ii). (a) By induction on n. If n = 0, then p ∈ A0 = S. By Lemma

3.6(a), for no r ∈ R and

we have

m+1, then

∈ R and a unique q ∈ Am such that

. So, we set

. If n =

. By Lemma 3.6(b), there is a unique r . By inductive hypothesis, q

Infinitary Classical Logic: Recursive Equations and Interactive Semantics

can be written as

83

. Hence, p can be factorized for every 1 ≤ k < n, and

as . (b)

Suppose, for a contradiction, that for some p ∈ A there are n and

m in such that n < m, p ∈ An and p ∈ Am. Let n be the least natural number for which this fact holds. By Lemma 3.6(a), we it is the case that p ∈ A0 = S and have n > 0, as for no

. Suppose now that n > 0, and . Since , we can apply Lemma 3.6(b) and we obtain r = r’ and t = t’ . But then, we have t ∈ An−1 and t ∈ Am−1. This contradicts the minimality of n. By (ii), it follows that each p ∈ A can be uniquely factorized as

, where np is the unique We are now ready to define G. Let p ∈ A and let

. We set

such that p ∈ An.

be the cardinality of and

We now show that G is a formula and that it is a solution of , then q ∈ dom(G).

(iii) (1)

, then p is a leaf of G.

(2) If (3) G = F[G/v]. Proof then

of

.

(iii).

(1)

either

Anp ∪ Ak ⊆ A = dom(G).

,

or

. In both cases, t ∈ S. Thus, q ∈

, we have just in case F(s ) ∈ . Since s is a leaf of F and , no proper extension of p is in dom(G). So, p is a leaf of G. . Let p ∈ (3) By (i),

(2)

Let

p

p

Use of Abstraction and Logic in Mathematics

84

dom(G). If np = 0, then p = s p ∈ S and G(p) = F(p) = F[G/v](p). Suppose now that np > 0. Let q be

the proof of (ii)(a)). If

. By definition, s p = s q (see

, then mp = mq and G(p) = G(q) = F[G/v]

, then mp = mq +1 and G(p) = ¬G(q) = F[G/v](p).

(p). If

We now prove that G is the unique solution of the recursive equation . (iv) If H is a formula such that H = F[H/v], then H = G. Proof of (iv). By (i), dom(H) = A. We now prove that for every , for each p ∈ An we have H(p) = G(p). We reason by induction on n. If n = 0, then p ∈ S. Hence, H(p) = F[H/v](p) = F(p) = G(p). Suppose now

. By inductive hypothesis,

that n = m+1. Let q be H(q) = G(q). If

, then H(p) = F[H/v](p) = H(q) = G(q) = G(p). If

, then H(p) = F[H/v](p) = ¬H(q) = ¬G(q) = G(p). Finally, we show that G is ill–founded. (v)

There exists a function , for every . Proof of (v). Recall that R is

every

such that non–empty

and

that

for each

. As

, for every

. So,

. Thus, the function f given by: , has the required property.

for

The proof of Theorem 3.9 is now complete. Example 3.10. Let us consider the recursive equations Let U, W and V be the solutions of the equations (1), (2) and (3) respectively. We have dom(U) = dom(W) = dom(V) = A, where A, we have U(p) = ∨, W(p) = ∧ and

, for all k < lg(p)}. Moreover, for p ∈

Infinitary Classical Logic: Recursive Equations and Interactive Semantics

85

In our setting, can be used to represent the equation “X = X → X” which is a well–known example of equation of mixed variance in the literature of recursive types (see e.g., [7]).

Derivations In this subsection, we introduce the notions of sequent and derivation. We also show some sequents which are derivable in our framework. In this work, we define sequents as special disjunctive formulas. We think that this choice is convenient, as it makes clearer the “duality” between sequents and environments (Definition 4.1(2)). Derivations are defined to be trees labeled by sequents and rules. Our definition of derivation is actually very similar to that of pre–proof in [4] (very roughly, a pre–proof is a not necessarily well– founded derivation in a sequent calculus for classical logic with the ω–rule). Definition 3.11 (Sequent). We call sequent any disjunctive formula F whose arity is {0,...,n−1}, for some . The set of all sequents is denoted by SEQ. Notation 3.12. Let S = ∨[F0,...,Fn−1] be a sequent. Let G be a formula. We write ∨[F0,...,Fn−1,G], and sometimes also S∨G, for the

sequent T of arity {0,...,n} defined as follows: and

In particular, if n = 0 then T = f ∨G = ∨[G] (recall that f is an abbreviation for ∨[ ]). Definition 3.13 (Rule). A rule is either an axiom rule or a disjunctive rule or a conjunctive rule. •

An

axiom

rule

is

an

ordered

triple

. •

A disjunctive rule is an ordered triple .



A conjunctive rule is a ordered pair

.

Use of Abstraction and Logic in Mathematics

86

The set of all rules is denoted by RULES. Definition 3.14 (Derivation). A derivation is a tree T labeled by SEQ × RULES such that for each p ∈ dom(T) one of the following conditions (ax), (∨) and (∧) holds.

i ∈ I. In the sequel, we use π,ρ,σ,... to range over derivations. We say that π is a derivation of S if = S, and we say that S is derivable if there exists a derivation π of it. Remark

3.15.

Let

π

be

a

derivation.

for some (a)

The subformula property. For every k < n, there exists

Let . such

. that Fk is a subformula of (b) Leaves. The position p is a leaf of π if and only if p is as in (ax) of Definition 3.14, or p is as in (∧) of Definition 3.14 and Fk =t (recall that t is an abbreviation for ∧[]. (c) Rules. The rule πR(p) and the sequent πL(p) completely determine the sequent πL(q), for each q which immediately extends p in dom(π). In other words, if ρ and σ are two derivations of the same sequent S, then dom(ρ) = dom(σ) and ρR(p) = σR(p) for all p ∈ dom(ρ) together imply ρ=σ. Using a more traditional notation for derivations in the sequent calculus, conditions (ax), (∨) and (∧) of Definition 3.14 can be respectively written as

where {F,¬F} = {v,¬v} in (ax), i0 ∈ I in (∨), and in (∧) the expression “... for all i ∈ I” means that there is one premise for each i ∈ I . In particular, if , then there is no premise above the conclusion. The rules of inference displayed above essentially correspond to the normal rules of Tait–calculus [8]. But in contrast with [8], in our setting ill– founded derivations are permitted. As a consequence, we have the following results.

Infinitary Classical Logic: Recursive Equations and Interactive Semantics

87

Theorem 3.16. (1)

Let S = ∨[F0,...,Fn−1] be a sequent. Suppose that for some k < n either Fk is a disjunctive formula whose arity is not the empty set, or Fk is a conjunctive formula. Then, S is derivable.

(2)

be a recursive equation, and let G be its unique Let solution. Then, the sequent ∨[G] is derivable.

Proof. (1) Let

, for some

T labeled by SEQ×RULES as follows: for all k < lg(p)} and

Suppose now that

. Let i0 ∈ I. We define a tree ,

. We analogously define a tree U labeled , for all k
bool”. The type of the linear space, “linear_space”, is defined as a polymorphic set “:′a -> bool”.

Properties of Linear Space Properties of linear space can be derived in HOL4. For example, the following properties are formalized in HOL4. Property 1. The element “zero” in a linear space is unique. Property 2. For an arbitrary element x in a linear space, the negative element of x is unique, and denoted as −𝑥. Property 3. For arbitrary elements 𝑥, 𝑦 and 𝑧 in a linear space, if 𝑥+𝑦=𝑥+𝑧, then 𝑦=𝑧.

Property 4. For arbitrary elements 𝑥, 𝑦 and 𝑧 in a linear space, if 𝑥+𝑦=𝑧, then 𝑥=𝑧−𝑦. Property 5. For an arbitrary element 𝑘 in a number field, 𝑘𝜃 = 𝜃.

Property 6. For an arbitrary element 𝑥 in a linear space, 0𝑥 = 𝜃.

Property 7. For an arbitrary element 𝑥 in a linear space, (−1) = −𝑥.

Property 8. For an arbitrary element 𝑘 in a number field and arbitrary elements 𝑥, 𝑦 in a linear space, if 𝑘𝑥 = 𝑦, 𝑘 0, then 𝑥 = (1/𝑘) [7, 8].

Formalization of Linear Space Theory in the Higher-Order Logic ...

105

Proof of Properties of Linear Space We complete the proof of 37 properties related to linear space, and create 37 theorems in the linear space theory. Without loss of generality, we show the proof process for Property 1 (i.e., the element “zero” in a linear space is unique). The property’s proof uses the goal-oriented proof method, and its process is shown in Figure 1.

Figure 1: Proof of Property 1.

The proving process consists of five steps. Step 1. Give the initial goal of Property 1. By using the element “zero” predefined during the formal modeling of linear space, the initial goal of this property is formally described as: “linear_space ∧s∧ls:′a:′a. (y = ls0) ∧ (x LP ls0 = x)”. “ls0” is the symbol of element “zero” in the expression. Algorithm 1 shows the result of the input of the initial goal in HOL4.

106

Use of Abstraction and Logic in Mathematics

Algorithm 1: Initial goal g.

Step 2. Start from the initial goal, and assume that the property desired is correct. Then use the tactic DISCH_TAC to simplify the initial goal g, moving the antecedent of the implicative goal g into the assumptions to get sub-goal g1. The process of Step 2 is shown in Algorithm 2. Algorithm 2: The subgoal g1.

Step 3. Use the tactic GEN_TAC to simplify sub-goal g1, thereby stripping the outermost universal quantifier from the conclusion of sub-goal g1 to obtain sub-goal g2. Algorithm 3 shows the process of Step 3. Algorithm 3: The subgoal g2.

Formalization of Linear Space Theory in the Higher-Order Logic ...

107

Step 4. For sub-goal g2, use the function EXISTS_UNIQUE_CONV existing in HOL4 to generate a theorem, that is, “(y = ls0) ∧ (x LP ls0 = x)) (?y· (y = ls0) ∧ (x LP ls0 = x)) ∧ !y. ((y = ls0) ∧ (x LP ls0 = x)) ∧ = ls0) ∧ (x LP ls0 = x) ==> (y = y’)”. Then apply the tactic RW_TAC to g2 by using the theorem, so as to get the simplified sub-goal g3. The process of Step 4 is shown in Algorithm 4. Algorithm 4: The subgoal g3.

Step 5. Apply the tactic RW_TAC to the sub-goal g3 by using an axiom of the definition of linear space, that is, “zero_def: [linear_space s ls] ∣- !x. xLP ls0 = x”, so as to prove the sub-goal g3 in a direct manner. The HOL4 system then returns the proved sub-goals g2 & g1 one by one, until the initial goal g is returned. The proof result is given in Algorithm 5. Algorithm 5: The proof result.

Finally, this paper generates a theorem named “zero_unique” and saves it in the linear space theory by using a theorem saving tool “store_thm”. The result is shown in Algorithm 6. Algorithm 6: Theorem “zero_unique”.

108

Use of Abstraction and Logic in Mathematics

According to the above processes, when carrying out a formal proof by the goal-oriented proof method, the first requirement is to accurately describe the initial goal, and then select appropriate tactics against the specific characteristics of goals in different stages and, finally, to constantly simplify the goals by using the proved theorems and tactics so as to complete the whole formal proof.

CONCLUSION We have presented the formal modeling of the linear space and the formal proof of its properties in HOL4. Our results enriched the existing theories of HOL4, thus laying the underpinnings for theorem proving based verification with the linear space theory for a broad range of applications. Further improvements include the formalization of linear combinations, linear dependence, linear independence, and subspace. The formalization of these theories will contribute to a more powerful formal verification engine in terms of the linear space theory.

ACKNOWLEDGMENT This research was supported by an international scientific and technological cooperation project (2011DFG13000) backed by The Ministry of Science and Technology of China.

Formalization of Linear Space Theory in the Higher-Order Logic ...

109

REFERENCES 1. 2.

3.

4.

5.

6.

7. 8.

J. R. Harrison, Theorem proving with the real numbers [Ph.D. thesis of Philosophy], University of Cambridge, 1996. A. Habibi, S. Tahar, and A. Ghazel, “Formal modelling of the ADSP2100 processor using HOL,” in Proceedings of the IEEE Canadian Conference on Electrical and Computer Engineering, pp. 614–619, May 2002. J. Harrison, “Floating point verification in HOL light: the exponential function,” Formal Methods in System Design, vol. 16, no. 3, pp. 271– 305, 2000. B. Akbarpour, S. Tahar, and A. Dekdouk, “Formalization of fixed-point arithmetic in HOL,” Formal Methods in System Design, vol. 27, no. 1-2, pp. 173–200, 2005. K. Slind and M. Norrish, “A brief overview of HOL4,” in Theorem Proving in Higher Order Logics, vol. 5170 of Lecture Notes in Computer Science, pp. 28–32, Springer, Berlin, Germany, 2008. Cambridge Research Center of SRI International, “The HOL System TUTORIAL (for HOL Kananaskis-7),” 2011, http://cdnetworks-kr-1. dl.sourceforge.net/project/hol/hol/kananaskis-7/kananaskis-7-tutorial. pdf. W. Qiu, Advanced Algebra, Higher Education Press, Beijing, China, 1996. Yizhong Lan, Simple Tutorial for Advanced Algebra, Peking University Press, Beijing, China, 2002.

Chapter

LANGUAGE AND PROOFS FOR HIGHER-ORDER SMT (WORK IN PROGRESS)

5

Haniel Barbosa1, Jasmin Christian Blanchette1,2,3, Simon Cruanes1, Daniel El Ouraoui1, Pascal Fontaine1 University of Lorraine, CNRS, Inria, and LORIA, Nancy, France Vrije Universiteit Amsterdam, Amsterdam, The Netherlands 3 Max-Planck-Institut f¨ur Informatik, Saarbr¨ucken, Germany 1 2

Satisfiability modulo theories (SMT) solvers have throughout the years been able to cope with increasingly expressive formulas, from ground logics to full first-order logic modulo theories. Nevertheless, higher-order logic within SMT is still little explored. One main goal of the Matryoshka project, which started in March 2017, is to extend the reasoning capabilities of SMT solvers and other automatic provers beyond first-order logic. In this preliminary report, we report on an extension of the SMT-LIB language, the standard input format of SMT solvers, to handle higher-order constructs. We Citation: (APA): Barbosa, H., Blanchette, J., Cruanes, S., El Ouraoui, D. & Fontaine, P. (2017). Language and Proofs for Higher-Order SMT (Work in Progress). C. Dubois and B. Woltzenlogel Paleo (Eds.): Fifth Workshop on Proof eXchange for Theorem Proving (PxTP 2017). (8 pages). Copyright: © Creative Commons Attribution 3.0 Unported (https://creativecommons. org/licenses/by/3.0/).

112

Use of Abstraction and Logic in Mathematics

also discuss how to augment the proof format of the SMT solver veriT to accommodate these new constructs and the solving techniques they require.

INTRODUCTION Higher-order (HO) logic is a pervasive setting for reasoning about numerous real-world applications. In particular, it is widely used in proof assistants (also known as interactive theorem provers) to provide trustworthy, machinecheckable formal proofs of theorems. A major challenge in these applications is to automate as much as possible the production of these formal proofs, thereby reducing the burden of proof on the users. An effective approach for stronger automation is to rely on less expressive but more automatic theorem provers to discharge some of the proof obligations. Systems such as HOLYHammer, Miz , Sledgehammer, and Why3, which provide a one-click connection from proof assistants to first-order provers, have led in recent years to considerable improvements in proof assistant automation [8]. Today, the leading automatic provers for firstorder classical logic are based either on the superposition calculus [1, 12] or on CDCL( ) [11]. Those based on the latter are usually called satisfiability modulo theory (SMT) solvers and are the focus of this paper. Our goal, as part of the Matryoshka project,1 is to extend SMT solvers to natively handle higher-order problems, thus avoiding the completeness and performance issues associated with clumsy encodings. In this paper, we present our first steps towards two contributions within our established goal: to extend the input (problems) and output (proofs) of SMT solvers to support higher-order constructs. Most SMT solvers support SMT-LIB [5] as an input format. We report on a syntax extension for augmenting SMTLIB with higher-order functions with partial applications, λ-abstractions, and quantification on higherorder variables (Section 2). Regrettably, there is no standard yet for proof output; each proof-producing solver has its own format. We focus on the proof format of the SMT solver veriT [9]. This solver is known for its very detailed proofs [2,6], which are reconstructed in the proof assistants Isabelle/HOL [7] and the GAPT system [10]. Proofs in veriT accommodate the formula processing and the proof search performed by the solver. Processing steps are represented using an extensible set of inference rules described by Barbosa et al. [2]. Here, we extend this calculus to support transformations such as β-reduction and congruence with λ-abstractions, which are required by the new constructs that can appear in higher-order problems (Section 3).

Language and Proofs for Higher-Order SMT (Work in Progress)

113

The CDCL( ) reasoning performed by veriT is represented by a resolution proof, which consists of the resolution steps performed by the underlying SAT solver and the lemmas added by the theory solvers and the instantiation module. These steps are described in Besson et al. [6]. The part of the proof corresponding to the actual proving will change according to how we solve higher-order problems. In keeping with the CDCL( ) setting, the reasoning is performed in a stratified manner. Currently, the SAT solver handles the propositional reasoning, a combination of theory solvers tackle the ground (variable-free) reasoning, and an instantiation module takes care of the first-order reasoning. Our initial plan is to adapt the instantiation module so that it can heuristically instantiate quantifiers with functional variables and to extend veriT’s underlying modular engine for computing substitutions [4]. Since only modifications to the instantiation module are planned, the only rules that must be adapted are those concerned with quantifier instantiation:

These rules are generic enough to be suitable also for higher-order instantiation. Here, we focus on adapting the rules necessary to suit the new higher-order constructs in the formula processing steps.

A SYNTAX EXTENSION FOR THE SMT-LIB LANGUAGE By the time of starting this writing, the SMT-LIB standard was at version 2.5 [5], and version 2.6 was in preparation. Although some discussions to extend the SMT-LIB language to higher-order logic have occurred in the past, notably to include λ-abstractions, the format is currently based on many-sorted first-order logic. We here report on an extension of the language in a pragmatic way to accommodate higher-order constructs: higher-order functions with partial applications, λ-abstractions, and quantifiers ranging over higher-order variables. This extension is inspired by the work on TIP (Tools for Inductive Provers) [13], which is another pragmatic extension of SMT-LIB.

114

Use of Abstraction and Logic in Mathematics

SMT-LIB contains commands to define atomic sorts and functions, but no functional sorts. The language is first extended so that functional sorts can be built:

The second line is the addition to the original grammar. We use rather than a special case of to avoid ambiguities with parametric sorts and to have the same notation as the one generally used for functional sorts. The next modification is in the grammar for terms, which essentially adds a rule for λ-abstractions and generalizes the application so that any term can be applied to other terms:

The old rule is now redundant. Higher-order quantification requires no new syntax, since sorts have been extended to accommodate functions. Semantically, the well-sortedness rules in SMT-LIB are extended with the following typing rules for the arrow constructor -> and λ-abstraction:

Where a judgment is composed of two items. On the left hand side, a signature Σ, which is a tuple of function and constant symbols. On the right hand side, a term annotated by its type. The notation Σ[x : τ] stands for the signature that maps x to the type τ. If we want to define a function taking an integer as argument and returning a function from integers to integers, it is now possible to write

Language and Proofs for Higher-Order SMT (Work in Progress)

115

(declare-fun f (Int) (-> Int Int)). The following example illustrates higherorder functions, terms representing a function, and partial applications:

The term (g 1) is a function from Int to Int, in agreement with the sort of g. Then it is applied to 2 in the expression ((g 1) 2) of sort Int. The term (h 1) is a partial application of the binary function h, and is thus a unary function. The term (f (h 1)) is therefore well typed and is an Int. Note that in our presentation all functions of type (-> Int Int . . . Int) are equivalent to (-> Int (-> Int (-> . . . Int))). This implies, in particular, that in the example above ((g 1) 2) is semantically equal to (g 1 2). More precisely we may considerate the three different declaration of f below:

as the unique form (declare-fun f () (-> Int Int Int)). This follows from -> being right associative. The next example features λ-abstraction:

The term (lambda ((f (-> Int Int)) (x Int)) f x) is an anonymous function that takes a function f and an integer x as arguments. It is applied to g and 1, and the fully applied term is stated to be equal to (g 1). The assertion is a tautology (thanks to β-reduction).

AN EXTENSION FOR THE VERIT PROOF FORMAT Our setting is classical higher-order logic as defined by the extended SMTLIB language above, or abstractly described by the following grammar:

116

Use of Abstraction and Logic in Mathematics

where formulas are terms of Boolean type. We rely on the metatheory defined by Barbosa et al. [2]. Besides the axioms for characterizing Hilbert choice and ‘let’ described there, we add the following axiom for λ-abstraction, where ≃ denotes the equality predicate: (β)

In general, the notation stands for a term that may depend on is the corresponding term where the terms distinct variables are simultaneously substituted for ; bound variables in t are renamed to avoid capture. For readability, and because it is natural with a higher-order calculus, we present the rules in curried form—that is, functions can be partially applied, and rules must only consider unary functions. The notion of context is as in Barbosa et al.: Each context entry either fixes a variable x or defines a substitution . Abstractly, a context Γ fixes a set of variables and specifies a substitution subst(Γ). The substitution is the identity for ∅ and is defined as follows in the other cases: In the first equation, the update shadows any replacement of x induced by Γ. We write Γ(t) to abbreviate the capture-avoiding substitution subst(Γ)(t). Our new set of rules is similar to that in Barbosa et al. The rules TRANS, SKO∃, SKO∀, LET, and are unchanged. The BIND rule is modified to accommodate the new λ-binder:

The metavariable B ranges over ∀, ∃, and λ. The CONG rule is also modified to accommodate new cases. With respect to the first-order calculus, the left-hand side of an application can be an arbitrarily complex term, and

Language and Proofs for Higher-Order SMT (Work in Progress)

117

not simply a function or predicate symbol. Rewriting can now occur also on these complex terms. The updated CONG rule is as follows:

The only genuinely new rule is for β-reduction—that is, the substitution of an argument in the body of a λ-abstraction. It is similar in form to the LET rule from the first-order calculus:

Indeed, (let x ≃ u in t) and (λx. t)u are semantically equal.

Example 1. The derivation tree of the normalization of (λx. p x x) a is as follows:

Example 2. The following tree features a β-redex under a λ-abstraction. Let

the

Example applied

3. The transitivity term reduces to a

; and

where Π stands for the subtree

rule is useful when λ-abstraction. Let Γ1 =

118

Use of Abstraction and Logic in Mathematics

The soundness of the extended calculus is a simple extension of the soundness proof in the technical report by Barbosa et al. [3]. We focus on the extensions. Recall that the proof uses an encoding of terms and context in λ-calculus, based on the following grammar: As

previously,

reify(M

follows:



N)

is defined as . The encoded rules are as

Lemma 1. If the judgment M ≃ N is derivable using the encoded inference system with the theories Proof. The proof is by induction over the derivation M ≃ N. We only provide here the three new cases: CASE BIND B = λ: The induction hypothesis is . Using (β) and the side condition of the rule, we can also deduce that . Hence by α-conversion this is equivalent to

.

CASE CONG: This case follows directly from equality in a higher-order setting. CASE BETA: This case follows directly from (β) and equality in a higher-order setting. The remaining cases are similar to Barbosa et al. The auxiliary functions L(Γ)[t] and R(Γ)[u] are used to encode the judgment of the original inference system Γ ⊲ t ≃ u. They are defined over the structure of the context, as follows:

Language and Proofs for Higher-Order SMT (Work in Progress)

119

Lemma 2. If the judgment Γ ⊲ t ≃ u is derivable using the original inference system, the equality L(Γ)[t] ≃ R(Γ)[u] is derivable using the encoded inference system. Proof. The proof is by induction over the derivation Γ ⊲ t ≃ u, we give only the three new cases:

CASE BIND with B = λ: The encoded antecedent is M[λy.(λx. s) y] ≃ N[λy. t] (i.e., L(Γ, y, x y)[s] ≃ R(Γ, y, x y)[t]), and the encoded succedent is M[λx. s] ≃ N[λy. t]. By the induction hypothesis, the encoded antecedent is derivable. Thus, by the encoded BIND rule, the encoded succedent is derivable. CASE CONG: Similar to BIND. CASE BETA: Similar to LET with n = 1. The remaining cases are similar to Barbosa et al. Lemma 3 (Soundness of Inferences). If the judgment Γ ⊲ t ≃ u is derivable using the original inference system with the theories Proof. Using the above updated lemmas, the proof is identical to the one for the original calculus.

CONCLUSION AND FUTURE WORK We have presented a preliminary extension of the SMT-LIB syntax and of the veriT proof format to support higher-order constructs in SMT problems and proofs. Partial applications, λ-abstractions, and quantification over functional variables can now be understood by a solver compliant with these languages. The only relatively challenging element of these extensions so far concerns the rules for representing detailed proofs of formula processing. The next step is to extend the generic proof-producing formula processing algorithm from Barbosa et al. [2]. Given the structural similarity between the introduced extensions and the previous proof calculus, we expect this to be straightforward. A more interesting challenge will be to reconstruct these new proofs in proof assistants, to allow full integration of a higher-order SMT solver.

120

Use of Abstraction and Logic in Mathematics

Since detailed proofs are produced, with proof checking being guaranteed to have reasonable complexity, we are confident to be able to produce effective implementations. With the foundations in place, the next step will be to implement the automatic reasoning machinery for higher-order formulas and properly evaluating its effectiveness. Moreover, when providing support for techniques involving, for example, inductive datatypes, we will need to augment the proof format accordingly.

ACKNOWLEDGMENT We would like to thank the anonymous reviewers for their comments. Between the initial version of this document and the current one, the SMTLIB extension has been greatly influenced by discussions with Clark Barrett and Cesare Tinelli (the SMT-LIB managers, together with Pascal Fontaine) and they should also be considered authors of this syntax extension.

Language and Proofs for Higher-Order SMT (Work in Progress)

121

REFERENCES 1.

2.

3.

4.

5.

6.

7.

8.

9.

Leo Bachmair & Harald Ganzinger (1994): Rewrite-Based Equational Theorem Proving with Selection and Simplification. Journal of Logic and Computation 4(3), pp. 217–247, doi:10.1093/logcom/4.3.217. Haniel Barbosa, Jasmin Christian Blanchette & Pascal Fontaine (2017): Scalable Fine-Grained Proofs for Formula Processing. In Leonardo de Moura, editor: Conference on Automated Deduction (CADE), LNCS 10395, Springer, pp. 398–412, doi:10.1007/978-3-319-63046-5_25. Haniel Barbosa, Jasmin Christian Blanchette & Pascal Fontaine (2017): Scalable Fine-Grained Proofs for Formula Processing. Research Report, Inria, doi:10.1007/978-3-319-63046-5_25. Available at https:// hal.inria.fr/hal-01526841. Haniel Barbosa, Pascal Fontaine & Andrew Reynolds (2017): Congruence Closure with Free Variables. In Axel Legay & Tiziana Margaria, editors: Tools and Algorithms for Construction and Analysis of Systems (TACAS), LNCS 10206, pp. 214–230, doi:10.1007/978-3662-54580-5_13. Clark Barrett, Pascal Fontaine & Cesare Tinelli (2015): The SMT-LIB Standard: Version 2.5. Technical Report, Department of Computer Science, The University of Iowa. Available at www.SMT-LIB.org. Fr´ed´eric Besson, Pascal Fontaine & Laurent Th´ery (2011): A Flexible Proof Format for SMT: a Proposal. In Pascal Fontaine & Aaron Stump, editors: Workshop on Proof eXchange for Theorem Proving (PxTP). Available at https://hal.inria.fr/hal-00642544. Jasmin Christian Blanchette, Sascha B¨ohme, Mathias Fleury, Steffen Juilf Smolka & Albert Steckermeier (2016): Semi-intelligible Isar Proofs from Machine-Generated Proofs. Journal of Automated Reasoning 56(2), pp. 155–200, doi:10.1007/s10817-015-9335-3. Jasmin Christian Blanchette, Cezary Kaliszyk, Lawrence C. Paulson & Josef Urban (2016): Hammering towards QED. Journal of Formalized Reasoning 9(1), pp. 101–148, doi:10.6092/issn.1972-5787/4593. Thomas Bouton, Diego Caminha B. de Oliveira, David D´eharbe & Pascal Fontaine (2009): veriT: An Open, Trustable and Efficient SMT-Solver. In Renate A. Schmidt, editor: Conference on Automated Deduction (CADE), LNCS 5663, Springer, pp. 151–156, doi:10.1007/978-3-642-02959-2_12.

122

Use of Abstraction and Logic in Mathematics

10. Gabriel Ebner, Stefan Hetzl, Giselle Reis, Martin Riener, Simon Wolfsteiner & Sebastian Zivota (2016): System Description: GAPT 2.0. In Nicola Olivetti & Ashish Tiwari, editors: International Joint Conference on Automated Reasoning (IJCAR), LNCS 9706, Springer, pp. 293–301, doi:10.1007/978-3-319-40229-1_ 20. 11. Robert Nieuwenhuis, Albert Oliveras & Cesare Tinelli (2006): Solving SAT and SAT Modulo Theories: From an Abstract Davis–Putnam– Logemann–Loveland Procedure to DPLL(T). Journal of the ACM 53(6), pp. 937–977, doi:10.1145/1217856.1217859. 12. Robert Nieuwenhuis & Albert Rubio (2001): Paramodulation-Based Theorem Proving. In Alan Robinson & Andrei Voronkov, editors: Handbook of Automated Reasoning, I, pp. 371–443, doi:10.1016/ B978-044450813-3/50009-6. 13. Dan Ros´en & Nicholas Smallbone (2015): TIP: Tools for Inductive Provers. In Martin Davis, Ansgar Fehnker, Annabelle McIver & Andrei Voronkov, editors: Logic for Programming, Artificial Intelligence, and Reasoning (LPAR), Springer, pp. 219–232, doi:10.1007/978-3-66248899-7_16.

Chapter

ALTERNATION IS STRICT FOR HIGHER-ORDER MODAL FIXPOINT LOGIC

6

Florian Bruse Universit¨at Kassel Kassel, Germany

We study the expressive power of Alternating Parity Krivine Automata (APKA), which provide operational semantics to Higher-Order Modal Fixpoint Logic (HFL). APKA consist of ordinary parity automata extended by a variation of the Krivine Abstract Machine. We show that the number and parity of priorities available to an APKA form a proper hierarchy of expressive power as in the modal µ-calculus. This also induces a strict alternation hierarchy on HFL. The proof follows Arnold’s (1999) encoding of runs into trees and subsequent use of the Banach Fixpoint Theorem.

INTRODUCTION Parity automata provide popular operational semantics for the modal µ-calculus and, hence, for all regular properties over trees. They are Citation: (APA): Bruse, F. (2016). Alternation Is Strict For Higher-Order Modal Fixpoint Logic. D. Cantone and G. Delzanno (Eds.): Seventh Symposium on Games, Automata, Logics and Formal Verification (GandALF’16). (15 pages). Copyright: © Creative Commons Attribution 3.0 Unported (https://creativecommons. org/licenses/by/3.0/).

124

Use of Abstraction and Logic in Mathematics

equivalent to most other acceptance modes with the exception of B¨uchi automata [13]. However, since parity automata can only express regular properties, extending their expressive power, or extending them to cover stronger logics, is the subject of ongoing research. For example, visibly pushdown automata [1] allow the addition of a limited pushdown stack but tie the stack operations to different and disjoint parts of the alphabet. In this paper, we revisit our previous work on extending parity automata by a variant of the Krivine Abstract Machine [9], which incorporates a simply typed lambda calculus into the semantics of the automaton model. The resulting Alternating Parity Krivine Automata (APKA) yield operational semantics for Higher-Order Modal Fixpoint Logic (HFL) [14]. The acceptance condition of APKA is a stair parity condition over an acceptance game. The stair parity condition resembles that of visibly pushdown automata, but it is not tied to any alphabet symbols or tree labels, but rather emerges via the bookkeeping done by the Krivine Machine part. This automaton model is very expressive: Properties such as uniform inevitability or the presence of a given property in a level that is a power of two are easily expressible. This expressive power comes at a price, since emptiness of APKA, which is equivalent to satisfiability of HFL-formulae, is undecidable. A key improvement over the variant of APKA presented in [5] is that in this paper, the state space of the automaton is not restricted to a treelike structure inherited from HFL-formulae, but can take the form of any graph, just like an ordinary parity automaton is less restricted in structure than a formula of the modal µ-calculus. Since in the new variant of APKA, precedence between states representing different fixpoints can not be inferred from their position in a syntax tree, it is given explicitly via a parity labeling of states. This has the advantage that the alternation class of an automaton, or that of any equivalent formula, can be defined via the number of its priorities, while for formulae, alternation can be hard to gauge syntactically. Already for the modal µ-calculus, syntactic criteria to define alternation classes can be quite complex [6, 12]. On the automaton side of things, however, characterization via the number of priorities makes things much easier. Translations from APKA into HFL and vice versa are readily available and any alternation hierarchy for APKA induces an alternation hierarchy on HFL. This settles the question posed in [5] on how to properly define alternation classes for HFL. We find that for APKA, adding more priorities increases expressive power. The original strictness result for parity automata has a beautiful proof

Alternation Is Strict For Higher-Order Modal Fixpoint Logic

125

[2] involving the Banach Fixpoint Theorem, which also has been adapted to Fixpoint Logic with Chop FLC [10]. Our strictness proof proceeds in a similar manner: Given an infinite binary tree and an APKA of suitable vocabulary, we construct another infinite binary tree which encodes the acceptance game of the run of the APKA. Given a vocabulary tailored to a specific alternation class, we construct an automaton which accepts such a game tree if and only if the original automaton accepts the original tree. This operation induces a contraction in the complete metric space of infinite binary trees, which, by the Banach Fixpoint Theorem, has a fixpoint. We show that on this fixpoint, no automaton with less priorities or with the same amount of priorities, but flipped parity, can be equivalent to the given metaautomaton. While strictness of the alternation hierarchy for APKA and, hence, for HFL is not unexpected, such a result is not obvious. It is well known that adding more priorities to a parity automaton or more Rabin pairs to a Rabin automaton increases their expressive power, just as extra fixpoint alternation in the modal µ-calculus does [4]. However, adding extra fixpoint nesting does not always yield more expressive power: The Immerman-Vardi Theorem entails that, over finite ordered structures, first-order logic with least and greatest fixpoints is as strong as first-order logic with only one least fixpoint. Also the alternation hierarchy of the modal µ-calculus itself collapses to the alternation-free fragment over certain classes of structures, for example the class of infinite words [8] and, more generally, classes of structures with restricted connectivity [7]. Preliminary work also shows that alternation for HFL collapses over finite structures. It should also be noted that, just like with Fixpoint Logic with Chop [10], formulas that are hard for alternation classes for the modal µ-calculus are not necessary suitable candidates for higher-order logics. This is because these formulas are not designed for the higher-order features of HFL. The plan of the paper is as follows: In Section 2, we define APKA and their acceptance condition for infinite binary trees. We have a look at their relation to HFL in Section 3. In the following section, we define alternation classes and present a class of trees that encode runs of APKA from a given alternation class. For each alternation class we also construct meta-automata that accept such a tree encoding a run if and only if the run was accepting. This allows us to prove strictness of the alternation hierarchy. The paper closes with a brief discussion of important points.

126

Use of Abstraction and Logic in Mathematics

ALTERNATING PARITY KRIVINE AUTOMATA Note that we previously defined APKA differently. This work supersedes earlier definitions in [5]. For ease of exposition, and since the alternation hierarchy argument is developed over the class of fully infinite binary trees, we only consider automata over labeled fully infinite binary trees. The concept of APKA extends naturally to trees of unrestricted branching factor and, or any class of Kripke structures. of propositions. An infinite binary tree with labels in Fix some set (just tree or -tree from now on) is given by a function from the ∗ set {0,1} of all {0,1} words into 2 . The root of the tree is identified with ε and the left and right successors of t ∈ {0,1} ∗ are t0 and t1, respectively. We say that P ∈ holds at t (written ) if P ∈ (t). The pair ,t refers to the subtree induced by t.

Simple types are defined inductively via τ ::= Pr | τ → τ. We often refer to Pr as ground type. The operator → is right-associative, so any type can be written as τ1 → ··· → τn → Pr. The order ord is defined inductively via ord(Pr) = 0 and ord(τ1 → ··· → τn → Pr) = max(ord(τ1),...,ord(τn)) +1. The set of types is partially ordered via τ,τ ′ < τ → τ ′ . The intended semantics for the is a set of subtrees of , the intended semantics ground type over a tree for a type of the form τ1 → ··· → τn → Pr is that of a monotone function . consuming arguments of types τ1,...,τn and returning a set of subtrees of

Figure 1: Typing Rules for APKA-transition relations.

Definition Fix a finite set set

of states, or fixpoint variables, and a finite

which is the disjoint union .

of lambda variables, where

Alternation Is Strict For Higher-Order Modal Fixpoint Logic

127

For each 1 ≤ i ≤ n , let ϕi be derived from the grammar where Automaton (APKA) five-tuple the form is the initial state, or labels each fixpoint variable with a priority, the of order at most k specify the types of the fixpoints, the type of the initial state is Pr, and δ is the transition relation that maps Xi to ϕi and is such that : Pr for each , according to the typing rules reproduced in Figure 1. of

An Alternating index m and

Parity Krivine order k is a

The state space of the automaton is sub(ψ) is the set of subformulae of ψ.

, where

Example 1. Let . Let

Let . We will see a run of in Example 2. This automaton corresponds to the HFL-formula (see Section 3 for a definition of HFL) .

Acceptance In the context of an APKA, an environment is either the empty environment e0 or of the form where the ψi are in Q, i.e., subformulae of δ(X) for some X. We call e ′ the parent environment of e, and any environment reachable via the irreflexive, transitive closure of this relation a predecessor of e. A pair (ψj ,ej) is called . While the set of environments never a closure. We set appears explicitely, we tacitly assume that at any point during a run of an APKA the only environments in existence are e0 and any environments the automaton has created so far. This also means that all environments have only finitely many predecessors.

Use of Abstraction and Logic in Mathematics

128

A configuration in a run of the automaton over some tree has the ,Q∈ is a subformula of form (t,(Q,e),e ′ ,Γ,∆), where t is subtree of δ(X) for some X, e and e ′ are environments, Γ is a possibly empty stack of closures, and ∆ is a finite sequence of priorities. In each configuration, if then there are n the type of the current closure is elements on the stack, and their types are, from bottom to top, τ1,...,τn. The latter invariant is by induction over the definition of the transition semantics. ,t0 is a possibly infinite sequence of configurations that A run over begins with the initial configuration (t0,(Xinit,e0),e0,ε,ε) and is produced by a two-player game between players ∃ and ∀. In each configuration, the next configuration is either produced deterministically, or one of the two players picks a successor. A run is accepting if ∃ wins the game according to a winning condition which we state later. The transition semantics from (t,(Q,e),e ′ ,Γ,∆) is as follows: •

If Q is , the automaton transitions towards δ(X). The closures on the stack are, from bottom to top, the closures .





The

automaton

a

new

environment

, removes all these closures from the stack (which is now empty) and transitions to (t,(δ(X),e ′′),e ′′ ,ε,∆ ′ ), where ∆ ′ is ∆ with the priority of Q appended. If Q is of the form (ψ1 ψ2), then the automaton pushes (ψ2,e) on the stack and transitions to the configuration (t,(ψ1,e),e ′ ,Γ·(ψ2,e),e ′ ,∆). If Q is of the form transitions to (t,e



creates

and not of type Pr, then the automaton ,e ′ ,Γ,∆).

and of type Pr, and if e = (Q ′ ,e ′′) with If Q is of the form , then the automaton transitions to (t,(Q,e),e ′′ ,Γ,∆ ′ ) where e ′′ is the parent of e ′ and ∆ ′ is ∆ without the top element.

Alternation Is Strict For Higher-Order Modal Fixpoint Logic

• •

If Q is of the form

, of type Pr, and if e

129

= (Q ′ ,e ′′) with e ′

,e ′ ,Γ,∆). = e ′′, then the automaton transitions to (t,e If Q is of the form ψ1 ∨ψ2 or ψ1 ∧ψ2 then the automaton transitions to (t,(ψ1,e),e ′ ,Γ,∆), respectively (t,(ψ2,e),e ′ ,Γ,∆), depending on ∃’s, respectively ∀’s choice.

, respectively ∀, chooses a If Q is of the form successor t ′ ∈ {t0,t1} and the automaton transitions towards (t ′ ,(ϕ,e),e ′ ,Γ,∆). 1 • If Q is of the form P or ¬P then ∃ wins if ,t |= Q and ∀ wins if ,t Q. By induction, the transition relation alone determines the winner of all finite plays of the game. The winner of an infinite play is determined by the behavior of the priority stack (see the end of this subsection). •

Note that, in a departure from the usual way the Krivine Abstract Machine works, we insist that the equivalent of lambda abstraction pop the entire stack via a string of lambda abstractions implicit in each δ(X). While this is no proper restriction in expressive power, it makes bookkeeping which fixpoint is currently being computed much easier (see Definition 4). Before we formalize the winner of an infinite play of the acceptance game, we illustrate the transition semantics via an example. Example 2. Consider the infinite binary tree where only the first two levels are labeled by P. Since all subtrees on a level are isomorphic, we refer to the root as r and all subtrees of level i as ti . An example run of the automaton A from Example 1 over this tree is depicted in Figure 2. This example is adapted from [10]. The highest priority that occurs infinitely often during the run is 1 and, hence, odd. However all these occurrences of 1 except the first two are eventually removed from the priority stack and the remaining priorities are all 0. We will see later that this means that the automaton accepts.

130

Use of Abstraction and Logic in Mathematics

Figure 2: Part of an example run of the APKA from Example 2.

Definition 3. Let be a configuration. If ψi = X then we say that X occurs in Ci . An occurrence of a fixpoint variable is a configuration such that the variable occurs in that configuration. Moreover, is such that ei+1 is new and there is a new priority on top of the priority stack. We say that ei+1 and this stack element are tied to this occurrence of X. The above means that there is a one-to-one correspondence between environments and occurrences of fixpoint variables: Reading a fixpoint variable X in a configuration entails creation of a new environment, sometimes denoted by eX , and every environment e is created by an occurrence of a fixpoint variable Xe. Moreover, each priority on the priority stack is tied to a unique occurrence of a fixpoint and, hence, environment. The converse does not hold, since priorities can be removed from the priority stack. However, we will see below that environments that correspond to deleted priorities are not relevant to the remainder of a run. Definition 4. Let

be a configuration. The

automaton is said to be currently computing the fixpoint X if

was created

by an occurrence of X. It is currently computing the environment is tied to an occurrence of X.

, which

Alternation Is Strict For Higher-Order Modal Fixpoint Logic

Lemma 5. Let

131

be a run. For some i, let be a configuration in that run. or a predecessor of

,

1.

ei is either

2.

For

3.

a predecessor of and the analogous property holds for all of s predecessors, all closures (ψ,e ′ ) on the stack are such that e′ is either ei or a predecessor of ei , The sequence of priorities on the priority stack is exactly

4.

, all variable bindings point to closures (ψ,e ′ ) where e′ is

the sequence of priorities tied to and the sequence of its predecessors. Proof. The proof is by induction over the sequence of configurations. After the initial state is expanded to its transition relation, the lemma holds. Item 2 needs to be verified only on environment creation, Item 1 only when the fixpoint currently being computed changes. Now consider the form of the current closure (ψi ,ei) and assume that the lemma holds so far. Clearly, for modal and boolean operators there is nothing to prove. If ψi is of the form (ψ ′ψ ′′), then ψ ′′ is put on the stack and, by assumption, is either from ei or a predecessor environment, so again the new element conforms to Item 3. If ψi is of the form X, then a new environment ei+1 is created and will be the new environment currently being computed. Moreover, the parent environment of ei+1 is ei . This satisfies Item 1. Since all closures on the stack are from ei or from predecessors of ei , the new environment satisfies Item 2. Since the stack is empty, it fulfills the stack requirements. Moreover, a new priority is added to the priority stack. Since it is tied to the new environment, Item 4 continues to hold. If ψi is a variable not of ground type, ei switches to a predecessor and all items continue to hold. If ψi is a variable of ground type, the stack is empty and, hence, Item 3 is satisfied. There are two cases: Either ei(x) = (Q ′ ,e ′ ) with e ′ = , or, by Item 2, e ′ is a predecessor of ei and, by Item 1, of . In the first case, the next closure will be (Q ′ ,e ′ ) computed in e ′ , and there is nothing left to prove. In the second case, the automaton transitions towards

132

Use of Abstraction and Logic in Mathematics

, where is the parent of and is ∆i with the top priority removed. Hence, Item 4 is satisfied. Since e ′ is a predecessor of , it is either equal to

or a predecessor of

, so Item 1 is also satisfied.

From the definition of the transition relation, we can deduce that the environment which is currently being computed changes in two ways: By entering a new environment from its parent, which corresponds to environment creation, or by returning to the parent environment from an immediate successor environment. This means that, once an environment is left in favor of the parent environment, it will neve be returned to and the computation of its fixpoint is finished. Moreover, closures with this environment also never appear again. Hence, if such an environment is permanently left, we say that it is being closed. Formally, a closed environment is one such that a variable of ground type from this environment has been read or, equivalently, the automaton has reached a configuration (t,(Q,e),e ′ ,Γ,∆) such that e ′ is the parent of the environment in question. Note that an environment is closed if and only if the corresponding priority has been removed from the priority stack. Lemma 6. Let e be an environment, let (ψ,e) be a closure of ground type for some configuration and let e be the environment currently being computed. As long as e stays the environment currently being computed, the type order of the current closure never properly decreases. If the computation changes from e to a proper successor and later returns to e for the next time, this happens in a ground-type proper subexpression of ψ. Proof. By the definition of the transition relation. The only transition that decreases the type order of the current closure is reading a fixpoint variable, which will change the environment currently being computed. Since (ψ,e) is of ground type, the stack must be empty. If the computation leaves e for a proper successor, this is through creation of a new environment or, equivalently, through reading a fixpoint variable. If the new environment binds a variable of ground type, the closure this variable points to must have been put on the stack between reading (ψ,e) and the environment’s creation. Hence, it must be a proper subexpression of ψ. If the new environment does not bind a variable of ground type, the computation can not return to e. Since δ(X) for each X has a finite syntax tree, repeated application of the previous lemma yields that the computation changes to any environment only a finite number of times. Otherwise we would obtain an infinitely

Alternation Is Strict For Higher-Order Modal Fixpoint Logic

133

descending sequence of subformulae of δ(X) where each subformula is an operand-type strict subformula of the previous. It follows that each environment is either eventually closed, or eventually left permanently. Each environment appears as the environment currently being computed only finitely often. Moreover, each environment can only have finitely many direct successors because creation of a succesor of e during a configuration requires the previous configuration to be in e. This means that, during an infinite run, infinitely many environments will not be closed and the corresponding priorities will never be popped from the priority stack. We define that ∃ wins the acceptance game if the highest priority occurs infinitely often but is never popped from the stack is even. More formally, consider a run

. Consider the subsequence of

such that , i.e., configurations a configuration such that ej was created in this configuration, but such that with x of there is no i > j with a configuration ground type, i.e., ej is never closed. By the above considerations, J must be infinite. Then for all n ≥ j, the priority stack ∆j will be an initial segment of ∆n. In particular, this holds for all n ∈ J. Hence, the set (∆j)j∈J is is such that ∆j is a prefix of ∆j ′ if j ≤ j ′ . We define that a play is accepting if the highest priority that occurs in the limit of this prefix-ordered chain is even. We say that an automaton accepts a tree , and write , if and only if ∃ has a strategy such that the acceptance game generates an accepting run. Note that the above constitutes a stair parity condition in the sense that only those priorities contribute to the winning condition that are never removed from the priority stack. Note that this is not the same as just taking the sequence of priorities occurring during the run: It is possible that a high priority occurs infinitely often during the run, but each occurrence is eventually removed from the priority stack. This occurs in Example 2 where priority 1 occurs infinitely often, but is always removed again from the priority stack a few configurations later. Definition 7. Two APKA are equivalent if and only if they accept the same trees. Observation 8. For each APKA

there is an APKA

set of propositions such that for all trees, we have .

over the same if and only if

134

Use of Abstraction and Logic in Mathematics

The desired automaton is obtained by increasing the priorities of each state by one and replacing modal and boolean operators by their duals. A proof by induction over the structure of the acceptance game shows that a winning strategy for ∃ in the game for one automaton yields a winning strategy for ∀ in the other, and vice versa.

APKA AND HFL Syntax of HFL In addition to the set

of atomic propositions, fix infinite sets of variables

disjoint from and disjoint from that denote variables bound by a λ-expression, respectively a fixpoint quantifier. Separating is usually not done for HFL, but facilitates technical exposition. Lower case letters x,y,... denote variables in

, upper case letters X,Y,... those in

.

HFL-formulae ϕ are defined by the grammar

Figure 3: Additional Typing Rules for HFL.

where is a simple type. Note that negation is not present explicitly in the logic since it can be eliminated [11]. The binder λ(x v : τ).ϕ binds x in ϕ, the binder σ(X : τ).ϕ with σ ∈ {µ,ν} binds X in ϕ. Let sub(ϕ) be the set of subformulae of ϕ. An HFL-formula is well-named if there is, for each X ∈

form σ(X : τ).ψ and, for each x ∈ λ(x: τ).ψ.

, at most one subformula of the

, at most one subformula of the form

or in a formula ϕ is bound if it is bound by a A variable from binder of the respective type, and free otherwise. A formula is called closed if it has no free variables and open otherwise. For a well-named formula ϕ

and X ∈ ∩sub(ϕ), define fpϕ(X) as the unique subformula ψ of ϕ such that ψ = σX.ψ ′ for σ ∈ {µ,ν}. We have a partial order on the fixpoint if Y appears freely in fpϕ(X). We say that Y is variables of ϕ via

Alternation Is Strict For Higher-Order Modal Fixpoint Logic

135

outermore than X. A variable is outermost among a set of variables if it is maximal in this set with respect to . We say that ϕ has type τ in a context Σ if Σ ⊢ ϕ : τ can be derived via the typing rules in in Figures 1 and 3. Note that the rules concerning variables from are not used. If Σ ⊢ ϕ : τ for some Σ and τ then ϕ is welltyped. A closed formula is well typed if /0 ⊢ ϕ : τ. Typing judgments are unique if formulae are annotated with the correct types [14]. We usually omit the type annotations and tacitly assume that all formulae are well-typed and that the type of a formula can be derived from context.

Semantics of HFL Fix a tree

.The semantics of types are partially ordered sets defined inductively

via

and

and

is the set of monotone functions from order

. Define the partial

via pointwise comparison: For

if and only if

.

Note that is a boolean algebra and, hence, also a complete lattice. also a complete lattice for all τ,τ ′ . Let This makes denote the join and meet, respectively, of the set M ⊆ , and let ⊤τ and ⊥ τ denote the maximal and minimal elements of .

Let be a context. An interpretation η is a partial map from the sets of variables such that for all j ≤ m. Then η[X f ] is the interpretation that maps X to f and agrees with η otherwise, similar for η[x → f ].

We define the semantics of HFL over inductively as in Figure 4 (with dual cases left out for space considerations). For well-typed, well-named . We write if ϕ ϕ, we write is closed and η is the empty interpretation. Two formulae are equivalent, written ϕ ≡ ϕ ′ , if

for all η, Σ.

136

Use of Abstraction and Logic in Mathematics

Translations between HFL and APKA Lemma 9. Let ϕ be an HFL-formula of order at most k. Then there is an APKA

of order at most k such that, for all trees

if and only if

accepts

,t, we have

,t.

Figure 4: Semantics of HFL.

Proof. (Sketch) For space considerations, we only give a sketch of the proof. Let ϕ be a HFL-formula.

Since lambda abstraction is implicit for APKA and can only occur directly after a fixpoint, occurrences of lambda abstraction λ f.ψ in ϕ that are not of the form σX.λ f1....λ fn.ψ need to be padded by vacuous fixpoints. If f is of type τ1 and ψ is of type τ2, replace λ f.ψ by σX.λ f.ψ, where X is of type τ1 → τ2 and σ is chosen as convenient.

Next, free lambda variables are removed. For a subformula σX.ψ that contains a free variable f that is not a fixpoint, replace σX.ψ by ((σX.λ f ′ .ψ[ f ′/ f ]) f) where f ′ is of the same type as f . This is organized such that fixpoints are translated before fixpoints in their subformulae, i.e., from top to bottom. In a third step, any fixpoint of the form σX.λ f1....λ fn.ψ with ψ of type τ1 → ··· → τm → Pr is changed to its η-long form, i.e., to σX.λ f1....λ fn.λg1.... λgm.ψ ′ with ψ ′ = ((ψ τm)···)τ1).

It is not hard to verify that neither of these steps changes semantics of be the formula in question. Let ϕ ′ be the resulting HFL-formula and let the collection of fixpoint variables in ϕ ′ . Without loss of generality, ϕ ′ has the form σXinit.ϕ ′′ for some σ.

Alternation Is Strict For Higher-Order Modal Fixpoint Logic

137

For each fixpoint X with defining formula σX.λ f1....λ fn.ψ set δ(X) to ψ where all occurrences of formula of the form σ ′X ′ .ψ ′ are replaced by X ′ and set τX as the type of X. Then the automaton with Λ chosen such that each fixpoint is labeled odd or even depending on parity, but not lower than any fixpoint in a subformula, is an APKA accepting the same trees as ϕ. Lemma 10. Let

HFL-formula

be an APKA of order at most k. Then there is an

of order at most k such that, for all trees

if and only if

accepts

,t, we have

,t.

We skip the proof for space considerations. It rests on the idea that a where σ fixpoint state X computes the formula is µ if Λ(X) is odd and ν otherwise. However, the translation is subject to the same exponential blowup in size (but not in order) that occurs when translating ordinary parity automata into the modal µ-calculus. Moreover, further preprocessing is necessary because fixpoints can occur as operatoroperand pair where the operator has a higher priority. In this case, a duplication of arguments is necessary to ensure proper precedence of fixpoints in the syntax tree. Corollary 11. Emptiness of APKA is undecidable. Corollary 12. For any finite tree (or any finite Kripke structure), and any APKA of order k, it is decidable in time k-fold exponential in the size of

whether

.

THE ALTERNATION HIERARCHY FOR ALTERNATING PARITY KRIVINE AUTOMATA Alternation Classes We define the semantic alternation class via the least number of priorities of any equivalent automaton. Definition 13. We define the classes •

as the set of all APKA equivalent to one with at most n priorities such that the highest is even

Use of Abstraction and Logic in Mathematics

138



as the set of all APKA equivalent to one with at most n priorities such that the highest is odd

• Remark 14. The following inclusions hold:

Note that the alternation classes are independent of the order of an automaton. For a HFL-formula ϕ, we say that ϕ is in some alternation class if there is an equivalent APKA in that class. Observation 15. If

Trees Encoding Acceptance Games For each n ≥ 1, define a set of propositions well as a set

as {D,C,V,T,F,F1,...,Fn}.

Let n ≥ 1. Consider a tree priorities, over

as {D,C,V,T,F,F0,...,Fn−1} as

or

and some APKA

with at most n

(depending on whether the highest priority is odd

or even). We construct a tree T (

,

) over the same set of propositions

which encodes the game tree of the acceptance game G(

,

) of

over

. A state labeled by C signals that ∀ picks a successor configuration, a state labeled by D signals that ∃ picks a successor configuration, a state labeled by Fi signals that priority i is added to the priority stack in this configuration and a state labeled by V signals that the top priority is being removed. Configurations where the priority stack is not being manipulated and neither player picks a successor configuration are treated as if ∃ picks a

successor, but both subtrees of T(

,

) are isomorphic.

The tree is generated inductively. Each position (t,(Q,e),e ′ ,Γ,∆) in the acceptance game induces a subtree, with the root of the tree being generated by the initial position. At each vertex, exactly one proposition P from , respectively

is true. We say that this vertex is labeled by P.

Alternation Is Strict For Higher-Order Modal Fixpoint Logic

139



The subtree induced by a position with Q of the form X is labeled FΛ(X) . Both children are the subtree induced by (t,(δ(X),e ′′),e ′′ ,Γ ′ ,∆ ′ ) where e ′′ ,Γ ′ and ∆ ′ are as per the transition relation.



The subtree induced by a position with Q of the form

is

labeled V if both is of type Pr and . Otherwise, it is labeled D. Both children are the subtree induced by the successor configuration as per the transition relation.

Figure 5: Part of a T( , ) for and from Example 2. Omitted subtrees are isomorphic to their sibling if present or not shown for space considerations. Ci refers to the configuration from Figure 2 that induces the subtree and is not part of the label.







The subtree induced by a position with Q of the form (ψ1ψ2) is labeled D. Both children are the subtree induced by (t,(ψ1,e),e ′ ,Γ(ψ2,e),∆). The subtree induced by a position with Q of the form ψ1 ∨ψ2 is labeled D. The left subtree is the subtree induced by (t,(ψ1,e),e ′ ,Γ,∆), the right subtree is that induced by (t,(ψ2,e),e ′ ,Γ,∆). The subtree induced by a position with Q of the form ψ1 ∧ψ2 is labeled C. The left subtree is the subtree induced by (t,(ψ1,e),e ′ ,Γ,∆), the right subtree is that induced by (t,(ψ2,e),e ′ ,Γ,∆).

Use of Abstraction and Logic in Mathematics

140







The subtree induced by a position with Q of the form ♦ϕ is labeled D. The left subtree is the subtree induced by (t0,(ϕ,e),e ′ ,Γ,∆), the right subtree is that induced by (t1,(ϕ,e),e ′ ,Γ,∆). The subtree induced by a position with Q of the form ϕ is labeled C. The left subtree is the subtree induced by (t0,(ϕ,e),e ′ ,Γ,∆), the right subtree is that induced by (t1,(ϕ,e),e ′ ,Γ,∆). The subtree induced by a position with Q of the form P or ¬P is

labeled T if ,s |= Q and F else. Both children are the subtree induced by (t,(Q,e),e ′ ,Γ,∆) again. It is easy to verify that this defines an infinite, fully binary tree. Figure 5 shows an example.

Hard Automata We now construct APKA that are hard for their alternation classes. Following Arnold’s [2] and Lange’s [10] proofs, these automata accept trees enconding an acceptance game that is won by ∃, respectively ∀. Consider the n ≥ 1 as follows: • • •

defined for each

The fixpoint states are {I,O,Xn−1,...,X0}, the type of I is Pr, the type of the other states is Pr → Pr, the initial state is I,

• •

• Again, it is easy to verify that These

automata

are

.

equivalent to the HFL-formulae where the σi are µ, respectively ν depending on the alternation class, and ψ = δ(O).

Alternation Is Strict For Higher-Order Modal Fixpoint Logic

Definition

16.

Consider

a

play

141

of

generated from an acceptance game. A round in this play consists of a configuration where the current closure is O and all subsequent configurations until it is O again. An environment is tied to a round if it is created during that round. A round begins with the automaton in O. Unless the current tree node is labeled by F, ∀ chooses the right conjunct in δ(O), and ∃, unless the current state is labeled by T, chooses the right disjunct. ∀ then picks the conjunct indicated by the label of the current subtree in the big conjunction and ∃ picks the right part of the implication. Any different choice results in the player making that choice instantly loosing the game. One of the players is then in charge of picking a successor subtree. Depending on the conjunct picked by ∀, the game continues in a new instance of O, goes through Xi ,...,X0,O for some i or continues with the content of x0. The latter will always lead to another instance of O, as we will see below. In either case, the game continues in the next round. Observation 17. Each round corresponds to exactly one configuration , namely that which induces the subtree in in the acceptance game of during the first configuration of the round. Furthermore, the current subtree in the game for is labeled by C if and only if the configuration that induces it has a conjunction or a box as the top operator in the formula part of the current closure. Note that each configuration in a play for , respectively over suitable trees is part of exactly one round, with the exception of the first two configurations which have current closures (I) and (O⊤). Fk-round if he picks the conjunct with Fk on the left of the implication and we call a round a plain round if he picks the conjuncts with C or D on the left of the implication. A round is closed if the environment tied to the single occurrence of O during that round is closed. A V-round is always closed immediately. Lemma 18. Consider a play of , respectively over a generated from an acceptance game and let the automaton be at the start of some round, i.e., just before reading another occurrence of O. Let (Ri)i∈I be the sequence of unclosed rounds played so far, in order. Set p Σ (R) = 0 if R is a plain round, set pΠ(R) = 1 if R is a plain

142

Use of Abstraction and Logic in Mathematics

round, set pΣ (R) = 0,k,...,1,0 if R is an Fk-round and set pΠ(R) = 1,k+1,...,2,1 if R is a plain round. Then the priority stack of from bottom to top is Σ the concatenation of the p (Ri) from first to last and the priority stack of from bottom to top is the concatenation of the pΠ(Ri) from first to last.

Moreover, all unclosed environments are tied to unclosed rounds. Tied to any plain round is a single environment for its ocurrence of O and it binds x0 the last environment of the first unclosed round before. Tied to an Fk-round is a sequence of environments for the occurrences of O,Xk,...,X0. The environment for X0 is the last environment, they all bind x0 to x0 of the previous environment except the environment for Xk which binds x0 to (Ox0) in the environment for O of its own round. Here, the initial unfolding for I is considered a dummy round. Proof. The proof is by induction over the play. At the beginning of the very first round, the priority stack contains only the priority for I and (⊤,e0) is on the stack. A plain round will consume the content from the stack, which is (x0,e) of the previous round, or (⊤,e0) for the very first round, and tie x0 of its single ocurrence of O to it. Moreover, it will add 0 to the priority stack. An Fk-round R will also consume (x0,e), respectively (⊤,e0) from the stack and tie x0 of the single ocurrence of O to it. During the round, the automaton will unfold Xk and tie (Ox0) of that environment to Xk’s x0, then unfold Xk−1,...,X0 and create a chain of x0 pointing to x0 of the environment before. Moreover, it will put the sequence p Σ (R), respectively p Π(R) on the priority stack. A V-round will put priority 0 on the stack, tie the x0 of its single occurrence of O to x0 of the previous unclosed round and then immediately read it. Consequently, all the environments of the previous unclosed round will be closed, including the ocurrrence of 0, and all the priorites tied to it will be popped. Notably, this will close all unclosed previous plain rounds until the next Fi-round, but nothing more. Lemma

19.

For

all

and

all

infinite,

Proof. We only show the case for

,

respectively fully

binary

-automata -trees

and we only show that ∃ has a

winning strategy in the acceptance game for

over

if she

Alternation Is Strict For Higher-Order Modal Fixpoint Logic

143

has one for over , for a -automaton and a -tree. The other cases are similar. Assume that ∃ has a winning strategy in the latter game.

The correspondence between rounds in the game for T( , ) and configurations in the game for A suggests the following strategy for ∃ in the former game: Stay within subtrees that represent configurations that follow her winning strategy. Since the underlying game is assumed to be

winning for ∃ and the root of T( , ) represents such a configuration by assumption, she can maintain this invariant in any round where she picks the successor configuration. In rounds where ∀ picks the successor configuration, both of his choices must be winning for ∃ in the underlying over T for otherwise the current configuration would not be game for winning for ∃. Clearly, following this strategy will guarantee that ∃ wins any finite play of the game for

by avoiding a node labeled F.

It remains to show that ∃ wins any infinite play when following the strategy above. This is because the sequence of unclosed nonplain rounds

in the game for , and priority stack in the game for A correspond like this: If (Fki )i∈I is the sequence of unclosed nonplain rounds, the (ki)i∈I is . This follows from an induction over the two the priority stack of

plays: Before the first round of the game for T( , ), the sequence of unclosed rounds is empty, and so is the priority stack of the correspondig . Any plain round will add a 0, the least priority, to the configuration of priority stack of and will not change the priority stack of . An and will add an unclosed FkFk-round will add k to the priority stack of round to the play of priority stack for

. A V-round will remove one priority k from the and will close a number of plain rounds and exactly

one nonplain round in the game for is an Fk-round.

. By the induction hypothesis, this

Hence, after both plays are finished, the highest priority to occur is k if and only if there are infinitely infinitely often on the stack for many unclosed Fk-rounds, but only finitely many unclosed Fk ′ rounds for k ′ > k. It follows from Lemma 18 that the highest priority to occur infinitely

144

Use of Abstraction and Logic in Mathematics

often on the stack for is k as well. Since ∃ wins the first game by assumption, that number must be even. Lemma 20. For each n ≥ 1 and every

, there

is a unique

. For each n ≥ 1 and every , there is a unique

.

Proof. The sets of all -trees, respectively the sets of all -trees, form metric spaces via the metric d(t,t ′ ) = 2−i , where i is the first level on which t and t ′ differ. It is well known that these spaces are complete [3]. is a Moreover, on all of these spaces, the mapping contraction in the sense of the Banach Fixpoint Theorem since the game trees of two trees that differ at level i will coincide at least up to level i+1. This is because the game with transitions through I first and a full rotation through δ(O) for each level. Hence, by the Banach Fixpoint Theorem, f has . a fixpoint Theorem 21. Proof.

For

the

sake

of .

is

contradiction, By . So

iff contradiction. hence,

assume

that

Lemma 20, there by construction of , which is a

.

A similar proof works for the dual case. Corollary 22. For each Proof. Since , non-strictness of would contradict the previous theorem. The same argument works for the dual case.

DISCUSSION It is a priori quite surprising that the order of an APKA or a HFL-formula is not of relevance when it comes to its alternation class. In particular, the automata that serve as example of automata that are hard for their respective classes are of order 1. This is surprising, since for the

Alternation Is Strict For Higher-Order Modal Fixpoint Logic

145

HFL-modelchecking problem, which corresponds to acceptance for APKA, complexity is almost exclusively dictated by the order of a formula. We believe that this dichotomy stems from the way the transition relation for APKA is restricted to formulae of ground type. A state that would compute a higher-order function, say of type (Pr → Pr) → (Pr → Pr) actually does not compute the full higher-order function, but its equivalent of type (Pr → Pr) → Pr → Pr at a fixed argument of type Pr. The first case requires computations over the full extent of a higher-order lattice, while in the second case it is sufficient to find an approximation that is good enough for the arguments in question.

ACKNOWLEDGEMENTS I thank Martin Lange and Etienne Lozes for discussing the matter with me at length.

146

Use of Abstraction and Logic in Mathematics

REFERENCES 1.

2.

3.

4.

5.

6.

7.

8.

Rajeev Alur & P. Madhusudan (2004): Visibly pushdown languages. In L´aszl´o Babai, editor: Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, June 13-16, 2004, ACM, pp. 202–211, doi:10.1145/1007352.1007390. Andr´e Arnold (1999): The µ-calculus alternation-depth hierarchy is strict on binary trees. ITA 33(4/5), pp. 329–340, doi:10.1051/ ita:1999121. Andr´e Arnold & Maurice Nivat (1980): The metric space of infinite trees. Algebraic and topological properties. Fundam. Inform. 3(4), pp. 445–476. Julian C. Bradfield (1996): The Modal mu-calculus Alternation Hierarchy is Strict. In Ugo Montanari & Vladimiro Sassone, editors: CONCUR ’96, Concurrency Theory, 7th International Conference, Pisa, Italy, August 26-29, 1996, Proceedings, Lecture Notes in Computer Science 1119, Springer, pp. 233–246, doi:10.1007/3-54061604-7 58. Florian Bruse (2014): Alternating Parity Krivine Automata. In Erzs´ebet Csuhaj-Varj´u, Martin Dietzfelbinger & Zolt´an Esik, editors: ´ Mathematical Foundations of Computer Science 2014 - 39th International Symposium, MFCS 2014, Budapest, Hungary, August 25-29, 2014. Proceedings, Part I, Lecture Notes in Computer Science 8634, Springer, pp. 111–122, doi:10.1007/978-3-662-44522-8 10. E. Allen Emerson & Chin-Laung Lei (1986): Efficient Model Checking in Fragments of the Propositional Mu-Calculus (Extended Abstract). In: Proceedings of the Symposium on Logic in Computer Science (LICS ’86), Cambridge, Massachusetts, USA, June 16-18, 1986, IEEE Computer Society, pp. 267–278. Julian Gutierrez, Felix Klaedtke & Martin Lange (2014): The µ-calculus alternation hierarchy collapses over structures with restricted connectivity. Theor. Comput. Sci. 560, pp. 292–306, doi:10.1016/j. tcs.2014.03.027. Roope Kaivola (1995): Axiomatising Linear Time Mu-calculus. In Insup Lee & Scott A. Smolka, editors: CONCUR ’95: Concurrency Theory, 6th International Conference, Philadelphia, PA, USA, August 21-24, 1995, Proceedings, Lecture Notes in Computer Science 962, Springer, pp. 423–437, doi:10.1007/3-540-60218-6 32.

Alternation Is Strict For Higher-Order Modal Fixpoint Logic

9.

10.

11.

12.

13.

14.

147

Jean-Louis Krivine (2007): A call-by-name lambda-calculus machine. Higher-Order and Symbolic Computation 20(3), pp. 199–207, doi:10.1007/s10990-007-9018-9. Martin Lange (2006): The alternation hierarchy in fixpoint logic with chop is strict too. Inf. Comput. 204(9), pp. 1346–1367, doi:10.1016/j. ic.2006.05.001. Etienne Lozes (2015): ´ A Type-Directed Negation Elimination. In Ralph Matthes & Matteo Mio, editors: Proceedings Tenth International Workshop on Fixed Points in Computer Science, FICS 2015, Berlin, Germany, September 11-12, 2015., EPTCS 191, pp. 132–142, doi:10.4204/EPTCS.191.12. Damian Niwinski (1997): Fixed Point Characterization of Infinite Behavior of Finite-State Systems. Theor. Comput. Sci. 189(1-2), pp. 1–69, doi:10.1016/S0304-3975(97)00039-X. Michael O. Rabin (1970): Weakly Definable Relations and Special Automata. In Yehoshua Bar-Hillel, editor: Mathematical Logic and Foundations of Set Theory - Proceedings of an International Colloquium Held Under the Auspices of The Israel Academy of Sciences and Humanities, Studies in Logic and the Foundations of Mathematics 59, Elsevier, pp. 1 – 23, doi:10.1016/S0049-237X(08)71929-3. Mahesh Viswanathan & Ramesh Viswanathan (2004): A Higher Order Modal Fixed Point Logic. In Philippa Gardner & Nobuko Yoshida, editors: CONCUR 2004 - Concurrency Theory, 15th International Conference, London, UK, August 31 - September 3, 2004, Proceedings, Lecture Notes in Computer Science 3170, Springer, pp. 512–528, doi:10.1007/978-3-540-28644-8 33.

Chapter

BISIMULATION IN INQUISITIVE MODAL LOGIC

7

Ivano Ciardelli1 and Martin Otto2 Institute for Logic, Language, and Computation University of Amsterdam Department of Mathematics, Logic Group Technische Universit¨at Darmstadt

1 2

Inquisitive modal logic, INQML, is a generalisation of standard Kripke-style modal logic. In its epistemic incarnation, it extends standard epistemic logic to capture not just the information that agents have, but also the questions that they are interested in. Technically, INQML fits within the family of logics based on team semantics. From a model-theoretic perspective, it takes us a step in the direction of monadic second-order logic, as inquisitive modal operators involve quantification over sets of worlds. We introduce and investigate the natural notion of bisimulation equivalence in the setting of INQML. We compare the expressiveness of INQML and first-order logic, and characterise inquisitive modal logic as the bisimulation invariant fragments of first-order logic over various classes of two-sorted relational

Citation: (APA): Ciardelli, I. & Otto, M. (2017). Bisimulation in Inquisitive Modal Logic. J. Lang (Ed.): TARK 2017 (16 pages). Copyright: © Creative Commons Attribution 3.0 Unported (https://creativecommons. org/licenses/by/3.0/).

150

Use of Abstraction and Logic in Mathematics

structures. These results crucially require non-classical methods in studying bisimulations and first-order expressiveness over non-elementary classes.

INTRODUCTION The recently developed framework of inquisitive logic [8, 2, 6, 4] can be seen as a generalisation of classical logic which encompasses not only statements, but also questions. One reason why this generalisation is interesting is that it provides a novel perspective on the logical notion of dependency, which plays an important role in applications (e.g., in database theory) and which has recently received attention in the field of dependence logic [20]. Indeed, dependency is nothing but a facet of the fundamental logical relation of entailment, once this is extended so as to apply not only to statements, but also to questions [3]. This connection explains the deep similarities existing between systems of inquisitive logic and systems of dependence logic (see [22, 3, 2, 23]). A different r ˆole for questions in a logical system comes from the setting of modal logic: once the notion of a modal operator is suitably generalised, questions can be embedded under modal operators to produce new statements that have no “standard” counterpart. This approach was first developed in [9] in the setting of epistemic logic. The resulting inquisitive epistemic logic (IEL) models not only the information that agents have, but also the issues that they are interested in, i.e., the information that they would like to obtain. Modal formulae in IEL can express not only that an agent knows that p ( p) but also that she knows whether p ( ?p) or that she wonders whether p (⊞?p)—a statement that cannot be expressed without the use of embedded questions. As shown in [9], several key notions of epistemic logic generalise smoothly to questions: besides common knowledge we now have common issues, the issues publicly entertained by the group; and besides publicly announcing a statement, agents can now also publicly ask a question, which typically results in new common issues. Thus, IEL may be seen as one step in extending modal logic from a framework to reason about information and information change, to a richer framework which also represents a higher stratum of cognitive phenomena, in particular issues and their raising in a communication scenario. Of course, like standard modal logic, inquisitive modal logic provides a general framework that admits various interpretations, each suggesting corresponding constraints on models. E.g., [2] suggests to interpret INQML as a logic of action. On this interpretation, a modal formula ?p expresses that whether a certain fact p will come about is determined independently of

Bisimulation in Inquisitive Modal Logic

151

the agent’s choices, while ⊞?p expresses that whether p will come about is fully determined by her choices.

From the perspective of mathematical logic, inquisitive modal logic is a natural generalisation of standard modal logic. There, the accessibility relation of a Kripke model associates with each possible world w ∈ W a set σ(w) ⊆ W of possible worlds, namely, the worlds accessible from w; any of formula ϕ of modal logic is semantically associated with a set worlds, namely, the set of worlds where it is true; modalities then express relationships between these sets: for instance, ϕ expresses the fact that . In the inquisitive setting, the situation is analogous, but both σ(w) ⊆ the entity Σ(w) attached to a possible world and the semantic extension of a formula are sets of sets of worlds, rather than simple sets of worlds. Inquisitive modalities still express relationships between these two objects: ϕ expresses the fact that S Σ(w) ∈ , while ⊞ϕ expresses the fact that . Σ(w) ⊆ In this manner, inquisitive logic leads to a new framework for modal logic that can be viewed as a generalisation of the standard framework. Clearly, this raises the question of whether and how the classical notions and results of modal logic carry over to this more general setting. In this paper we address this question for the fundamental notion of bisimulation and for two classical results revolving around this notion, namely, the Ehrenfeucht-Fra¨ıss´e theorem for modal logic, and van Benthem style characterisation theorems [11, 21, 19, 14]. A central topic of this paper is the role of bisimulation invariance as a unifying semantic feature that distinguishes modal logics from classical predicate logics. As in many other areas, from temporal logics and process logics to knowledge representation in AI and database applications, so also in the inquisitive setting we find that the appropriate notion of bisimulation invariance allows for precise modeltheoretic characterisations of the expressive power of modal logic in relation to first-order logic. Our first result is that the right notion of inquisitive bisimulation equivalence ∼, with finitary approximation levels ∼n , supports a counterpart for INQML of the classical Ehrenfeucht–Fra¨ıss´e correspondence. This result is non-trivial in the inquisitive setting, because of some subtle issues stemming from the interleaving of first- and second-order features in inquisitive modal logic.

152

Use of Abstraction and Logic in Mathematics

Theorem 1 (inquisitive Ehrenfeucht–Fra¨ısse theorem) Over finite vocabularies, the finite levels ∼n of inquisitive bisimulation equivalence correspond to the levels of INQML-equivalence up to modal nesting depth n. In order to compare INQML with classical first-order logic, we define a class of two-sorted relational structures, and show how such structures encode models for INQML. With respect to such relational structures we find not only a “standard translation” of INQML into two-sorted firstorder logic, but also a van Benthem style characterisation of INQML as the bisimulation-invariant fragment of (two-sorted) first-order logic over several classes of models. These results are technically interesting, and they are not available on the basis of classical techniques, because the relevant classes of two-sorted models are non-elementary (in fact, first-order logic is not compact over these classes, as we show). Our techniques yield characterisation theorems both in the setting of arbitrary inquisitive models, and in restriction to just finite ones. Theorem 2 Inquisitive modal logic can be characterised as the ∼-invariant fragment of first-order logic FO over natural classes of (finite or arbitrary) relational inquisitive models.

Beside the conceptual development and the core results themselves, we think that also the methodological aspects of the present investigations have some intrinsic value. Just as inquisitive logic models cognitive phenomena at a level strictly above that of standard modal logic, so the model-theoretic analysis moves up from the level of ordinary first-order logic to a level strictly between first- and second-order logic. This level is realised by firstorder logic in a two-sorted framework that incorporates second-order objects in the second sort in a controlled fashion. This leads us to substantially generalise a number of notions and techniques developed in the modeltheoretic analysis of modal logic ([11, 14, 10, 15], among others).

INQUISITIVE MODAL LOGIC In this section we provide an essential introduction to inquisitive modal logic, INQML [2]. For details and proofs, see §7 of [2].

Foundations of Inquisitive Semantics Usually, the semantics of a logic specifies truth-conditions for the formulae of the logic. In modal logics these truth-conditions are relative to possible

Bisimulation in Inquisitive Modal Logic

153

worlds in a Kripke model. However, this approach is limited in an important way: while suitable for statements, it is inadequate for questions. To overcome this limitation, inquisitive logic interprets formulae not relative to states of affairs (worlds), but relative to states of information, modelled extensionally as sets of worlds (viz., those worlds compatible with the given information). Definition 3 (information states) An information state over a set of worlds W is a subset s ⊆ W .

Rather than specifying when a sentence is true at a world w, inquisitive semantics specifies when a sentence is supported by an information state s: for a statement α this means that the information available in s implies that α is true; for a question µ, it means that the information available in s settles µ. If t and s are information states and t ⊆ s, this means that t holds at least as much information as s: we say that t is an extension of s. If t is an extension of s, everything that is supported at s will also be supported at t. This is a key feature of inquisitive semantics, and it leads naturally to the notion of an inquisitive state (see [5, 18, 9]).

Definition 4 (inquisitive states) An inquisitive state over W is a non-empty set of information states Π ⊆℘(W) satisfying •

s ∈ Π and t ⊆ s implies t ∈ Π (downward closure).

Inquisitive Modal Models

A Kripke frame can be thought of as a set W of worlds together with a map σ that equips each world with a set of worlds σ(w)—the set of worlds that are accessible from w—i.e., an information state. Similarly, an inquisitive modal frame consists of a set W of worlds together with an inquisitive assignment, i.e., a map Σ that assigns to each world an inquisitive state. An inquisitive modal model is an inquisitive frame equipped with a valuation function.

Definition 5 (inquisitive modal models) An inquisitive modal frame is a pair , where Σ: W →℘℘(W) associates to each world w ∈W an inquisitive state Σ(w). An inquisitive

modal model is a pair is an inquisitive modal frame, and V : P → ℘(W) is a propositional valuation function. A world-(or state-)

154

Use of Abstraction and Logic in Mathematics

pointed inquisitive modal model is a pair consisting of a model distinguished world (or state) in .

and a

With an inquisitive modal model we can always associate a standard having the same set of worlds and modal accessibility Kripke model map σ : W →℘(W) induced by the inquisitive map Σ according to σ(w) := S Σ(w). Under an epistemic interpretation [9, 1], Σ is taken to describe not only an agent’s knowledge, as in epistemic logic, but also her issues, i.e., the questions she is interested in. The agent’s knowledge state at w, σ(w) = S Σ(w), consists of those worlds that are compatible with what the agent knows. The agent’s inquisitive state at w, Σ(w), consists of those information states where her issues are settled.

Inquisitive Modal Logic The syntax of inquisitive modal logic INQML is given by:

We treat negation and disjunction as defined connectives (syntactic shorthands) according to ¬ϕ := ϕ → ⊥ , and ϕ ∨ ψ := ¬(¬ϕ ∧ ¬ψ). In this sense, the above syntax includes standard propositional formulae in terms of atoms and connectives ∧ and → together with the defined ¬ and ∨. As we will see, the semantics for such formulae will be essentially the same as in standard propositional logic. In addition to standard connectives, our language contains a new connective, , called inquisitive disjunction. We may read formulae built up by means of this connective as propositional as the question whether or questions. E.g., we read the formula not p, and we abbreviate this formula as ?p. Finally, our language contains two modalities, which are allowed to embed both statements and questions. As we shall see, both these modalities coincide with a standard Kripke box when applied to statements, but crucially differ when applied to questions. Under an epistemic interpretation, ?p expresses the fact that the agent knows whether p, while ⊞?p expresses (roughly) the fact that she wants to find out whether p.

Bisimulation in Inquisitive Modal Logic

155

While models for INQML are formally a class of neighbourhood models, the semantics of INQML is very different from neighbourhood semantics for modal logic. As mentioned above, the semantics of INQML is given in terms of support relative to an information state, rather than truth at a possible world.

Definition 6 (semantics of INQML) Let •

be an inquisitive modal model, s ⊆ W :

• • • • • • As an illustration, consider the support conditions for the formula ?p := p ¬p: this formula is supported by a state s in case p is true at all worlds in s (i.e., if the information available in s implies that p is true) or in case p is false at all worlds in s (i.e., if the information available in s implies that p is false). Thus, ?p is supported precisely by those information states that settle whether or not p is true. The following two properties hold generally in INQML: •

Persistency: if

;

. • Semantic ex-falso: The first principle says that support is preserved as information increases, i.e., as we move from a state to an extension of it. The second principle says that the empty set of worlds—the inconsistent state—vacuously supports everything. Together, these principles imply that the support set

Use of Abstraction and Logic in Mathematics

156

of a formula is downward closed and non-empty, i.e., it is an inquisitive state. Although the primary notion of our semantics is support at an information state, truth at a world is obtained as a defined notion. Definition 7 (truth) ϕ is true at a world w in a model .

Spelling out Definition 7 in the special case of singleton states, we see that standard connectives have the usual truth-conditional behaviour. For modal formulae, we find the following truth-conditions:

Proposition 8 (truth conditions for modal formulae) • • Notice that truth in INQML cannot be given a direct recursive definition, as the truth conditions for modal formulae ϕ and ⊞ϕ depend on the support conditions for ϕ—not just on its truth conditions. For many formulae, support at a state just boils down to truth at each world. We refer to these formulae as truth-conditional.

Definition 9 (truth-conditional formulae) We say that a formula ϕ is truth-conditional if for all models and information states for all w ∈ s.

Following [2], we view truth-conditional formulae as statements, and non-truth-conditional formulae as questions. The next proposition identifies a large class of truth-conditional formulae. Proposition 10 Atomic formulae, ⊥ , and all formulae of the form ϕ and ⊞ϕ are truth-conditional. The class of truth-conditional formulae is closed under all connectives except for .

Using this fact, it is easy to see that all formulae of standard modal logic, i.e., formulae which do not contain or ⊞, receive exactly the same truth conditions as in standard modal logic. Proposition

11

If

ϕ

is

a

formula

not

containing

in standard Kripke semantics.

Bisimulation in Inquisitive Modal Logic

157

As long as questions are not around, the modality ⊞ also coincides with , and with the standard box modality. That is, if ϕ is truth-conditional, we have: Thus, the two modalities coincide on statements. However, they come apart when they are applied to questions. For an illustration, consider the formulae ?p and ⊞?p in the epistemic setting: ?p is true iff the information state of the agent, σ(w), settles the question ?p; thus, ?p expresses the fact that the agent knows whether p. By contrast, ⊞?p is true iff any information state t ∈ Σ(w), i.e., any state that settles the agent’s issues, also settles ?p; thus ⊞?p expresses that finding out whether p is part of the agent’s goals.

INQUISITIVE BISIMULATION An inquisitive modal model can be seen as a structure with two sorts of entities, worlds and information states, which interact with each other. On the one hand, an information state s is completely determined by the worlds that it contains; on the other hand, a world w is determined by the atoms it makes true and the information states which lie in Σ(w). Taking a more behavioural perspective, we can look at an inquisitive modal model as a model where two kinds of transitions are possible: from an information state s, we can make a transition to a world w ∈ s, and from a world w, we can make a transition to an information state s ∈ Σ(w). This suggests a natural notion of bisimilarity, together with its natural finite approximations of n-bisimilarity for n ∈ . As usual, these notions can equivalently be defined either in terms of backand-forth systems or in terms of strategies in corresponding bisimulation games. We chose the latter for its more immediate and intuitive appeal to the underlying dynamics of a “probing” of behavioural equivalence. The game is played by two players, I and II, who act as challenger and defender of a similarity claim involving a pair of worlds w and w ′ or information states s and s ′ over two models

and

. We

denote world-positions as and state-positions as , where w ∈ W,w ′ ∈ W′ and s ∈℘(W),s ′ ∈℘(W′ ), respectively. The game proceeds in rounds that alternate between worldpositions and state-positions. Playing from a world-position

, I chooses an information state in the

158

Use of Abstraction and Logic in Mathematics

inquisitive state associated to one of these worlds (s ∈ Σ(w) or s ′ ∈ Σ ′ (w ′ )) and II must respond by choosing an information state on the opposite side, which results in a state-position . Playing from a state-position , I chooses a world in either state (w ∈ s or w ′ ∈ s ′ ) and II must respond by choosing a world from the other state, which results in a world-

position . A round of the game consists of four moves leading from a world-position to another. In the bounded version of the game, the number of rounds is fixed in advance. In the unbounded version, the game is allowed to go on indefinitely. Either player loses when stuck for a move. The game ends with a loss for II in any world-position that shows a discrepancy at the atomic . All other level, i.e., such that w and w ′ disagree on the truth of some plays, including infinite runs of the unbounded game, are won by II. Definition 12 (bisimulation equivalence) Two world-pointed models ,w and

are n-bisimilar,

, if II has a winning

strategy in the n-round game starting from bisimilar, denoted

,w ∼

unbounded game starting from

.

,w and

are

, if II has a winning strategy in the .

are (n-)bisimilar, denoted Two state-pointed models , if every world in s is (n-)bisimilar to some world in s′ and vice versa. Two models are globally bisimilar, denoted world in is bisimilar to some world in and vice versa.

, if every

AN EHRENFEUCHT–FRA¨ISSE THEOREM The crucial r ˆole of these notions of equivalence for the model theory of inquisitive modal logic is brought out in a corresponding Ehrenfeucht– Fra¨ıss´e theorem. Using the standard notion of the modal depth of a formula, we denote as INQMLn the class of INQML-formulae of depth up to n. It is easy to see that the semantics of any formula in INQMLn is preserved under n-bisimilarity; as a consequence, all of inquisitive modal logic is preserved under full

Bisimulation in Inquisitive Modal Logic

159

bisimilarity. The following analogue of the classical Ehrenfeucht–Fra¨ıss´e of basic propositions, n-bisimilarity theorem shows that, for finite sets coincides with logical indistinguishability in INQMLn, which we denote as :

Theorem 1. Given a finite set of atomic propositions and inquisitive state-pointed modal models

:

, for any n ∈

Notice that, by taking s and s ′ to be singleton states, we obtain the corresponding connection for world-pointed models as a special case: . As customary, the crucial implication of the theorem, from right to left, follows from the existence of characteristic formulae that define ∼n -classes of worlds, information states and inquisitive states over models—and it is here that the finiteness of is crucial.

Proposition 13 (characteristic formulae for ∼n -classes) For any world-pointed model s.t.

, and for any n ∈

,w over a finite set of atomic propositions

there is a formula

of modal depth n

.

Proof. By simultaneous induction on n, we define formulae and for all worlds w, together with auxiliary formulae information states s and inquisitive states Π over . Given two inquisitive if every states Π and Π′ in models state s ∈ Π is n-bisimilar to some state s ′ ∈ Π′ , and vice versa. Dropping reference to the fixed , we let:

Use of Abstraction and Logic in Mathematics

160

These formulae are of the required modal depth; the conjunctions and disjunctions in the definition are well-defined since, for a given n, there are only finitely many distinct formulae of the form , and analogously for . We can then prove by simultaneous induction on n that these formulae satisfy the following properties: • • • Let us say that a class

of world-pointed (state-pointed) models is

is the set of world-pointed models where ϕ is defined by a formula ϕ if true (in which ϕ is supported). Corollary 14 A class

of world-pointed models is definable in INQML

of stateif and only if it is closed under ∼n for some n ∈ . A class pointed models is definable in INQML if and only if it is both downward closed and closed under ∼n for some n ∈ .

RELATIONAL INQUISITIVE MODELS

In this paper, we want to compare the expressive power of inquisitive modal logic with that of first-order logic. However, this is not straightforward. A standard Kripke model can be identified naturally with a relational structure with a binary accessibility relation R and a unary predicate Pi for the . By contrast, an inquisitive interpretation of each atomic sentence modal model also needs to encode the inquisitive state map Σ : W →℘℘(W). This map can be identified with a binary relation E ⊆W ×℘(W). In order to view this as part of a relational structure, however, we need to adopt a two-

Bisimulation in Inquisitive Modal Logic

161

sorted perspective, and view W and℘(W) as domains of two distinct sorts. This leads to the following notion.

Relational Inquisitive Models Definition 15 (relational models) A relational inquisitive modal model is a relational structure

where

are sets, E,ε ⊆W ×S, and Pi ⊆ W. With s ∈ S we associate

and require the following conditions, which the set enforce resemblance with inquisitive modal models: •

Extensionality: if



Non-emptiness: for every w,

, then s = s ′ . .



Downward closure: if s ∈ E[w] and t ⊆ , there is an s ′ ∈ S such that = t and s ′ ∈ E[w]. By extensionality, the second sort S can be identified with a domain

{ |s ∈ S} ⊆ ℘(W) of sets over the first sort. We will always make this identification and view a relational model as a structure where S ⊆℘(W) and ∈ is the actual membership relation. In the following , we shall therefore also specify relational models by just when the fact that S ⊆℘(W) and the natural interpretation of ε are understood.

induces a corresponding Kripke Notice that a relational model model on W. We simply let wRw′ if for some s ∈ S we have wEs and w ′ εs, and we let R[w] := {w ′ |wRw′}.

Natural Classes of Relational Models In addition to extensionality and downward closure, we might impose other constraints on a relational model M: in particular, we may require S to be the full powerset of W, or to resemble the powerset from the perspective of each world w.

Use of Abstraction and Logic in Mathematics

162

Definition 16 (classes of relational models) A relational model

is called:

• full if S =℘(W); • locally full if S ⊇℘(R[w]) for all w ∈ W . These conditions suggest different ways of encoding a concrete inquisitive modal model as a relational model. Definition 17 (relational encodings) Let be an inquisitive , each modal model. We define three relational encodings based on W, and with wEs ⇔ s ∈ Σ(w), w ε s ⇔ w∈s and Pi=V(pi). The encodings differ in the second sort domain S: • • • Clearly,

is the minimal relational counterpart of

its minimal counterpart that is locally full, and counterpart that is full.

its unique

Relational Models and First-Order Logic A relational inquisitive model supports a two-sorted first-order language having two relation symbols E and ε, and a number of predicate symbols Pi for i ∈ I. It is easy to translate formulae ϕ ∈ INQML to FO-formulae ϕ ∗ is (x) in a single free variable x of the second sort in such a way that, if an inquisitive modal model and have:

is any of the above encodings, we

This translation can be seen as an analogue of the standard translation of modal logic to first-order logic. The framework of relational inquisitive models thus allows us to view INQML as a syntactic fragment of FO, INQML ⊆ FO, just as standard modal logic ML over Kripke structures may be regarded as a fragment ML ⊆ FO.

Bisimulation in Inquisitive Modal Logic

163

Importantly, however, the class of relational inquisitive modal models is not first-order definable in this framework, since the downward closure condition involves a second-order quantification. In other words, we are dealing with first-order logic over non-elementary classes of intended models.

THE ∼-INVARIANT FRAGMENT OF FO

Regarding INQML as a fragment of first-order logic (over relational models, in any one of the above classes), we may think of downward closure and ∼-invariance as characteristic semantic features of this fragment. The core question for the rest of this paper is to which extent INQML may express all properties of worlds that are FO-expressible.3 In other words, over which classes of models, if any, can INQML be characterised as the bisimulation invariant fragment of first-order logic? In short, for what classes do we have just as ML ≡ FO/∼ by van Benthem’s theorem?

Bisimulation Invariance and Compactness The inquisitive Ehrenfeucht–Fra¨ıss´e theorem, Theorem 1, implies ∼-invariance for all of INQML. By Corollary 14 it further implies expressive completeness of INQMLn for any ∼n -invariant property of world-pointed models. In order to prove (†) in restriction to some particular class of relational inquisitive models, it is thus necessary and sufficient to show that, for any ϕ(x) ∈ FO, ∼-invariance of ϕ(x) over implies ∼n -invariance of ϕ(x) over for some finite n. This may be viewed as a compactness principle for ∼-invariance of first-order properties, which is non-trivial in the non-elementary setting of relational inquisitive models. Observation 18 For any class C of relational inquisitive models, the following are equivalent: • •

; INQML ≡ FO/∼ for world properties over for FO-properties of world-pointed models, ∼-invariance over C implies ∼n -invariance over for some n.

164

Use of Abstraction and Logic in Mathematics

Interestingly, first-order logic does not satisfy compactness in restriction to the (non-elementary) class of relational inquisitive models. More importantly, over the class of full relational models, violations of compactness can even be exhibited for ∼-invariant formulae.

Observation 19 Over full relational inquisitive models, the absence of infinite R-paths from the designated world w (i.e., well-foundedness of the converse of R at w) is a first-order definable and ∼-invariant property of worlds that is not preserved under ∼n for any n, hence not expressible in INQML. In particular, first-order logic violates compactness over full relational models.

The Characterisation Theorem In light of Observation 18, Observation 19 means that (†) fails over the class of full relational models. This is not too surprising: on full relational models, FO has access to full-fledged second-order quantification, while INQML can only quantify over subsets within the range of Σ. This is in sharp contrast with our main theorem: be either of the following classes of relational THEOREM 2. Let models: the class of all models; of finite models; of locally full models; of finite locally full models. Over each of these classes, INQML ≡ FO/∼, i.e., a property of world-pointed models is definable in INQML over if and only if it is both FO-definable over and ∼-invariant over . Without recourse to compactness, the most useful tool from first-order model theory for our purposes is the local nature of first-order logic over relational structures, in terms of Gaifman distance. In the setting of a relational model, Gaifman distance is graph distance in the undirected bipartite graph on the sets W of worlds and S of states with edges between any pair linked by E or ε; the ℓ-neighbourhood

Figure 1: Generic upgrading pattern.

Bisimulation in Inquisitive Modal Logic

165

N ℓ (w) of a world w consists of all worlds or states at distance up to ℓ from w in this sense. It is easy to see that if is a world-pointed relational model and

is even, the restriction of this model to N ℓ (w), denoted is also a world-pointed relational model.

we In light of Observation 18, to show that (†) holds over a class need to show that a first-order formula ϕ(x) whose semantics is invariant under ∼ over the class , is in fact invariant under one of the much coarser finite approximations ∼n over , for some value n depending on ϕ. For this there is a general approach that has been successful in a number of similar investigations, starting from an elementary and constructive proof in [14] of van Benthem’s classical characterisation of basic modal logic [21] and its finite model theory version due to Rosen [19] (for ramifications of this method, see also [15, 10] and [16]). This approach involves an upgrading of a sufficiently high finite level ∼n of bisimulation equivalence

to a finite target level ≡q of elementary equivalence, where q is the quantifier rank of ϕ. Concretely, this amounts to finding, for any worldpointed relational model

, a fully bisimilar pointed model

with

the property that, if . The diagram in Figure 1 shows how ∼-invariance of ϕ, together with its nature as a first-order formula of quantifier rank q, entails its ∼n -invariance —simply by taking the detour via the lower rung. In the following section, we show how to achieve the required upgradings for various classes of relational models; we use a variation on an upgrading technique from [14], based on an inquisitive analogue of partial tree unfoldings.

Partial Unfolding and Stratification Theorem 2 boils down to the compactness property expressed in Observation 18 for the relevant classes of relational models. To show this property we make use of a process of stratification, comparable to tree-like unfoldings in standard modal logic. is Definition 20 We say that a relational inquisitive model stratified if its two domains W and S consist of essentially disjoint4 strata •

Use of Abstraction and Logic in Mathematics

166

• For an even number to depth ℓ from w if

and a world w, we say that is stratified.

is stratified

It is not hard to see that any world-pointed relational inquisitive model is bisimilar to one that is stratified. Moreover, for any even number , a finite world-pointed relational model is bisimilar to one that is finite and stratified to depth ℓ from its distinguished world. If the original model is locally full, the process of partial unfolding leading to an (ℓ-)stratified model preserves local fullness.

Figure 2: Upgrading pattern for Theorem 2.

Observation 21 For relational models depth ℓ for some even

, and for

that are stratified to :

This is because, due to stratification and cut-off, the n-round game exhausts all possibilities in the unbounded game. be any one of the classes in the theorem Proof of Theorem 2. Let and let ϕ(x) ∈ FOq be ∼-invariant over . We want to show that ϕ is ∼n -invariant over for n = 2 q , where q is the quantifier rank of ϕ. The upgrading argument is sketched in Figure 2. Towards its ingredients, consider a world-pointed relational model in . Since ϕ is ∼-invariant, we

Bisimulation in Inquisitive Modal Logic

can assume w.l.o.g. that world-pointed models distinct isomorphic copies of

167

is stratified to depth ℓ = n. We define two as follows. Both models contain q as well as of

. In addition,

contains a copy of with the distinguished world w, while with the distinguished world w: contains a copy of

Using an Ehrenfeucht-Fra¨ıss´e game argument for FO it is possible to show that

Given any two pointed models , we can see that ϕ is preserved between them by chasing the diagram in Figure 2 along the path through the auxiliary models, which are all in .

CONCLUSION We have seen the foundations of a model theory for inquisitive modal logic in two main aspects. Firstly, the notion of inquisitive bisimulation equivalence has been established as the appropriate notion of semantic invariance by an Ehrenfeucht-Fra¨ıss´e correspondence, which provides a precious tool for studying the expressive power of inquisitive modal logic. Secondly, we have seen that INQML admits modeltheoretic characterisations as the bisimulation-invariant fragment of classical first-order logic over certain classes of relational structures with two sorts for worlds and information states. Our result holds both in the general setting, and in restriction to finite models. The model-theoretic challenges arise in dealing with non-elementary classes of models, whose essentially two-sorted nature extends first-order expressiveness in the direction of monadic second-order logic. Unpublished work [7] indicates that this approach can be taken considerably further: characterisations analogous to those presented here for basic INQML can be obtained for inquisitive epistemic logic—the multi-agent, S5-like variant of INQML. In that setting, the model unfolding procedure that we used here to establish our Theorem 2 can no longer be used, because the resulting structures would no longer satisfy the inquisitive S5 constraints. Instead, new and more complex techniques are needed.

168

Use of Abstraction and Logic in Mathematics

REFERENCES 1.

2. 3.

4. 5.

6.

7. 8. 9. 10.

11.

12. 13.

I. Ciardelli (2014): Modalities in the realm of questions: axiomatizing inquisitive epistemic logic. In R. Gor´e, B. Kooi & A. Kurucz, editors: Advances in Modal Logic (AIML), College Publications, London, pp. 94–113. I. Ciardelli (2015): Questions in Logic. ILLC Dissertation Series DS2016-01, Institute for Logic, Language and Computation, Amsterdam. I. Ciardelli (2016): Dependency as question entailment. In S. Abramsky, J. Kontinen, J. V¨a¨an¨anen & H. Vollmer, editors: Dependence Logic: theory and applications, Springer International Publishing Switzerland, pp. 129–181, doi:10.1007/978-3-319-31803-5_8. I. Ciardelli (2016): Questions as information types. Synthese, doi:10.1007/s11229-016-1221-y. I. Ciardelli, J. Groenendijk & F. Roelofsen (2013): Inquisitive semantics: A new notion of meaning. Language and Linguistics Compass 7(9), pp. 459–476, doi:10.1111/lnc3.12037. I. Ciardelli, J. Groenendijk & F. Roelofsen (2015): On the semantics and logic of declaratives and interrogatives. Synthese 192(6), pp. 1689–1728, doi:10.1007/s11229-013-0352-7. I. Ciardelli & M. Otto (2017): Bisimulation in Inquisitive Modal Logic. Unpublished manuscript. I. Ciardelli & F. Roelofsen (2011): Inquisitive logic. Journal of Philosophical Logic 40, pp. 55–94, doi:10. 1007/s10992-010-9142-6. I. Ciardelli & F. Roelofsen (2015): Inquisitive dynamic epistemic logic. Synthese 192(6), pp. 1643–1687, doi:10.1007/s11229-014-0404-7. A. Dawar & M. Otto (2009): Modal characterisation theorems over special classes of frames. Annals of Pure and Applied Logic 161, pp. 1–42, doi:10.1016/j.apal.2009.04.002. V. Goranko & M. Otto (2007): Model Theory of Modal Logic. In P. Blackburn, J. van Benthem & F. Wolter, editors: Handbook of Modal Logic, Elsevier, pp. 249–329, doi:10.1016/S1570-2464(07)80008-5. H. Hansen (2003): Monotone Modal Logics. MSc Thesis, University of Amsterdam. H. Hansen, C. Kupke & E. Pacuit (2009): Neighbourhood structures: Bisimilarity and basic model theory. Logical Methods in Computer Science 2.

Bisimulation in Inquisitive Modal Logic

169

14. M. Otto (2004): Elementary Proof of the van Benthem–Rosen Characterisation Theorem. Technical Report 2342, Fachbereich Mathematik, Technische Universit¨at Darmstadt. 15. M. Otto (2004): Modal and guarded characterisation theorems over finite transition systems. Annals of Pure and Applied Logic 130, pp. 173–205, doi:10.1016/j.apal.2004.04.003. 16. M. Otto (2012): Highly acyclic groups, hypergraph covers and the guarded fragment. Journal of the ACM 59 (1), doi:10.1145/2108242.2108247. 17. E. Pacuit (2007): Neighborhood semantics for modal logic: An introduction. Lecture notes for a course at ESSLLI. 18. F. Roelofsen (2013): Algebraic foundations for the semantic treatment of inquisitive content. Synthese 190(1), pp. 79–102, doi:10.1007/ s11229-013-0282-4. 19. E. Rosen (1997): Modal logic over finite structures. Journal of Logic, Language and Information 6, pp. 427–439, doi:10.1023/A:1008275906015. 20. J. V¨a¨an¨anen (2007): Dependence Logic: A New Approach to Independence Friendly Logic. Cambridge University Press, doi:10.1017/CBO9780511611193. 21. J. van Benthem (1983): Modal Logic and Classical Logic. Bibliopolis, Napoli, doi:10.2307/2274406. 22. F. Yang (2014): On extensions and variants of dependence logic: A study of intuitionistic connectives in the team semantics setting. Ph.D. thesis, University of Helsinki. 23. F. Yang & J. V¨a¨an¨anen (2016): Propositional logics of dependence. Annals of Pure and Applied Logic 167, pp. 557–589, doi:10.1016/j. apal.2016.03.003.

Chapter

GRAPHICAL SEQUENT CALCULI FOR MODAL LOGICS

8

Minghui Ma1 and Ahti-Veikko Pietarinen2 Institute of Logic and Cognition, Sun Yat-Sen University, Guangzhou, China Chair of Philosophy, Tallinn University of Technology, Tallinn, Estonia

1 2

The syntax of modal graphs is defined in terms of the continuous cut and broken cut following Charles Peirce’s notation in the gamma part of his graphical logic of existential graphs. Graphical calculi for normal modal logics are developed based on a reformulation of the graphical calculus for classical propositional logic. These graphical calculi are of the nature of deep inference. The relationship between graphical calculi and sequent calculi for modal logics is shown by translations between graphs and modal formulas.

Citation: (APA): Ma, M. & Pietarinen, A.-V. (2017). Graphical Sequent Calculi for Modal Logics. Sujata Ghosh and R. Ramanujam: M4M9 EPTCS 243, 2017, pp. 91–103 (13 pages). Copyright: © Creative Commons Attribution 3.0 Unported (https://creativecommons. org/licenses/by/3.0/).

172

Use of Abstraction and Logic in Mathematics

INTRODUCTION Sequent calculi for normal modal logics can be obtained uniformly from a basic calculus, as has been observed in [23]. The search for generalized cutfree sequent calculi for modal logics has produced display calculus ([3]), hypersequent calculus ([2]), labelled sequent calculus ([10]), hybrid logic calculus ([18]), and deep sequent calculus ([7, 8, 21, 22]). Among these efforts, there are two main approaches. One is the semantic approach; the other largely syntactic. In the semantic approach, labelled calculi exist for a number of complete modal logics. The syntactic approach does not use labels. Each sequent has an obvious corresponding formula. Ordinary sequent calculi and hypersequent calculi for modal logics are syntactic. Deep inference systems for modal logics, such as deep sequent calculi developed by Br ¨unnler [7, 8] and Stouppa [22], are also largely syntactic. There exists also deep inference for hybrid logic ([19]). The syntax of deep sequents is defined by assuming the negation normal form in classical modal logic and nested sequents. The central idea of deep inference is that deep structures are transformed into appropriate shapes at any position in a derivation that allows the transformation. It has turned out that cut-free sequent calculi can be developed systematically and modularly for normal modal logics. As often is the case, what is syntactic and what semantic may interestingly overlap, as is the case in the hybrid and two-sided approaches. Also in the graphical and diagrammatic systems the distinction between syntax and semantics is not, and was not originally meant to be by Peirce, razor-sharp, which professes to gain some flexibility when dealing with some more complicated and non-standard systems. The aim of the present paper is to provide a different kind of deep inference system for normal modal logics. The language is given by Peirce’s alpha and gamma graphs as presented in his theory of existential graphs (see e.g. [13, 14, 17, 24]). Graphs are scribed on the sheet of assertion. Inference rules are formulated as transformation rules from one graph to another graph. In non-modal propositional logic (alpha graphs) and first-order logic (beta graphs), there are basically only two general kinds of transformations: insertions to the graphs and erasures from the graphs. In graphical modal logic, there are two additional kinds of transformations: merges and splits. In a sense also merges and splits are instances of the operations of insertions and erasures. Thus the fundamental proof rules also in the modal extensions of graphical logic can be classified into two general classes. As usual, these

Graphical Sequent Calculi for Modal Logics

173

operations are allowed only in certain positions in a graph. It is the notion of a position that is made explicit in graphical logic. This makes such graphical calculi the natural home for deep inference. Peirce’s theory of existential graphs was generalized into conceptual graphs by Sowa [20] in 1984. Since then conceptual graphs have been widely used within artificial intelligence and cognitive science. Diagrammatic reasoning and their history and philosophy has been studied for many years (see e.g. [1, 15, 16]). As far as modal logics are concerned, van den Berg [4] defines a graphical system for modal logic K which is complete with respect to the Hilbert-style axiomatic system of K. Bra ¨uner [5] defines a Peircean graphical system for the modal logic S5, which is also complete with respect to the Hilbert-style axiomatic system of S5. This type of graphical system is also extended by Bra ¨uner and Øhrstrøm [6] to modal logics S4 and KD45. In distinction from the above works, the graphical systems for modal logics presented in this paper are shown to be equivalent to algebraic sequent systems. This means that a range of modal graphical systems can be developed in a systematic and modular fashion.

THE SYNTAX OF MODAL GRAPHS We fix a denumerable set of simple propositions Prop the elements of which are primitive graphs. They occur in a compound graph as basic parts. According to Peirce, the sheet of assertion, or the blank where nothing is scribed on it, is also a primitive graph. It corresponds to tautology ⊤. Henceforth, we denote the blank by SA or omit it altogether when no confusion arises. A primitive graph is a simple proposition or the blank (SA). The modal graphs are defined inductively from primitive graphs using and the broken cut . The two special notations: the continuous cut continuous cut means negation. The broken cut means logical contingency (non-necessity). The continuous and broken cuts are called primitive cuts uniformly. There are four combinations of cuts: (1)

Double continuous cut:

(2)

Double broken cut:

; ;

Use of Abstraction and Logic in Mathematics

174

(3)

Possibility cut:

;

(4) Necessity cut: . The compound cuts consist of two cuts, one nested within the other, with nothing between them. The two primitive cuts and the four compound cuts stated above are called cuts uniformly. They are used as single graph operations that form new graphs from the given ones. Definition 1. The set of all modal graphs

is defined inductively by:

where p ∈ Prop. The graphs are read as “the continuous cut of G” and “the broken cut of G” respectively. The graph G1 G2 is called the juxtaposition of G1 and G2 on the sheet of assertion. Henceforth, when we talk about graphs we mean modal graphs. Given two graphs G and H, we define shorthand notations , G ⊃ H and G ≡ H as below: Definition 2. For any graph G, the parsing tree of G, denoted by T(G), is defined inductively as follows: 1. 2. 3. 4.

T(p) is a single root node p. T(SA) is a single root node SA. T(G1G2) is a root node G1G2 with children nodes T(G1) and T(G2). is a root node

with one child node T(G).

is a root node with one child node T(G). 5. A partial graph of a graph G is a node in T(G). For any graph G, the history of a node J in T(G), denoted by h(J), is the unique path from the root to J. The position of the root is always on the sheet of assertion. We say that J is a positive (negative) node of T(G) if there is an even (odd) number of cuts in h(J). A position is a point on the area of a graph (but not on the boundary of the cut). Given any graph G, a position in G is positive (negative) if it is enclosed by an even (odd) number of cuts. Graph are scribed at positions. No two graphs, or their parts, can be scribed at the same position.

Graphical Sequent Calculi for Modal Logics

175

A graph context is a graph G{ } with a single slot { }, the empty context, which can be filled by other graphs. The notation G{H} stands for the graph obtained from the graph context G{ } by filling the slot by H. An occurrence of a graph J in a graph G is called positive (negative), notation G{J +} (G{J − }), if it is a positive (negative) node in T(G).

THE GRAPHICAL CALCULI KG

Graphical calculi for modal logics are presented by graphical rules. In general, a graphical rule is of the form

where G and H are graphs. The graph G is called the premiss, and H is called the conclusion. On the sheet of assertion, the syntax of graphs becomes diagrammatic. This means that the syntax is two-dimensional, it has no separate notation for parentheses, and that its well-formed graphs are scribed in the ambient space which is continuous, compact, open and non-oriented. The following equalities can be thought of as identifying graphs: The permutation (PM) says that to distinguish positions of H1 and H2 in a partial graph H1H2 of G has no significance. The associativity (AS) says that the order of forming the graphs indicated by the parentheses in these rules is likewise immaterial. After all, these equalities follow from the basic properties of the space and therefore need no separate statement in the system. Likewise, if two graphs, G and H, are asserted on the sheet of assertion, the juxtaposition of them, G H, is at once also asserted. The continuous and broken cuts have different meanings in general. However, the continuous cut of SA is tantamount to the broken cut of SA in the sense that it is impossible to falsify a tautology. Hence we assume the following equality:

This equality says that contradiction is impossible. Its algebraic meaning is the normality condition in modal algebras (Section 5).

Use of Abstraction and Logic in Mathematics

176

Definition 3. The graphical calculus Kg for the minimal normal modal logic K consists of the following axiom and graphical rules: 1. 2. •

Axiom: SA (The Sheet of Assertion) Alpha rules: Deletion:

Every positive partial graph H in a graph G can be deleted. •

Insertion:

Any graph can be inserted into a negative position in a graph G. •

Double cut:

Any partial graph H of a graph G can be replaced by the double cut of H, and vice versa. •

Iteration/deiteration:

where H{ } is a broken-cut-free graph context, namely, no broken cut occurs in H{ }. In a graph K{GH{J}}, the partial graph G can be iterated or deiterated at any position in H. 3. Modal rules:

(K1) and (K2) mean that the necessity cut distributes over juxtaposition. We call the rule (K1) splitting and (K2) merging. (DMN) is the rule of downward monotonicity. A proof of a graph G in Kg is a finite sequence of graphs G0,...,Gn such that Gn = G, and each Gi is either SA or derived from previous graphs by a rule in Kg. A graph G is provable in Kg, notation ⊢Kg G, if it has a proof in Kg. A graphical derivation of H from G is admissible in Kg, if ⊢Kg G implies

Graphical Sequent Calculi for Modal Logics

177

⊢Kg H.

Remark 1. The restriction on the context H{ } in (IT) and (DEIT) rules is significant. Iteration/deiteration in a modal context may lead to invalid inferences. For example, consider the following two inferences where the rules (IT) and (DEIT) are applied into the broken cut:

The premisses of (I) and (II) are valid, but their conclusions are not valid in the algebraic semantics for Kg (Section 5). (I) is a counterexample to the validity of iteration into broken cut, and (II) is a counterexample to the validity of deiteration from a broken cut. Lemma 1. The graphs G ⊃ SA and G ⊃ G are derivable in Kg. Proof. We have the following proofs:

This completes the proof. Proposition 1. The following rules are admissible in Kg: 1.

De Morgan rules:

2.

Contraposition and transitivity rules:

3.

Prefixing and Modus Ponens:

4.

Lattice rules:

Use of Abstraction and Logic in Mathematics

178

5.

Residuation rules:

6.

Distributivity:

7.

Upward monotonicity:

8.

Replacement of equivalents:

9.

Necessitation rule:

Proof. For (DM1) and (DM2), we have the following simple proofs that only use the double-cut rules:

(TR) is shown as follows:

Graphical Sequent Calculi for Modal Logics

179

For (D1) and (D2), we have the following proofs ([9]):

The rule (RE) is shown by induction on the construction of J{ } as follows. Assume G ≡ H. If J{ } = { }, the conclusion is the same as the premiss. Suppose

. By induction hypothesis, we have J

′{G} ≡ J ′{H}. Then it is easy to show . Assume J{ } = J1J2{ }. By induction hypothesis, we have J2{G} ≡ J2{H}. Then it is easy to show J1J2{G} ≡ J1J2{H}.

The rule (UMN) is obtained from (DMN) by the rule of contraposition (CP). (Nec) is shown by (PF), (UMN) and (TR). The other rules are easily shown. Theorem 1 (Cut-elimination). The following cut-elimination rule

is admissible in Kg. Proof. Clearly ⊢Kg J{SA}.

is provable in Kg. By (RE), we have . Assume

. By (TR), we have

EXTENSIONS Extensions of Kg can be obtained by adding some characteristic rules. The formulation of these characteristic rules will make use of the cuts, including the six cuts (two primitive and four combined ones) we introduced in Section 2. We say that the occurrence of a cut in a graph is positive (negative) if it is enclosed evenly (oddly) by primitive cuts (continuous or broken cuts). A normal modal graphical calculus is an extension of Kg with a set of graphical rules. Given a set of rules Σ = {Ri | i ∈ I}, the notation KΣ denotes the calculus generated by rules in Σ. Let us have the following rules of transformation as the basic rules for various systems of graphical modal logic:

180

Use of Abstraction and Logic in Mathematics

(D) Any positive necessity cut can be transformed into a possibility cut. Any negative possibility cut can be transformed into a necessity cut.

(T) Any positive continuous cut can be transformed into a broken cut. Any negative broken cut can be transformed into a continuous cut.

(4) Any positive necessity cut can be doubled. Any negative possibility cut can be doubled.

(B) Any positive double broken cut can be deleted. Any double broken cut can be inserted into a negative position.

(5) Any positive double broken cut can be transformed into a necessity cut. Any negative possibility cut can be transformed into a double broken cut.

Definition 4. Let (X) = {(X +),(X −)} for X ∈ {D,T,4,B,5}. We define the following graphical calculi:

Let S be any one of the systems in Definition 4. Let S + and S − be the systems obtained from S by dropping the negative and positive rules respectively

Graphical Sequent Calculi for Modal Logics

181

Theorem 2. S + = S = S −. Proof. Consider KT+ = Kg(T +). It suffices to show that (T −) is provable

in KT+. Assume that

is provable in KT+. There are two cases:

. First, it is easy to prove

Case 1.

. Then we have the following proof:

Case 2.

. We have the following proof:

Hence (T −) is provable in KT+. The remaining cases of S are shown similarly.

GRAPHICAL AND SEQUENT CALCULI The set of all modal formulas rule:

is defined by the following inductive

where p ∈ Prop. Other propositional connectives ⊥ ,∨,→ and ↔ are defined as usual. The dual operator of ♦ is defined as ♦α := ¬α. A basic sequent is an expression of the form α ⊢ β.

Definition 5. The basic sequent calculus SK consists of the following axioms and rules: (1)

Axioms:

Use of Abstraction and Logic in Mathematics

182

(2)

Rules for propositional connectives:

(3)

Modal rule:

By the standard Lindenbaum–Tarski construction, one can easily obtain the following completeness result: Theorem 3. A sequent is derivable in SK iff it is valid in all modal algebras. We shall present the translations between the modal language

and

the graphical language , and then prove the connections between the graphical calculus Kg and the sequent calculus SK. by

Definition 6. The translation

The translation

is defined inductively

is defined inductively by

The two translations π and σ are related to each other. The relationship can be presented by the following result:

Graphical Sequent Calculi for Modal Logics

Proposition 2. There are functions that the following diagrams commute:

183

such

i.e., π ◦σ = δ and σ ◦π = ρ. Proof. As we are using them later on, let us first define the two (redundant) functions δ and ρ as follows. Define the function δ inductively by: δ(p) = p, . By induction on δ(⊤) = ⊤, δ(ϕ1 ∧ϕ2) = δ(ϕ1)∧δ(ϕ2), and the construction of a modal formula ϕ one can easily show σ(π(ϕ)) = δ(ϕ). Hence (I) commutes. Define the function ρ inductively as follows:

By induction on the construction of a graph G one can easily show that σ(π(G)) = ρ(G). Hence (II) commutes. A formula context is a formula structure α{ } with a single slot { } which can be filled with a formula. Let α{β} be the formula obtained from α{ } by filling the slot by β. The notation α{β +} stands for that β is positive in α, i.e., β is in the scope of an even number of negation symbols. Similarly we use the notation α{β −}. Lemma 2. The following hold in SK: (1) (2) (3)

if α{β +} and β ⊢SK γ, then α{β} ⊢SK α{γ}. if α{β −} and β ⊢SK γ, then α{γ} ⊢SK α{β}. if β ⊢SK γ and γ ⊢SK β, then α{β} ⊢SK α{γ} and α{γ} ⊢SK α{β}. Proof. By induction on the construction of α{ }. We sketch the proof of (1) and (2) by simultaneous induction. The case α{ } = { } is obvious. Suppose α{β} := ¬α ′{β} and β ⊢SK γ. There are two cases:

Case 1. ¬α ′{β +}. Then α ′{β −}. By induction hypothesis, we have α ′{γ} ⊢SK α ′{β}. Then ¬α ′{β} ⊢SK ¬α ′{γ}.

Use of Abstraction and Logic in Mathematics

184

Case 2. ¬α ′{β −}. Then α ′{β +}. By induction hypothesis, we have α ′{β} ⊢SK α ′{γ}. Then ¬α ′{γ} ⊢SK ¬α ′{β}.

The case α{ } = α1{ }∧α2 or α{ } = α1 ∧α2{ } is obvious. Suppose α{ } = α ′{ } and β ⊢SK γ.

Assume α ′{β +}. Then by induction hypothesis we have α ′{β} ⊢SK α ′{γ}. Then by ( ) we have α ′{β} ⊢SK α ′{γ}. The case for α ′{β −} is similar. Lemma 3. For any graph G, if ⊢Kg G, then ⊤ ⊢SK π(G).

Proof. Assume ⊢Kg G. Let G0,...,Gn = G be a proof of G. We show ⊤ ⊢SK π(Gi) by induction on i ≤ n. If Gi is SA, clearly we have ⊤ ⊢SK π(Gi). Assume that Gi is obtained from G ′ by a rule (R). If (R) is an alpha rule, it is easy to get the conclusion by induction hypothesis and Lemma 2. Suppose that (R) is a modal rule. (1).

(R) = (K1) or (K2). Let . By induction .

hypothesis,

we

have

Clearly

. By Lemma 2 (3), we get ⊤ ⊢SK π(Gi). The case for (K2) is similar.

and G ′ = J{(H ⊃ (R) = (DMN). Let K) +}. By induction hypothesis, we have ⊤ ⊢SK π(J{(H ⊃ K) +}), i.e., ⊤ ⊢SK π(J){¬(π(H)∧π(K))}. Clearly, ¬(π(H)∧ ¬π(K)) ⊢SK . By Lemma 2 (1), we get ⊤ ⊢SK π(Gi). Lemma 4. For any formula α, if ⊤ ⊢SK α, then ⊢Kg σ(α). (2).

Proof. By induction on the derivation of ⊤ ⊢ α in SK. The proof is omitted. Lemma 5. For any graph G, ⊢Kg G iff ⊢Kg ρ(G).

Proof. By induction on the proof of G in Kg. The proof is omitted.

Theorem 4. For any graph G, ⊢Kg G iff ⊤ ⊢SK π(G).

Proof. The ‘only if’ part is obtained by Lemma 3. Assume ⊤ ⊢SK π(G). By Lemma 4, we have ⊢Kg σ ◦π(G). By Proposition 2, ⊢Kg ρ(G). By Lemma 5, ⊢Kg G.

Definition 7. A modal algebra is an algebra A = (A,∧,¬, ,1) where (A,∧,¬,1) is a Boolean algebra, and is a unary operator on A satisfying the conditions:

Graphical Sequent Calculi for Modal Logics

185

1. Additivity: for all a,b ∈ A, . 2. Normality: 1 = 1. Any formula α is interpreted as a function α A in a modal algebra A. A sequent α ⊢ β is valid in A if α A ≤ β A whatever elements of A are assigned to variables in α or β. By the standard Lindenbaum– Tarski construction, one can show the completeness of SK with respect to the class of all modal algebras, i.e., α ⊢SK β if and only if α ⊢ β is valid in all modal algebras (Theorem 3). A graph G is interpreted as the function G A = π(G) A. A graph G is valid in a modal algebra A if ⊤ ⊢ π(G) is valid in A. Then one can obtain the following completeness result:

Theorem 5. A graph G is provable in Kg iff it is valid in all modal algebras.

Proof. The soundness is shown by induction on the proof of G. For completeness, assume G. By Theorem 4, we have ⊤ π(G). By the completeness of SK, there is a modal algebra A with . Then G is not valid in A. For any set of modal formulas Σ, let Σ ≤ = {⊤ ⊢ α | α ∈ Σ}. Then we have the basic sequent calculus SKΣ ⊢ which is obtained from SK by adding all sequents in Σ ≤ as axioms. Let Alg(Σ) be the class of all modal algebras that validate all sequents in Σ ⊢ . Then the sequent system SKΣ ⊢ , if consistent, is sound and complete with respect to Alg(Σ). For any set of modal formulas Σ, consider the set of graphical rules Σ g = {⊤ ⊢ σ(α) | α ∈ Σ}. Let KgΣ g be the graphical calculus obtained from Kg by adding all rules in Σ g . For

Σ



{D,T,4,B,5},

where

, one can show that the calculus KgΣ g is equivalent to SKΣ ⊢ by the translation π. The proof is similar to Theorem 4. Moreover, the graphical calculi KgΣ g are sound and complete with respect to Alg(Σ).

CONCLUSION Graphical calculi for modal logics developed in the present paper are systematic and modular. They are modal graphical versions of Gentzenstyle sequent systems. They follow closely Peirce’s original presentation in another sense as well: the rules arise systematically from Peirce’s

186

Use of Abstraction and Logic in Mathematics

presentation of brokencut gamma graphs and their rules (R 467, 478). Only (DMN), (B) and (5) are new. In the basic system Kg, identifying a vacant broken-cut with a vacant continuous cut dispenses with necessitation as a primitive rule. Moreover, the basic rules are perfectly symmetrical. Thanks to the diagrammatic syntax, graphs need not assume negation normal form. Thus there are good prospects for developing deep inference proof systems for non-normal and intuitionistic modal logics in a similar fashion. The notions of position in the areas of cuts and the polarity of positions likewise result immediately from the diagrammatic language that these systems are built upon. Thus diagrammatic syntax can be considered to be an advantage when compared to languages and notations that are used in other deep inference systems. Labels are likewise not needed. As to some other future work, the specific sense of the cut-elimination process suggests that there are interesting decision procedures that we can get from proof searches in the proposed calculi. The desirable property is the subformula property, as well as a syntactic calculation of interpolants, among others.

ACKNOWLEDGEMENTS We want to thank the three reviewers for their helpful comments. The work of the first author is supported by the Chinese National Foundation for Social Sciences and Humanities (grant no. 16CZX049). The work of the second author is supported by the Academy of Finland (project 1270335) and the Estonian Research Council (project PUT 1305) (Principle Investigator A.-V. Pietarinen).

Graphical Sequent Calculi for Modal Logics

187

REFERENCES 1.

G. Allwein and J. Barwise (eds.) Logical Reasoning with Diagrams. Oxford University Press, 1996. 2. A. Avron. The method of hypersequents in the proof theory of propositional non-classical logics. In: W. Hodges, M. Hyland, C. Steinhorn, J. Truss (eds.) Logic: From Foundations to Applications. Proceedings of the Logic Colloquium, Keele, UK, 1993, pp. 1–32. Oxford University Press, New York, 1996. 3. N. D. Belnap. Display logic. Journal of Philosophical Logic, 11:375– 417, 1982. doi:10.1007/BF00284976. 4. H. van den Berg. Modal logics for conceptual graphs. In: Proceedings of First International Conference on Conceptual Structures. LNAI, vol. 699, pp. 413-429. Springer-Verlag, Berlin, 1993. 5. T. Bra ¨uner. Peircean graphs for the modal logic S5. In: M.-L. Mugnier and M. Chein (eds.), Conceptural Structures: Theory, Tools and Applications, Proceedings of Sixth International Conference on Conceptual Structures. LNAI, vol. 1453, pp. 255–269. SpringerVerlag, Berlin, 1998. doi:10.1007/BFb0054919 6. T. Bra ¨uner and P. Øhrstrøm. Towards a diagrammatic formulation of modal and temporal logic. In: F. Daoud (ed.), Working notes of AAAI’99 Fall Symposium on Modal and Temporal Logic-based Planning for Open Networked Multimedia Systems. AAAI, pp. 61–67. AAAI Press, North Falmouth, 1999. 7. K. Br ¨unnler. Deep inference and symmetry in classical proofs. PhD thesis, Technische Universit¨at Dresden, 2003. 8. K. Br ¨unnler. Deep sequent systems for modal logic. Archive for Mathematical Logic, 48:551–577, 2009. doi:10.1007/s00153-0090137-3 9. M. Ma and A.-V. Pietarinen. Peirce’s sequent proofs of distributivity. In: S. Ghosh and S. Prasad (eds.), Logic and Its Applications: Proceedings of the 7th Indian Logic Conference, LNCS 10119, 2017. doi:10.1007/978-3-662-54069-5 13. 10. S. Nergi. Proof analysis in modal logic. Journal of Philosophical Logic, 34:507–544, 2005. 11. C. S. Peirce. Lowell Lectures of 1903. Lecture IV. Manuscript at the Houghton Library of Harvard University, 1903. (R 467)

188

Use of Abstraction and Logic in Mathematics

12. C. S. Peirce. Lowell Lectures of 1903. Syllabus for Certain Topics of Logic. Manuscript at the Houghton Library of Harvard University, 1903. (R 478) 13. A.-V. Pietarinen. Peirce’s diagrammatic logic in IF perspective. In: A. Blackwell, K. Marriott and A. Shimojima (eds.), Diagrammatic Representation and Inference: Third International Conference, Diagrams 2004. LNAI, vol. 2980, pp. 97–111. Springer-Verlag, Berlin, 2004. doi:10.1007/978-3-540-25931-2 11 14. A.-V. Pietarinen. Signs of Logic: Peircean Themes on the Philosophy of Language, Games, and Communication, Springer, Dordrecht, 2006. 15. A.-V. Pietarinen. Moving Pictures of Thought II: Graphs, Games, and Pragmaticism’s Proof. Semiotica, 2011(186), 315–331, 2011. doi:10.1515/semi.2011.058 16. A.-V. Pietarinen. Extensions of Euler Diagrams in Peirce’s Four Manuscripts on Logical Graphs. In: M. Jamnik, Y. Uesaka and S. E. Schwartz (eds.), Diagrammatic Representation and Inference: Ninth International Conference, Diagrams 2016. LNAI, vol. 9781, pp. 139– 156. Springer-Verlag, Berlin, 2016. doi:10.1007/978-3-319-42333-3 11 17. D. D. Roberts. The Existential Graphs of Charles S. Peirce. Mouton, The Hague, 1973. 18. J. Seligman. The Logic of Correct Description. In: M. de Rijke (ed.), Advances in Intensional Logic, pp. 107-135. Kluwer, Dordrecht, 1997. doi:10.1007/978-94-015-8879-9 5 19. L. Straßburger. Deep Inference for Hybrid Logic. Proceedings of International Workshop of Hybrid Logic 2007, pp. 13–22. 20. J. F. Sowa. Conceptual Structures: Information Processing in Mind and Machine. Addison-Wesley, Reading, 1984. doi:10.1016/00043702(88)90069-0 21. C. Stewart and P. Stouppa. A systematic proof theory for several modal logics. In: R. Schmidt, I. PrattHartmann, M. Reynolds and H. Wansing (eds.). Advances in Modal Logic, vol. 5, pp. 309–333. King’s College Publications, London, 2005. 22. P. Stouppa. A deep inference system for the modal logic S5. Studia Logica, 85(2):199–214, 2007. doi:10.1007/s11225-007-9028-y 23. H. Wansing. Sequent systems for modal logics. In: Gabbay, D., Guenther, F. (eds.) Handbook of Philosophical Logic, vol. 8, 2nd

Graphical Sequent Calculi for Modal Logics

189

edition, pp. 61–145. Kluwer, Dordrecht, 2002. doi:10.1007/978-94010-0387-2 2 24. J. Zeman. The Graphical Logic of Charles S. Peirce. Ph.D. dissertation. University of Chicago, 1964. 25. J. Zeman. Peirce’s Graphs. In: D. Lukose et al. (eds), Proceedings of Fifth International Conference on Conceptual Structures, LNCS 1257, pp. 12–24. Springer-Verlag, Berlin, 1997. doi:10.1007/BFb0027877

Chapter

CATEGORICAL ABSTRACT ALGEBRAIC LOGIC: MEET-COMBINATION OF LOGICAL SYSTEMS

9

George Voutsadakis School of Mathematics and Computer Science, Lake Superior State University, Sault Sainte Marie, MI 49783, USA

ABSTRACT The widespread and rapid proliferation of logical systems in several areas of computer science has led to a resurgence of interest in various methods for combining logical systems and in investigations into the properties inherited by the resulting combinations. One of the oldest such methods is fibring. In fibring the shared connectives of the combined logics inherit properties from both component logical systems, and this leads often to inconsistencies. To deal with such undesired effects, Sernadas et al. (2011, 2012) have recently introduced a novel way of combining logics, called meet-combination, in which the combined connectives share only the common logical properties

Citation: (APA): Voutsadakis, G. (2013). Categorical Abstract Algebraic Logic: MeetCombination of Logical Systems. Journal of Mathematics, 2013. (9 pages). Copyright: © Creative Commons Attribution 3.0 Unported (https://creativecommons. org/licenses/by/3.0/).

192

Use of Abstraction and Logic in Mathematics

they enjoy in the component systems. In their investigations they provide a sound and concretely complete calculus for the meet-combination based on available sound and complete calculi for the component systems. In this work, an effort is made to abstract those results to a categorical level amenable to categorical abstract algebraic logic techniques.

INTRODUCTION The widespread and rapid proliferation of logical systems in several areas of computer science has led to a resurgence of interest in various methods for combining logical systems and in investigations into the properties inherited by the resulting combinations. One of the oldest methods for combining connectives is fibring [1]. In fibring one combines two logical systems by possibly imposing some sharing of common connectives or identification of connectives from the constituent logical systems. When such interaction occurs, the combined connectives inherit all properties of the components from both logical systems, and this leads often to inconsistencies. A typical example of this strong interaction is the combination of an intuitionistic negation from one logical system with a classical negation from another. The combined connective behaves like a classical negation, and this outcome defeats any intended purpose for the combination. Fibring has been studied substantially since its original introduction, and both its virtues and its vices are relatively well understood. For instance in [2], fibring was presented as a categorical construction (see also [3]), in [4] fibred logical systems were investigated from the point of view of preserving completeness, in [5] some work was carried out on the effect of fibring in logics belonging to specific classes of the classical abstract algebraic logic hierarchy [6–8], and more recently, in [9] fibring was employed to obtain some modal logics, first considered in [10], in a structured way and to draw some conclusions regarding their algebraic character. To avoid some of the drawbacks and undesired effects involved in the application of fibring, Sernadas et al. [11, 12] introduced, recently, another way of combining logical systems, called meet-combination, in which the combined connectives, instead of inheriting all properties they enjoy in the component logical systems, inherit only those properties that are common to both connectives. A very illuminating example of the difference that this entails as contrasted to the fibring method consists of the result of combining two logics and , one including a classical conjunction ∧ and one including a classical disjunction ∨, with the intention of obtaining a combined

Categorical Abstract Algebraic Logic: Meet-Combination of Logical...

193

connective “identifying” these two connectives from the component logics. Roughly speaking, if fibring is used, then, since in the combination the combined connective [∧∨] has all properties that are enjoyed by each of the connectives in either logic, the derivation

(1) shows that in the combined logic a single formula entails all other formulas; that is, there are only two possible theories, the empty theory and the entire set of formulas. On the other hand, this derivation would not be valid in the meet-combination of the two logics, since the afore-used Properties of Disjunction and Conjunction in

and

, respectively, are not shared

by ∧ in and by ∨ in , respectively. Commutativity, however, is a shared property, whence the derived rule [∧∨]/𝜓[∧∨]𝜙 is a derived rule of the meet-combination. In [11] Sernadas et al. start from a given logical system with a Hilbert style calculus and with a matrix semantics and define a new logic of the same that incorporates all meet-combinations of connectives of arity. Moreover, this system includes in a canonical way the connectives of the original logical system. Roughly speaking, the Hilbert calculus of the combination consists of all old Hilbert rules plus two new rules that ensure that the combined connectives inherit the common properties of the component connectives and only those properties. The matrix semantics consists, also roughly speaking, of the direct squares of the matrices in the original matrix semantics. In the main results, [11, Theorems 3.9 and 3.13], it is shown that soundness and a special form of completeness, called concrete completeness, are inherited in from . Moreover, Sernadas et al. [11] investigate in some detail the case of classical propositional logic, which constitutes the main motivation and paradigmatic example behind their work. Based on classical propositional calculus, they present several interesting examples, which, in addition, serve as illustrations for various sensitive points of the general theory. In the present paper, we adapt the framework of [11] to a categorical level, using notions and techniques of categorical abstract algebraic logic [13, 14]. Our main goal is providing a framework in which, starting from

194

Use of Abstraction and Logic in Mathematics

a 𝜋-institution whose closure system is axiomatized by a set of rules of inference, we may construct a new 𝜋-institution that includes, in a precise technical sense, natural transformations corresponding to meet-combinations of operations available in the original 𝜋-institution. The closure system of this new 𝜋-institution is created by essentially mimicking the process of [11] to create a new set of rules of inference, suitable for the new sentence functor, and by using this new set of rules to define the inferences in the newly created structure. Under conditions analogous to those imposed by Sernadas et al. in [11], we are also able to establish a form of soundness and a form of restricted completeness for the new system, with respect to a suitably constructed matrix system semantics, under the proviso that these properties are satisfied by the original system. We close this section by providing an outline of the contents of the paper. In Section 2, we introduce the basic notions underlying the framework in which our work will be carried out. The inspiration comes from categorical abstract algebraic logic [13, 14] and, more specifically, uses the notion of a category of natural transformations on a given sentence functor and, implicitly, many aspects of the theory of 𝑁-rule based 𝜋-institutions, where 𝑁 is a category of natural transformations on the sentence functor of the 𝜋-institution under consideration. A recent reference on this material is [15]. The reader should be aware that basic categorical notions are used rather heavily, but the elementary references to the subject [16–18] should be enough for necessary terminology and notation. In Section 3 the basic constructions that take after corresponding constructions in [11] are presented. Here the meet-combination of logical systems refers to logical systems based on sentence functors, whose “signatures” are categories of natural transformations on the sentence functors and whose rules of inference and model classes are all categorical in nature. The goal is to work in a framework that would be amenable to categorical abstract algebraic logic methods and techniques so as to be able to consider aspects drawing from both theories. In Sections 4 and 5, we show that a form of soundness and a form of restricted completeness are inherited by the meet-combination, subject to the condition that it is present in the components being combined. These results yield also results on conservativeness and on consistency, which are presented in Section 6.

Categorical Abstract Algebraic Logic: Meet-Combination of Logical...

195

Finally, based on the thorough work of [11], we present in Section 7 some examples showcasing various aspects of the general theory. These examples are relevant to both the theory developed in [11] and to its extension elaborated on in the present paper and, whenever appropriate, we draw attention to points where the two theories overlap and points where some differences occur.

BASIC FRAMEWORK In the sequel we consider an arbitrary but fixed category Sign, called the category of signatures, and an arbitrary but fixed Set-valued functor SEN : Sign → Set, called the sentence functor. Also into the picture in a critical way will be an arbitrary but fixed category 𝑁 of natural transformations on SEN, which we view as the clone of all algebraic operations on SEN. We remind the reader here of the precise definition of such a category, as presented, for example, in [15]. The clone of all natural transformations on SEN is defined to be the locally small category with collection of objects {SEN : 𝛼 an ordinal} and collection of morphisms 𝜏 : SEN𝛼 → SEN𝛽sequences of natural transformations 𝜏𝑖 : SEN𝛼 → SEN. Composition (2)

is defined by (3) A subcategory 𝑁 of this category containing all objects of the form SEN𝑘 for 𝑘 < 𝜔, and all projection morphisms 𝑝𝑘, : SEN𝑘 → SEN, 𝑖 < 𝑘, 𝑘 < 𝜔, with given by (4)

and such that, for every family {𝜏𝑖 : SEN → SEN : 𝑖 < 𝑙} of natural transformations in 𝑁, the sequence ⟨𝜏𝑖 : 𝑖 < 𝑙⟩ : SEN𝑘 → SEN𝑙 is also in 𝑁, is referred to as a category of natural transformations on SEN. 𝑘

A natural transformation : SEN𝑛 → SEN in 𝑁 is called a constant if, for

all Σ ∈ |Sign| and all

(5)

Use of Abstraction and Logic in Mathematics

196

If : SEN𝑛 → SEN is a constant, then we set

, to denote the

. value of the constant in SEN(Σ), which is independent of 0 𝑛−1 An 𝑁-rule of inference or simply an 𝑁-rule is a pair of the form ⟨{𝜎 ,...,𝜎 }, 𝜏⟩, sometimes written more legibly 𝜎0 ,...,𝜎𝑛−1/𝜏, where 𝜎0 ,...,𝜎𝑛−1, 𝜏 are natural transformations in 𝑁. The elements 𝜎𝑖, 𝑖 < 𝑛, are called the premises and 𝜏 the conclusion of the rule.

An -Hilbert calculus is a set of 𝑁-rules. Using the 𝑁-rules in , one may define derivations of a natural transformation 𝜎 in 𝑁 from a set Δ of . If the natural transformations in 𝑁. Such a derivation is denoted by is fixed and clear in a particular context, we might simply write calculus Δ ⊢ 𝜎. Given two functors SEN : Sign → Set and SEN’ : Sign’ → Set, with categories of natural transformations 𝑁, 𝑁’ on SEN, SEN’, respectively, a pair ⟨𝐹, 𝛼⟩, where 𝐹 : Sign → Sign’ is a functor and 𝛼 : SEN → SEN’ ∘ 𝐹 is a natural transformation, is called a translation from SEN to SEN’. Moreover, it is said to be (𝑁, 𝑁’)-epimorphic if there exists a correspondence 𝜎 ⟼ 𝜎’ between the natural transformations in 𝑁 and 𝑁’ that preserves projections (and, thus, also arities), such that, for all : SEN𝑘 → SEN , all Σ ∈ |Sign| and all , (6) An (𝑁, 𝑁 )-epimorphic translation from SEN to SEN’ will be denoted by ⟨𝐹, 𝛼⟩ : SEN → SEN’, with the relevant categories 𝑁, 𝑁’ of natural transformations on SEN, SEN’, respectively, understood from context. ’

An 𝑁-algebraic system •



consists of

a functor SEN’ : Sign’ → Set, with a category 𝑁’ of natural transformations on SEN’; an (𝑁, 𝑁’)-epimorphic translation ⟨𝐹, 𝛼⟩ : SEN → SEN’.

An 𝑁-matrix system or, simply, 𝑁-matrix consisting of

is a pair

; an 𝑁-algebraic system ’ an axiom family 𝑇 ∈ AxFam(SEN ) on SEN’, that is, a collection 𝑇 = {𝑇Σ}Σ∈|Sign’| of subsets 𝑇Σ ⊆ SEN’ (Σ), Σ ∈ |Sign’|. We perceive of the elements of SEN’ ((Σ)) as truth values for evaluating the natural transformations in 𝑁 and those of 𝑇𝐹(Σ) as being the designated • •

Categorical Abstract Algebraic Logic: Meet-Combination of Logical...

197

ones. An 𝑁-matrix semantics is a class of 𝑁-matrices. Given a natural transformation : SEN𝑘 → SEN in 𝑁, we set (7)

where 𝑓 ∈ Sign(Σ, Σ’) and 𝜎 at

semantics

. The matrix

under

, written

satisfies

, written

, if

. An 𝑁-rule 𝜎0,...,𝜎𝑛−1/𝜏 is a rule of an 𝑁-matrix (8)

if

, for all 𝑖 < 𝑛, implies

, for every 𝑁- matrix

, all Σ ∈ |Sign|, all Σ-assignments in , and all 𝑓 ∈ Sign(Σ, Σ’). If the semantics is clear from context, we simply write 𝜎0 ,...,𝜎𝑛−1 ⊨ 𝜏. In the remainder of this paper, by a logical system, or simply a logic, we , where understand a pentuple • • • • •

Sign is a category; SEN : Sign → Set is a sentence functor; 𝑁 is a category of natural transformations on SEN; is an 𝑁-Hilbert calculus; is a 𝑁-matrix semantics.

MEET-COMBINATIONS Let

be a logical system. Define the product

logical system or, simply, product logic as follows: the logic

has the same signature category Sign as

.

The sentence functor SEN× : Sign → Set is defined by setting (9) for all Σ ∈ |Sign|, and, similarly, for morphisms.

The category 𝑁× of natural transformations on SEN× has the same objects as 𝑁 and its morphisms (SEN × SEN) ≅ SEN𝑛 × SEN𝑛 into SEN

Use of Abstraction and Logic in Mathematics

198

× SEN are pairs of natural transformations 𝜎’, 𝜎’’ : SEN𝑛 → SEN in 𝑁. We call the members of 𝑁× the combined natural transformations or combined operations or, following [11], but rather apologetic for abusing terminology, combined connectives. Given : SEN𝑘 → SEN in 𝑁, we set given in 𝑁×, we set

in 𝑁× and, accordingly, (10)

Every 𝑁-rule 𝑟 = 𝜎0 ,...,𝜎𝑛−1/𝜏 gives rise to an 𝑁×-rule (11) The calculus is an “enrichment” of in the sense that it contains all , and some additional 𝑁×-rules devised for rules of the form , for dealing with the combined operations: •



for each

in 𝑁×, the lifting rule (LFT)

(12) is included in to enforce inheritance by in of all the ’ ’’ common properties of 𝜎 and 𝜎 in ; in 𝑁×, the special for each constant colifting rules (cLFT)

(13) are included in to enforce that should enjoy in only those properties that are common properties of 𝜎’ and 𝜎’’ in . The reason for allowing only the special co-lifting rules (i.e., ones that admit only constants), rather than the (general) co-lifting rules, is that, unless this restriction is imposed, the rules are not in general sound. This will become apparent in the analysis to follow. Before introducing the semantics of , we show, following [11], ’ ’’ that given constant natural transformations 𝜎 , 𝜎 : (SEN×)𝑘 → SEN× in 𝑁, the two combined constructors ⟨𝜎’, 𝜎’’⟩ and ⟨𝜎’’, 𝜎’⟩ are closely related.

Categorical Abstract Algebraic Logic: Meet-Combination of Logical...

199

Theorem 1 (Sernadas, Sernadas, and Rasga). Let be a logical system. Consider a constant natural transformation and

are interderivable in

.

in 𝑁× and set

. Then

Proof. Apply first cLFT twice and then LFT, in each direction. One gets the following proof:

(14) and be 𝑁-algebraic Let systems with the same underlying sentence functors and the same signature functor component : Sign → Sign’. Let SEN’× = SEN’ × SEN’ : Sign’ → Set be defined, for all Σ ∈ |Sign’|, by (15)

and similarly for morphisms, and let ⟨𝐹, 𝛼 × 𝛽⟩ : SEN → SEN’× be given, for all Σ ∈ |Sign|, by ×

(16) Denote by

the 𝑁×-algebraic system

(17) and

Moreover, given two 𝑁-matrix systems , let

(18) where

, such that, for all Σ ∈ |Sign’|, (19)

200

Use of Abstraction and Logic in Mathematics

The semantics is the class consisting of all 𝑁×-matrix systems of , for having underlying 𝑁-algebraic systems the form respectively, with the same underlying sentence functors and the same signature functor components. The semantics will be called the product semantics, taking after [11]. Finally, we let ⊢× and ⊨× stand for satisfaction and entailment in the . product logic

SOUNDNESS Recall that, given a natural transformation : SEN𝑘 → SEN in 𝑁, we use the notation to denote the natural transformation in 𝑁×. Proposition 2. Let system and consider the product system in

Suppose that where the 𝑖th component Moreover,

for

be a logical .

, and for all

, . Then (20)

all all Σ’ ∈ |Sign| and all 𝑓 ∈ Sign(Σ, Σ’),

(21)

Proof. We have the following equivalences: (22) iff iff

iff This proves the Proposition.

Categorical Abstract Algebraic Logic: Meet-Combination of Logical...

Proposition 3. Let 𝑁-rule ⟨{𝜎0 ,...,𝜎𝑛−1}, 𝜏⟩ is sound in is sound in . Proof.

Suppose

that are in

be a logical system. If the , then the 𝑁×-rule , and

so

that

, and for all 𝑖 < 𝑛. Then, by Proposition 2,

such that

(23)

for all 𝑖 < 𝑛. Thus, by soundness of ⟨{𝜎 ,...,𝜎 }, 𝜏⟩ in 0

and

201

, we get that

𝑛−1

. Therefore, again by Proposition 2,

and, hence,

is sound for

be

Let

in

and

of ,

.

𝑁× and suppose that . Then, by the definition

(24) Proposition 4. Let lifting rule LFT is sound in Proof. Suppose that

be a logical system. The . in

and

, such that, for some

and

, (25)

This

implies

and

that .

These

imply

that

, whence, by (24),

202

Use of Abstraction and Logic in Mathematics

(26) This proves the soundness of lifting. Let Sign be a category and SEN : Sign → Set a sentence functor with 𝑁 a category of natural transformations on SEN. Recall that a natural transformation 𝜎 : SEN𝑘 → SEN is called a constant if, for all Σ ∈ |Sign|, all and that we use the notation

for this value, which is independent of

,

.

is said to be a 𝑐-semantics if, for all A class of 𝑁-matrix systems and in , every 𝑘 constant 𝜎 : SEN → SEN in 𝑁 and all Σ ∈ |Sign|, (27) is a 𝑐-semantics if and only if every constant Intuitively, a semantics is consistently interpreted as true or false under all matrix systems in the semantics, that is, under all combinations of interpretations and designated truth values included in the semantics. Proposition 5. Let is a 𝑐-semantics. For all constants where × 𝑁 , the special co-lifting rules

be a logical system, in

(28) are sound in Proof. Let

. a constant in 𝑁×, Σ ∈ |Sign|, and , such that, for some Σ’ ∈ |Sign| and 𝑓 ∈ Sign(Σ, Σ’), (29)

Then (recalling the notation for constants) whence

,

Categorical Abstract Algebraic Logic: Meet-Combination of Logical...

203

(30) Since

is a 𝑐-semantics, we get that the four following relations hold:

(31) Therefore, we obtain that (32) which show that the special co-lifting rules are sound in

.

be a logical Theorem 6 (soundness). Let is a 𝑐-semantics. If is sound, then the product logic system, where is also sound. Proof. We have shown in Proposition 3 that all rules inherited by are . By Proposition 4, the lifting rule is sound in and, since sound in is assumed to be a 𝑐-semantics, by Proposition 5, the special co-lifting rules are sound in . Therefore the product logic is also sound.

C-COMPLETENESS A logic is 𝑐-complete if it is complete with respect to constant natural transformations. More precisely, for all sets Δ ∪ {𝜎} of constants in 𝑁, we have that (33)

Proposition 7. If a logic for all sets of constants Δ ∪ {𝜎} in 𝑁,

is 𝑐-complete, then, (34)

. Then, since includes all 𝑁×-rules of the form Proof. Suppose that , for all , we get that . Therefore, by the 𝑐-completeness of , we

204

Use of Abstraction and Logic in Mathematics

get that . Thus, there exists a model , together with and and such that showing that

. Hence, the model

and is also 𝑐-complete.

Proposition 8. Let that, for some Δ ∪ {𝜎, 𝜏} in 𝑁,

, such that is

. Therefore

,

be a logic and suppose

(35)

Then it is also the case that (36) . By the lifting rule, we must have Proof. Suppose that or . Therefore, by hypothesis, . Suppose, without loss of generality, that the first holds. Thus, there exists a model and

that

, such

(37) Thus, we must have

(38) or

(39)

Categorical Abstract Algebraic Logic: Meet-Combination of Logical...

This implies that either concludes the proof.

or

bears witness to

205

and

To formulate the following proposition we introduce a convenient notation: given a set Δ of natural transformations in 𝑁×, we write (40)

Proposition 9. Let some set of constants

be a logic and suppose for in 𝑁

×

(41)

Then it is also the case that (42) then, by the special co-lifting property,

Proof. If

. Thus, by hypothesis,

. Hence, there exists and 𝑓 ∈ Sign(Σ, Σ’),

such that

(43) while, at the same time, (44) These relations imply that

(45) whence

.

206

Use of Abstraction and Logic in Mathematics

Theorem 10 (𝑐-completeness). If the logic is 𝑐-complete, then the product logic 𝑐-complete also.

is

Proof. If is 𝑐-complete, then, by Proposition 7, we get that, for all sets of constants Δ ∪ {𝜎} in 𝑁, (46)

Thus, by Proposition 8, for all sets of constants s Δ ∪ {𝜎, 𝜏} in 𝑁, (47)

𝑁, ×

in

Finally, by Proposition 9, we get that, for all sets of constants

(48) This proves that

is 𝑐-complete.

CONSERVATIVENESS AND CONSISTENCY Theorem 11 (conservativeness). Let For every set of natural transformations Δ ∪ {𝜎} in 𝑁, Proof. Suppose

. If

some we get that and, therefore,

be a logic.

(49) is such that, for then,

, whence, by the hypothesis, . This shows that Δ ⊨ 𝜎.

Theorem 12 (consistency). If the logic consistent, then so is the product logic

Proof. This follows directly from conservativeness.

, is .

Categorical Abstract Algebraic Logic: Meet-Combination of Logical...

207

EXAMPLES FROM CLASSICAL PROPOSITIONAL LOGIC We present a simple example, essentially borrowed from [11], with the twofold goal of, first, seeing how the theory of [11] can be easily accommodated in the categorical framework (becoming actually a trivial case) and, second, showcasing the difference between the soundness of special co-lifting and the lack of soundness obtained by allowing the full power of the general co-lifting rule. is a logic, such that 𝑁 Suppose, first, that contains two binary natural transformations ∧, ∨ : SEN2 → SEN and two constants T, F that obey the usual laws of conjunction, disjunction, truth, and falsity of classical propositional logic. Then, if A, B ∈ {T, F}2, we have that (50)

This can be shown by observing that the hypothesis yields, by special co-lifting, and . These, by following usual derivations in , yield and , whence, by lifting, to consist, we finally obtain the conclusion. In fact, if we arrange for essentially, of Boolean algebras and evaluations together with Boolean filters, it is the case that (51) where are the two projection natural transformations; that is, “commutativity” is valid in general, not just for constants. However, the derivation (50) cannot be inferred directly from this using 𝑐-completeness, since there are nonconstant natural transformations involved. To illustrate, using the same example, that the general co-lifting rule fails, we may employ Boolean models to show that (52) In fact, note that (53) whereas

208

Use of Abstraction and Logic in Mathematics

(54) the first belonging to the product filter of 2-element Boolean algebras, the second failing to do so. Note, next, that (55) A straightforward computation shows that in the direct product of 2-element Boolean algebras, the left-hand side evaluates to (1, 1), whereas the right-hand side to (1, 0). Even though this serves as a counterexample for an analog of Theorem 1 concerning the exchangeability of components in the context of [11], this problem does not arise in our context. In fact, our reformulation of [11, Theorem 2.1] in the form of Theorem 1 would only ensure that (56) , one has the, possibly Suppose now that in derived, rule 𝜎/𝜏, where 𝜎, 𝜏 are both constants in 𝑁. Then it can be shown that (57) follows from the special co-lifting, whereas lifting In fact, helps establish the opposite direction

(58) Finally, if one has available in a disjunction ∨ and an implication →, both behaving classically, then, since both derived rules (59) are rules of of lifting.

, one obtains the rule

in

by an application

We close with a generally phrased (rather informally formulated) problem that would be of interest in the context developed in the present work from the point of view of abstract algebraic logic. For more details on

Categorical Abstract Algebraic Logic: Meet-Combination of Logical...

209

the motivations and the state of the art in that theory, as well as the precise definitions and more insights on the notions employed in the phrasing of this problem, the reader is referred to [13–15] and further references therein. Problem for Investigation. Suppose that we have some knowledge , about the algebraic classification of the 𝜋-institution is a logic in the sense of the present paper, where is possibly satisfying some additional conditions. The closure system the system induced by the set of 𝑁-rules, as detailed in, for example, [15]. What corresponding information may then be drawn about the 𝜋-institution , that corresponds, in a similar manner, to the product ? logic

210

Use of Abstraction and Logic in Mathematics

REFERENCES 1.

2.

3.

4.

5. 6. 7.

8. 9.

10. 11. 12.

13.

D. M. Gabbay, “Fibred semantics and the weaving of logics. I. Modal and intuitionistic logics,” Journal of Symbolic Logic, vol. 61, no. 4, pp. 1057–1120, 1996. A. Sernadas, C. Sernadas, and C. Caleiro, “Fibring of logics as a categorial construction,” Journal of Logic and Computation, vol. 9, no. 2, pp. 149–179, 1999. C. Caleiro, W. Carnielli, J. Rasga, and C. Sernadas, “Fibring of logics as a universal construction,” in Handbook of Philosophical Logic, vol. 13, pp. 123–187, 2nd edition, 2005. A. Zanardo, A. Sernadas, and C. Sernadas, “Fibring: completeness preservation,” Journal of Symbolic Logic, vol. 66, no. 1, pp. 414–439, 2001. V. L. Fernández and M. E. Coniglio, “Fibring in the Leibniz hierarchy,” Logic Journal of the IGPL, vol. 15, no. 5-6, pp. 475–501, 2007. W. J. Blok and D. Pigozzi, “Algebraizable logics,” Memoirs of the American Mathematical Society, vol. 77, no. 396, 1989. J. Czelakowski, Protoalgebraic Logics, vol. 10 of Trends in LogicStudia Logica Library, Kluwer Academic, Dodrecht, The Netherlands, 2001. J. M. Font, R. Jansana, and D. Pigozzi, “A survey of abstract algebraic logic,” Studia Logica, vol. 74, no. 1-2, pp. 13–97, 2003. M. A. Martins and G. Voutsadakis, “Malinowski Modalization, Modalization through Fibring and the Leibniz Hierarchy,” http://www. voutsadakis.com/RESEARCH/papers.html. J. Malinowski, “Modal equivalential logics,” The Journal of NonClassical Logic, vol. 3, no. 2, pp. 13–35, 1986. A. Sernadas, C. Sernadas, and J. Rasga, “On combined connectives,” Logica Universalis, vol. 5, no. 2, pp. 205–224, 2011. A. Sernadas, C. Sernadas, and J. Rasga, “On meet-combination of logics,” Journal of Logic and Computation, vol. 22, no. 6, pp. 1453– 1470, 2012. G. Voutsadakis, “Categorical abstract algebraic logic: algebraizable institutions,” Applied Categorical Structures, vol. 10, no. 6, pp. 531– 568, 2002.

Categorical Abstract Algebraic Logic: Meet-Combination of Logical...

211

14. G. Voutsadakis, “Categorical abstract algebraic logic: equivalent institutions,” Studia Logica, vol. 74, no. 1-2, pp. 275–311, 2003. 15. G. Voutsadakis, “Categorical abstract algebraic logic: algebraic semantics for pi-Institutions,” Mathematical Logic Quarterly. In press. 16. M. Barr and C. Wells, Category Theory for Computing Science, Les Publications CRM, Montreal, Canada, 3rd edition, 1999. 17. F. Borceux, Handbook of Categorical Algebra, Vol. I, Encyclopedia of Mathematics and Its Applications, Cambridge University Press, Cambridge, UK, 1994. 18. S. Mac Lane, Categories for the Working Mathematician, Springer, New York, NY, USA, 1971.

Chapter

FUZZY LOGIC VERSUS CLASSICAL LOGIC: AN EXAMPLE IN MULTIPLICATIVE IDEAL THEORY

10

Olivier A. Heubo-Kwegna Department of Mathematical Sciences, Saginaw Valley State University, 7400 Bay Road, University Center, MI 48710-0001, USA

ABSTRACT We discuss a fuzzy result by displaying an example that shows how a classical argument fails to work when one passes from classical logic to fuzzy logic. Precisely, we present an example to show that, in the fuzzy context, the fact that the supremum is naturally used in lieu of the union can alter an argument that may work in the classical context.

INTRODUCTION Rosenfeld in 1971 was the first classical algebraist to introduce fuzzy algebra by writing a paper on fuzzy groups [1]. The introduction of fuzzy groups then Citation: (APA): Heubo-Kwegna, O. A. (2016). Fuzzy Logic versus Classical Logic: An Example in Multiplicative Ideal Theory. Advances in Fuzzy Systems, 2016. (5 pages). Copyright: © Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/).

214

Use of Abstraction and Logic in Mathematics

motivated several researchers to shift their interest to the extension of the seminal work of Zadeh [2] on fuzzy subsets of a set to algebraic structures such as rings and modules [3–7]. In that regard, Lee and Mordeson in [3, 4] introduced the notion of fractionary fuzzy ideal and the notion of invertible fractionary fuzzy ideal and used these notions to characterize Dedekind domains in terms of the invertibility of certain fractionary fuzzy ideals, leading to the fuzzification of one of the main results in multiplicative ideal theory. Other significant introduced notions to tackle the fuzzification of multiplicative ideal theory are the notion of fuzzy star operation [8] and the notion of fuzzy semistar operation [9, 10] on integral domains. This paper is concerned with the fuzzification of multiplicative ideal theory in commutative algebra (see, e.g., [1, 3, 5, 8–11]). In the field of commutative ring, it is customary to use star operations not only to generalize classical domains, but also to produce a common treatment and deeper understanding of those domains. Some of the instances are the notion of Prüfer ⋆-multiplication domain which generalizes the notion of Prüfer domain [12] and the notion of ⋆-completely integrally closed domain which generalizes the notion of completely integrally closed domain [13, 14]. The importance of star operations in the classical theory has led scholars to be interested in fuzzy star operations introduced in [8] and this has been generalized to fuzzy semistar operations in [10]; this generalization has led to more fuzzification of main results in multiplicative ideal theory. In this note, we focus on some classical arguments of multiplicative ideal theory that do not hold in the fuzzy context. The example chosen is to infer that what appears to be pretty simple and even rather easy in the context of classical logic may not be true in the fuzzy context. So, one challenge of fuzzification is to detect any defect or incongruous statement that may first appear benign but is a real poison in the argument used to prove fuzzy statements. Precisely, in our example, we display the difficulty in how the natural definition in the fuzzy context may make it a little bit more challenging to work with in comparison with its equivalent classical definition. For an overview of all definitions of fuzzy submodules, fuzzy ideals, and fuzzy (semi)star operations (of finite character), the reader may refer to [8–10, 15].

PRELIMINARIES AND NOTATIONS Recall that an integral domain 𝑅 is a commutative ring with identity and no-zero divisors. Hence, its quotient ring 𝐿 is a field. A group (𝑀, +) is

Fuzzy Logic versus Classical Logic: An Example in Multiplicative ...

215

an 𝑅-module if there is a mapping 𝑅×𝑀 → 𝑀, (𝑟 , 𝑥) ⟼ 𝑟 𝑥, satisfying the following conditions: 1𝑥 = 𝑥; (𝑥 − 𝑦) = 𝑟 𝑥 − 𝑟 𝑦; and (𝑟 𝑡) = 𝑟 (𝑡𝑥) for all 𝑟 , 𝑡 ∈ 𝑅 and 𝑥, 𝑦 ∈ 𝑀, where 1 is the identity of 𝑅. Note that the quotient field 𝐾 of an integral domain 𝑅 is an 𝑅-module. An 𝑅-submodule 𝑁 of an 𝑅-module 𝑀 is a subgroup of 𝑀 such that 𝑟 𝑥 ∈ 𝑁 for all 𝑟 ∈𝑅 and 𝑥∈𝑁. For more reading on integral domains and modules, the reader may refer to [7, 15]. Recall also that a star operation on 𝑅 is a mapping 𝐴→𝐴⋆ of (𝑅) into (𝑅) such that, for all 𝐴, 𝐵 ∈ 𝐹(𝑅) and for all 𝑎 ∈ 𝐿 \ {0}, • (𝑎 )⋆ = (𝑎 ) and (𝑎 𝐴)⋆ = 𝑎 𝐴⋆; • 𝐴 ⊆ 𝐵 ⇒ 𝐴⋆ ⊆ 𝐵⋆; • 𝐴 ⊆ 𝐴⋆ and 𝐴⋆⋆ ≔ (𝐴⋆)⋆ = 𝐴⋆. For an overview of star operations, the reader may refer to [15, Sections 32 and 34].

A fuzzy subset of 𝐿 is a function from 𝐿 into the real closed interval [0, 1]. We say 𝛼 ⊆ 𝛽 if (𝑥) ≤ (𝑥) for all 𝑥 ∈ 𝐿. The intersection ⋂𝑖∈𝐼 𝛼𝑖 of the fuzzy subsets 𝛼𝑖’s is defined as ⋂𝑖∈𝐼 (𝑥) = ⋀𝑖 𝛼(𝑥) and the union ⋃𝑖∈𝐼 𝛼𝐼 of the fuzzy subsets 𝛼𝑖’s is defined as ⋃𝑖∈𝐼 𝛼𝑖(𝑥) = ⋁𝑖 𝛼(𝑥) for every 𝑥 ∈ 𝐿. Let 𝛽𝑡 = {𝑥 ∈ : 𝛽(𝑥) ≥ 𝑡}; then, 𝛽𝑡 is called a level subset of 𝛽. We let 𝜒𝐴 denote the characteristic function of the subset 𝐴 of 𝑅. A fuzzy subset of 𝐿 is a fuzzy 𝑅-submodule of 𝐿 if (𝑥 − 𝑦) ≥ (𝑥) ∧ 𝛽(𝑦), 𝛽(𝑟 𝑥) ≥ 𝛽(𝑥), and 𝛽(0) = 1, for every 𝑥, 𝑦 ∈ 𝐿 and every 𝑟 ∈ 𝑅. Note that a fuzzy subset 𝛽 of 𝐿 is a fuzzy 𝑅-submodule of 𝐿 if and only if (0) = 1 and 𝛽𝑡 is an 𝑅-submodule of 𝐿 for every real number 𝑡 in [0, 1]. Let 𝑑𝑡 denote the fuzzy subset of 𝐿 defined as follows: for each 𝑥 in 𝐿, (𝑥) = 𝑡 if 𝑥 = 𝑑 and 𝑑𝑡(𝑥) = 0 otherwise. We call 𝑑𝑡 a fuzzy singleton. A fuzzy 𝑅-submodule 𝛽 of 𝐿 is finitely generated if 𝛽 is generated by some finite fuzzy singletons; that is, it is the smallest fuzzy 𝑅-submodule of 𝐿 containing those fuzzy singletons. Throughout this paper, (𝑅) denotes the set of all fuzzy 𝑅-submodules of 𝐿 and 𝑓𝑧(𝑅) denotes the set of all finitely generated fuzzy 𝑅-submodules of 𝐿. Definition 1 (see [9]). A fuzzy semistar operation on 𝑅 is a mapping , which satisfies the following three properties for all , and 0 ≠ 𝑑 ∈ 𝐾: (⋆1) (𝑑1 ∘ 𝛽)⋆ = 𝑑1 ∘ 𝛽⋆;

(⋆2) 𝛼 ⊆ 𝛽 ⇒ 𝛼⋆ ⊆ 𝛽⋆;

(⋆3) 𝛽 ⊆ 𝛽⋆ and 𝛽⋆⋆ ≔ (𝛽⋆)⋆ = 𝛽⋆.

Use of Abstraction and Logic in Mathematics

216

Recall from [9] that a fuzzy semistar operation ⋆ on 𝑅 is union preserving

. Note that the preservation of union on ⋆ is over if a countable set. Now, define a mapping ⋆𝑓 from into as follows: (1)

Then, if ⋆ is a union preserving fuzzy semistar operation on 𝑅, then ⋆𝑓 is a fuzzy semistar operation on 𝑅 [9, Theorem 3.5]. This leads to the following definition. Definition 2 (see [9]). Let ⋆ be a fuzzy semistar operation on 𝑅. •

If ⋆𝑓 is a fuzzy semistar operation on 𝑅, then ⋆𝑓 is called the fuzzy semistar operation of finite character (or finite type) associated with ⋆. • ⋆ is called a fuzzy semistar operation of finite character if ⋆ = ⋆𝑓. Example 3. (1) It is clear by definition that (⋆𝑓) = ⋆𝑓; that is, ⋆𝑓 is of finite character whenever ⋆𝑓 is a fuzzy semistar operation on 𝑅 for any fuzzy semistar operation ⋆ on 𝑅. (2)

(3)

The constant map 𝛽 ⟼ 𝜒𝐿 is also trivially a fuzzy semistar operation on 𝑅 that is not of finite character. Let ℤ denote the set of all integers with quotient field ℚ of all rational numbers. Let 𝐿 = [0, 1] be the unit interval (note that the unit interval is a completely distributive lattice). Define by

(2) . Then, ⋆ is a fuzzy semistar operation on ℤ of finite for any character (the reader may refer to [9, Example 3.8. (2)] for the proof of this fact).

Fuzzy Logic versus Classical Logic: An Example in Multiplicative ...

217

FUZZY LOGIC VERSUS CLASSICAL LOGIC: AN EXAMPLE Recall from [9] that a fuzzy semistar operation ⋆ on 𝑅 is said to be union

. Note that the preservation of union preserving if on ⋆ is over a countable set. Also, recall the following result in [9]. Theorem 4 (see [9, Theorem 3.5]). Let ⋆ be a union preserving fuzzy semistar operation on 𝑅. Then, ⋆𝑓 is a fuzzy semistar operation on 𝑅.

Let 𝑅 be an integral domain with quotient field 𝐿. Recall that (𝑅) denotes the set of all fuzzy 𝑅-submodules of 𝐿 and 𝑓𝑧(𝑅) denotes the set of finitely generated fuzzy 𝑅- submodules of 𝐿. Now, we claim that we could not get rid of the assumption in Theorem 4 because we could not use the fuzzy counterpart of the following classical argument below.

The Fuzzy and Classical Statements A Classical Argument Let 𝐵 be a submodule of 𝑅 in 𝐿 and let 𝐼 be a finitely generated submodule and 𝐴 ⊆ 𝐵}, where ⋆ is a classical of 𝑅 in 𝐿 such that semistar operation on 𝑅. Then, 𝐼 is contained in some with 𝐴𝑖 ∈ (𝑅) and 𝐴𝑖 ⊆ 𝐵. This classical argument is a well-known simple argument in multiplicative ideal theory. In fact, suppose we set 𝐼 = ⟨𝑥1,...,⟩ as a finitely generated ideal such that and 𝐴 ⊆ 𝐵}. Then, for each with 𝐴𝑖 ∈ (𝑅) and 𝐴𝑖 ⊆ 𝐵. So, . Now, using the well-known facts that 𝐴 ⊆ 𝐴⋆ , for a classical semistar operation ⋆ on 𝑅, we and . Now, since each 𝐴𝑖 is finitely generated, the obtain that finite sum of 𝐴𝑖’s is also finitely generated and this completes the proof.

The Fuzzy Counterpart of the Above Classical Argument

Let 𝛽 be a fuzzy 𝑅-submodule of 𝐿 and let 𝛼 be a finitely generated fuzzy 𝑅-submodule of 𝐿 such that 𝛼 ⊆ ⋃{𝛾⋆ |𝛾 ∈ 𝑓𝑧(𝑅) and 𝛾 ⊆ 𝛽}, where ⋆ is a fuzzy semistar operation on 𝑅. Then, 𝛼 is contained in some , with 𝛾𝑖 ∈ 𝑓𝑧(𝑅) and 𝛾𝑖 ⊆ 𝛽.

218

Use of Abstraction and Logic in Mathematics

A Counterexample to Negate the Fuzzy Counterpart We now produce an example to prove that the fuzzy counterpart statement is false. Note that the reason why the counterpart may be false is clearly the fact that the union in the fuzzy context is the supremum. So, the real challenge here is to construct a counterexample that will clearly justify the wrongness of the argument.

The Counterexample Let ⋆ be the fuzzy semistar operation of finite character as defined in Example 3(4): by

(9) for any

.

Let ℚ denote the quotient field of all rational numbers. We define : ℚ → [0, 1] (note that the unit interval is a completely distributive lattice), and we use the known fact that 𝛽 is a fuzzy ℤ-submonoid of ℚ if and only if 𝛽𝑡 is a ℤ-submonoid of ℚ for any 𝑡 ∈ [0, 1] and 𝛽(0) = 1. Let : ℤ → [0, 1] be defined by (4) where sgn is the signature function and |𝑛| denotes the absolute value of 𝑛. It is easy to see that 𝑓 (𝑛) → 1 for 𝑛→∞ and 𝑓 (𝑛) → 0 for 𝑛 → −∞. Consider an infinite sequence of ℤ-submodules of ℚ as follows: (5) Obviously, 2 ℤ is a ℤ-submodule of ℚ for any 𝑛 ∈ ℤ, since 𝑟 2𝑛 − 𝑠2𝑛 = (𝑟 − 𝑠)2𝑛 and 𝑟 (𝑠2𝑛) = (𝑟 𝑠)2𝑛. Moreover, 2𝑛ℤ ⊆ 2𝑚ℤ, whenever 𝑚 < 𝑛. Indeed, 𝑟 2𝑛 = (𝑟 2𝑛−𝑚)2𝑚, which implies 𝑟 2𝑛 ∈ 2𝑚ℤ. Then, one can define 𝛽 for any 𝑥 ∈ ℚ by 𝑛

Fuzzy Logic versus Classical Logic: An Example in Multiplicative ...

219

(6) Note that if 𝑥 ∉ 2𝑛ℤ for any 𝑛 ∈ ℤ, then (𝑥) = 0 (e.g., other hand,

). On the

(7) Since 0 ∈ 2𝑛ℤ for any 𝑛 ∈ ℤ (and 0 is the unique element having this property), a consequence of the supremum is (0) = 1. Thus, (𝑥) = 1 if and only if 𝑥 = 0. It should be noted that if 0 < (𝑥) < 1, then there exists 𝑛 ∈ ℤ such that (𝑥) = 𝑓(𝑛). Indeed, it is easy to see from definition of 𝑓 and 𝑏𝑒𝑡𝑎 and there exist 𝑚, 𝑚’ ∈ ℤ such that 0 < (𝑚’) < (𝑥) < 𝑓(𝑚) that < 1 (see the remark above about the convergence of 𝑓 to zero and one); therefore, 𝑥 ∈ 2𝑚’ℤ and 𝑥 ∉ 2𝑚ℤ. Hence, (𝑥) = (𝑛) for a suitable 𝑛 ∈ ℤ for which 𝑚’ < 𝑛

To demonstrate that 𝛽 is a fuzzy ℤ-submodule of ℚ, let us first consider the cases 𝑡 = 0 and 𝑡 = 1. One can simply check that 𝛽0 = ℚ and 𝛽1 = {0}. If 𝑡 ∈ (0, 1), then there exists exactly one 𝑛 ∈ ℤ such that (𝑛−1) < 𝑡 ≤ (𝑛) and 𝛽𝑡 = 2𝑛ℤ. Therefore, each 𝑡-level subset of 𝛽 is a ℤ-submonoid of ℚ. Since . Now, let us consider

(8) where 𝛽 and ⋆ have been defined above. Put . Let us show that ⋆ . Since (𝑥) = 0 for any we have 𝛽 (𝑥) = 0; therefore, . To show the opposite inequality, let 𝑥 ∈ 𝕂 and without loss of generality let and . Consider for 𝑛 = 𝑛0

and + 1, 𝑛0 + 2, . . .. It is easy to see that 𝛼𝑛 is finitely generated ℤ-submodule of ℚ for any 𝑛 > 𝑛0. Moreover, (𝑥) = 𝛽(𝑥) and 𝛼𝑛(𝑥𝑛) = 𝛽(𝑥𝑛), where 𝛽(𝑥𝑛) > 𝛽(𝑥) > 0. Then, by definition of ⋆, we obtain

220

Use of Abstraction and Logic in Mathematics

(9) Now, let us demonstrate the false argument. Let us consider 𝛼 = ⟨𝑥1⟩ for some 𝑥 ∈ 𝕂 \ {0}. Obviously, 𝛼 ∈ (ℤ) and . According to our false argument, it should be true that “since 𝛼 is finitely generated, 𝛼 is contained with 𝛾𝑖 ∈ (ℤ) and 𝛾𝑖 ⊆ 𝛽.” But this is impossible, because in finitely many for any choice of finitely many 𝛾1,...,𝛾𝑛 ∈ 𝑓𝑧(ℤ) we can find 𝑥1,...,𝑥𝑚 ∈ 𝕂 (it is sufficient to consider elements of 𝕂 that are used for generating 𝛾1,...,𝛾𝑛) such that

.

Final Remark The proof of the classical argument holds due to the fact that the classical union is involved allowing the choice of a finitely generated 𝑅-submodule of 𝐿 for each element of 𝐼. However, in the fuzzy counterpart statement, the fuzzy union is defined in terms of the supremum and the technique used in the proof of the classical argument cannot apply in the fuzzy context since clearly does not imply the existence of with . We must also note that the fuzzy counterpart statement is the natural one that grasps some thoughts about the context in which the crisp result can be extended. In fact, the condition of union preserving of fuzzy star operation, that is, , which does not always hold in the fuzzy context is not needed in the crisp case to get a classical finite character semistar operation. This additional condition of union preserving of fuzzy star operation will make our fuzzy counterpart statement true.

Fuzzy Logic versus Classical Logic: An Example in Multiplicative ...

221

REFERENCES 1. 2. 3.

4.

5.

6.

7. 8.

9.

10. 11. 12.

13.

A. Rosenfeld, “Fuzzy groups,” Journal of Mathematical Analysis and Applications, vol. 35, pp. 512–517, 1971. L. A. Zadeh, “Fuzzy sets,” Information and Computation, vol. 8, pp. 338–353, 1965. K. H. Lee and J. N. Mordeson, “Fractionary fuzzy ideals and fuzzy invertible fractionary ideals,” Fuzzy Sets and Systems, vol. 5, pp. 875– 883, 1997. K. H. Lee and J. N. Mordeson, “Fractionary fuzzy ideals and Dedekind domains,” Fuzzy Sets and Systems. An International Journal in Information Science and Engineering, vol. 99, no. 1, pp. 105–110, 1998. W. J. Liu, “Operations on fuzzy ideals,” Fuzzy Sets and Systems. An International Journal in Information Science and Engineering, vol. 11, no. 1, pp. 31–41, 1983. P. Lubczonok, “Fuzzy vector spaces,” Fuzzy Sets and Systems. An International Journal in Information Science and Engineering, vol. 38, no. 3, pp. 329–343, 1990. H. Matsumura, Commutative Ring Theory, Cambridge University Press, Cambridge, UK, 1986. H. Kim, M. O. Kim, S.-M. Park, and Y. S. Park, “Fuzzy star-operations on an integral domain,” Fuzzy Sets and Systems, vol. 136, no. 1, pp. 105–114, 2003. O. A. Heubo-Kwegna, “Fuzzy semistar operations of finite character on integral domains,” Information Sciences, vol. 269, pp. 366–377, 2014. O. A. Heubo-Kwegna, “Fuzzy semistar operations on integral domains,” Fuzzy Sets and Systems, vol. 210, pp. 117–126, 2013. J. N. Mordeson and D. S. Malik, Fuzzy commutative algebra, World Scientific Publishing, Singapore, Asia, 1998. G. W. Chang, “Prüfer ∗-multiplication domains, Nagata rings, and Kronecker function rings,” Journal of Algebra, vol. 319, no. 1, pp. 309–319, 2008. D. D. Anderson, D. F. Anderson, M. Fontana, and M. Zahfrullah, “On v-domains and star operations,” Communications in Algebra, vol. 2, pp. 141–145, 2008.

222

Use of Abstraction and Logic in Mathematics

14. E. G. Houston, S. B. Malik, and J. L. Mott, “Characterization of ∗-multiplication domains,” Canadian Mathematical Bulletin, vol. 27, pp. 48–52, 1984. 15. R. Gilmer, Multiplicative Ideal Theory. Corrected Reprint of the 1972 Edition, vol. 90 of Queen’s Papers in Pure and Applied Mathematics, Queen’s University, Kingston, Canada, 1992.

Chapter

LINK PREDICTION USING A PROBABILISTIC DESCRIPTION LOGIC

11

José Eduardo Ochoa Luna1, Kate Revoredo2, and Fabio Gagliardi Cozman1 Escola Politécnica, Universidade de São Paulo, Av. Prof. Mello Morais 2231, São Paulo, SP, Brazil 2 Departamento de Informática Aplicada, Unirio, Av. Pasteur, 458, Rio de Janeiro, RJ, Brazil 1

ABSTRACT Due to the growing interest in social networks, link prediction has received significant attention. Link prediction is mostly based on graph-based features, with some recent approaches focusing on domain semantics. We propose algorithms for link prediction that use a probabilistic ontology to enhance the analysis of the domain and the unavoidable uncertainty in the task (the ontology is specified in the probabilistic description logic ). The scalability of the approach is investigated, through a combination of Citation: (APA): Luna, J. E. O., Revoredo, K., & Cozman, F. G. (2013). Link prediction using a probabilistic description logic. Journal of the Brazilian Computer Society, 19(4), 397-409. (13 pages). Copyright: © Creative Commons Attribution 2.0 International License (https:// creativecommons.org/licenses/by/2.0).

224

Use of Abstraction and Logic in Mathematics

semantic assumptions and graph-based features. We evaluate empirically our proposal, and compare it with standard solutions in the literature. Keywords: Link prediction, Probabilistic logic, Description logics

INTRODUCTION Many social, biological, and information systems can be well described as networks, where nodes represent objects (individuals), and links denote the relations or interactions between nodes. Predicting a possible link in a network is an interesting issue that has received significant attention. For instance, one may be interested in finding potential friendships between two persons in a social network, or a potential collaboration between two researchers. In short, link prediction aims at predicting whether two nodes should be connected, given previous information about their relationships or interests. Mohammad and Mohammed [18] survey representative link prediction methods, classifying them into three groups. In the first group, feature-based methods construct pairwise features to use in classification. The majority of the features are extracted from the graph topology by computing similarity based on the neighborhood of the pair of nodes, or based on ensembles of paths between the pair of nodes [15]. Semantic information has also been used as features [26, 32]. The second group includes probabilistic approaches that model the joint probability for entities in a network by Bayesian graphical models [31]. The third group employs linear algebraic approaches that compute the similarity between nodes in a network by rankreduced similarity matrices [14]. We present an approach for link prediction that combines Bayesian graphical models and semantic-based features. Hence, our proposal belongs to the first two categories mentioned in the previous paragraph. To represent semantic-based features, we employ a probabilistic description logic called ( ) [5]. This probabilistic description logic extends the Credal popular logic [27] with probabilistic inclusions. These are sentences, such as P(Professor | Researcher) = 0.4, specifying the probability that an element of the domain is a Professor given that it is a Researcher. Exact have been proposed [5], and approximate inference algorithms for using ideas inherited from the theory of Relational Bayesian Networks [12]. We benefit from such algorithms, and add some techniques to make our approach scalable to real domains. We also present experimental validation

Link Prediction Using A Probabilistic Description Logic

225

of our proposal. The paper is organized as follows. Section 2 reviews basic concepts of probabilistic description logics and of link prediction. Our proposals for a scalable semantic link prediction approach appear in Sect. 3. Section 4 describes experiments, and Sect. 5 concludes the paper and discusses some future work.

BACKGROUND This section briefly review probabilistic description logics and link prediction methods, with a focus on concepts and techniques that are later used.

Probabilistic Description Logics and Description logics (DLs) form a family of representation languages that are typically decidable fragments of first-order logic (FOL) [3]. Knowledge is expressed in terms of individuals, concepts, and roles. The semantics of a description is given by a domain D (a set) and an interpretation⋅I (a functor). Individuals represent objects through names from a set NI = {a, b, …}. Each concept in the set NC = {C, D, …} is interpreted as a subset of a domain DD. Each role in the set NR = {r, s, …} is interpreted as a binary relation on the domain. An assertion states that an individual belongs to a concept of that a pair of individuals satisfies a role. An ABox is a set of assertions. A popular description logic is [27]; given its importance to our proposal, we briefly review it here. Constructors in are conjunction (C ⊓ D), disjunction (C ⊔ D), negation (¬C), existential restriction (∃r. C), and value restriction (∀r. C). Concept inclusions and definitions are denoted respectively by C ⊑ D and C ≡ D, where C and D are concepts. Concept C ⊔ ¬C is denoted by ⊤, and concept C ⊓ ¬C is denoted by ⊥ . The semantics of these constructs is given by a domain D and an interpretationI as follows: each individual a is mapped into an element aI; each concept C is mapped into a subset CI of the domain; each role r is mapped into a binary relation rI in the domain; moreover, • • •

Use of Abstraction and Logic in Mathematics

226

• • Finally, C ⊑ D is interpreted as .

and C ≡ D is interpreted as

An example may be useful. Consider the following concept definition: (1)

specifying that researchers are individuals who are persons and who have published a bibliographic item. Several probabilistic description logics have appeared in the literature [13, 17]; here we just indicate a few representative proposals. Heinsohn [11] and Sebastiani [28] consider probabilistic inclusion axioms such as

meaning that a randomly selected object is a Professor with probability αα. This characterizes a domain-based semantics: probabilities are assigned to subsets of the domain D. Sebastiani also allows inclusions such as P(Professor(John)) = α, specifying probabilities over the interpretations themselves. For example, one interprets P(Professor(John)) = 0.001 as assigning 0.001 to be the probability of the set of interpretations where John is a Professor. The latter semantics characterizes an interpretation-based semantics. is a probabilistic extension The probabilistic description logic of the description logic that adopts an interpretation-based semantics. , but only allows concept names on the left It keeps all constructors of one can have hand side of inclusions/definitions. Additionally, in probabilistic inclusions such as P(C|D) = α or P(r) = β for concepts C and D, and for role r (in this paper we only consider equality in probabilistic inclusions/definitions). If the interpretation of D is the whole domain, then we simply write P(C) = α. The semantics of these inclusions is roughly (a formal definition can be found in Ref. [5]) given by:

Link Prediction Using A Probabilistic Description Logic

227

We assume that every terminology is acyclic: no concept uses itself (where “use” is the transitive closure of “directly use”; we say that C directly uses D if D appears in the right hand side of an inclusion/definition, or in the conditioning side of a probabilistic inclusion). This assumption allows one through a directed acyclic graph. Such a to represent any terminology , has each concept name and role name as a node, and graph, denoted by if a concept C directly uses concept D, that is if C and D appear respectively in the left and right hand sides of an inclusion/definition, then D is a parent . Each existential restriction ∃r. C and each value restriction ∀r. of C in as a node, with an edge from r and C to each C is added to the graph restriction directly using it. Each restriction node is a deterministic node in that its value is completely determined by its parents. Consider, as an example, a terminology containing the sentence in Expression (1), plus P(Person) = 0.2, P(BibItem) = 0.6, P(hasPublication) = 0.1; its graph is depicted in Fig. 1.

Figure 1. Graph

.

The semantics of is based on probability measures over the space of interpretations, for a fixed domain. To make sure a terminology specifies a single probability measure, a number of additional assumptions are adopted: the domain is assumed finite, fixed, and known; the unique-name assumption and the rigidity assumption for individuals (as usual in firstorder probabilistic logic [6]) are assumed; a single concept name appears in the left hand side of any inclusion or definition and in the conditioned side of any probabilistic inclusion; and finally a Markov condition imposes independence of any grounding of concept/role conditional on the [5]. Given these groundings of its corresponding parents in the graph in defines a relational Bayesian assumptions, a set of sentences . network [12] whose underlying graph is exactly

228

Use of Abstraction and Logic in Mathematics

Consider the following example. Suppose we have terminology and , There are several possible sets of assertions that domain are obtained by grounding. For instance, {Person(bob), Researcher(bob), BibItem(paper), hasPublication(bob, paper)}. The assumptions discussed in the previous paragraph induce a single probability measure over the set of all assertions (groundings), because they induce a Bayesian network over indicator variables of assertions. For example, for domain D = {bob, paper}, Fig. 2 depicts the Bayesian network over indicator variables of assertions (for the sake of space, names are abbreviated; for instance, hP denotes hasPublication; b denotes bob, and so on). To simplify notation, the indicator function of assertion C(a) is indicated simply by C(a), instead of the more usual convention IC(a) = true.

Figure 2. Bayesian network over indicator functions of assertions, produced by . grounding the terminology

Inferences, such as for an ABox , can be computed by grounding, thus generating a Bayesian network where one “slice” is built for each individual. For instance, in the Bayesian network depicted in Fig. 2 two slices, one for individual bob and another for individual paper, are built. For large domains, exact probabilistic inference is in general quite hard. Variational algorithms that approximate such probabilities are available in the literature [5].

Link Prediction The task we are interested in can be defined as follows [15]. One is given a network (a graph) G consisting of a set of nodes V (represented by letters a, b, etc.) and a set of edges E, where an edge represents an interaction between nodes. Interactions may be tagged with times, and the link prediction

Link Prediction Using A Probabilistic Description Logic

229

problem may be one of predicting the existence of edges in a time interval, given the edges observed in another time interval. Here we are interested in a static problem where we are given nodes and edges, except for the edge between two nodes A and B, and we must then predict whether there is an edge between A and B. Many different tools are used for link prediction, some of which, like matrix factorization, are related to the massive size of datasets; other tools are directly related to the existence of links between nodes. One can use classifiers that, based on network features and measures, classify each tentative link as existing or not [18]; one may also resort to collective classification over the whole set of possible links [7]. Several such techniques are based on computing measures of proximity/similarity between nodes in a network [15, 16]. One of them is the Katz measure [15], a weighted sum of the number of paths in the graph between two given nodes, with higher weights assigned to shorter paths:

where pi is the number of paths of length i connecting A and B, while β ∈ (0, 1] weighs the paths—a small value of β favors shorter paths. Another notable proximity measure is the Adamic–Adar measure [1], given by:

where Γ(X) be the set of all neighbors of node X. The intuition behind the Adamic–Adam measure is that, instead of simply counting the number of neighbors shared by two nodes, we should emphasize common neighbors that have less neighbors themselves. Other approaches to link prediction consider semantic features. The degree of semantic similarity among entities can be useful to predict links that might be missed by simple topological or frequency-based features [31]. One way of capturing semantic similarity is by considering documents related to nodes in the network. A simple example of semantic similarity is the keyword match count between two authors [10]. A more sophisticated method makes use of the well-known techniques such as TFIDF feature vector representation and the cosine measure to compute similarity [31]. The latter measure, for documents d1 and d2, is obtained by creating vector

230

Use of Abstraction and Logic in Mathematics

representations and that contain word counts weighted by their TFIDF (Term Frequency −− Inverse Document Frequency) measures. The similarity measure is then

where the dot product is used in the numerator and the Euclidean distance is used in the denominator. To recall, the TFIDF weighting scheme assigns to term t a weight in document d given by TFIDFt,d= TFt,d × IDFt, where TFt,d is the term frequency in d, and IDFt is the inverse document frequency of t, given by , for N the total number of documents and DFt the number of documents containing the term. Approaches to link prediction can be understood not only by considering the kinds of tools employed, but also by examining the model that is used to represent the network as a whole. Typically, one assumes some sort of probabilistic mechanism that at least partially explains the existence of edges, perhaps together with domain-specific knowledge (for instance, domain theories about human relationships) [9, 19]. Thus the simplest network model is the Erdös–Rènyi random graph: each pair of nodes can be connected with identical probability. More sophisticated models resort to hierarchical specification of link probabilities, or to grouping of nodes within blocks of varying probability. One way to capture the probabilistic structure of a network is through graph-based models such as Markov random fields or Bayesian networks [23]. However, these languages are well suited to express independence relations between a fixed set of random variables; when nodes and links are to be dealt within graphs, it is best to consider modeling languages that can specify Markov random fields and Bayesian networks over relational structures. Indeed many proposals for link prediction resort to such languages, from seminal work by Getoor et al. [8] and Taskar et al. [29]. The presence of relational structure lets one to represent properties of individuals nodes, of links, of communities; one can then compute the probability of specific links, and estimate such probabilities from data. In this paper, we follow this modeling strategy; the difference between our modeling language and previous proposals is that we adopt a language based on description logics, as already indicated in the previous section. Our interest in models based on description logics is justified given recent results on the importance of

Link Prediction Using A Probabilistic Description Logic

231

ontologies in organizing information that can be used in link prediction [2, 4, 30].

LINK PREDICTION WITH Given a network G where many links are observed, one is interested in predicting whether a link between nodes a and b exists (presumably the linkage between a and b has not been observed). We address this problem by considering, in addition to topological information about the network, knowledge about the domain concerning network entities. To do so, domain . knowledge is represented through a probabilistic ontology using Among the concepts (NC) and roles (NR) in the ontology, there is a concept that indicates which elements of the domain are nodes in G, and a role that indicates which pairs of elements are linked—hence and describe the network itself, while other concepts and roles describe the remaining domain knowledge. In our experience, it is important to explicitly indicate which elements of the domain are nodes, to make sure inference runs only with the required elements (in effect this is providing a type that separates network nodes from other elements of the domain). For example, in a coauthorship network, nodes represent researchers and relationships may be “has a publication with” or “is advised by”. An , is shown in Fig. 3. ontology for such a domain, represented by The ontology describes publications, using concepts such as Researcher and Publication, and using roles such as hasPublication, hasSameInstitution, sharePublication. Nodes in the network instantiate a concept (for instance Researcher), while links in the network instantiate a role (for instance sharePublication).

Figure 3. A probabilistic ontology for the co-authorship domain, and an ABox.

The semantic link prediction task proposed in this paper can be described as: compute the probability of an assertion concerning a particular role of

232

Use of Abstraction and Logic in Mathematics

interest, given an ABox of asserted concepts and roles involving nodes , in the network. Because domain knowledge is expressed with questions about probability of assertions can be answered by inference in . For instance, the question “what is the probability of Emily and Ann share a publication given some information about the domain?” can be , where represents the translated into information about the domain. If this probability is higher than a suitable threshold, then a link is included.

Algorithm 1: Algorithm for link prediction: evidence is the complete set of assertions

Our first link prediction algorithm is described in Algorithm 1.Footnote1 The algorithm starts by going through all pairs of instances of the concept (that is, all nodes). For each pair, it checks whether a link between the corresponding nodes exist in the network; if not, the probability of the link is computed using the relational Bayesian network extracted from the ontology . If the probability is greater than a threshold, then the corresponding link is added to the set of suggested links. (Alternatively, when the threshold is not given, a list of links, ranked by their probability, can be produced.) The evidence is the given set of assertions; the size of this set has great impact in inference effort. When inferences are computed, the ontology is turned into a relational Bayesian network, whose grounding is a Bayesian network—each assertion may generate a new slice of nodes in this grounded Bayesian network. Approximate algorithms are necessary for inference; in this work we employ the variational inference method described in Ref. [5]. While one can suppose that more assertions lead to more accurate predictions, the computational effort involved in inference may be so large as to generate bad approximations. Hence it is important to filter out assertions and to focus on the most relevant ones.

Link Prediction Using A Probabilistic Description Logic

233

We are interested in predicting a relationship between two specific nodes, a and b. Therefore, assertions directly related to these two objects and to other objects strongly related to them in the network are more relevant for link prediction than assertions on other objects in the network. We can make our link prediction algorithm scalable if we only consider assertions about a, b and about the objects strongly related to them in our inferences. To do so, we must specify the set strongly related to a and b.

of elements of the domain that are deemed

Liben-Nowell and Kleinberg [15] compute similarities between two nodes using ensembles of paths between the two nodes (so as to decide whether to include a link between the nodes). It seems reasonable to adopt the same strategy, and define to contain nodes in paths between a and b (although we could consider all possible paths between two nodes, compute this could be expensive. Hence, we restrict ourselves to a path size of five). Therefore, in Algorithm 1 the evidence must be specialized for each pair of nodes; given a and b, the set relevant assertions are then collected into E.

must be constructed and the

The resulting link prediction algorithm is described in Algorithm 2. Experiments with this algorithm, using real data, are reported in the next section.

Algorithm 2: Algorithm for link prediction: evidence on nodes and strongly related elements of the domain

EXPERIMENTS Experiments have been conducted to evaluate our approach to semantic link prediction. A real world data repository, the Lattes curriculum platform, was used. Our algorithm was combined with state-of-the-art classifiers for link prediction. This section reports the steps involved in this process.

234

Use of Abstraction and Logic in Mathematics

Scenario Description The Lattes platform is the public repository of Brazilian scientific curricula that consists of approximately a million registered documents. Information is encoded in HTML format, ranging from personal information such as name and professional address to publication lists, administrative tasks, research areas, research projects and advising/advisor information. There is implicit relational information in these web pages; for instance, collaboration networks are built by advising/adviser links, shared publications, and so on. To perform experiments we have randomly selected eight thousand researchers that are associated with eight research areas. Table 1 depicts these research areas. Table 1. Research areas and number of co-authored collaboration Research area Agricultural Sciences Biological Sciences Exact and Earth Sciences Human Sciences Social Sciences Health Sciences Engineering Languages and Arts

Code A1 A2 A3 A4 A5 A6 A7 A8

Number 17,157 23,222 18,440 2,281 4,462 17,255 10,879 1,315

Assertions were extracted from the Lattes platform concerning these researchers. For instance, if a parser finds that a researcher John has four publications (p1, p2, p3, p4) and a researcher Mary has two (p2, p5), where p2 was done in collaboration with John, then assertions, as the following, are extracted:

A probabilistic ontology was then learned using algorithms in the literature [20, 24]. This ontology is comprised by 24 probabilistic inclusions and 17 concept definitions. Because learning is mainly concerned with deterministic and probabilistic inclusions, the learned ontology was enlarged with 4 relevant roles. Parts of the final ontology can be seen in Figs. 3 and 4.

Link Prediction Using A Probabilistic Description Logic

Figure 4. A probabilistic ontology

235

for the Lattes domain.

In this probabilistic ontology, concepts and probabilistic inclusions typically denote mutual research interests. In short, in this ontology a ResearcherLattes is a person that has publications, advises other people and participates on examination boards. On the other hand, a SupervisionCollaborator is a probabilistic inclusion which denotes a kind of researcher that was advised for another researcher. The SameInstitution concept denotes researchers that work at the same institution. Seemingly, the SameBoard concept denotes researchers that have participated on same examination boards. The NearCollaborator is a probabilistic inclusion that denotes researchers working at the same institution that have shared publications. The FacultyNearCollaborator is a near collaborator that also participates of same examination boards. The NullMobilityResearcher concept denotes researchers which have low mobility, i.e., they remain at the same institution where they were advised. The StrongRelatedResearcher denotes strong relationship between two researchers (advisor and advisee) which also share publications. The concept Researcher indicates whether an element of the domain is a node in the network (hence Researcher is ) and the role sharePublication indicates whether a pair of elements of the domain is linked in the network (hence sharePublication is ). Topological graph information was computed using the assertions for ResearcherResearcher and sharePublicationsharePublication. Figures 5, 6 and 7 depict collaboration networks within research areas in our dataset.

236

Use of Abstraction and Logic in Mathematics

Figure 5. Collaborations patterns in research areas (1,000 researchers): Social Sciences.

Figure 6. Collaborations patterns in research areas (1,000 researchers): Human Sciences.

Link Prediction Using A Probabilistic Description Logic

237

Figure 7. Collaborations patterns in research areas (1,000 researchers): Languages and Arts.

Figure 8. Lattes collaboration network: subset of collaborations among researchers.

238

Use of Abstraction and Logic in Mathematics

Using this data, link probabilities were computed through inference in ontology. To illustrate inference, consider Fig. 8 which depicts a the subset of collaborations among researchers. If we inspect this collaboration graph we could be interested, for instance, in checking links among researchers from different groups. Since filling forms in the Lattes platform is prone to errors, there is uncertainty regarding real collaborations. Thus, in Fig. 8 one could further investigate whether a link between researcher R (rectangle node) and the researcher B (triangle node) is suitable. The probability of a possible link between R and B was computed, P(sharePublication(R, B)|E), where E contains evidence about researchers such as publications, institution, examination board participations and so on. Using evidence one obtains

One could obtain more evidence, such as information about nodes that indirectly connect these two groups (Fig. 8), denoted by I1, I2. Consider:

In order to compare our approach with existing algorithms, topological and semantic features have also been defined, as discussed in the following sections.

Methodology In this section, we describe our main design choices to run experiments. Given the 8,000 selected researchers, there exist 31,996,000 possible link relationships. To perform link prediction we have considered collaborations based on coauthorship on publications (there are 2,837,206 publications). After analysing these publications we identified 95,100 true positive links among researchers based on co-authorship. Table 1 details true coauthorship collaborations for every research area. Given these true relationships, we have defined three datasets. The first one, Lattes I, where true links for all eight research areas were considered,

Link Prediction Using A Probabilistic Description Logic

239

provides some general analysis. In the second and third datasets, Lattes II and Lattes III, only true links for one of the eight researcher areas were considered, allowing some specific analysis. Biological Sciences and Exact and Earth Sciences research areas were chosen, since they are the ones with more collaborations. According to cross validation principles, every dataset must be divided in training and validation sets. To avoid skewness (due to unbalanced classes), all dataset were balanced,Footnote2 thus they have the same quantity of positive and negative examples. The positive examples were randomly chosen from the true links and the same number of negative examples were randomly collected, where negative examples means that there is not a link between the nodes. Table 2 details the three datasets. Table 2. Lattes datasets: number of positive (+) and negative (−−) examples Name Lattes I (General) Lattes II (Biological Sciences) Lattes III (Exact and Earth Sciences)

# Examples (+/−+/−) 90,000 20,000 18,000

Although we can use probabilistic inference to decide whether there is a link between two nodes, to perform comparisons with previous approaches we resort to a classification algorithm approach. This paradigm allow us to combine several metrics (topological, semantic and probabilistic) as features of a classification algorithm. In this sense, we can compare which feature is more relevant by adding, deleting and combining features and observing the classification results. To perform classification we resort to the Logistic regression algorithm. Which outputs values between 0 and 1 (due the logistic function) and prevent us from doing feature normalization. A threshold of 0.5 was used to decide a classification. The features used in the classification for link prediction (defined in Sect. 2.2) are commonly extracted from topological graph properties such as neighborhood and paths between nodes. In addition, numerical features are also computed from joint probability distributions and semantics. The two baseline graph-based numerical feature, Katz and AdamicAdar measures, have been used in our experiments. For the first one, since computing all paths (∞) is expensive we only consider paths of length at most four (i ≤ 4).

Use of Abstraction and Logic in Mathematics

240

We have also considered semantic features. In this work, for each researcher a document with the words appearing in the title of his publications (removing stop words) is considered. Thus, a researcher is represented as a set of words, which allow us to compute two features based on semantic similarity: • •

The keyword match count between two researchers [10]. The cosine between the TFIDF features vectors of two researchers [31]. Finally, the probability P(r(x, y)|E), given by our probabilistic description logic model, is also used as a numerical feature in the classification model, in order to investigate whether it can improve the classification approach for link prediction.

Results In order to evaluate suitability of our approach in predicting coauthorships in the Lattes dataset, several experiments were run. The experiments were performed in three stages, considering incrementally, topological, semantic and probabilistic-logic scores. In the first stage we evaluate topological scores. Two baseline scores, Katz and Adamic-Adar, have been used as features in the logistic regression algorithm. After a ten-fold cross validation process, the classification algorithm yielded results on accuracy which are depicted in Table 3 (stage 1). Table 3. Classification results for datasets Lattes I, Lattes II and Lattes III on accuracy (%) for baseline features: Adamic-Adar (Adamic), Katz, Word matchand a combination of them ing (Match), Cosine, Stage

Feature

Lattes I (acc.)

Lattes II (acc.)

Lattes III (acc.)

l

Adamic

83.34 ± 1.87

82.5 ± 1.35

81.23 ± 1.46

Katz

85.4 ± 1.07

87.7 ± 0.91

84.43 ± 0.84

Adamic + Katz

85.9± 1.12

87.75± 1.03

85.44± 0.78

Match

75.42 ± 1.66

73.42 ± 2.66

72.8 ± 0.47

2

Link Prediction Using A Probabilistic Description Logic

3

Cosine

89.35 ± 1.28

90.4 ± 1.37

86.7± 0.85

Adamic + Katz + Match + Cosine

91.63± 1.23

90.69± 1.23

86.3 ± 0.12

Cralc

93.3 ± 0.79

94.2 ± 1.48

89.72 ± 1.67

Adamic + Katz + Match + 93.89± Cosine + Cralc 0.83

94.46± 0.83

90.2± 0.72

241

Bold values indicate the best result in the corresponding stage For all three Lattes dataset, the Katz feature yields the best accuracy when the two topological features are used in isolation. Katz has been shown to be among the most effective topological measures for the link prediction task [15]. Furthermore, when we combine the two features, we improve all three accuracy. In the second stage, we evaluate two features based on semantic similarity and their combination with topological features. Results on accuracy for these semantic features are depicted in Table 3 (stage 2). The cosine similarity feature performs better than matching keyword feature and outperforms the two former topological features. When we combine all four features together, there is an improvement in accuracy considering datasets Lattes I and Lattes II. Dataset Lattes III was indifferent to the combination of all four features. was Finally, in the third stage, a probabilistic feature based on introduced into the model. Results on accuracy for this feature are depicted in Table 3 (stage 3), showing it performs better than all other features. Moreover, there is significant improvement in accuracy considering datasets Lattes 1 and Lattes 2, when all five features are combined. It is worth noting that the probabilistic logic feature used in isolation outperforms all other features and allows us to improve the classification model for link prediction on accuracy. It could be argued that such performance stems from evidence used on probabilistic inferences, but a similar analysis could be done for topological and semantic features. They use information that is missing on a probabilistic description logic setting. In conclusion, despite the fact that all features have different approaches, experimental results showed that they can be successfully used together. Nothing prevents us from defining ad-hoc probabilistic networks to estimate link probabilities. However, by doing so we are expected to define a large propositionalized network (a relational Bayesian network) [25] or

242

Use of Abstraction and Logic in Mathematics

estimate local probabilistic networks [31]. These approaches do not scale well, since computing probabilistic inference for large networks is expensive. To overcome these performance and scalability issues, we resort to which is based on variational methods—tuned by lifted inference in evidence defined according to the nodes’s neighborhood. Thus, for a 10,000 possible nodes, if evidence is given for 5 nodes (this is the neighborhood for a given link candidate), then there are only 6 slices which have messages interchanged. To instantiate the overall network, we use local evidence to perform inference for every link candidate, i.e., neighborhood evidence is instantiated accordingly. (10,000 In our experiments, the average runtime for inference in nodes network) was 168 ms. Table 4 depicts some runtime results for larger networks, which demonstrates the scalability of our approach. A direct grounding of the ontology into a propositional Bayesian network would generate an unmanageably large model. Table 4. Average runtime for inference in nodes in the network # Nodes 10,000 100,000 10,000,000

considering the number of Runtime (ms) 168 175 185

CONCLUSION In this paper, we have introduced a link prediction method that combines graph-based and ontological information through the use of a probabilistic description logic. Given a collaborative network, we encode interests and probabilistic ontology. To predict links, graph features through a we resort to probabilistic inference—thus we combine and extend previous work on relational probabilistic models of link prediction, and on ontologybased link prediction. To make the proposal scalable we propose a novel strategy for approximating link probabilities: for each pair of nodes, we focus only on evidence collected along paths between them. Our proposal was evaluated on an academic domain, where links among researchers were predicted. Moreover, the approach was successfully compared with graphbased and semantic-based features.

Link Prediction Using A Probabilistic Description Logic

243

Compared to previous work, our approach employs a rich ontology (as opposed to simple is-a terminologies) that can encode substantial information about the domain. Hierarchical structure can be encoded together with knowledge about specific nodes in a network—we plan to explore richer ontologies in the future. Moreover, our proposal attains better scalability than previous proposals that have tried to explore probabilistic relational models for similar purposes.

ACKNOWLEDGMENTS The third author is partially supported by CNPq. The work reported here has received substantial support by FAPESP Grant 2008/03995-5 and FAPERJ Grant E-26/111484/2010. Thanks to Jesus Pascual Mena Chalco for providing us datasets and figures of the Lattes research areas.

Notes •



This algorithm was first discussed in Ref. [25], and later refined, together with Algorithm 2, in Refs. [22] and [21]; the presentation is here further refined. Some experiments and results reported here appeared in those preliminary publications; in this paper we also describe novel experiments with significantly larger datasets. The problem of class skewness, imbalance in the class distribution, give rise to poor performance of a supervised learning algorithm [18]. To cope with this issue, existing research suggests several different approaches, such as altering the training sample by upsampling or down-sampling, i.e., balancing.

244

Use of Abstraction and Logic in Mathematics

REFERENCES 1. 2.

3. 4. 5.

6. 7. 8. 9. 10.

11. 12. 13.

Adamic L, Adar E (2001) Friends and neighbors on the web. Soc Netw 25:211–230 Aljandal W, Bahirwani V, Caragea D, Hsu H (2009) Ontologyaware classification and association rule mining for interest and link prediction in social networks. In: AAAI 2009 Spring symposium on social semantic web: where web 2.0 meets web 3.0. Standford, CA Baader F, Nutt W (2007) Basic description logics. In: Description logic handbook. Cambridge University Press, Cambridge, pp 47–100 Caragea D, Bahirwani V, Aljandal W, Hsu W (2009) Ontology-based link prediction in the livejournal social network. In: SARA’09, p 1 Cozman FG, Polastro RB (2009) Complexity analysis and variational inference for interpretation-based probabilistic description logics. In: Proceedings of the twenty-fifth conference annual conference on uncertainty in artificial intelligence (UAI-09). AUAI Press, Corvallis, Oregon, pp 117–125 Fagin R, Halpern JY, Megiddo N (1990) A logic for reasoning about probabilities. Inf Comput 87:78–128 Getoor L, Diehl CP (2005) Link mining: a survey. ACM SIGKDD Explor Newsl 7(2):3–12 Getoor L, Friedman N, Koller D, Taskar B (2002) Learning probabilistic models of link structure. J Mach Learn Res 3:679–707 Goldenberg A, Zheng AX, Fienberg SE, Airoldi EM (2010) A survey of statistical network models. Found Trends Mach Learn 2(2):129–233 Hasan MA, Chaoji V, Salem S, Zaki M (2006) Link prediction using supervised learning. In: Proceedings of SDM 06 workshop on link analysis, counterterrorism and security Heinsohn J (1994) Probabilistic description logics. In: International conference on uncertainty in artificial intelligence, pp 311–318 Jaeger M (2002) Relational Bayesian networks: a survey. Linkoping Electr Artic Comput Inf Sci 6 Klinov P (2008) Pronto: A non-monotonic probabilistic description logic reasoner. In: The semantic web research and applications, pp 822–826

Link Prediction Using A Probabilistic Description Logic

245

14. Kunegis J, Lommatzsch A (2009) Learning spectral graph transformations for link prediction. In: Proceedings of the ICML, pp 561–568 15. Liben-Nowell D, Kleinberg J (2007) The link prediction problem for social networks. J Am Soc Inf Sci Technol 7(58):1019–1031 16. Lu L, Zhou T (2011) Link prediction in complex networks: a survey. Physica A 390:1150–1170 17. Lukasiewicz T, Straccia U (2008) Managing uncertainty and vagueness in description logics for the semantic web. Semant Web J 6(4):291–308 18. Mohammad A, Mohammed J (2011) A survey of link prediction in social networks. In: Social network data analytics, pp 243–275 19. Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45:167–256 20. Ochoa-Luna J, Revoredo K, Cozman F (2011) Learning probabilistic description logics: a framework and algorithms. In: Proceedings of the MICAI, LNCS, vol 7094. Springer, Berlin, pp 28–39 21. Ochoa-Luna J, Revoredo K, Cozman F (2012) An experimental evaluation of a scalable probabilistic description logics approach for semantic link prediction. In: Bobillo F et al (eds) Proceedings of the 8th international workshop on uncertainty reasoning for the semantic web, vol 900. CEUR-WS.org, Shangai, China,analytics, pp 63–74 22. Ochoa-Luna J, Revoredo K, Cozman F (2012) A scalable semantic link prediction approach through probabilistic description logics. In: Proceedings of 9th artificial intelligence national meeting (ENIA) 23. Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, Sananalytics, Francisco 24. Revoredo K, Ochoa-Luna J, Cozman F (2010) Learning terminologies in probabilistic description logics. In: da Rocha Costa A, Vicari R, Tonidandel F (eds) Advances in artificial intelligence SBIA, (2010) Lecture Notes in Computer Science, vol 6404. Springer/Heidelberg, Berlin, pp 41–50 25. Revoredo K, Ochoa-Luna J, Cozman F (2011) International workshop on URSW, semantic link prediction through probabilistic description logics. In: Bobillo F et al (eds) Proceedings of the 7th international workshop on URSW, vol 778, pp 87–97

246

Use of Abstraction and Logic in Mathematics

26. Sachan M, Ichise R (2011) Using semantic information to improve link prediction results in network datasets. Int J Comput Theory Eng 3:71–76 27. Schmidt-Schauss M, Smolka G (1991) Attributive concept descriptions with complements. Artif Intel 48:1–26 28. Sebastiani F (1994) A probabilistic terminological logic for modelling information retrieval. In: ACM conference on research and development in information retrieval (SIGIR), pp 122–130 29. Taskar B, Wong MF, Abbeel P, Koller D (2003) Link prediction in relational data. In: Proceedings of neural information processing systems 30. Thor A, Anderson P, Raschid L, Navlakha S, Saha B, Khuller S, Zhang XN (2011) Link prediction for annotation graphs using graph summarization. In: The semantic web-ISWC, pp 714–729 31. Wang C, Satuluri V, Parthasarathy S (2007) Local probabilistic models for link prediction. In: Proceedings of the 2007 seventh IEEE ICDM. IEEE Computer Society, Washington, DC, USA, pp 322–331. doi:10.1109/ICDM.2007.108 32. Wohlfarth T, Ichise R (2008) Semantic and event-based approach for link prediction. In: Proceedings of the 7th international conference on practical aspects of knowledge management

Chapter

REASONING ABOUT SOCIAL CHOICE AND GAMES IN MONADIC FIXED-POINT LOGIC

12

Ramit Das1, R. Ramanujam1, Sunil Simon2 IMSc (HBNI), Chennai, India Department of CSE IIT Kanpur, Kanpur, India

1 2

Whether it be in normal form games, or in fair allocations, or in voter preferences in voting systems, a certain pattern of reasoning is common. From a particular profile, an agent or a group of agents may have an incentive to shift to a new one. This induces a natural graph structure that we call the improvement graph on the strategy space of these systems. We suggest that the monadic fixedpoint logic with counting, an extension of monadic first-order logic on graphs with fixed-point and counting quantifiers, is a natural specification language on improvement graphs, and thus for a class of properties that can be interpreted across these domains. The logic has an efficient model checking algorithm (in the size of the improvement graph).

Citation: (APA): Das, R., Ramanujam, R. & Simon, S. (2019). Alternation Is Strict For Higher-Order Modal Fixpoint Logic. L.S. Moss (Ed.): TARK 2019. (15 pages). Copyright: © Creative Commons Attribution 3.0 Unported (https://creativecommons. org/licenses/by/3.0/).

248

Use of Abstraction and Logic in Mathematics

INTRODUCTION A logical study of game theory aims at exposing the assumptions and reasoning that underlies the basic concepts of game theory. This involves the study of individual, rational, strategic decision making between presented alternatives (in the non-cooperative setting). One potential form of reasoning in such a situation is to envisage all possible strategic choices by others, consider one’s own response to each, then others’ response to it in their turn, and so on ad infinitum, with Nash equilibrium representing fixed-points of such iteration. Such reasoning, which we might call improvement dynamics, is similar to but distinct from rational decision making under uncertainty; it is also similar to but distinct from epistemic reasoning. The former is about optimization, selecting the ‘best’ option in light of one’s information; the latter is about ‘higher order information’ involving information about others’ information etc. Improvement dynamics intends to yield the same end results as these, but operates at a more operational, computational level, and reasoning about it can be seen as reasoning at the level of computations searching for equilibria. In this sense, logic is seen as a succinct language for describing computational structure, rather than as a deductive system of reasoning by agents. In spirit, the role of such logics is similar to that of logics in descriptive complexity theory. If we were to talk of the descriptive complexity of game theoretic equilibrium notions, it would need to account for the implicit improvement dynamics embedded in the solution concept. Interestingly, several contexts in social choice theory embed such improvement dynamics as well. When we aggregate individual choices or preferences into social choices / preferences, or decide on social action (like resource allocation) based in individual preferences, once again we see implicit improvement dynamics. If a particular profile of voter preferences yields a specific electoral outcome, one can consider a voter announcing a revised (and altered) preference to force a different outcome. Two agents might exchange their allocated goods to move to a new allocation, if they perceive advantage in doing so. Again, these can be seen as offers and counter-offers, perhaps leading to an equilibrium, or not. Some of these situations involve individual improvements, some (like pairs of agents swapping goods) involve coalitions, but they have the same underlying computational structure. In this paper, we suggest that monadic fixed-point logic (with counting) is a suitable language for reasoning about this computational structure

Reasoning about Social Choice and Games in Monadic Fixed-Point Logic

249

underlying games and social choice contexts. This is an extension of first order logic with monadic least fixed-point operators and counting. In this, we follow the spirit of descriptive complexity, where extensions of first order logics describe complexity classes. Formulas offer concise descriptions of reasoning embedded in improvement dynamics. Why bother? When we have a common language across contexts, we can employ a form of reasoning common in one (say normal form games) in another (say fair resource allocations) and thus transfer results and techniques. We show that the idea of improvement under swaps corresponds to certain form of strong equilibria and coalitional improvement in games. Dynamics in iterated voting again correponds to improvement dynamics in games. In such cases when the structures studied possess interesting properties such as the finite improvement property or weak acyclicity we get certificates of existence of equilibria. Interesting subclasses of games (such as potential games) possess such properties and by “transfer” we can look for similar subclasses in social choice contexts, and vice versa. The choice of monadic fixed-point logic is also motivated by the fact that it admits an efficient model checking algorithm. Monadic least fixed point operator, iterating over subsets of strategy profiles, suffices for improvement dynamics. Counting can help us constrain paths succinctly: though counting is first order expressible, such expression would be prohibitively long. Thus the contribution of this paper is modest and simple. The reasoning discussed is familiar, that of improvement dynamics in normal form games, and expressing this in monadic fixed-point logic with counting. In the process, we can study the same properties in different contexts, such as normal form games, fair allocations and electoral systems. We also present a model checking algorithm for the logic.

Logic and Game Theory Various logical formalisms have been used in the literature to reason about games and strategies. Action indexed modal logics have often been used to analyse finite extensive form games where the game representation is interpreted as models of the logical language [8, 5, 6]. A dynamic logic framework can then be used to describe games and strategies in a compositional manner [33, 19, 34] and encode existence of equilibrium strategies [22]. Alternating temporal logic (ATL) [1] and its variants [23, 41, 12] constitute a popular framework to reason about strategic ability in

250

Use of Abstraction and Logic in Mathematics

games, especially infinite game structure defined by unfoldings of finite graphs. These formalism are useful to analyse strategic ability in terms of existence of strategies satisfying certain properties (for example, winning strategies and equilibrium strategies). Some of the above logical formalism are also able to make assertions about partial specifications that strategies have to conform to in order to constitute a stable outcome. In this work we suggest a framework to reason about the dynamics involved in iteratively updating strategies and to analyse the resulting convergence properties. [7] consider dynamics in reasoning about games in the same spirit as ours and describe it in fixed-point logic. But crucially, the dynamics is on iterated announcements of players’ rationality, and belief revision in response to it. Moreover, they discuss extensive form games rather than normal form games. However, they do advocate the use of the fixedpoint extension of first order logic for reasoning about games. Monadic least fixed point logic (MLFP) is an extension of first-order logic which is well studied in finite model theory [38]. It is a restriction of first order logic with least fixed point in which only unary relation variables are allowed. MLFP is an expressive logic for which, on finite relational structures, model checking can be solved efficiently [15]. It is also known that MLFP is expressive enough to describe various interesting properties of games on finite graphs. MLFP can also naturally describe transitive closure of a binary relation which makes it an ideal logical framework to analyse the dynamics involved in updating strategies and its convergence properties. When α is a formula with one first order free variable,Cxα ≤ k asserts that the number of elements in the domain satisfying α is at most k. Clearly, this is expressible in first order logic with equality, but at the expense of succinctness. In the literature on first order logic with arithmetical predicates [37], it is customary to consider two sorted structures to distinguish between domain elements and the counts, but since our domain elements are always profiles, there is no need for such caution. It well known that a variety of contexts in the mathematical social sciences can be formulated in terms of improvement dynamics leading to equilibria (of some kind). Our observation here is that the deployment of the MLFPC logic can help to unify algorithmic techniques across these contexts. Rather than devise an algorithm for each problem of this kind, definability in MLFPC can at once give a uniform algorithm, which could then be finetuned. Admittedly when we present contexts as diverse as normal form games, allocations in social choice theory or voting rules, all in one uniform

Reasoning about Social Choice and Games in Monadic Fixed-Point Logic

251

framework, we only get a broad-strokes description of the models, and the literature on these contexts vary widely in details. We hope to convince the reader that a priori, the MLFPC has sufficient expressiveness to capture interesting variations. Our hope is to delineate the logical resources needed to express the variations, but that will require more work ahead.

THE IMPROVEMENT GRAPH STRUCTURE Improvement dynamics is a natural notion to study in the context of any situation involving strategic interaction of agents. In this section we formalise this dynamics in terms of the data structure called improvement graphs. We consider three specific application domains: strategic form games, voting theory and allocation of indivisible items. We show how improvement graphs can be interpreted in these applications and argue that the analysis of the structure acts as the basis for reasoning about strategic interaction. Let [n] = {1,...,n} denote the set of n agents. Each agent is associated with a finite set of choices Si . A profile of choices (one for each agent) induces an outcome in the strategic interaction. Let S denote the set of all choice profiles, O denote the set of all outcomes and s(O) denote the outcome associated with the profile s ∈ S. Each agent i ∈ [n] is associated

with a preference ordering over the outcome set: . This ordering induces a preference ordering over profiles as follows: for s,s’ ∈ S

and . For a choice profile s = (s1,...,sn), we use the standard notation s−i to denote the n−1 tuple arising from s in which the choice of agent i is removed. The associated improvement graph is the directed graph G = (V,E) where V = S and E ⊆V ×[n]×V. We will denote the triple (s,i,s’ ) ∈ E by s →i s’ . The edge relation E satisfies the condition: for i ∈ [n] and s,s’ ∈ S, we have

. An improvement path in G is a maximal sequence of profiles s s ··· such that for every j > 0 there is a player kj such that s j →k j s j+1 . Note that here we use deviation by a single player to define the improvement graph. We could easily extend the definition to deviation by a subset of players, this interpretation might be more relevant in certain domains. 1

Strategic Form Games

2

252

Use of Abstraction and Logic in Mathematics

A strategic form game is given by the tuple where the set of strategies Si for agent i ∈ [n] can be viewed as its set of choices. For S = S1 × ··· × Sn, the function λ : S → O associates an outcome to every strategy profile. In this paper, we consider only finite strategic form games. The notion of best response and Nash equilibrium are standard: si is best response to s−i if for all ; s is a Nash equilibrium if for all i ∈ [n], si is best response to s−i . Existence of Nash equilibrium and computation of an equilibrium profile (when it exists) are important questions in the context of strategic form games. Given a strategic form game T, let GT denote the improvement graph associated with T (as defined above). Improvement paths in GT correspond to maximal sequence of strategy profiles that arise by allowing players to make unilateral profitable deviations that result in improving their choice according to their preference ordering. We say that a game has the finite improvement property (FIP) if every improvement path in GT is finite [31]. In an improvement path, if each kj edge in the sequence is the best response of agent kj to then is called a best response improvement path. We can analogously define the finite best response property (FBRP) if every best response improvement path is finite. FIP not only guarantees the existence of Nash equilibrium, but also ensures the stronger property that a decentralised local search mechanism convergences to a equilibrium outcome. Various natural classes of resource allocation games like congestion games [36], fair cost sharing games and restrictions of polymatrix games [2] are known to have the FIP. A weakening of FIP was proposed by Young [42] which insists on the existence of a finite improvement path starting from any initial strategy profile. Classes of strategic form games that satisfy this property are called weakly acyclic games. Note that weak acyclicity ensures that a randomised local search procedure almost surely convergence to an equilibrium outcome [28]. Examples of classes of games which has this property include congestion games with player specific payoff functions [30], certain internet routing games [16] and network creation games [25]. As we can see, the improvement graph presents a data structure for analysing normal form games. It captures the epistemic reasoning underlying player choices: if I were to consider a particular profile of choices by all of us, I would rather choose another strategy to improve by my payoff; in that

Reasoning about Social Choice and Games in Monadic Fixed-Point Logic

253

case, agent j would revise her choice; and so on, unless we reach a profile from where none of us has any reason to deviate. Such reasoning is closely related to pre-play negotiations studied by game theorists.

Voting Systems Consider an electorate consisting of a set [n] = {1,...,n} of n voters and a set C of m candidates. Let be a voting rule that considers the preference of each voter over the candidates and chooses a subset of winning candidates of size k (since k among m candidates have to be elected). The strategy . The sets for all voters are the same . The voting rule specifies which outcome set is candidates win given the complete preferences of all voters. We assume that each voter i has a preference ordering ≺i over the outcome set O. Thus the voting system can be given by the tuple

.

The improvement graph GL associated with L is as before: GE = (V,E) where V = Sn , the set of strategy profiles of voters; E ⊆ (V ×[n]×V) is the improvement relation for voter i, given by: s →i s’ if .

Voting equilibria have been studied by Myerson and Weber [32]. In general, one speaks of the bandwagon effect in an election if voters become more inclined to vote for a given candidate as her standing in pre-election polls improve, or the underdog effect, if voters become less inclined to vote for a candidate as her standing improves. Myerson and Weber suggest that equilibrium arises when the voters, acting in accordance with both their preferences for the candidates and their perceptions of the relative chances of candidates in contention for victory, generate an election result that justifies their perceptions. Note that the improvement path again gives us the possibility of ‘interaction’ arising from voter preferences, and we can analyse this in the context of specific voting rules. Given that agents may have incentive to strategically misreport their preferences, it is natural to study the convergence dynamics when voting is modelled as a game. Iterative voting [26, 35, 29] is a formalism that is useful to analyse the strategic dynamics when at each turn a voter is allowed to alter her vote based on the current outcome until it converges to an outcome from which no voter wants to deviate. In general, the outcome of iterative voting may depend on the order of voters’ changes. Again, voters act myopically,

254

Use of Abstraction and Logic in Mathematics

without knowing the others’ preferences. This dynamics is again reflected by the improvement path as discussed here and sink nodes correspond to Nash equilibria. Thus given a voting rule, it is natural to ask what equilibria are reachable from a given vote profile.

Allocation of Indivisible Goods An important problem often studied in economics and computer science is the allocation of resources among rational agents. This problem is fundamental and has practical implications in various applications including college admissions, organ exchange and spectrum assignment. In this paper, we consider the setting where there are [n] agents and a set A = {a1,...,am} of m indivisible items. An allocation . In the most general setting, each agent i has a preference ordering ≺i over the allocations. Thus an instance of an allocation problem can be specified as a tuple . Let Π denote the set of all allocations. In this setting, each allocation π can be viewed as defining an outcome and agents have a preference ordering over such outcomes. In a typical allocation problem, it is often assumed that the preference ordering for each agent i depends only on the bundle of items assigned to agent i. A special case of the above setting is when n = m (i.e. the number of agents and the items are the same) and π is required to be a bijection. An instance of such an allocation problem A along with an initial allocation π0 defines the well known ShapleyScarf housing market [39]. If agents are allowed to exchange items with each other, stability of allocation is a very natural notion to consider. Core stable outcomes are defined as allocations in which no group of agents have an incentive to exchange their items as part of an internal redistribution within the coalition. The improvement graph structure can capture the dynamics involved in such a sequence of item exchange in a natural manner. The associated improvement graph can be defined as GH = (V,E) where V = Π. Since the deviation involves exchange of goods among a subset of players (rather than a unilateral deviation by a single player), the edge relation is indexed with a subset u ⊆ [n]. That is, for π,π’ ∈ Π and u ⊆ [n], we have π(j) = π’ (j).

A finite path in the improvement graph corresponds to a finite sequence of exchanges that converge to a stable outcome. An important question is whether stable allocations always exist and whether a finite sequence of

Reasoning about Social Choice and Games in Monadic Fixed-Point Logic

255

exchanges can converge to such an allocation. For the housing market, it is known that a simple and efficient procedure often termed as Gales Top Trading Cycle, can compute an allocation that is core stable. The allocation constructed in this manner also satisfies desirable properties like strategy proofness and Pareto optimality. The question of whether decentralised swap dynamics converges to stable allocation has been studied in various related models of resource allocation. [14] analyses optimality in the setting where pairs of agents exchange items or services. The influence of neighbourhood structures in terms of item exchange and its influence on convergence to stable and optimal allocations is studied in [20, 13]. Exchange dynamics with restricted preferences orderings are considered in [40, 17]. Note that the improvement graph here is different from the ones we discussed earlier in a crucial sense. When agents in u swap goods, the allocation for other players outside [u] is unaffected. If each agent’s preference ordering depends only on the valuation of the bundle of items that the agent is allocated then their satisfaction is unchanged. However, agent 1 may swap goods with 2 and then use some the goods acquired to make a swap deal with 3 thus leading to interesting causal chains. In effect the entire space of allocations may be tentatively explored by the agents and the interesting question is whether they settle to an ‘equilibrium’. Again the improvement path offers interesting dynamics, and we can ask whether from any allocation, we can reach one where no agent wishes to make a swap. In the case where the preference orderings depend on externalities [10, 18] the situation is similar to what we observed in the context of strategic form games. Apart from stability, notions of fairness like envy-freeness, propotionality and maximin share guarantee are also well studied in the context of allocation of indivisible items [11, 9]. Analysis of the improvement graph is also useful in the context of fairness notions. Existence of a finite improvement path terminating in a fair allocation would indicate the possibility of convergence to a fair allocation in terms of an exchange dynamics.

Remark We have set up an improvement graph from three different models used in the mathematical social sciences. This is of course so that a uniform logical language can be used to specify properties of all these. One natural alternative to consider would be to translate all the models into one, games being the natural choice, and then induce the improvement graph over the

Use of Abstraction and Logic in Mathematics

256

defined model. This is certainly possible, but in general this can lead to an increase in the size of the graph. Moreover since we hope to use MLFPC not only to unify these contexts but also differentiate them (in terms of logical resources needed), such a reduction would not be helpful.

MONADIC FIXED-POINT LOGIC WITH COUNTING In this section we present the monadic fixed-point and counting extension of first order logic interpreted over improvement graphs. The use of the fixedpoint extension is motivated by the fact that we wish to express properties like acyclicity of the graph, which is not first-order expressible. As we will see below, we need the fixed-point quantifier to range only over collections of nodes, and hence monadic fixed-point quantifiers suffice for our purpose. The counting extension helps us to count nodes in a subgraph, or along a path; this helps us express notions like fairness of schedules, which is of relevance in specifying improvement dynamics. An alternative formalism would be the transitive closure extension of first order logic. But as Grohe has shown [21], monadic lfp logic is strictly less expressive than transitive closure logic, and hence we prefer a minimal extension of first order logic that serves our purposes. Note that the counting extension does not add expressivess but only succinctness. This is of use when we discuss concurrent deviation by a subset of players.

MLFPC Syntax Let σ be a first order relational vocabulary. Let monadic relation symbols, such that for each i, second order fixed-point variables of the logic.

be a sequence of . These are the

The set of all MLFPC formulas, ΦMLFPC, can be defined inductively as follows: •

Every FO-atomic formula α over σ ∪ {S1,S2,...} is an MLFPC formula. 1 fv (α) = the set of first order free variables in α. fv2 (α) = the set of all relation symbols Si occurring in α; fv(α) = fv1 (α) ∪ fv2 (α). •

If α,β are MLFPC formulas then so are, ∼ α, α ∧β and α ∨β.

Reasoning about Social Choice and Games in Monadic Fixed-Point Logic



If α is a MLFPC formula, x ∈ fv1 (α) and ∀xα and Cxα ≤ k.



If α is an MLFPC formula, Si ∈ fv2 (α), x ∈ fv1 (α), and

257

, then so are ∃xα,

is an MLFPC

and Si occurs positively in α, then formula.

The restriction to positive second order variables in the lfp operator is essential to provide an effective semantics to the logic. It is a standard way of ensuring monotonicity, given that we do not have an effective procedure to test whether a given first-order formula is monotone on the class of finite σ structures [27]. It should be noted that the use of positive second order variables in no way restricts us to contexts where equilibria are guaranteed to exist. Equilibria are given by graph properties, and these variables allow us to collect sets of vertices monotonically.

MLFPC Semantics To interpret formulas, we extend σ-structures with interpretations for the free first order and second order variables (the latter from the given sequence . Let A be a σ-structure, which has domain A. The semantics for the first order formulas of MLFPC are standard. The semantics for the count and the fixed point operator are given below.

where is

let, an

operator

such

that, .

The lfp quantifier induces an operator on the powerset of elements on the structure ordered by inclusion. The positivity restriction ensures that the operator is monotone and hence least fixed-points exist.

258

Use of Abstraction and Logic in Mathematics

Properties Since the models of interest are improvement graphs, first order variables range over nodes in the graph, monadic second order variables range over subsets of nodes and the vocabulary consists of binary relations Eu, where u ⊆ [n]. When |u| = 1 and u = {i}, we will simply write the relation as Ei . We write formulas of the form E(x, y) to denote

.

Properties Since the models of interest are improvement graphs, first order variables range over nodes in the graph, monadic second order variables range over subsets of nodes and the vocabulary consists of binary relations Eu, where u ⊆ [n]. When |u| = 1 and u = {i}, we will simply write the relation as Ei . We write formulas of the form E(x, y) to denote

.

We now write special formulas that will be of interest in the sequel.

• sink(x) = ∀y. ∼ E(x, y) • trap(S, x) = ∀y.(E(y, x) ⇒ S(y)) • acyclic = ∀u.[lfpS,x trap](u) • reach(S, x) = sink(x)∨ ∃y(E(y, x)∧S(y)) • weakly−acyclic = ∀u.[lfpS,x reach](u) Now consider the formulas interpreted over improvement graphs of normal form games. sink refers to the set of sink nodes, and these are exactly the Nash equilibria of the associated game. The sentence acyclic is true exactly when the improvement graph is acyclic and hence such games have the finite improvement property (as every improvement path is finite). To see that the sentence captures acyclicity, note the action of the lfp operator: at the zeroth iteration, we get all nodes with in-degree 0; we then get all nodes which have incoming edges from nodes whose in-degree is 0; and so on. Eventually it collects all nodes through which no path leads to a cycle. Since the sentence applies to every node, we infer that the graph does not contain any cycle. For weak acyclicity, we require that there exists a finite improvement path starting from every node. Again the lfp operator picks up sink nodes at the zeroth iteration, then all nodes that have a sink node as successor, and so on. Eventually it collects all nodes that start finite improvement paths. The sentence asserts that every node has this property.

Reasoning about Social Choice and Games in Monadic Fixed-Point Logic

259

Note that these formulas capture equilibria, FIP and weak acyclicity on improvement graphs per se, irrespective of whether they arise from normal form games, voting systems, or resource allocations. But there were some differences in the way these graphs were generated, and we turn our attention to these differences. In improvement graphs of resource allocation systems, we defined the edge relation to be s →u s’ , where . Note that an analogous definition in the case of normal form games leads us naturally to a concurrent setting, and we get what are called k-equilibria in coordination games. Let s and s 0 be strategy profiles in a normal form game, u ⊆ [n]. Define

s →u s’ when s[ j] = s’ [ j] for . That is, with the choices of the other agents fixed, the coalition of agents in u can coordinate their choices and deviate to get a better outcome. Thus k-equilibria are nodes from which no coalition of at most k agents can profitably deviate. We can then define a coalitional k improvement path, where at each step a coalition of at most k agents deviate, which leads us further to a k-FIP. [3] shows that a class of uniform coordination games has this property. • •

sinku(x) = ∀y. ∼ Eu(x, y)

sinkk(x) =

• k −edge(x, y) = • k −trap(S, x) = ∀y.(k −edge(y, x) =⇒ S(y)) • k −FIP = ∀u.[lfpS,x k −trap](u) Note that the disjunctions are large, exponential in k. Since we have a counting operator, we could add further structure to nodes, prising out the individual strategies of players and then use the counting quantifier over these to get a succinct formula, linear in k. Observe that these formulas apply to coalitions of k-agents performing swaps in resource allocation systems and we thus uniformly transfer notions from the concurrent setting of games to resource allocation systems as well. Proceeding further, we get a similar notion for voting systems, where again a subset of voters can get together and agree on revising their expressed preferences, thus leading us to coalitional improvement paths and coalitional FIP.

Use of Abstraction and Logic in Mathematics

260

In general, we might want to specify reachability of a set of distinguished nodes satisfying some property. For instance, in the context of allocation systems, we are interested in nodes that are envy free. An agent i envies an agent j at node x if there exists a node y such that and the allocation for i at y is the same as the allocation for j at x. A node is envy-free if no player envies another at that node. We might then want to assert that an envy free node is reachable from any node. Note that we only need to enrich the first order vocabulary to speak of x[i], y[ j] etc to express envy-freeness, and the lfp operator is sufficient to specify reachability of such nodes. • reachφ (S, x) = φ(x)∨ ∃y(E(y, x)∧S(y)) • φ −reachable = ∀u.[lfpS,x reach](u) In the context of voting systems, a similar strengthening of the syntax of atomic formulas can lead to amusing specification of reachability of singular profiles in which all voters have identical first and last preference but disagree on all candidates ranked by them in between. Further, the counting quantifier can give us interesting relaxations of improvement dynamics. For instance consider the following specifications: • reach(S, x) = sink(x)∨ ∃y(E(y, x)∧S(y)) • path−count = Cu([lfpS,x reach](u)) < 5 In the context of voting systems, a similar strengthening of the syntax of atomic formulas can lead to amusing specification of reachability of singular profiles in which all voters have identical first and last preference but disagree on all candidates ranked by them in between. Further, the counting quantifier can give us interesting relaxations of improvement dynamics. For instance consider the following specifications: • reach(S, x) = sink(x)∨ ∃y(E(y, x)∧S(y)) • path−count = Cu([lfpS,x reach](u)) < 5 This specifies that at most 5 nodes have finite improvement paths originating from them. Clearly, such specifications are of greater interest in voting systems than in the others. Here the lfp operator is in the scope of the counting quantifier. In the following specification, we have them the other way about. • •

count −trap(S, x) = CyE(y, x) < k =⇒ (∀z.(E(z, x) =⇒ S(z))) special = ∀u.[lfpS,x count −trap](u)

Reasoning about Social Choice and Games in Monadic Fixed-Point Logic

261

Consider the housing market model mentioned in the previous section. This framework corresponds to a resource allocation problem where a valid allocation corresponds to a bijection between agents and resources and each agent is initially assigned an item. Stability is an important solution concept in this setting. The dynamics in which agents start with the initial allocation and repeatedly exchange items provided it is a profitable move for all the agents involved in the exchange is quite natural. The obvious question is whether this process converges to a stable outcome. If we model the associated improvement dynamics graph as exchange between pairs of agents then each action corresponds to resolving a blocking pair. The sentence sink then refers to the existence of 2-stable outcomes and acyclic asserts that every sequence of blocking pair leads to a 2-stable outcome. If we interpret the improvement dynamics graph as capturing exchange of items within a coalition of agents then these sentences correspond to existence and convergence to core-stable outcome. For the housing allocation problem, the existence of core-stable outcome is guaranteed by the Top Trading Cycle algorithm. There are variants of the housing allocation problem where stable outcomes are not always guaranteed to exists (for instance, in the presence of externalities). In this case, by using the counting quantifier, we can make interesting assertions like the number of nodes that have finite improvement paths originating from them.

MODEL CHECKING ALGORITHM In this section we discuss the model checking problem for MLFPC. Given φ ∈ ΦMLFPC and a σ-structure A, the model checking problem is to check if

. We show that the logic admits an efficient model checking procedure (Algorithm 1). The main idea is to use the model checking algorithm for FO and modify it accordingly for the newly introduced quantifiers of the count and the least fixed point.

Use of Abstraction and Logic in Mathematics 262 R. Das, R. Ramanujam & S. Simon

Algorithm 1: for Checking of A |= φ Algorithm 1 for Checking of A |= φ Input : A, φ 1 Output : Sol ⊆ Afv (φ ) Subφ = { subformulas of φ } /* ordered via subformula ordering ≤s */ Q = {qα | α ∈ Subφ } for α ∈ Subφ

switch type of α i. case α is an atomic formula qα is computed by reading off the structure A ii. case α =∼ β qα is just the negation of qβ /*since β ≤s α, qβ is already computed */ iii. case α = β1 ∧ β2 Let P = fv1 (β1 ) ∩ fv1 (β2 ), Q = fv1 (β1 ) \ P, R = fv1 (β2 ) \ P for x ∈ A|P| ,y ∈ A|Q| ,z ∈ A|R| if qβ1 (x,y) = 1 and qβ2 (x,z) = 1 qα (x,y,z) = 1 else qα (x,y,z) = 0 iv. case α = ∃yβ 1 for x ∈ A|fv (α)| , a ∈ A if qβ (x, a) = 1 qα (x) = 1 v. case α = Cy β ≤ k Let count = 0 1 for x ∈ A|fv (α)| for a ∈ A if qβ (x, a) = 1 then count = count + 1 if count ≤ k then qα (x) = 1 count = 0 vi. case α = [lfpSi ,x β ](u) lfp(α) /*subroutine call */

Algorithm 2 : lfp(α) - Subroutine call for computing the least fixed point Input : α = [lfpSi ,x β ](u) Output : qα Let fv(β ) = {x,y, Si ,Y } , fv(α) = {u,y,Y } for a ∈ A|y| iter = {}, fβa = {}

do

115

Reasoning about Social Choice and Games in Monadic Fixed-Point Logic

263

Algorithm 2: lfp(α) - Subroutine call for computing the least fixed point

Theorem 1. Given φ and A, Algorithm 1 decides if time.

in polynomial

Proof. Given ϕ ∈ ΦMLFPC, let Subϕ denote the set of subformulas of ϕ and ≤s denote the corresponding subformula ordering. The idea follows the algorithms outlined for model checking procedures in FO and combines with what is known about the least fixed point. We maintain a polynomial time computable relational list Q of polynomial size. We will basically follow the proof of the first order logic and argue similarly for the count and least fixed point operators introduced. Let A be the underlying domain of

. For

each otherwise. Induction Hypothesis. For all α ∈ Subφ , Qα is of size |A| O(1) and can be computed in time |A| O(1) . Base Case. When α is an atomic formula, then Qα can be directly

are polynomial computed from . Since each of the relations defined in in the size of A and it would take a linear pass across the relations expressed to compute Qα. Thus, we can conclude that the total time taken is also polynomial in the size of A.

Use of Abstraction and Logic in Mathematics

264

Induction Case •

α =∼ β By, I.H, since β ≤s α, we would have already computed which can be computed by an









algorithm running in time . α = β1 ∧β2 By, I.H. we have in polynomial time been able to . Then the procedure maintain the polynomial sized lists

outlined in the algorithm would take time and maintains a list of similar size. α = ∃xβ By, I.H. we have in polynomial time been able to maintain the polynomial sized lists Qβ . The algorithm outlined would take

which would also happen to be the size of the list maintained by Qβ. α =Cxβ ≤ k By, I.H. we have in polynomial time been able to maintain the polynomial sized lists Qβ . The outlined procedure computes Qα in time α = [lfpSi

,x

. β](u)

First

we

note,

. What essentially happens here is that the second order variable Si gets used to generate an inductive relation via the fixed point computation and the newly introduced first order variable u is utilised to check validity of these formulas. So, unfortunately we cannot directly use Qβ to give a polynomial time procedure to generate Qα. We would rather use the fact that the least fixed point computation is a polynomial time computation even in the case of the first order logic with the count introduced. If we look at the inductive procedure to compute the fixed point we see that the monadic fixed point operator starts at an empty set and then converges to a subset of A. And at each stage there is an increment in the number of elements of the output set by at least 1. Therefore this will run in maximum O(|A|) time. Now for the call to model checking at each stage we notice that since the formulas get fixed values assigned at each model checking call, we actually operate a FO-model checking call, which we know happen to take O(A |β| ) time. We have a precomputation of Qβ , which we can additionally make use of to reduce the time in the following manner.

Reasoning about Social Choice and Games in Monadic Fixed-Point Logic

265

For each choice of b ∈ A the entire instance gets fixed and in thus a linear pass through |β| we would get to know whether the choice of b satisfies the formula or not. Therefore each stage that does the FO-model checking would take time at most O(|β|). Therefore total time taken by the entire lfp computation part is . We are interested in the data complexity of our procedure, where we can ignore the size of the formula, which typically happens to be of lesser size than the model over which the model checking procedure is held (reflects the practical circumstances). So we can conclude that our procedure runs in time

.

Complexity of Algorithm 1. The algorithm iterates over all subformulas in increasing order. The worst case running time of the lfp procedure for inputs

, where . Thus the running time of

Algorithm 1 is . For the specific properties mentioned in section 3.3, note that the corresponding MLFPC formulas refers to one second order variable and two first order variables. Thus for all the properties mentioned in section 3.3, the model checking procedure runs in time

.

In the context of improvement graphs, if there are n agents and at most m choices for each agent, the size of the associated improvement graph is O(mn ). Since it is possible to have a compact representation for certain subclasses of strategic form games, for instance, polymatrix games [24], the size of the improvement graph structure can be exponential in the representation of the game. Thus the model checking procedure, while polynomial on the size of the underlying improvement graph, can in principle, be exponential in the size of the game representation. This observation may not be very surprising since even for restricted classes of games like 0/1 polymatrix games, checking for the existence of Nash equilibrium is known to be NPcomplete [4].

DISCUSSION We see this paper as a preliminary investigation, hopefully leading to a descriptive complexity theoretic study of fundamental notions in games and interaction. It is clear that fixed-point computations underlie the reasoning in a wide variety of such contexts, and logics with least fixed-point operators are

266

Use of Abstraction and Logic in Mathematics

natural vehicles of such reasoning. We expect that this is a minimal language for improvement dynamics, but with further vocabulary restrictions that need to be worked out. Proceeding further, we would like to delineate bounds on the use of logical resources for game theoretic reasoning. For instance, one natural question is the characterization of equilibrium dynamics definable with at most one second order (fixedpoint) variable. Expressiveness needs to be sharpened from the perspective of models as well. We would like to characterize the class of improvement graphs for different subclasses of games, resource allocation systems and voting rues, considering the wide variety of details in the literature. This would in general necessaite enriching the logical language and we wish to consider minimal extensions. Another important issue is the identification of subclasses that avoid the navigation of huge improvement graphs. Potential games provide an interesting subclass and they correspond to some appropriate allocation rules and forms of voting (under specific election rules). But these are only specific exemplifying instances, studying the stucture of formulas and their models will (hopefully) lead us to many such correspondences. An important direction is the study of infinite strategy spaces. Clearly the model checking algorithm needs a finite presentation of the input but this is possible and it is then interesting to explore convergence of fixed-point iterations.

ACKNOWLEDGEMENTS We thank the reviewers for their insightful comments. We thank Anup Basil Mathew for discussions on improvement dynamics. Sunil Simon was partially supported by grant MTR/ 2018/ 001244.

Reasoning about Social Choice and Games in Monadic Fixed-Point Logic

267

REFERENCES 1.

R. Alur, T. A. Henzinger & O. Kupferman (2002): AlternatingTime Temporal Logic. Journal of the ACM 49, pp. 672–713, doi:10.1145/585265.585270. 2. Krzysztof R Apt, Bart de Keijzer, Mona Rahn, Guido Schafer & Sunil Simon (2017): ¨ Coordination games on graphs. International Journal of Game Theory 46(3), p. 851877, doi:10.1007/s00182-016-0560-8. 3. Krzysztof R Apt & Sunil Simon (2015): A classification of weakly acyclic games. Theory and Decision 78(4), pp. 501–524, doi:10.1007/978-3642-33996-7 1. 4. Krzysztof R. Apt, Sunil Simon & Dominik Wojtczak (2015): Coordination Games on Directed Graphs. In: Proc. of 15th International Conference on Theoretical Aspects of Rationality and Knowledge, doi:10.4204/EPTCS.215.6. 5. J. van Benthem (2001): Games in dynamic epistemic logic. Bulletin of Economic Research 53(4), pp. 219– 248, doi:10.1111/14678586.00133. 6. J. van Benthem (2002): Extensive games as process models. Journal of Logic Language and Information 11, pp. 289–313, doi:10.1023/A:1015534111901. 7. Johan van Benthem & Amelie Gheerbrant (2010): ´ Game Solution, Epistemic Dynamics and Fixed-Point Logics. Fundamenta Informaticae 100(1-4), pp. 19–41, doi:10.3233/FI-2010-261. 8. G. Bonanno (2001): Branching Time Logic, Perfect Information Games and Backward Induction. Games and Economic Behavior 36(1), pp. 57–73, doi:10.1006/game.1999.0812. 9. Sylvain Bouveret, Yann Chevaleyre & Nicolas Maudet (2016): Fair Allocation of Indivisible Goods, chapter 12. Handbook of Computational Social Choice, Cambridge University Press. 10. Simina Branzei, Ariel D. Procaccia & Jie Zhang (2013): Externalities in cake cutting. In: Proceedings of the 23rd IJCAI, pp. 55–61. 11. E. Budish (2011): The combinatorial assignment problem: Approximate competitive equilibrium from equal incomes. Journal of Political Economy 119(6), pp. 1061–1103, doi:10.1086/664613. 12. Krishnendu Chatterjee, Thomas A. Henzinger & Nir Piterman (2010): Strategy logic. Information and Computation 208(6), pp. 677–693, doi:10.1016/j.ic.2009.07.004.

268

Use of Abstraction and Logic in Mathematics

13. Y. Chevaleyre, U. Endriss & N. Maudet (2017): Distributed Fair Allocation of Indivisible Goods. Artificial Intelligence 242, pp. 1–22, doi:10.1016/j.artint.2016.09.005. 14. Anastasia Damamme, Aurelie Beynier, Yann Chevaleyre & Nicolas Maudet (2015): ´ The power of swap deals in distributed resource allocation. In: Proceedings of 14th International Conference on Autonomous Agents and Multiagent Systems, pp. 625–633. 15. H. Ebbinghaus & J. Flum (1999): Finite Model Theory. Springer, doi:10.1007/3-540-28788-4. 16. R. Engelberg & M. Schapira (2011): Weakly-Acyclic (Internet) Routing Games. In: Proc. 4th International Symposium on Algorithmic Game Theory (SAGT11), Lecture Notes in Computer Science 6982, Springer, pp. 290–301, doi:10.1007/978-3-642-24829-0 26. 17. Etsushi Fujita, Julien Lesca, Akihisa Sonoda, Taiki Todo & Makoto Yokoo (2015): A Complexity Approach for Core-Selecting Exchange with Multiple Indivisible Goods under Lexicographic Preferences. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence, pp. 907–913, doi:10.1613/jair.1.11254. 18. M. Ghodsi, H. Saleh & M. Seddighin (2018): Fair Allocation of Indivisible Items With Externalities. CoRR abs/1805.06191. Available at http://arxiv.org/abs/1805.06191. 19. V. Goranko (2003): The Basic Algebra of Game Equivalences. Studia Logica 75(2), pp. 221–238, doi:10.1023/A:1027311011342. 20. Laurent Gourves, Julien Lesca & Anaelle Wilczynski (2017): ¨ Object allocation via swaps along a social network. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI17), pp. 213–219, doi:10.24963/ijcai.2017/31. 21. Martin Grohe (1994): The Structure of Fixed Point Logics. AlbertLudwigs-Universitat. 22. P. Harrenstein, W. van der Hoek, J.J. Meyer & C. Witteven (2003): A Modal Characterization of Nash Equilibrium. Fundamenta Informaticae 57(2-4), pp. 281–321. 23. W. van der Hoek, W. Jamroga & M. Wooldridge (2005): A Logic for Strategic Reasoning. Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multi-Agent Systems, pp. 157–164, doi:10.1145/1082473.1082497.

Reasoning about Social Choice and Games in Monadic Fixed-Point Logic

269

24. E.B. Janovskaya (1968): Equilibrium Points in Polymatrix Games. Litovskii Matematicheskii Sbornik 8, pp. 381–384. 25. Bernd Kawald & Pascal Lenzner (2013): On Dynamics in Selfish Network Creation. In: Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures, ACM, pp. 83 – 92, doi:10.1145/2486159.2486185. 26. O. Lev & J. S. Rosenschei (2012): Convergence of iterative voting. In: Proceedings of AAMAS-2012, pp. 611–618. 27. Leonid Libkin (2013): Elements of finite model theory. Springer, doi:10.1007/978-3-662-07003-1. 28. J.R. Marden, G. Arslan & J.S. Shamma (2007): Regret based dynamics: convergence in weakly acyclic games. In: Proceedings of the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2007), IFAAMAS, pp. 194–201, doi:10.1145/1329125.1329175. 29. Reshef Meir, Maria Polukarov, Jeffrey S.Rosenschein & Nicholas R.Jennings (2017): Iterative voting and acyclic games. Artificial Intelligence 252, pp. 100–122, doi:10.1016/j.artint.2017.08.002. 30. I. Milchtaich (1996): Congestion Games with Player-Specific Payoff Functions. Games and Economic Behaviour 13, pp. 111–124, doi:10.1006/game.1996.0027. 31. D. Monderer & L. S. Shapley (1996): Potential Games. Games and Economic Behaviour 14, pp. 124–143, doi:10.1006/game.1996.0044. 32. Roger B. Myerson & Robert J. Weber (1993): A Theory of Voting Equilibria. The American Political Science Review 87(1), pp. 102– 114, doi:10.2307/2938959. 33. R. Parikh (1985): The logic of games and its applications. Annals of Discrete Mathematics 24, pp. 111–140, doi:10.1016/S03040208(08)73078-0. 34. R. Ramanujam & S. Simon (2008): Dynamic logic on games with structured strategies. In: Proceedings of the 11th International Conference on Principles of Knowledge Representation and Reasoning (KR-08), AAAI Press, pp. 49–58. 35. A. Reijngoud & U. Endriss (2012): Voter response to iterated poll information. In: Proceedings of AAMAS2012, pp. 635–644.

270

Use of Abstraction and Logic in Mathematics

36. R. W. Rosenthal (1973): A Class of Games Possessing Pure-Strategy Nash Equilibria. International Journal of Game Theory 2(1), pp. 65– 67, doi:10.1007/BF01737559. 37. Nicole Schweikardt (2005): Arithmetic, first-order logic, and counting quantifiers. ACM Transactions on Computational Logic 6(3), pp. 634– 671, doi:10.1145/1071596.1071602. 38. Nicole Schweikardt (2006): On the expressive power of monadic least fixed point logic. Theoretical Computer Science 350, p. 325344, doi:10.1016/j.tcs.2005.10.025. 39. L. S. Shapley & H. Scarf (1974): On cores and indivisibility. Journal of Mathematical Economics 1(1), pp. 23–37, doi:10.1016/03044068(74)90033-0. 40. Zhaohong Sun, Hideaki Hata, Taiki Todo & Makoto Yokoo (2015): Exchange of Indivisible Objects with Asymmetry. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence, pp. 97–103. 41. D. Walther, W. van der Hoek & M. Wooldridge (2007): Alternatingtime Temporal Logic with Explicit Strategies. In: Proceedings of the 11th Conference on Theoretical Aspects of Rationality and Knowledge (TARK2007), pp. 269–278, doi:10.1145/1324249.1324285. 42. H. Peyton Young (1993): The evolution of conventions. Econometrica 61(1), pp. 57–84, doi:10.2307/2951778.

Chapter

FORMAL ANALYSIS OF 2D IMAGE PROCESSING FILTERS USING HIGHERORDER LOGIC THEOREM PROVING

13

Adnan Rashid1 , Sa’ed Abed2, and Osman Hasan1 School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), Islamabad, Pakistan 2 Computer Engineering Department, College of Engineering and Petroleum, Kuwait University, Kuwait City, Kuwait 1

ABSTRACT Two-dimensional (2D) image processing systems are concerned with the processing of the images represented as 2D arrays and are widely used in medicine, transportation and many other autonomous systems. The dynamics of these systems are generally modeled using 2D difference equations, which are mathematically analyzed using the 2D z-transform. It mainly involves a transformation of the difference equations-based models of these systems to their corresponding algebraic equations, mapping the 2D arrays (2D discrete-time signals) over the (z1,z2)-domain. Finally, these (z1,z2)-domain Citation: (APA): Rashid, A., Abed, S. E., & Hasan, O. (2022). Formal analysis of 2D image processing filters using higher-order logic theorem proving. EURASIP Journal on Advances in Signal Processing, 2022(1), 1-18. (18 pages). Copyright: © Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/).

272

Use of Abstraction and Logic in Mathematics

representations are used to analyze various properties of these systems, such as transfer function and stability. Conventional techniques, such as paperand-pencil proof methods, and computer-based simulation techniques for analyzing these filters cannot assert the accuracy of the analysis due to their inherent limitations like human error proneness, limited computational resources and approximations of the mathematical expressions and results. In this paper, as a complimentary technique, we propose to use formal methods, higher-order logic (HOL) theorem proving, for formally analyzing the image processing filters. These methods can overcome the limitations of the conventional techniques and thus ascertain the accuracy of the analysis. In particular, we formalize the 2D z-transform based on the multivariate theories of calculus using the HOL Light theorem prover. Moreover, we formally analyze a generic (L1,L2)-order 2D infinite impulse response image processing filter. We illustrate the practical effectiveness of our proposed approach by formally analyzing a second-order image processing filter.

INTRODUCTION Two-dimensional (2D) image processing systems [1, 2] typically involve image filtering, editing, enhancement, compression and restoration of the images represented as 2D arrays (2D discrete-time signals). Image processing filters [2] are the fundamental components of the 2D image processing systems that are widely used for image filtering. These filters are categorized as high-pass, band-pass and low-pass filters based on the passage of the allowable range of frequencies. For example, a high-pass filter permits a range of frequencies greater than a certain threshold. Moreover, these filters are widely used in autonomous vehicles [3, 4] and medicine [5]. For example, they are used to perform various image processing tasks for controlling the autonomous vehicles, such as noise reduction, color normalization, histogram equalization and edge detection, to enhance the quality of the images captured using various devices such as closed-circuit television (CCTV) and webcams [6]. Similarly, they are widely used in medicine for performing various image pre- and post-processing tasks, such as image quality enhancement, noise removal and image smoothing [5]. The dynamics of these image processing systems are generally modeled using 2D difference equations. Next, the 2D z-transform is used to mathematically analyze these systems. It mainly involves a transformation of the difference equations-based models of these systems to their corresponding algebraic equations, using the definition and various classical

Formal Analysis of 2D Image Processing Filters using Higher-order ...

273

properties of the 2D z-transform, while mapping 2D arrays over the (z1,z2)domain. Finally, these (z1,z2)-domain representations are used to analyze various properties of these image processing systems like transfer function and stability [2]. Conventionally, the image processing filters have been analyzed using paper-and-pencil proof techniques and computer-based symbolic and numerical methods. However, in the former case, the analysis is error-prone due to the highly involved human manipulation, particularly for analyzing the larger and complex image processing systems, and thus we cannot ascertain an absolute accuracy of the analysis in this approach. Similarly, the later approaches suffer from some of their inherent limitations. For example, the symbolic methods involve a large number of unverified symbolic procedures residing in the root of the associated tools [7]. Similarly, the numerical techniques include a finite number of iterations due to the limited power of the computing machines. Moreover, they are based on the mathematical results that are approximated due to the finite precision arithmetic of computers. Therefore, these conventional approaches cannot be trusted when analyzing the image processing filters utilized in various safety-critical areas, such as autonomous driving and medicine. Formal methods [8] are system analysis techniques that are based on developing a mathematical model of the system using logic and verifying its various properties using deductive reasoning. Higher-order logic (HOL) theorem proving [9, 10] is a widely utilized formal method for analyzing many safety-critical systems. In this paper, we propose a HOL theorem proving-based framework for analyzing the image processing filters. In particular, we formalize the 2D z-transform based on the multivariate theories of calculus using the HOL Light theorem prover [11]. The main motivation of selecting HOL Light for the proposed formalization is the presence of the fundamental libraries of multivariate calculus,Footnote1 vectorsFootnote2 and matrices,Footnote3 which are required to formally analyze the 2D image processing systems.

CONTRIBUTIONS OF THE PAPER The novel contributions of the paper are: •

Formalization of 2D z-transform and its region of convergence (ROC).

Use of Abstraction and Logic in Mathematics

274



• •

Formal verification of various classical properties of 2D z-transform, such as linearity, shifting in time-domain, scaling in (z1, z2)-domain and complex conjugation. Formal analysis of a generic (L1,L2)-order 2D IIR image processing filter. Formal analysis of a second-order image processing filter

PRELIMINARIES This section provides an introduction to the HOL Light theorem prover and the formalization of some fundamental concepts from the multivariate calculus libraries of HOL Light that facilitate the understanding of the rest of the paper.

HOL Light Theorem Prover HOL Light[12] is a proof assistant for developing proofs of the mathematical concepts written as theorems in higher-order logic. HOL Light is implemented in the strongly typed functional programming language ML [13]. A theorem is a statement that is formalized as an axiom or can be implied from already verified theorems using inference rules. Soundness is assured in a theorem proving environment as every new theorem is verified using the primitive inference rules or any other previously verified theorems. HOL Light provides an extensive support of theories, such as Boolean algebra, arithmetic, real numbers, vectors and matrices, which are extensively used in our formalization. Indeed, one of the motivations for selecting the HOL Light theorem prover for the proposed framework is the availability of extensive libraries of vectors and matrices.

Multivariable Calculus Theories in HOL Light This section presents an introduction to some fundamental concepts formalized in HOL Light, such as summability, infinite summation and vector summation, and some HOL Light notations that help understanding the rest of the paper. An N-dimensional vector in HOL Light is formalized as a column matrix capturing individual elements as real numbers. All vector operations are then considered as matrix manipulations. Most of the theorems in multivariable calculus theories of HOL Light are proved for functions

Formal Analysis of 2D Image Processing Filters using Higher-order ...

275

with an arbitrary data type of . Similarly, complex numbers can be described as instead of defining a new data type. The HOL Light symbol &: represents an injection of natural numbers to real numbers. Similarly, the symbol Cx: typecasts real numbers to complex numbers. The symbols Re: and Im: represent the real and imaginary components of a complex number, respectively. The HOL Light symbol % captures the scalar multiplication of a vector or matrix. Similarly, a matrix–vector multiplication is modeled as ∗∗ in HOL Light. The generalized summation over an arbitrary function fn: formalized in HOL Light as follows:

is

Definition 1 Generalized Summation of Vector summ st (λ x. fn x$k))

st fn. vecsum st fn = (lambda k.

where vecsum accepts a set st: A → bool over which the summation occurs and a function fn of data type and returns a generalized vector summation over the set st. Here, the HOL Light function summ provides a finite summation for a fn over real numbers. For example, a mathematical expression

is described in HOL Light as vecsum (0..n) fn.

Definition 2 Summs st fn lt. (fn summs lt) st ⇔ (0..n)) fn) → lt) sequentially

((λ n. vecsum (st INTER

The HOL Light function summs accepts a set of natural numbers st: N

→ bool, a function fn:

and a limit value lt:

and returns the

. Here, INTER captures the traditional mathematical expression intersection of two sets. Similarly, sequentially represents a net providing a sequential growth of a function f, i.e., f(k),f(k+1),f(k+2),…, etc. This is mainly used in modeling the concept of an infinite summation. We provide the formalization of the summability of a function fn: over st: N → bool, which ensures that there exist some limit value L: such that

in HOL Light as:

,

276

Use of Abstraction and Logic in Mathematics

Definition 3 Summability of a Function lt) st) The limit of a function fn:

fn st. summable fn st ⇔ (∃ lt. (fn summs is formalized as:

Definition 4 Limit of a Function

net f. limt net fn = (∈ lt. (fn → lt) net)

where the function limt takes a net with components of data type A and a function fn and returns a limit value lt: to which fn converges at the given net. It is formalized using the Hilbert choice operator ∈. Similarly, the concept tends to (→) is formalized in HOL Light as:

Definition 5 Tends to fn lt net. (fn → lt) net ⇔ (λ x. dist (fn x, lt) < e) net

∀e. &0 < e ⇒ eventually

Now, we provide a formalization of an infinite summation, which is used in the formal definition of the 2D z-transform presented in Sect. 5.1.

Definition 6 Infinite Summation of a Function summs lt) st)

fn st. inftsumm st fn = (∈ lt. (fn

where the HOL Light function inftsumm accepts st: num → bool specifying , and returns a limit the starting point and a function fn of data type value lt: given st.

to which the infinite summation of fn converges from the

Next, we formally verify an equivalence of the infinite summation (Definition 6) to its alternate form in terms of sequential limit as the following HOL Light theorem:

Theorem 1 Relationship Between Infinite Summation and the Sequential Limit

Formal Analysis of 2D Image Processing Filters using Higher-order ...

277

st fn. inftsumm st fn = limt sequentially (λ k. vecsum (st INTER (0..k)) fn)

METHODS Figure 1 depicts our proposed method for analyzing the image processing filters using HOL theorem proving. The user provides the 2D difference equations that model the dynamics of the image processing system, which needs to be analyzed. This 2D difference equation is modeled in higherorder logic using the multivariate calculus theories of HOL Light. In the next step, we formalize the 2D z-transform that is required for mathematically analyzing the image processing systems. It mainly transforms the difference equations-based models of these systems to their corresponding algebraic equations, using the definition and various classical properties, such as linearity, shifting and scaling, of the 2D z-transform, while mapping 2D arrays over the (z1,z2)-domain. Finally, these (z1,z2)-domain representations are used to analyze various properties of these systems, such as transfer function and the solution of the corresponding difference equations.

Figure 1: Proposed framework.

RESULTS

278

Use of Abstraction and Logic in Mathematics

Formalization of the 2D z-transform The 2D z-transform of a 2D discrete-time function (2D array) f(n1,n2) is mathematically expressed as follows [2]: (1) where f is a function of type , and z1 and z2 are complex variables. The limits from 0 to ∞ make Eq. (1) as a mathematical representation of a unilateral 2D z-transform. We have opted for this representation based on the same motivation that was considered for one-dimensional z-transform [14] and the Laplace transform [15]. We formalize the 2D z-transform [Eq. (1)] in HOL Light as follows:

Definition 7 2D z-Transform f z1 z2. z_transform_2d f z1 z2 = inftsumm (from 0) (λ n1. inftsumm (from 0) (λ n2. f n1 n2 / (z1 pow n1 ∗∗ z2 pow n2)))

where z_transform_2d accepts a function of type and two complex variables z1: and z2: and returns a complex number, which according to Eq. (1). represents the 2D z-transform of f: An essential issue with the applicability of the 2D z-transform of f(n1,n2) is the existence of F(z1,z2) that occurs due to the presence of the infinite summations in Eq. (1). Thus, we need to identify conditions for the existence of the 2D z-transform. A set of all those values of z1 and z2 for which the infinite summations are converging and F(z1,z2) is finite (or summable) is known as the ROC. It is mathematically expressed as follows: (2) We formalize the ROC of the 2D z-transform as follows:

Definition 8 Region of Convergence (ROC) f n1. ROC_2d f n1 = {(z1, z2) | ¬(z1 = Cx( &0)) ∧ ¬(z2 = Cx( &0)) ∧ z_tr_summable f z1 z2 n1 ∧ z_tr_td_ summable f z1 z2}

where ROC_2d accepts a function f: and n1 capturing the starting point of the outer summation of the 2D z-transform [Eq. (1)] and returns a

Formal Analysis of 2D Image Processing Filters using Higher-order ...

279

set of nonzero values of variables z1 and z2 for which the 2D z-transform of f exists. It is necessary to specify the associated ROC_2d to compute the 2D z-transform. Moreover, the functions z_tr_summable and z_tr_td_summable capture the summability of the function f for the inner and the outer (double) summations, respectively, and are formalized in HOL Light as follows:

Definition 9 Summability of Function for Inner Summation f z n1. z_tr_summable f z1 z2 n1 = (∀ n1. summable (from 0) (λ n2. f n1 n2 / (z1 pow n1 ∗∗ z2 pow n2)))

Definition 10 Summability of Function for Outer (Double) Summation f z1 z2. z_tr_td_summable f z1 z2 = summable (from 0) (λ n1. inftsumm (from 0) (λ n2. f n1 n2 / (z1 pow n1 ∗ z2 pow n2))) Moreover, we verify two fundamental properties of ROC, such as the linearity of the ROC and scaling of the ROC, which are quite helpful for formally verifying the classical properties of the 2D z-transform in Sect. 5.2.

Theorem 2 Linearity of ROC z1 z2 a b f g n1. [A1]: (z1, z2) IN ROC_2d f n1 ∧ [A2]: (z1, z2) IN ROC_2d g n1 ⇒ (z1, z2) IN ROC_2d (λ n1 n2. a ∗ f n1 n2) n1 INTER ROC_2d (λ n1 n2. b ∗ g n1 n2) n1

Theorem 3

Scaling of ROC z1 z2 a f n1. [A]: (z1, z2) IN ROC_2d f n1 ⇒ (z1, z2) IN ROC_2d (λ n1 n2. f n1 n2 / a) n1 Theorem 2 ensures that if (z1, z2) is inside ROC_2d f n1 and ROC_2d g n1 for functions f and g then it is also inside the intersection of both ROCs for the scaled version of these functions. Similarly, Theorem 3 provides the scaling property with respect to the division by a complex number a.

Formal Verification of the Classical Properties of the 2D ztransform

280

Use of Abstraction and Logic in Mathematics

We use Definitions 7 and 8 and Theorems 2 and 3 for verifying some of the classical properties of the 2D z-transform in HOL Light. This verification plays a vital role in reducing the effort required for analyzing image processing systems, as described later in Sects. 5.3 and 5.4. Linearity of the 2D z-Transform The linearity of the 2D z-transform is mainly used in decomposing complex (larger) systems to subsystems or combining smaller systems to larger ones having different scaling inputs. It can be mathematically expressed as follows: If Z[f(n1,n2)]=F(z1,z2)Z[f(n1,n2)]=F(z1,z2) and Z[g(n1,n2)]=G(z1,z2),Z[g( n1,n2)]=G(z1,z2), then the following holds: (3) The 2D z-transform of a linear combination of 2D sequences (or arrays) is equal to the linear combination of the 2D z-transform of the individual arrays. We verify linearity property in HOL Light as:

Theorem 4 Linearity of the 2D z-Transform f g z1 z2 a b n1. [A1]: (z1, z2) IN ROC_2d f n1 ∧ [A2]: (z1, z2) IN ROC_2d g n1 ⇒ z_transform_2d (λ n1 n2. a ∗ f n1 n2 ± b ∗ g n1 n2) z1 z2 = a ∗ z_transform_2d f z1 z2 ± b ∗ z_transform_2d g z1 z2

where a: and b: are arbitrary complex constants. Assumptions A1 and A2 capture the regions of the convergence of functions f and g, respectively. The proof of the above theorem is mainly based on Theorem 2 and the linearity of the infinite summation along with some complex arithmetic reasoning.

Shifting Property of the 2D z-Transform The shifting property of the 2D z-transform is mostly used for analyzing the 2D linear shift-invariant (LSI) systems. In particular, it is used to solve the difference equations capturing the dynamics of these systems. The shifting property expresses the transform of the shifted signal f(n1−m1,n2−m2) in terms of its 2D z-transform F(z1,z2).

If Z[f(n1,n2)]=F(z1,z2) and assuming f(−n1,n2)=0, f(n1,−n2)=0 and f(−n1,− n2)=0, ∀n1=1,2,…,m1 and ∀n2=1,2,…,m2, i.e., f(n1,n2) is nonzero in the first quadrant only, then the shifting of a 2D array is mathematically expressed as follows:

Formal Analysis of 2D Image Processing Filters using Higher-order ...

281

(4) We formally verify the above property in HOL Light as:

Theorem 5 Shifting in Time Domain f z1 z2 m1 m2 n1. [A1]: (z1, z2) IN ROC_2d f n1 ∧ [A2]: in_ fst_quad_2d f ⇒ z_transform_2d (λ n1 n2. f (n1 - m1 n2 - m2)) z1 z2 = z_transform_2d f z1 z2 / (z1 pow m1 ∗ z2 pow m2)

where the function in_fst_quad_2d ensures that the function f is nonzero in the first quadrant only and is formalized in a relational form, i.e., f (n1 - m1, n2 - m2), ∀ m1 m2. m1 < n1, m2 < n2. The verification of Theorem 5 is mainly based on the properties of complex numbers along with two properties regarding the negative offset of series and infinite summation. More details about the proof process of this theorem can be found in our proof script.Footnote4 Scaling in (z1,z2)-domain Property of the 2D z-Transform The scaling property of the 2D z-transform results in shrinking or expansion of the (z1,z2)-domain, i.e., 4D complex (z1,z2z1,z2)-plane. If Z[f(n1,n2)]=F(z1,z2), then two different types of scaling are defined as: (5) (6) If h1 and h2 are positive real numbers, then the scaling is interpreted as expansion of the 4D complex (z1,z2)-plane. On the other hand, multiplication by

[Eq. (6)] shrinks the (z1,z2)-domain.

We verify the above theorems in HOL Light as:

Theorem 6 Scaling in (z1,z2) -Domain (Positive/Expansion) f z1 z2 n1 h1 h2. [A1]: (inv h1 ∗ z1, inv h2 ∗ z2) IN ROC_2d f n1 ∧ [A2]: (z1, z2) IN ROC_2d f n1 ⇒ z_transform_2d (λ n1 n2. h1 pow n1 ∗ h2 pow n2 ∗ f n1 n2) z1 z2 = z_transform_2d f (inv h1 ∗ z1, inv h2 ∗ z2)

282

Use of Abstraction and Logic in Mathematics

Theorem 7 Scaling in (z1,z2)-Domain (Negative/Shrinking) f z1 z2 n1 w1 w2. [A1]: (w1 ∗ z1, w2 ∗ z2) IN ROC_2d f n1 ∧ [A2]: (z1, z2) IN ROC_2d f n1 ⇒ z_transform_2d (λ n1 n2. w1 pow (-n1) ∗ w2 pow (-n2) ∗ f n1 n2) z1 z2 = z_transform_2d f (w1 ∗ z1) (w2 ∗ z2)

Complex Conjugation Property of the 2D z-Transform The complex conjugation property facilitates an easy manipulation of the 2D z-transform of conjugated arrays. It is mathematically expressed as follows: (7)

where f∗(n1,n2) represents the complex conjugate of an array f(n1,n2). The corresponding formalization of the complex conjugation property in HOL Light is given as follows:

Theorem 8 Complex Conjugation f z1 z2 n1. [A]: (cnj z1, cnj z2) IN ROC_2d f n1 ⇒ z_transform_2d (λ n1 n2. cnj (f n1 n2)) z1 z2 = cnj (z_transform_2d f (cnj z1, cnj z2))

Formal Verification of a (L1,L2)-order 2D Infinite Impulse Response (IIR) Image Processing Filter 2D digital filters [1] are integral components of the image processing systems. Their main responsibility includes the decomposition of an image to multiple frequency bands, restricting a 2D array/signal to a certain frequency band and providing the input–output relationship of these systems. For example, a low-pass filter allows a range of frequencies less than a certain threshold [2]. The analysis of an image processing filter mainly involves developing its mathematical model using a 2D difference equation. The next step is to apply 2D z-transform on both sides of the difference equation. Finally, the definition and the classical properties of the 2D z-transform are used to perform transfer function-based analysis of the given filter. The impulse response of a discrete-time system captures its behavior for the scenario when dirac-delta function is acting as an input array [2]. 2D image processing infinite impulse response (IIR) filters have a nonzero impulse response function over an infinite length of time. For these filters,

Formal Analysis of 2D Image Processing Filters using Higher-order ...

283

the present output depends on the present input and all previously computed input and output values. Mathematically, the 2D image processing filters are described using the following difference equation [16]:

(8) where α(l1,l2) and β(k1,k2) are input and output coefficients, respectively. The output array y(n1,n2) is a linear combination of the previous K1−1 and K2−1 output samples, the present input x(n1,n2)), and L1−1 and L2−1 previous input samples. Moreover, for the shift-invariant filter, α(l1,l2) and β(k1,k2) are the complex constants ( ). Therefore, Eq. (8) is known as a linear constant coefficient difference equation (LCCDE). The 2D z-transform of a (L1,L2)th difference represented in the form of f(n1,n2) is given as: (9) The corresponding transfer function of the 2D IIR filter is mathematically expressed as [16]:

(10) To formally verify the transfer function of the 2D filter [Eq. (10)], we formalize the (L1,L2)th difference as follows:

Definition 11 Formalization of the (L1,L2)th Difference f c L1 L2 n1 n2. l1l2th_ difference f c L1 L2 n1 n2 = vecsum (0..L1) (λ l1. vecsum (0..L2) (λ l2. c l1 l2 ∗∗ f (n1 - l1) (n2 - l2))) The function l1l2th_difference accepts a function f: , coefficients of the difference equation c l1 l2, the order (L1, L2) of the 2D difference equation and the variables n1 and n2 and returns the (L1,L2)

284

Use of Abstraction and Logic in Mathematics

difference. It uses the function vsum s f twice to capture the double summation.

th

Next, we formalize a general LCCDE [Eq. (8)] as follows:

Definition 12 Formalization of the LCCDE y x M1 M2 N1 N2 n1 n2 a b. LCCDE x y a b M1 M2 N1 N2 n1 n2 ⇔ y (n1, n2) = l1l2th_difference y a M1 M2 n1 n2 - l1l2th_difference x b N1 N2 n1 n2 as:

Next, we verify the 2D z-transform of the (L1,L2)th difference [Eq. (9)]

Theorem 9 The 2D z-Transform of the (L1,L2)th Difference f c L1 L2 z1 z2 n1. [A1]: (z1, z2) IN ROC_2d f n1 ∧ [A2]: in_fst_ quad_2d f ⇒ z_transform_2d (λ n1 n2. l1l2th_difference f c L1 L2 n1 n2) z1 z2 = z_transform_2d f z1 z2 ∗ vecsum (0..L1) (λ l2. vecsum (0..L2) (λ l1. z1 cpow - Cx ( &l1) ∗ z2 cpow - Cx ( &l2) ∗ c l1 l2)) where Assumption A1 ensures that (z1,z2) are in the region of convergence of the function f. Assumption A2 implies that the function f is in the first quadrant. Finally, the conclusion provides the 2D z-transform of the (L1,L2) th difference. The verification of the above theorem is mainly based on induction on N1 and N2 and Theorems 2 and 4 along with the following lemma about the summability of (L1,L2)th difference equation.

Lemma 1 Summability of the (L1,L2)th Difference f c L1 L2 z1 z2 n1. [A1]: (z1, z2) IN ROC_2d f n1 ∧ [A2]: in_fst_quad_2d f ⇒ (z1, z2) IN ROC_2d (λ n1 n2. l1l2th_difference f c L1 L2 n1 n2) n1

To verify the transfer function of the 2D filter [Eq. (10)], we have to ensure that the 2D input and output arrays exist in the first quadrant only. Moreover, the denominator of Eq. (10) should be nonzero. We formalize both these requirements in HOL Light as follows:

Formal Analysis of 2D Image Processing Filters using Higher-order ...

285

Definition 13 First Quadrant Input and Output 2D Arrays for LCCDE ⊢def in_fst_ quad_2d_lccde x y ⇔ in_fst_quad_2d x ∧ in_fst_quad_2d y

Definition 14

ROC LCCDE ⊢def ∀ x y K1 K2 lst n1 ROC_2d_LCCDE x y K1 K2 lst n1 = (ROC_2d x n1) INTER (ROC_2d y n1) DIFF {(z1, z2) | vecsum (0.. K1) (λ k2. vecsum (0..K2) (λk1. z1 cpow - Cx ( &k1) ∗ z2 cpow - Cx ( &k2) ∗ EL k1 lst)) = Cx ( &0)} DIFF {(z1, z2) | z_transform_2d x z1 z2 = Cx ( &0)}

where the function in_fst_quad_2d_lccde (Definition 13) accepts the input and output 2D arrays x and y and asserts the first quadrant condition for both arrays. Similarly, ROC_2d_LCCDE (Definition 14) provides the ROC of the input and output 2D arrays. It uses the HOL Light function DIFF to exclude all values of the denominator, where the transfer function of the 2D IIR filter becomes undefined. Now, we provide the formal verification of the transfer function of a 2D IIR filter in HOL Light as follows:

Theorem 10 Transfer Function of a 2D IIR Filter x y a b L1 L2 K1 K2 z1 z2 n1.[A1]: (z1, z2) IN ROC_2d_LCCDE x y K1 K2 blst n1 ∧[A2]: in_fst_quad_2d_lccde x y ∧[A3]: (∀ n1 n2. LCCDE x y a b L1 L2 K1 K2 n1 n2)⇒ z_transform_2d y z1 z2 / z_transform_2d x z1 z2 = vecsum (0..K1) (λ k2. vecsum (0..K2) (λ k1. z1 cpow - Cx ( &k1) ∗ z2 cpow - Cx ( &k2) ∗ a k1 k2)) / vecsum (0..L1) (λ l2. vecsum (0..L2) (λ l1. z1 cpow - Cx ( &l1) ∗ z2 cpow - Cx ( &l2) ∗ b l1 l2))

Assumption A1 provides the ROC for LCCDE. Assumption A2 ensures that the input and output 2D arrays are in the first quadrant. Assumption A3 captures the time-domain model of the 2D IIR filter, i.e., the LCCDE (Eq. (8)). Finally, the conclusion presents the transfer function of the 2D IIR filter. The proof process of the above theorem is based on the linearity and shifting properties of the 2D z-transform (Theorems 4 and 5) and summability of the (L1,L2)th difference (Lemma 1) along with some complex arithmetic reasoning. Theorem 10 provides the transfer function of a generic 2D IIR

286

Use of Abstraction and Logic in Mathematics

image processing filter and is quite useful in the verification of the secondorder 2D medical image processing filter described in Sect. 5.4.

Formal Verification of a Second-order 2D Image Processing Filter To illustrate the practical utilization and effectiveness of the proposed formalization of the 2D z-transform, we apply it to formally analyze a second-order image processing filter that is widely used for performing various tasks, such as noise removal [1], image smoothing [2] and quality enhancement [5]. A second-order image processing filter is graphically represented by the flow graph shown in Fig. 2. A flow graph is a collection of branches (directed connections) and nodes (input and output 2D arrays), where nodes are connected using branches. The constants in Fig. 2 represent the gains of each branches, whereas z1−1 and z2−1 present the shift right (horizontal delay) and shift up (vertical delay) operations, respectively. We can mathematically describe this filter using the following linear difference equation.

Figure 2: Flow graph of a second-order image processing filter.

(11)

Formal Analysis of 2D Image Processing Filters using Higher-order ...

287

Alternatively, Eq. (11) can be represented as:

(12) The transfer function corresponding to the difference equation-based model [Eq. (11)] is given as:

(13) Alternatively, the above equation can be represented as:

(14) To verify the transfer function expressed in Eq. (13), we need to formalize the difference equation-based model of the filter [Eq. (11)], which is given in HOL Light as:

Definition 15 Difference Equation-Based Model of the Second-Order Filter ⊢def ∀ y x n1 n2 a b. second_order_filter x y a b n1 n2 ⇔ y (n1, n2) = l1l2th_difference y a 2 2 n1 n2 - l1l2th_difference x b 0 0 n1 n2

where a and b are the coefficients of input and output 2D arrays. The function second_order_filter accepts input and output 2D arrays, their coefficients a and b and returns the linear difference equation describing the second-order image processing filter. Now, we formally verify the transfer function [Eq. (13)] in HOL Light as follows:

Theorem 11 Transfer Function of a Second-Order Filter

288

Use of Abstraction and Logic in Mathematics

Assumption A1 provides the ROC for the differential equation-based model of the second-order filter. Assumption A2 ensures that the input and output 2D arrays x and y are in the first quadrant. Assumption A3 asserts that the input and output coefficients are constant. Assumptions A4 and A5 ensure that the complex variables z1 and z2 are nonzero. Assumption A6 captures the time-domain model of the second-order filter, i.e., Eq. (11). Finally, the conclusion presents the transfer function of the second-order filter. The verification of the above theorem is mainly based on Theorem 10 along with some complex arithmetic reasoning. Theorem 11 is the formal verification result of the second-order image processing filter based on our formalization of the 2D z-transform described in Sects. 5.1 and 5.2. Now, a specialized case of a second-order image processing filter is graphically represented by the flow graph shown in Fig. 3. This filter can be mathematically represented, by setting the values of the gains of each branch as (11), as follows.

in Eq.

Figure 3: Flow graph of a specialized second-order image processing filter.

Formal Analysis of 2D Image Processing Filters using Higher-order ...

289

(15) The transfer function corresponding to the difference equation-based model [Eq. (15)] is described as: (16) We formally verify the transfer function [Eq. (16)] as:

Theorem 12 Transfer Function of a Specialized Second-Order Filter

Assumption A1 captures the ROC for the differential equation-based model of the specialized second-order filter. Assumption A2 asserts the first quadrant conditions on the input and output 2D arrays x and y. Assumptions A3 and A4 ensure that the complex variables z1 and z2 are nonzero. Assumption A5 presents the time-domain model of the specialized secondorder filter, i.e., Eq. (15). Finally, the conclusion captures the transfer function of the specialized second-order filter. The verification of the above theorem is done almost automatically using Theorem 11, which illustrates the effectiveness of our proposed approach.

290

Use of Abstraction and Logic in Mathematics

Figure 4: Root map for the specialized second-order filter.

Next, we implement the transfer function of the specialized secondorder filter, verified as Theorem 12, in Python. In particular, we implemented the poles [denominator of Eq. (16)] of the transfer function, i.e., the characteristic equation

on the complex

plane . Figure 4 provides the root map capturing the poles of the transfer function, and their placement with respect to unit circle in the complex plane can be used for analyzing the 2D stability of the corresponding system. In the case of the specialized secondorder filter (Fig. 4), the presence of poles inside the unit circle provides a sufficient condition for the stability of the corresponding system. However, in case of poles outside the unit circle, the corresponding system will be unstable. Similarly, the one-dimensional (1D) stability can be analyzed by implementing the characteristic equation for all z1 with z2=1 and observing the placement of the poles in the complex z1 plane.

DISCUSSIONS The distinguishing feature of our proposed framework, as compared to the traditional analysis techniques, is that all verified theorems are of generic nature, i.e., all of the functions and variables involved in these theorems are universally quantified and thus can be specialized based on the requirement of the analysis of the image processing filter of any order. For example, Theorem 10 provides the verification of the transfer function of a generic

Formal Analysis of 2D Image Processing Filters using Higher-order ...

291

(L1,L2)-order 2D IIR image processing filter and it can be directly used for analyzing an image processing filter of any order, such as second-order filter (Theorems 11 and 12). We only need to specialize the gains in Eqs. (8), (9) and (10) of an image processing filter based on a particular scenario, whereas, in the case of computer-based simulations, we need to model each filter based on its corresponding order, individually that can add a lot of complexity for the case of higher-order filters. Thus, the generic nature of the formalized theorems in our proposed approach makes it better than the transitional analysis methods. Another advantage of our proposed approach is the inherent soundness of the theorem proving technique. It ensures that all the required assumptions are explicitly present along with the theorem, which are often ignored in the conventional simulations-based analysis, and their absence may affect the accuracy of the corresponding analysis. For example, for a given system (second-order image processing filter), if we do not incorporate the constraints captured as Assumptions A3, A4 and A5 of Theorem 11 and Assumptions A3 and A4 of Theorem 12, it may lead to some undesired results; for example, it may result into a transfer function that can make a stable system as an unstable system. One of the main limitations of the proposed approach is the significant user involvement in the proposed formalization of z-transform, due to the undecidable nature of the higher-order logic. However, we have developed simplifiers, such as ROC_SIMP_TAC, DIFF_EQ_SIMP_TAC and TRANS_FUN_TAC, that significantly reduce the user guidance in the reasoning process. More details of the proof process can be viewed in our proof script.Footnote5

CONCLUSIONS 2D image processing systems include processing of the images, such as image filtering, enhancement, compression and restoration. These systems are typically analyzed using the 2D z-transform. This paper proposed a HOL theorem proving-based framework for formally analyzing 2D image processing filters. In particular, we formalized the 2D z-transform and formally verified its various classical properties, such as linearity, shifting in time, scaling in (z1,z2)-domain and complex conjugation. Moreover, we formally analyzed a generic 2D IIR image processing filter. Finally, to demonstrate the practical utilization and effectiveness of the proposed framework, we presented the formal analysis of a second-order image processing filter.

292

Use of Abstraction and Logic in Mathematics

In future, we aim to formalize the 2D inverse z-transform [16] that will enable us to find the time-domain solutions of the time-domain models of the image processing systems. Another future direction is to formalize the 2D convolution [2] that can greatly simplify the reasoning about systemsof-systems [16].

ACKNOWLEDGEMENTS This work was supported and funded by Kuwait University, Research Project No. (EO 07/19).

Formal Analysis of 2D Image Processing Filters using Higher-order ...

293

REFERENCES 1.

J.S. Lim, Two-Dimensional Signal and Image Processing (Prentice Hall, Englewood Cliffs, 1990) 2. J.W. Woods, Multidimensional Signal, Image, and Video Processing and Coding (Elsevier, Amsterdam, 2006) 3. R. Hussain, S. Zeadally, Autonomous cars: research results, issues, and future challenges. IEEE Commun. Surv. Tutor. 21(2), 1275–1313 (2018) 4. H. Blasinski, J. Farrell, T. Lian, Z. Liu, B. Wandell, Optimizing image acquisition systems for autonomous driving. Electron. Imaging 2018(5), 161–1 (2018) 5. C. Behrenbruch, S. Petroudi, S. Bond, J. Declerck, F. Leong, J. Brady, Image filtering techniques for medical image post-processing: an overview. Br. J. Radiol. 77(suppl–2), 126–132 (2004) 6. G. Hemalatha, C. Sumathi, Preprocessing techniques of facial image with median and Gabor filters, in: Information Communication and Embedded Systems (IEEE, 2016), pp. 1–6 7. A.J. Durán, M. Pérez, J.L. Varona, The Misfortunes of a Mathematicians’ Trio using Computer Algebra Systems: Can We Trust? CoRR. arXiv:1312.3270 (2013) 8. O. Hasan, S. Tahar, Formal Verification Methods. Encyclopedia of Information Science and Technology (IGI Global Pub, Hershey, 2015), pp. 7162–7170 9. J. Harrison, Handbook of Practical Logic and Automated Reasoning (Cambridge University Press, Cambridge, 2009) 10. M.J. Gordon, HOL: a proof generating system for higher-order logic, in VLSI Specification, Verification and Synthesis. SECS, vol. 35 (Springer, Berlin, 1988), pp. 73–128 11. J. Harrison, HOL light: a tutorial introduction, in Formal Methods in Computer-Aided Design. LNCS, vol. 1166 (Springer, 1996), pp. 265– 269 12. J. Harrison, HOL light: a tutorial introduction, in Proceedings of the First International Conference on Formal Methods in Computer-Aided Design (FMCAD’96). Lecture Notes in Computer Science, vol. 1166, ed. by M. Srivas, A. Camilleri (Springer, Berlin, 1996), pp. 265–269

294

Use of Abstraction and Logic in Mathematics

13. L. Paulson, ML for the Working Programmer (Cambridge University Press, Cambridge, 1996) 14. U. Siddique, M.Y. Mahmoud, S. Tahar, On the formalization of z-transform in HOL, in Interactive Theorem Proving (Springer, 2014), pp. 483–498 15. S.H. Taqdees, O. Hasan, Formalization of laplace transform using the multivariable calculus theory of HOL light, in Logic for Programming Artificial Intelligence and Reasoning(Springer, 2013), pp. 744–758 16. D.E. Dudgeon, Multidimensional Digital Signal Processing (Prentice Hall, Engewood Cliffs, 1983)

Chapter

GRAN3SAT: CREATING FLEXIBLE HIGHER-ORDER LOGIC SATISFIABILITY IN THE DISCRETE HOPFIELD NEURAL NETWORK

14

Yuan Gao 1,2, Yueling Guo 1,3, Nurul Atiqah Romli 1, Mohd Shareduwan Mohd Kasihmuddin 1, Weixiang Chen 2, Mohd. Asyraf Mansor 4 and Ju Chen1,2 School of Mathematical Sciences, Universiti Sains Malaysia, Penang 11800, Malaysia School of Medical Information Engineering, Chengdu University of Traditional Chinese Medicine, Chengdu 610000, China 3 School of Science, Hunan Institute of Technology, Hengyang 421002, China 4 School of Distance Education, Universiti Sains Malaysia, Penang 11800, Malaysia 1 2

ABSTRACT One of the main problems in representing information in the form of nonsystematic logic is the lack of flexibility, which leads to potential overfitting. Although nonsystematic logic improves the representation of the conventional k Satisfiability, the formulations of the first, second, and third-order logical structures are very predictable. This paper proposed a Citation: (APA): Gao, Y., Guo, Y., Romli, N. A., Kasihmuddin, M. S. M., Chen, W., Mansor, M. A., & Chen, J. (2022). GRAN3SAT: Creating Flexible Higher-Order Logic Satisfiability in the Discrete Hopfield Neural Network. Mathematics, 10(11), 1899. (28 pages). Copyright: © Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/).

296

Use of Abstraction and Logic in Mathematics

novel higher-order logical structure, named G-Type Random k Satisfiability, by capitalizing the new random feature of the first, second, and third-order clauses. The proposed logic was implemented into the Discrete Hopfield Neural Network as a symbolic logical rule. The proposed logic in Discrete Hopfield Neural Networks was evaluated using different parameter settings, such as different orders of clauses, different proportions between positive and negative literals, relaxation, and differing numbers of learning trials. Each evaluation utilized various performance metrics, such as learning error, testing error, weight error, energy analysis, and similarity analysis. In addition, the flexibility of the proposed logic was compared with current state-of-the-art logic rules. Based on the simulation, the proposed logic was reported to be more flexible, and produced higher solution diversity. Keywords: G-Type Random k Satisfiability; artificial neural network; Hopfield Neural Network; flexibility; random dynamics

INTRODUCTION Artificial Intelligence (AI) is a field of modelling intelligence that integrates technical science, theory development, mathematics, computer science, physics, and biology. AI has many applications [1,2,3,4,5], which include Artificial Neural Networks (ANN). The conventional ANN consists of interconnected neurons that divide input and output layers which are connected by synaptic weight. Generally, the input neuron receives information in the form of a problem or data, is processed by the intermediate layer, and generates the final neuron state that corresponds to the solution of the problem. This feature makes ANNs a great platform to solve and improve the solution of any given optimization problem. In 1982, Hopfield [6] proposed the earliest variant of ANN, namely the Hopfield Neural Network (HNN), that consists of a single-layer feedback neural network. In this discussion, we only consider the application of the Discrete Hopfield Neural Network (DHNN) in solving the optimization problem. The DHNN is a two-value nonlinear dynamic system that has multiple inputs, and the firing of the output is solely based on the pre-defined threshold values. Structurally, the sufficient condition for the stability of the DHNN is that the weighted coefficient matrix is symmetric and has zero diagonal elements. By capitalizing the coefficient matrix of the DHNN, the solution capacity creates associative memory behavior which mimics actual human intelligence. The final neuron state of the DHNN can be interpreted

GRAN3SAT: Creating Flexible Higher-Order Logic Satisfiability in ...

297

in terms of an energy function, where the absolute minimum energy signifies the most optimal solution for any given optimization problem. Thus, the correctness of the final energy function in a DHNN is highly dependent on the value of the synaptic weight assigned by the network. One of the main weaknesses of the conventional DHNN is the convergence issue resulting from the lack of capacity as the number of neurons increases. This is because the conventional DHNN has no symbolic rule to govern the modeling of the Discrete Hopfield Network (DHN), which causes the network to reiterate the synaptic weight until the optimal synaptic weight can be reached. Without a proper symbolic rule, the DHNN will forever iterate until the final neuron state achieves an optimal state that corresponds to global minimum energy. In order to remedy the situation, Abdullah [7] proposed a logical rule in ANN by mapping the connection of the neuron with a valid (or near valid) interpretation. This is the earliest effort to introduce an effective method to find the optimal synaptic weight that corresponds to the optimal final neuron state. Interestingly, this is the first proposal of the term “Wan Abdullah method” where the synaptic weight is obtained by comparing the final energy with the cost function of the logic. This research direction was continued by Sathasivam [8], where she proposed the Horn Satisfiability (HornSAT) logical rule in DHNNs. This proposed DHNN utilizes effective neuron relaxation to ensure the final neuron state will not be trapped in the suboptimal state. Note that this study was the first computer simulation of logic programming in DHNNs, and the result shows that logical rules indeed can be embedded into DHNNs. However, the impact of different logical rules in DHNNs is poorly understood because HornSAT has limited usability in terms of structure. Thus, there is a need for a different structure of logical rules where each variable inside the clause is not limited to only one positive literal at most. Kasihmuddin et al. [9] proposed the first systematic logical rule, namely 2 Satisfiability (2SAT) in DHNNs. The proposed logical rule has two literals per clause, and all clauses are connected by disjunction. This logical rule was embedded into DHNN by comparing the cost function with the Lyapunov energy function. With the aid of a genetic algorithm, the learning of 2SAT in DHNNs can be carried out effectively. Mansor et al. [10] extended the order of the logical rule by proposing 3 Satisfiability in DHNNs. In this context, the third-order Lyapunov energy function is compared to obtain the thirdorder synaptic weight. The proposed study managed to obtain an optimal value for the global minima ratio, despite the associative memory of the

298

Use of Abstraction and Logic in Mathematics

DHNN with 3SAT increasing exponentially. This research was pivotal to the application of systematic SAT in ANNs. Alzaeemi et al. [11] proposed 2SAT in the Radial Basis Function Neural Network (RBFNN) by calculating the center and width that corresponds to the output weight. The implementation of 2SAT in an RBFNN was reported to yield a small iteration error during learning. Note that the proposed study has been comprehensively compared with the state-of-the-art DHNN in [12], showing the compatibility of systematic logical rules in various types of ANNs. In another development, Kasihmuddin et al. [13] proposed the first non-satisfiability logical rule, namely Maximum k Satisfiability in DHNN, by considering the nonzero cost function during the learning phase. The proposed research was shown to achieve an optimal global minima ratio with lower learning error. In another development, systematic logic has been applied to logic mining that classifies various real-life problems. Despite successful implementation of systematic logic in DHNNs, systematic logic lacks variety of clauses and produces less neuron variation during the retrieval phase of DHNNs. Thus, there is an urgent proposal for a logical rule that contains a clause with different orders to be embedded into a DHNN. There is a great diversity of nonsystematic logical rules that were recently proposed. Sathasivam et al. [14] proposed the first Random 2 Satisfiability (RAN2SAT) in DHNNs, where the first and second-order clauses form the whole logical formulation. Interestingly, the result of the experiment shows that the first-order clause creates more logical inconsistency compared to the second-order clause. This implies that as the number of first-order clauses increases the DHNN is unable to complete the learning phase, resulting in a suboptimal retrieval phase. This research was further extended by Karim et al. [15], where the higher-order RANkSAT was proposed by adding thirdorder logic. This study has an interesting insight because different variants of RAN3SAT, such as RAN1,3SAT and RAN2,3SAT were proposed in DHNNs. In this paper, all the variants of RAN3SAT were compared with the systematic logical rule to obtain the optimal global minima ratio. This is an interesting result because we are able to validate that DHNNs can “behave” according to the nonsystematic logical rule. In order to optimize RAN3SAT in DHNNs, Bazuhair et al. [16] intelligently proposed Election Algorithm to optimize and improve both the learning and retrieval phases. The proposed RAN3SAT is considered the best hybrid DHNN because the proposed method has low learning error, high variation value, and a high global minima ratio. In another development, Alway et al. [17] contributed to the development of nonsystematic logical rules by proposing Major 2

GRAN3SAT: Creating Flexible Higher-Order Logic Satisfiability in ...

299

Satisfiability (MAJ2SAT). The proposed logical rule capitalizes a high proportion of second-order logic in comparison with other logical clauses. Based on the result reported in this paper, MAJ2SAT creates more variation in terms of logical rules, and has a very low similarity value compared to systematic logical rules. However, the existing nonsystematic logical rule does not take into account the random occurrence of the clause that makes the final formulation. In this context, nonsystematic logic must have the ability to cover all the solution sets bounded by the higher-order logical clause. In this paper, we introduced G-Type Random 3 satisfiability (GRANkSAT) that capitalizes both higher-order systematic and nonsystematic logical rules in DHNNs. The higher-order systematic logical rule provides storage capacity to GRAN3SAT, whereas the higher-order nonsystematic logical rule provides a more diversified third-order logical connection. This is the first attempt to leverage both logical rules into a DHNN which we believe can represent all sets of logical rules that have been previously proposed. The main contributions of this paper are as follows: •

We propose a novel logical rule, namely G-Type Random 3 Satisfiability, or GRAN3SAT, by randomly generating the firstorder, second-order, and third-order satisfiability logical rules. By incorporating a third-order clause, the capacity of the proposed logic increases. • We implement GRAN3SAT into a DHNN by minimizing the logical inconsistency of the logical rule that corresponds to the zero-cost function. The derived cost function that corresponds to GRAN3SAT will be capitalized to compute the synaptic weight of the network. • We conduct various extensive analyses to examine the behavior of the proposed GRAN3SAT. The final neuron state for various case studies will be evaluated based on different initial neuron states, parameter perturbation, different trial runs, and relaxation. Various performance metrics, such as learning error, synaptic weight error, energy profile, test error, and similarity metric, will be reported to justify the behavior of the proposed GRAN3SAT. • We compare the proposed GRAN3SAT with state-of-the-art systematic and nonsystematic logical rules. The organization of this paper is as follows. Section 2 provides an overview of the structure of a novel GRANkSAT. Section 3 explains the

Use of Abstraction and Logic in Mathematics

300

implementation of GRANkSAT into a DHNN. The experimental setup and performance evaluation metrics used throughout the simulation are shown in Section 4. Section 5 discusses and analyzes the behavior performance of a DHNN-GRANkSAT in different parameters and phases, and compares it with several established logical structures. Finally, Section 6 presents the conclusions and future work.

G-TYPE RANDOM K SATISFIABILITY GRANkSAT is a nonsystematic logical structure expressed by conjunctive normal form CNF. GRANkSAT consists of a series of clauses with random literals, and the numbers of clauses and states of literals are randomly determined. In this case, GRAN3SAT mainly consists of k-SAT (k≤3), where k-SAT is made up of a set of x literals and a set of y clauses. Each literal value has the form of {1,−1} that represents TRUE or FALSE. The general structure of GRAN3SAT (PGRAN3SAT) is given as follows: (a) (b)

A set of x literals: A1,A2,A3,…,Ax A collection of clause numbers: U={Nc1,Nc2,Nc3…,Ncω}, whereby (1) (2) (3)

where mi is the number of the third-order clause, ni is the number of the second-order clause, and ki is the number of the first-order clause. (c)

A random number j, where j∈[1,ω] and j∈N which corresponds to the set of clauses Ncj.

(d)

The third-order logic clause is as follows: where

, (4)

(e)

The

second-order

logic , where

clause

is

defined

as:

GRAN3SAT: Creating Flexible Higher-Order Logic Satisfiability in ...

301

(5) (f)

,

The first-order logic clause is stated as: where (6)

Thus, the general formulation for GRANkSAT, or PGRAN3SAT, based on the above features is as follows: (7) where mj>0, nj≥ 0, kj≥ 0. Based on (4)–(6), the states of the literals are determined randomly, where Ai∈{Ai,¬Ai}. The examples of PGRAN3SAT with different random structures are as follows: (8)

(9) According to Equations (8) and (9), the equation is satisfied or PGRAN3SAT=1 if all the clauses in the formulation are fully satisfied. Another interesting point about Equation (7) is the randomness in representing the clause for PGRAN3SAT. In this case, the clause is not only limited to Equations (8) and (9), but it has infinitely many combinations with a fixed total number of literals. Equation (7) is different from that of previous research proposed by Karim et al. [15], which proposed RAN3SAT formulation. In RAN3SAT, the proportion of the Equations (4)–(6) is pre-determined, although the state of the literal remains random. Thus, the random feature of the formulation does not consider the proportion of the clause. In this paper, we propose a higher-order logical rule of PGRAN3SAT by proposing a third-order clause (refer Equation (4)), yet the occurrence for each clause remains random. In other words, PGRAN3SAT is expected to provide more logical flexibility in terms of clauses and literals. In this case, any information in a combinatorial problem (such as logic mining) will be represented randomly in the form of a onedimensional to three-dimensional system. This feature helps practitioners represent any combinatorial problem in a more flexible formulation. Therefore, the proposed PGRAN3SAT is a breakthrough in modelling neurons in ANN.

302

Use of Abstraction and Logic in Mathematics

GRAN3SAT IN THE DISCRETE HOPFIELD NEURAL NETWORK The DHNN is another variant of ANNs that has no hidden layers [18]; it can be used to solve various optimization problems. A DHNN consists of bipolar neurons where the state is represented by {1,−1}. The conventional neuron update with a pre-defined tolerance is as follows:

(10) where Wijk is the synaptic weight between neuron i, neuron j, and neuron k. Si is the state of neuron i, and δi is the threshold. It is the synaptic weight Wi that deserves attention, as it refers to the degree of the connection between multiple neurons. The property of the synaptic weights for the two neurons follows Wij=Wji, Wijk=Wjik=Wjki=Wikj=Wkij=Wkji, and has no self-feedback connection Wii=Wjj=0, Wiii=Wjjj=Wkkk=0. When a higher-order connection has been added to Equation (10), any two similar neuron connections will result in zero value for synaptic weight. PGRAN3SAT can be implemented into the DHNN (GRAN3SAT) by assigning each neuron with a variable. Collectively, the neurons will be grouped randomly as clauses in Equations (4)–(6) until they satisfy the total number of neurons. The cost function χPGRAN3SAT for the implementation of PGRAN3SAT into the DHNN is as follows: (11)

(12) To fully implement the modelling of PGRAN3SAT in the DHNN, the cost function χPGRAN3SAT that is associated with PGRAN3SAT must be zero. In other words, the DHNN must find at least one interpretation which corresponds to a zero cost function. By finding at least one consistent interpretation, the optimal synaptic weight for GRAN3SAT via [19] can be found. However, if χPGRAN3SAT≠0, the PGRAN3SAT is not considered satisfiable, which results in nearly random synaptic weight. Since achieving χPGRAN3SAT=0 is vital to ensure the DHNN can retrieve the correct final neuron state, effective learning methods must be employed during the learning phase of the DHNN.

GRAN3SAT: Creating Flexible Higher-Order Logic Satisfiability in ...

303

The probability of finding a satisfactory interpretation for PGRAN3SAT is as follows: (13) where θ is the probability value, and is the probability of satisfying the k-order clause. During the retrieval phase, the DHNN will implement the iterative update of neurons from the initial state to the final state via the local field formula and activation function. Equations (14) and (15) represent the local field formula and the formula of update neuron states, respectively. Since Hyperbolic Tangent Activation Function (HTAF) has non-linear properties, HTAF is widely used as the activation function in artificial neural networks [20]. (14)

(15) represent the initial state and the updated state of neuron i, Si and respectively. Wijk, Wij, and Wi represent the synaptic weights of the third, second, and first-orders of the DHNN, respectively. The main motivation of using PGRAN3SAT in GRAN3SAT is to obtain a more final state that has various logical rules during the retrieval phase. For instance, by using Equation (15), the final neuron state is connected in various types of clauses that have been stated in Equations (4)–(6). Thus, the magnitude of the final neuron state can be evaluated using the Lyapunov energy function as follows:

, as shown

(16) (17)

304

Use of Abstraction and Logic in Mathematics

Since each logical order provides a fixed energy value, we can obtain the absolute minimum energy of the PGRAN3SAT by calculating in Equation (17). Note that we can obtain the optimal synaptic weight by comparing Equation (16) with Equation (11), as long as the learning phase of GRAN3SAT obtained at least one interpretation that corresponds to χPGRAN3SAT=0. By iteratively updating the neuron state via Equations (14) always converges to the nearest local minima and (15), solution. As a result of the random nature of the proposed PGRAN3SAT, both and will fluctuate and have different proportions than the research of Karim et al. [15], where the absolute final energy can be pre-determined. Despite having different clause arrangement compared to [15], the choice of literal in PGRAN3SAT is similar with the research of [14] and [15] where random literals are the building blocks of the clause in Equations (4)–(6). In order to separate between the global minimum solution and the local minimum solution, the convergence property of the proposed GRAN3SAT must satisfy the following condition: (18) where Tol is a pre-determined tolerance value of GRAN3SAT. In this context, condition (18) determines whether the final neuron state exhibits the behavior that satisfies PGRAN3SAT.

Figure 1 illustrates the schematic diagram of the implementation of PGRAN3SAT to the DHNN (GRAN3SAT). Generally, the schematic diagram can be divided into the learning phase and the retrieval phase. Before the learning phase, random clause arrangement for PGRAN3SAT was determined and was converted into Boolean algebra. After assigning each clause in PGRAN3SAT with a neuron, GRAN3SAT is required to assign the neuron state that satisfies the cost function in Equation (11). Thus, the optimal neuron assignment will help us compute the optimal synaptic weight that will be used in the retrieval phase.

GRAN3SAT: Creating Flexible Higher-Order Logic Satisfiability in ...

305

Figure 1: Schematic diagram for GRAN3SAT.

EXPERIMENTAL SETUP To further investigate the behavior of the proposed research, GRAN3SAT will be evaluated based on different parameters and learning settings. Four different simulations, including different number of clauses, different proportions of literals (positive or negative), different learning trials, and different numbers of iterations will be tested in this paper. Each part will undergo three assessments, which are the learning phase, retrieval phase, and similarity index. The details for each simulation are as follows: •

Different numbers of clauses. In this section, we evaluate and analyze the impact of different order logics on GRAN3SAT by using performance metrics at each phase, and determine the impact of parameter perturbation on GRAN3SAT.

Use of Abstraction and Logic in Mathematics

306



Different proportions of literals. In this section, we evaluate the impact of different proportions of literals (positive or negative) on GRAN3SAT by using performance metrics at each phase, and determine the impact of parameter perturbation on GRAN3SAT. • Different learning trials. In this section, we evaluate the impact of different learning trials on GRAN3SAT on the performance metrics of each phase. This simulation provides a basis for efficiency improvement in the subsequent learning algorithm. • Different numbers of iterations. In this section, we evaluate the impact of Sathasivam relaxation on GRAN3SAT via performance metrics of each phase to obtain the most optimal parameters. • Flexibility analysis of the logic structure. In this section, we compare GRAN3SAT with several established logical rules in terms of flexibility of the logical rule. All the experiments will be simulated using MATLAB 2016a with the 64-bit Windows 10 operating system. Table 1 shows the parameters involved in each experiment. Table 1: Parameters for the proposed GRAN3SAT

Each simulation will be evaluated using six types of performance metrics. The metrics are based on process evaluation in the learning phase (learning error analysis), outcome evaluation in the learning phase (weight analysis), process evaluation in the retrieval phase (energy analysis), outcome evaluation in the retrieval phase (global solution analysis), similarity index evaluation (solution diversity analysis), and structural error metrics (logic flexibility analysis). Table 2, Table 3, Table 4 and Table 5 present a list of parameters involved in all performance evaluation metrics.

GRAN3SAT: Creating Flexible Higher-Order Logic Satisfiability in ...

307

Table 2: List of parameters in the learning phase

Table 3: List of parameters in the retrieval phase

Table 4: Parameters involved in the similarity index

Table 5: Parameters involved in structure evaluation

RMSE (root mean square error), MAE (mean absolute error) [21], and MAPE (mean absolute percent error) [22] are statistical metrics that can be used as evaluation metrics for machine learning [23]. RMSE has been used as a standard statistical metric to measure the performance of models. In addition, MAE is one of the most direct measures of prediction error; the

308

Use of Abstraction and Logic in Mathematics

smaller the MAE value, the better the model. MAPE measures the accuracy of the proposed model by percentage value. Compared with MAE, RMSE is more sensitive to outliers which have a greater impact on it. In the learning phase, we measure the fitness of neuron states and examine the satisfied clause which generates the optimal synaptic weights. Equations (19)–(21) will be used to measure the fitness of the neuron, whereas the error in synaptic weight used will be evaluated based on Equations (22) and (23). Table 2 shows the parameters used in synaptic weight analysis. (19)

(20)

(21) (22)

(23) In the retrieval phase, the energy analysis is used to determine the efficiency of GRAN3SAT [24]. Equations (24) and (25) represents the formulation for RMSE and MAE of the neuron during retrieval phase. The quality of GRAN3SAT solutions is evaluated, and the RMSE, MAE, and ZM formulas are used to evaluate the test errors that are defined in Equations (26)–(28). In addition, Table 3 describes the parameters used in the retrieval phase.

(24)

(25)

GRAN3SAT: Creating Flexible Higher-Order Logic Satisfiability in ...

309

(26) (27) (28) The similarity index quantifies the relationship between the final state of the neuron and the ideal neuron state during the retrieval phase of GRAN3SAT. The definition of the

is as follows:

(29) where A is the positive literal and ¬A is the negative literal existing in each is the ideal neuron state. Note that Equation clause of PGRAN3SAT, and (29) will consider the final neuron state that achieves global minimum energy. In this case, the Jaccard index SJaccard [25] will be used to evaluate the quality of the final neuron state: (30) is the current final neuron state. Note that a lower value of SJaccard where is favored, since it shows higher diversity of the final neuron state. To evaluate the flexibility of the logic structure, this paper proposes Equations (31)–(34) to quantify the degree of change in the logic structure from the perspectives of the number of clauses and the literal state. Equations (31) and (32) represent the number of clauses that fluctuate during the learning phase of GRAN3SAT, and Equations (33) and (34) represent the error resulting from the number of negative literals. In addition, Table 5 represents the parameters involved in evaluating the flexibility of the logic structure.

(31)

310

Use of Abstraction and Logic in Mathematics

(32)

(33)

(34) Figure 2 shows the overall implementation of the proposed GRAN3SAT. Figure 2 can be divided into two parts. First, the implementation process represents the actual process of the proposed GRAN3SAT from the learning phase to the retrieval phase. Second, the process line describes the phases of the parameter influence with different performance evaluation metrics.

Figure 2: Implementation flowchart for the proposed GRAN3SAT.

GRAN3SAT: Creating Flexible Higher-Order Logic Satisfiability in ...

311

RESULTS AND DISCUSSION To evaluate the effectiveness of the proposed model, GRAN3SAT will be evaluated based on four perspectives: different orders of clauses, different proportions between positive and negative literals, different numbers of iterations, and different numbers of learning trials. These perspectives will be evaluated based on various performance metrics. After finding the best setting from Section 5.1, Section 5.2, Section 5.3 and Section 5.4, the proposed GRAN3SAT will be compared with existing model stated in Section 4.

The Effect of Different Types of Clauses The purpose of this section is to analyze the influence of the numbers of firstorder ki, second-order ni, and third-order logic mi clauses on the performance of the proposed GRAN3SAT. Since (ki,ni,mi) are randomly generated, the proportions of the clauses will be adjusted based on the proportion of thirdorder (α), second-order (β), and first-order logicals (γ); the cases are shown in Table 6. Table 6: Different cases for the GRAN3SAT model

Figure 3 demonstrates the performance of different GRAN3SAT models in terms of MAElearn, RMSElearn, MAPElearn, MAEweight, and RMSEweight during the learning phase of the DHNN. In order to assess the actual capability of GRAN3SAT with different proportions, Exhaustive Search Algorithm (ES) was implemented with all models during the retrieval phase. This learning method was proposed by [26], where the algorithm capitalizes trial and error to achieve a minimized cost function χPGRAN3SAT=0. This causes the values of MAElearn and RMSElearn for all GRAN3SAT models to increase as the number of neurons increases. Based on Figure 3a,b, the GRAN3SAT that has the highest proportion of α has the lowest values of MAElearn and RMSElearn. This shows that the third order clause has the capability to reduce the learning error of the proposed GRAN3SAT. This pattern was supported by the high error for the GRAN3SAT model that has a high value of β and γ. Another

312

Use of Abstraction and Logic in Mathematics

interesting perspective is that as the number of first-order logics increased, the performance of GRAN3SAT during the learning phase detriorated. This is due to the difficulty of the ES to find the consistent interpretation that satisfies GRAN3SAT that has more first-order logic. Despite the increase in error as the number of neurons increases, the ratio of values for both MAElearn and RMSElearn are close to 1:1. This indicates the absence of outliers that potentially influence neuron fitness [27]. Based on Figure 3c, when NN≥15, the value of MAPElearn is relatively stable, reflecting the proportion of the number of unsatisfied clauses to the total number of clauses [24]. It is reported that Cases I, II, III, and IV will stabilize at around 0.33, 0.210, 0.280, and 0.410, respectively. As reported in Figure 3a,b, most of the clauses that are not satisfied are a result of high values of β and γ. This also confirms that lower MAPElearn can be achieved by GRAN3SAT if more third-order logic was generated in the formulation.

Figure 3: Error performance for (a) MAElearn, (b) RMSElearn, (c) MAPElearn, (d) MAEweight, and (e) RMSEweight.

GRAN3SAT: Creating Flexible Higher-Order Logic Satisfiability in ...

313

Figure 3d,e demonstrate the MAEweight and RMSEweight for all GRAN3SAT models. Note that this perspective is only for the restricted learning phase where the number of learning trials is pre-determined. Based on the result obtained, the values of MAEweight and RMSEweight were the lowest when higher α was generated in the formula. This shows that the correct synaptic weight can be obtained when more third-order clauses are considered in the GRAN3SAT logical rules. The lowest performance of the GRAN3SAT model is when γ is the highest, since the DHNN requires more learning iterations to minimize the cost function. When MAEweight was minimized, the DHNN is able to retrieve the optimal final neuron state which corresponds to the behavior of the GRAN3SAT model. We now consider the capability of GRAN3SAT during the retrieval phase. Figure 4 demonstrates the performance of different GRAN3SAT models in terms of MAEenergy, RMSEenergy, MAEtest, RMSEtest, and ZMtest during the retrieval phase of the DHNN. In terms of MAEenergy and RMSEenergy, the GRAN3SAT with the highest proportion of α has the lowest values of MAEenergy and RMSEenergy. This shows that the difference between final energy and the absolute minimum energy is minimized as more thirdorder clauses are generated in GRAN3SAT. This is a result of the lower values of MAEweight and RMSEweight that lead to optimal final neuron states. It is also reported that the MAEenergy and RMSEenergy were obtained from lower-order clauses compared to third-order clauses. These findings have good agreement with the research of [28], where the closer the energy is towards the absolute minimum energy, the more stable the final neuron state becomes. It can be seen from Figure 4 that as the NN increases, the final state that corresponds to the global solution is difficult to obtain. Most of the solution is trapped as a local solution which results in higher values of MAEenergy and RMSEenergy. Thus, the values of MAEtest and RMSEtest will also increase (refer to Figure 3c,d). In order to fully understand the value of MAEtest obtained by GRAN3SAT, we report the specific values of MAEenergy and RMSEenergy for MAEtest=0.994 when all final states of the DHNN reach local minima solutions in Table 7.

314

Use of Abstraction and Logic in Mathematics

Figure 4: Error performance for (a) MAEenergy, (b) RMSEenergy, (c) MAEtest, (d) RMSEtest, and (e) ZMtest. Table 7: The values of MAEenergy and RMSEenergy for different NN when MAEtest=0.994

According to Table 7, the final neuron state failed to achieve the global minimum solution in 2.471≤MAEenergy≤3.949 for Case II to Case IV, whereas MAEenergy=6.444 and RMSEenergy=7.416 for Case I. GRAN3SAT also reported higher MAEenergy and RMSEenergy at the critical value since there are no final neuron states that achieve the global minimum solution. This shows the importance of higher-order clauses to retrieve the best behavior of GRAN3SAT. The ratio of MAEenergy and RMSEenergy is close to 1:1, indicating that there is no outlier in the energy distribution. Using linear fitting in MATLAB [29], the relationship between the average number of k-order logic clauses and energy errors can be obtained (refer Table 8). It can be found that the average energy error of a single higher-order logical clause is less than that of a lower-order logical clause.

GRAN3SAT: Creating Flexible Higher-Order Logic Satisfiability in ...

315

This shows that the behavior of GRAN3SAT can be portrayed by assigning higher values of α. Table 8: Average individual k-order logic clause and energy error linear fit coefficients

According to Figure 4e, GRAN3SAT with higher α was reported to produce higher ZMtest compared to other proportions. The reason of the higher value of ZMtest is because the lower values of MAElearn and RMSElearn in finding the consistent interpretation during the learning phase. Thus, optimal synaptic weight drives the final neuron state to converge to the global minimum solution. In this case, the local field in Equation (14) has a higher chance to satisfy the condition in Equation (18). On the contrary, a higher value of γ will reduce the probability of the DHNN to converge to the optimal final neuron state. This is due to only one value of synaptic weight that contributes to the update of the neuron state. Thus, the final neuron state is likely to be trapped in a local minimum solution. While all the previous metrics focus on the evaluation of the number of GRAN3SAT solutions, it is equally important to evaluate the quality of the neuron state. We conduct similarity analysis to measure the similarity of the global solution for each GRAN3SAT model. In this section, the SJaccard with the second-order fitting [30] are mainly applied to evaluate the performance for each model. Figure 5 represents the relationship between the SJaccard with the second-order fitting and the NN under different models. Note that the bar chart reflects the fluctuation of the SJaccard for the final neuron state where the curve represents the overall trend of the SJaccard with the number of NN. As shown in Figure 5, the result for SJaccard can be explained in three intervals. At interval 6≤NN≤27 the SJaccard for the GRAN3SAT model is Case IV>Case III>Case I>Case II, where α has a larger global solution space. Next, at interval 27≤NN≤36, Case III was reported to outperform Case IV due to higher lower-order clauses (β,γ). This results from the greater probability for this case to obtain random Wij and Wi during the learning phase of the DHNN. Finally, the value of Case I declines steadily at NN>36 compared with the other cases. In this case, the global solution of GRAN3SAT at NN>36 has stable and more diversified final neuron states. This can be

316

Use of Abstraction and Logic in Mathematics

explained by referring to Equation (14) where more synaptic weight was responsible for the neuron updates during the retrieval phase. However, the ineffectiveness of the ES during the learning phase of the DHNN creates more local minimum solutions which reduce the number of final neuron states that are different from each other. This phenomenon is more obvious as the NN increases. Consequently, the choice of GRAN3SAT that capitalizes higher-order clauses will increase the number of global solutions that eventually reduce the value of SJaccard. The feature makes GRAN3SAT with higher α more advantageous compared to other models.

Figure 5: SJaccard for different GRAN3SAT models.

The Effect of Different Proportions of Literals (Positive or Negative) The purpose of this section is to analyze the influence of different proportions of literals (positive or negative) towards the performance of the GRAN3SAT model. Note that although the initial neuron state is randomly generated (1 or −1), the effect of the number of positive and negative literals can provide us insight on the behavior of our proposed GRAN3SAT model. Table 9 shows the proportion of the negative literal, PN, of the neurons based on different GRAN3SAT models. Using the information in Table 9, we will investigate each GRAN3SAT model based on the effectiveness of the learning phase, retrieval phase, and similarity analysis.

GRAN3SAT: Creating Flexible Higher-Order Logic Satisfiability in ...

317

Table 9: Different literal proportions for GRAN3SAT models

Figure 6 demonstrates the performance of different GRAN3SAT models in terms of MAElearn, RMSElearn, MAPElearn, MAEweight, and RMSEweight during the learning phase. Note that MAElearn, RMSElearn, and MAPElearn analyze the fitness of the neuron with different PN, whereas MAEweight and RMSEweight analyze the error as a result of incorrect synaptic weights with different PN. As shown in Figure 6a–c, there is no obvious difference in terms of MAElearn, RMSElearn, and MAPElearn for all cases. This indicates that the learning phase of the GRAN3SAT model is not affected by different PN. In order to support this statement, we illustrate the linear fittings of MAElearn and RMSElearn in Table 10. Based on the value of the error, the slope of the fitting remains the same, which indicates that different PN do not influence the learning capability of the GRAN3SAT model. We magnified our finding by extracting the value of MAPElearn for three different NN (refer Table 11). The curve remains stable at about 0.31–0.36 when NN≥24, which indicates that the percentage of neuron fitness does not change as the NN increases. Interestingly, the mixture of random occurrence clauses for PGRAN3SAT as stated in Equations (4)–(6) with different PN does not increase or decrease the performance of the ES in finding the consistent interpretation. Similar observations were reported in MAEweight and RMSEweight in Figure 6d,e. Compared to MAElearn, the values of MAEweight and RMSEweight increase in different ways. At NN60, it can be observed that the final state of all cases are basically local solutions. Consequently, GRAN3SAT can increase the number of optimal final neuron states by increasing the number of positive literals.

Figure 7: Error performance for (a) MAEenergy, (b) RMSEenergy, (c) MAEtest, (d) RMSEtest, and (e) ZMtest.

320

Use of Abstraction and Logic in Mathematics

Table 12: The average NC when NN=100.

Figure 8 indicates the SJaccard of GRAN3SAT for different values of PN. Based on Figure 8, the final state of GRAN3SAT varies significantly and fluctuates in a certain range. In order to understand the actual performance of the final neuron state, the minimum and maximum values of the SJaccard for GRAN3SAT are shown in Table 13. Since Case I generates completely random literals, the final neuron state in regards to SJaccard shows the highest differences in magnitude. This shows that more random literals in GRAN3SAT will create different final neuron states. Another point to ponder is that high values of PN in GRAN3SAT will increase the diversity of the final neuron state that achieved the global minimum solution. According to Table 13, the value of SJaccard increased from 0.033 to 0.966 by just reducing the value of PN from 0.9 to 0.1, respectively. This is because the lower the value of PN, the greater the probability for GRAN3SAT to produce Ao∨Ap∨Aq and Ai∨Aj, which results in similar final neuron states. The main problem with producing the mentioned clauses is that the synaptic weight obtained during the learning phase tends to be monotonous in terms of vector and magnitude. As the proposed GRAN3SAT retrieves the final neuron state using Equations (14) and (15), the HTAF will classify the final state more towards a single type of neuron state; this will increase the value of SJaccard. Therefore, the proposed GRAN3SAT model will achieve an optimal final neuron state by reducing positive literals in PGRAN3SAT formulation.

Figure 8: SJaccard for GRAN3SAT for different values of PN.

GRAN3SAT: Creating Flexible Higher-Order Logic Satisfiability in ...

321

Table 13: The ranges of the SJaccard values

From the above discussion, it can be concluded that different PN will affect the energy error, test error, and especially the similarity index, but have no effect on clause satisfaction and synaptic weight error. During the retrieval phase, the larger the value of PN, the higher the energy error, and the fewer global solutions. In the similarity index, the larger the value of PN, the smaller the value of SJaccard. This means that a larger proportion of PN will have fewer solutions that satisfy Equation (18). In this case, the tradeoff between PN must be determined to ensure the optimality of our proposed GRAN3SAT.

The Effect of Different Learning Trials The purpose of this section is to analyze the effect of different learning trials, Ntrial, on the performance of the GRAN3SAT models. During the learning phase, the ES will be given a set of trials to ensure the cost function of the DHNN is minimized. In this case, the fitness of the neurons will be based on the number of satisfied clauses obtained from each Ntrial. Note that the higher number of learning trials will assist the ES to explore a larger search space for the highest fitness that corresponds to a GRAN3SAT model. The results from this section will provide the theoretical support for improving the efficiency of the learning algorithm of our proposed GRAN3SAT model. Figure 9a–c show the effect of the Ntrial on the errors MAElearn, RMSElearn, and MAPElearn for the GRAN3SAT model. As shown in Figure 9a–c, the MAElearn, RMSElearn, and MAPElearn are similar despite having different values of Ntrial. During the learning phase of GRAN3SAT, the neuron fitness is dependent on the number of clauses in PGRAN3SAT, although it has been reported in the research of [15] that higher values of Ntrial will increase the probability of the ES to minimize the cost function. The random feature of PGRAN3SAT in Equations (4)–(6) makes high values of Ntrial appear insignificant. When the ES failed to achieve χPGRAN3SAT=0, the difference between the current neuron fitness and the desired neuron fitness will increase. This was supported by the increase in MAElearn, RMSElearn, and MAPElearn as the number of NN increased.

322

Use of Abstraction and Logic in Mathematics

Figure 9: Error performance for (a) MAElearn, (b) RMSElearn, (c) MAPElearn, (d) MAEweight, and (e) RMSEweight.

Figure 9d,e illustrate the effect of the Ntrial on the errors MAEweight and RMSEweight for the GRAN3SAT model. At 6≤NN≤63, the value of MAEweight and RMSEweight are the highest when Ntrial is the lowest. This is because although the cost function of the neuron in GRAN3SAT is χPGRAN3SAT≠0, some fragment of the clause of PGRAN3SAT was satisfied and obtained the optimal synaptic weight. In this case, more Ntrial will provide more solution space for the ES to find near optimal neuron fitness. Therefore, MAEweight and RMSEweight can be reduced. Based on Table 14, at NN>63, the values of MAEweight and RMSEweight gradually stabilize, and it is difficult to satisfy the clause in Equations (4)–(6). This shows that the ES does not contribute to driving the neuron towards global maxima. It is worth noting that simply increasing the number of Ntrial will increase both computational time and capacity of the GRAN3SAT model. On the contrary, the effect of high Ntrial seems obvious if GRAN3SAT became systematic SAT, except that the clause produced is only a first-order clause. In order to remediate the situation, the learning options of GRAN3SAT can be improved by combining other meta-heuristic algorithms, such as the Grey Wolf Optimization (GWO) [31], Genetic algorithm (GA) [32], Election Algorithm (EA) [33], and Particle Swarm Optimization (PSO) [34].

GRAN3SAT: Creating Flexible Higher-Order Logic Satisfiability in ...

323

Table 14: The values of MAElearn and MAEweight for GRAN3SAT

Figure 10a,b illustrate the effect of the Ntrial on the MAEenergy and RMSEenergy for the GRAN3SAT model. Based on these figures, there is a small difference in the energy error between Lmin (minimum energy) and Lf (final energy) as a whole, showing a trend of decreasing energy error as the Ntrial value increases. This is closely related to the optimal synaptic weights obtained by GRAN3SAT during the learning phase. We confirm our findings by simulating the linear fitting of the MAEenergy with different values of Ntrial (refer to Table 15). Overall, the MAEenergy and RMSEenergy show a steady upward trend with the increase of NN. This is due to the increased numbers of third-order clauses and second-order clauses, resulting in slow network convergence.

Figure 10: Error performance for (a) MAEenergy, (b) RMSEenergy, (c) MAEtest, (d) RMSEtest, and (e) ZMtest.

324

Use of Abstraction and Logic in Mathematics

Table 15: The slope of the linear fitting for MAEenergy

Figure 10c–e and Table 16 demonstrate the effect of Ntrial on the MAEtest, RMSEtest, and ZMtest for the GRAN3SAT model. As the Ntrial increased from Ntrial=102 to Ntrial=103, the sharp increase of MAEtest for 6