Theoretical Foundations + Automata in Mathematics and Selected Applications (Handbook of Automata Theory, 1-2) 3985470065, 9783985470068

Automata theory is a subject of study at the crossroads of mathematics, theoretical computer science, and applications.

133 17 15MB

English Pages 1608 [1612] Year 2021

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Theoretical Foundations + Automata in Mathematics and Selected Applications (Handbook of Automata Theory, 1-2)
 3985470065, 9783985470068

Table of contents :
About the pagination of this eBook
Handbook of Automata Theory: Volume
I
Preface
Contents
List of contributors
Part I. Foundations
1. Finite automata
1. Basic algebraic structures
2. Words, languages and automata
3. Operations on recognisable languages
4. Minimal automaton and syntactic monoid
5. Rational versus recognisable
6. Algebraic approach
References
2. Automata and rational expressions
1. A new look at Kleene's theorem
2. Rationality and recognisability
3. From automata to expressions: the AtEs-maps
4. From expressions to automata: the EtAs-maps
5. Changing the monoid
6. Introducing weights
7. Notes
References
3. Finite transducers and rational transductions
1. Introduction
2. Basic definitions
3. Morphic representations
4. Applications
5. Undecidability in transductions
6. Further reading
References
4. Weighted automata
1. Introduction
2. Weighted automata and their behaviour
3. Linear representations
4. The Kleene–Schützenberger theorem
5. Semimodules
6. Nivat's theorem
7. Weighted monadic second-order logic
8. Decidability of "r_1=r_2"?
9. Characteristic series and supports
10. Further results
References
5. Max-plus automata
1. Introduction
2. Preliminaries
3. One-letter max-plus automata
4. General max-plus automata
5. Bibliographic notes
References
6. ω-Automata
1. Introduction
2. Types of omega-automata
3. Basic properties of Büchi automata
4. Basic constructions
5. Run DAG’s of Büchi automata
6. Run trees of Büchi automata
7. Congruence relations
8. Loop structure
9. Alternation
10. Applications in logic
11. More complex recurrence conditions
References
7. Automata on finite trees
1. Introduction
2. Fundamentals on tree automata
3. Ground-tree rewriting
4. Tree-walking automata
5. Automata on unranked trees
6. Classification of regular tree languages
7. Conclusion
References
8. Automata on infinite trees
1. Introduction
2. Automata on infinite trees
3. Constructions for complementation and simulation
4. Decision problems
5. Applications in logic
References
9. Two-dimensional models
1. Introduction
2. Basic concepts for picture definition
3. Tiling recognition
4. Grammars
5. Comparison of language families
6. Conclusion
References
Part II. Complexity issues
10. Minimisation of automata
1. Introduction
2. Definitions and notation
3. Brzozowski's algorithm
4. Moore's algorithm
5. Hopcroft's algorithm
6. Slow automata
7. Minimisation by fusion
8. Dynamic minimisation
9. Extensions and special cases
References
11. Learning algorithms
1. Introduction
2. Preliminaries
3. Classical results
4. Learning from given data
5. Learning non-deterministic finite automata
6. Learning regular tree languages
7. PAC learning
8. Applications and further material
References
12. Descriptional complexity of regular languages
1. Introduction
2. Descriptional complexity and lower bound techniques
3. Transformation between models for regular languages
4. Operations on regular languages
5. Some recent developments
References
13. Enumerating regular expressions and their languages
1. Introduction and overview
2. On measuring the size of a regular expression
3. A simple grammar for valid regular expressions
4. Unambiguous context-free grammars and the Chomsky–Schützenberger theorem
5. Solving algebraic equations using Gröbner bases
6. Asymptotic bounds via singularity analysis
7. Lower bounds on enumeration of regular languages by regular expressions
8. Upper bounds on enumeration of regular languages by regular expressions
9. Exact enumerations
10. Conclusion and open problems
References
14. Circuit complexity of regular languages
1. Introduction
2. Circuits
3. Syntactic monoid
4. Regular expressions
5. Circuit complexity of regular languages
6. Circuit size of regular languages
7. Final remarks
References
15. Černý's conjecture and the road colouring problem
1. Synchronising automata, their origins and importance
2. Algorithmic and complexity issues
3. Around the Černý's conjecture
4. The road colouring problem
References
Part III. Algebraic and topological theory of automata
16. Varieties
1. Motivation and examples
2. Equations, identities and families of languages
3. Connections with logic
4. Operations on classes of languages
5. Varieties in other algebraic frameworks
References
17. Profinite topologies
1. Introduction
2. Profinite topologies for general algebras
3. The case of semigroups
4. Relatively free profinite semigroups
References
18. The factorisation forest theorem
1. Introduction
2. Some definitions
3. The factorisation forest theorem
4. Algebraic applications
5. Variants of the factorisation forest theorem
6. Applications as an accelerating structure
References
19. Wadge–Wagner hierarchies
1. The Wadge hierarchy
2. The Wagner hierarchy
References
20. Equational theories for automata
1. Introduction
2. Conway semirings
3. Automata in Conway semirings
4. Iteration semirings
5. Complete semirings
6. Continuous semirings
7. Completeness
8. Inductive *-semirings and Kleene algebras
9. Residuation
10. Some extensions
References
21. Language equations
1. Introduction
2. General properties of operations
3. Equations with one-sided concatenation
4. Resolved systems of equations
5. Equations with constant sides
6. Equations of the general form
7. Equations with erasing operations
References
22. Algebra for trees
1. Introduction
2. Trees as ground terms
3. A recipe for designing an algebra
4. Preclones
5. Forest algebra
6. Seminearring
7. Nesting algebras
8. Recent developments
References
Index
Handbook of Automata Theory: Volume
II
Dedicatory
Preface
Contents
List of contributors
Part IV. Automata in mathematics
23. Rational subsets of groups
1. Introduction
2. Finitely generated groups
3. Inverse automata and Stallings' construction
4. Rational and recognisable subsets
References
24. Groups defined by automata
1. Introduction
2. The geometry of the Cayley graph
3. Groups generated by automata
References
25. Automata in number theory
1. Introduction
2. Automatic sequences and automatic sets of integers
3. Prime numbers and finite automata
4. Expansions of algebraic numbers in integer bases
5. The Skolem–Mahler–Lech theorem in positive characteristic
6. The algebraic closure of F_p(t)
7. Update
References
26. On Cobham's theorem
1. Introduction
2. Numeration basis
3. Automatic sequences
4. Multidimensional extension and first-order logic
5. Numeration systems and substitutions
6. Cobham's theorem in various contexts
7. Decidability issues
References
27. Symbolic dynamics
1. Introduction
2. Shift spaces
3. Automata
4. Minimal automata
5. Symbolic conjugacy
6. Special families of automata
7. Syntactic invariants
References
28. Automatic structures
1. Introduction
2. Automatic Structures
3. The connection with MSOL
4. Operations on automatic structures
5. Proving a structure has no automatic presentation
6. Equivalent automatic presentations
7. Automatic-like structures
8. Outlook
References
29. Automata and finite model theory
1. Introduction
2. Definitions
3. Finite model theory of strings and trees
4. Automata and finite model theory of arbitrary structures
References
30. Finite automata, image manipulation, and automatic real functions
1. Introduction
2. Definitions and notation
3. Normal forms, minimality, and decidability
4. Examples
5. Image manipulation
6. A monster function
References
Part V. Selected applications
31. Communicating automata
1. Introduction
2. Communicating automata
3. Reachability problems
4. Specifications and model-checking
5. Realisability
References
32. Symbolic methods and automata
1. Introduction
2. Integer domain
3. Real domain
4. Conclusions and perspectives
References
33. Synthesis with finite automata
1. Introduction
2. Church problem
3. Control of discrete event systems
4. Distributed synthesis: synchronous architectures
5. Distributed synthesis: Zielonka automata
References
34. Timed automata
1. Introduction
2. Timed automata
3. The emptiness problem, why and how?
4. The region abstraction: a key for decidability
5. Applications of the region automaton construction
6. The language-theoretic perspective
7. Conclusion and current developments
References
35. Higher-order recursion schemes and their automata models
1. Introduction
2. Preliminaries
3. From CPDA to recursion schemes
4. From recursion schemes to collapsible pushdown automata
5. Safe higher-order recursion schemes
References
36. Analysis of probabilistic processes and automata theory
1. Introduction
2. Definitions and Background
3. Analysis of finite-state Markov chains
4. Analysis of finite-state MDPs
5. Adding recursion to MCs and MDPs
References
37. Natural language parsing
1. Introduction to natural language parsing
2. Preliminaries
3. Tabulation
4. LR recognition
5. Earley's algorithm
6. Cocke–Younger–Kasami algorithm
7. Bibliographic notes
References
38. Verification
1. Introduction
2. Linear-time logics
3. Applications
4. Branching-time logics
5. Applications
References
39. Automata and quantum computing
1. Introduction
2. Mathematical background
3. Preliminaries
4. One-way QFAs
5. Two-way QFAs
6. Other models and results
7. Concluding remarks
References
Index

Citation preview

Handbook of Automata Theory Volume I Theoretical Foundations Edited by Jean-Éric Pin

About the pagination of this eBook This eBook contains a multi-volume set. To navigate the front matter of this eBook by page number, you will need to use the volume number and the page number, separated by a hyphen. For example, to go to page v of volume 1, type “1-v” in the Go box at the bottom of the screen and click "Go." To go to page v of volume 2, type “2-v”… and so forth.

Handbook of Automata Theory Volume I Theoretical Foundations Edited by Jean-Éric Pin

Editor: Jean-Éric Pin Institut de Recherche en Informatique Fondamentale (IRIF) Université de Paris and CNRS Bâtiment Sophie Germain, Case courier 7014 8 Place Aurélie Nemours 75205 Paris Cedex 13 E-mail: [email protected] Volume I: 2020 Mathematics Subject Classification: 68Q45; 03B50, 03D05, 08A70, 08B20, 15A80, 16Y60, 20E18, 20M05, 20M07, 20M35, 28A05, 68Q32, 68Q42, 68Q70, 68R10, 68T05 Keywords: finite automata, Hopcroft’s algorithm, regular languages, regular expressions, finite transducers, weighted automata, automata on infinite words, automata on trees, picture l­anguages, algorithmic learning, descriptional complexity, Boolean circuits, synchronizing automata, road colouring problem, varieties of languages, profinite topology, descriptive set theory, equational theories, language equations, forest algebras

ISBN Vol. I ISBN Vol. II ISBN Set

978-3-98547-002-0 978-3-98547-003-7 978-3-98547-006-8 (set of both volumes)

Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. Published by EMS Press, an imprint of the European Mathematical Society – EMS – Publishing House GmbH Institut für Mathematik Technische Universität Berlin Straße des 17. Juni 136 10623 Berlin, Germany https://ems.press © 2021 EMS Press Cover drawing of Jacques de Vaucanson’s digesting duck (canard digérant) published in ­Scientific American Vol. 80 (3), 1899. Fractal tree on the first page by Nicolas Janey. Typeset using the authors’ LaTeX sources: Marco Zunino, Savona, Italy Printing and binding: Beltz Bad Langensalza GmbH, Bad Langensalza, Germany ♾ Printed on acid free paper 987654321

Dedicated to the memory of Professor Zoltán Ésik (1951–2016 )

Preface The Handbook of Automata Theory has its origins in the research programme AutoMathA (Automata: from Mathematics to Applications, 2005–2010), a multidisciplinary programme of the European Science Foundation at the crossroads of mathematics, theoretical computer science, and applications. It is designed to provide a broad audience of researchers and students in mathematics and computer science with a comprehensive overview of research in automata theory. Automata theory is one of the longest established areas in computer science. It was born over sixty years ago, with the seminal work of Kleene, who first formalised the early attempts of McCulloch and Pitts, and was originally motivated by the study of neural networks. For many years, its main applications have been computer design, compilation of programming languages, and pattern matching. But over the last twenty years, applications of automata theory have considerably diversified, and now include verification methods to cope with such emerging technical needs as network security, mobile intelligent devices, and high performance computing. At the same time, the mathematical foundations of automata theory rely on more and more advanced parts of mathematics. While only elementary graph theory and combinatorics were required in the early sixties, new tools from non-commutative algebra (semigroups, semirings and formal power series), logic, probability theory, and symbolic dynamics have been successively introduced, and the latest developments borrow ideas from topology and geometry. It was time to gather these mathematical advances and their numerous applications in a reference book. The Handbook of Automata Theory is intended to serve this purpose. It comprises thirty-nine chapters, presented in two volumes: Volume I: Theoretical foundations Volume II: Automata in mathematics and selected applications Together, the two volumes cover most of the topics related to automata. Volume I presents, in the first part, the basic models of the theory: finite automata working on finite words, infinite words, finite trees and infinite trees, transducers, weighted automata and max-plus automata, and two-dimensional models. In the second part, complexity and algorithmic issues are discussed extensively, including connections with circuit complexity and finite model theory. In the third part, the algebraic and topological aspects of automata theory are treated. Volume II first offers a wide range of connections between automata and mathematics, including group theory, number theory, symbolic dynamics, finite model theory, and fractal-type images. Secondly, selected applications are covered, including

viii

Preface

message-passing systems, symbolic methods, synthesis, timed automaton model, verification of higher-order programs, analysis of probabilistic processes, natural language processing, formal verification of programs, and quantum computing. Much of this material had never been published in a book before, making the Handbook a unique reference in automata theory. Due to the length of the Handbook, the chapters are divided into two volumes. For the convenience of the reader, the front matter and the index appear in both volumes (paginated with roman numerals). As this project started over ten years ago, some recent developments may not have been addressed. Nevertheless, the reader will be able to find updates and possible corrections on https://ems.press/isbn/978-3-98547-006-8

Acknowledgements. I would like to thank the European Science Foundation, and in particular the Standing Committee for Physical & Engineering Sciences (PESC), for funding the research programme AutoMathA within their Research Networking Programme (2005–2010). The Handbook would not have been possible without their moral and financial support. The programme AutoMathA brought together a research community of wide scope; its joint work and efforts have been vital for composing the present handbook. The AutoMathA project was initially launched by Jorge Almeida (Lisboa), Stefano Crespi Reghizzi (Milano) and myself. Let me also thank the other members of the AutoMathA Steering Committee: Jacques Duparc, Jozef Gruska, Juhani Karhumäki, Mikołaj Bojańczyk, Søren Eilers, Stuart W. Margolis, Tatiana Jajcayova, Véronique Bruyère, Werner Kuich, Wolfgang Thomas, and Zoltán Ésik. Sadly and unexpectedly, Zoltán Ésik passed away during the final stages of the Handbook project. We dedicate the Handbook to the memory of this great scientist and friend. The constant support of the AutoMathA Steering Committee during the preparation of this handbook was an invaluable help. Narad Rampersad’s assistance during the early stage of the Handbook was also particularly appreciated. All the authors are particularly indebted to Jeffrey Shallit. As one of the few native English-speaking authors of the book, Jeffrey has accepted the daunting task of reviewing all the chapters in their entirety. He not only detected a considerable number of English mistakes, but he also greatly improved the style and mathematical content of the chapters. I would therefore like to express my deepest thanks to him. The advisory board consisting of Søren Eilers (Copenhagen) and Wolfgang Thomas (Aachen) was instrumental in defining the early version of the Handbook project. I am particularly indebted to Wolfgang Thomas for his advice and constant encouragement and help during the long gestation period of this handbook. Of course, this handbook would not have been possible without the authors of the thirty-nine chapters. I would like to thank them all for their high quality scientific contribution and their perseverance during the chapter review process.

Preface

ix

For their patience and extreme care in the production of the Handbook, I would like to thank the typesetter Marco Zunino and all the people of EMS Press I have been working with: Apostolos Damialis, Sylvia Fellmann, Thomas Hintermann, Manfred Karbe, Vera Spillner, and Simon Winter. Special thanks to Nicolas Janey who kindly designed the fractal tree on the first page. Jean-Éric Pin Managing editor Paris, 2021

Contents VOLUME ONE Preface

vii

List of contributors

xvii Part I Foundations

Chapter 1

Finite automata

3

Jean-Éric Pin Chapter 2

Automata and rational expressions

39

Jacques Sakarovitch Chapter 3

Finite transducers and rational transductions

79

Tero Harju and Juhani Karhumäki Chapter 4

Weighted automata

113

Manfred Droste and Dietrich Kuske Chapter 5

Max-plus automata

151

Sylvain Lombardy and Jean Mairesse Chapter 6

!-Automata

189

Thomas Wilke (revised by Sven Schewe) Chapter 7

Automata on finite trees Christof Löding and Wolfgang Thomas

235

xii

Contents

Chapter 8

Automata on infinite trees

265

Christof Löding Chapter 9

Two-dimensional models

303

Stefano Crespi Reghizzi, Dora Giammarresi, and Violetta Lonati

Part II Complexity issues Chapter 10

Minimisation of automata

337

Jean Berstel, Luc Boasson, Olivier Carton, and Isabelle Fagnot Chapter 11

Learning algorithms

375

Henrik Björklund, Johanna Björklund, and Wim Martens Chapter 12

Descriptional complexity of regular languages

411

Hermann Gruber, Markus Holzer, and Martin Kutrib Chapter 13

Enumerating regular expressions and their languages

459

Hermann Gruber, Jonathan Lee, and Jeffrey Shallit Chapter 14

Circuit complexity of regular languages

493

Michal Koucký Chapter 15

Černý’s conjecture and the road colouring problem Jarkko Kari and Mikhail Volkov

525

Contents

xiii

Part III Algebraic and topological theory of automata Chapter 16

Varieties

569

Howard Straubing and Pascal Weil Chapter 17

Profinite topologies

615

Jorge Almeida and Alfredo Costa Chapter 18

The factorisation forest theorem

653

Thomas Colcombet Chapter 19

Wadge–Wagner hierarchies

695

Jacques Duparc Chapter 20

Equational theories for automata

729

Zoltán Ésik Chapter 21

Language equations

765

Michal Kunc and Alexander Okhotin Chapter 22

Algebra for trees

801

Mikołaj Bojańczyk

Index

xxiii

Contents

xiv

VOLUME TWO Preface

vii

List of contributors

xvii Part IV Automata in mathematics

Chapter 23

Rational subsets of groups

841

Laurent Bartholdi and Pedro V. Silva Chapter 24

Groups defined by automata

871

Laurent Bartholdi and Pedro V. Silva Chapter 25

Automata in number theory

913

Boris Adamczewski and Jason Bell Chapter 26

On Cobham’s theorem

947

Fabien Durand and Michel Rigo Chapter 27

Symbolic dynamics

987

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin Chapter 28

Automatic structures

1031

Sasha Rubin Chapter 29

Automata and finite model theory

1071

Wouter Gelade and Thomas Schwentick Chapter 30

Finite automata, image manipulation, and automatic real functions 1105 Juhani Karhumäki and Jarkko Kari

Contents

xv

Part V Selected applications Chapter 31

Communicating automata

1147

Dietrich Kuske and Anca Muscholl Chapter 32

Symbolic methods and automata

1189

Bernard Boigelot Chapter 33

Synthesis with finite automata

1217

Igor Walukiewicz Chapter 34

Timed automata

1261

Patricia Bouyer Chapter 35

Higher-order recursion schemes and their automata models

1295

Arnaud Carayol and Olivier Serre Chapter 36

Analysis of probabilistic processes and automata theory

1343

Kousha Etessami Chapter 37

Natural language parsing

1383

Mark-Jan Nederhof and Giorgio Satta Chapter 38

Verification

1415

Javier Esparza, Orna Kupferman, and Moshe Y. Vardi Chapter 39

Automata and quantum computing

1457

Andris Ambainis and Abuzer Yakaryılmaz

Index

xxiii

List of contributors

Boris Adamczewski (Chapter 25) CNRS, Université de Lyon Institut Camille Jordan 43 boulevard du 11 novembre 1918 69622 Villeurbanne Cedex France

Jason Bell (Chapter 25) Department of Pure Mathematics University of Waterloo Waterloo, ON N2L 3G1 Canada [email protected]

[email protected] Jorge Almeida (Chapter 17) CMUP, Departamento de Matemática Faculdade de Ciências Universidade do Porto Rua do Campo Alegre 687 4169-007 Porto Portugal

Jean Berstel (Chapters 10, 27) Laboratoire d’Informatique Gaspard-Monge Université Paris-Est Marne-la-Vallée 5, boulevard Descartes Champs-sur-Marne 77454 Marne-la-Vallée Cedex 2 France [email protected]

[email protected] Andris Ambainis (Chapter 39) University of Latvia Faculty of Computing Raina bulv. 19 Rīga 1586 Latvia [email protected] Laurent Bartholdi (Chapters 23, 24) Mathematisches Institut Georg-August Universität zu Göttingen Bunsenstraße 3–5 37073 Göttingen Germany [email protected] Marie-Pierre Béal (Chapter 27) Laboratoire d’Informatique Gaspard-Monge Université Paris-Est Marne-la-Vallée 5, boulevard Descartes Champs-sur-Marne 77454 Marne-la-Vallée Cedex 2 France [email protected]

Henrik Björklund (Chapter 11) Department of Computing Science Umeå University 90187 Umeå Sweden [email protected] Johanna Björklund (Chapter 11) Department of Computing Science Umeå University 90187 Umeå Sweden [email protected] Luc Boasson (Chapter 10) IRIF, Université de Paris et CNRS Bâtiment Sophie Germain Case courrier 7014 8 Place Aurélie Nemours 75205 Paris Cedex 13 France [email protected]

xviii Bernard Boigelot (Chapter 32) Institut Montefiore, B28 Université de Liège 10, Allée de la découverte 4000 Liège Belgium [email protected]

List of contributors Thomas Colcombet (Chapter 18) IRIF, Université de Paris et CNRS Bâtiment Sophie Germain Case courrier 7014 8 Place Aurélie Nemours 75205 Paris Cedex 13 France [email protected]

Mikołaj Bojańczyk (Chapter 22) MIMUW Banacha 2 02-097 Warszawa Poland [email protected] Patricia Bouyer (Chapter 34) Université Paris-Saclay CNRS, ENS Paris-Saclay Laboratoire Méthodes Formelles 91190 Gif-sur-Yvette France [email protected] Arnaud Carayol (Chapter 35) LIGM, Université Gustave Eiffel 5, boulevard Descartes Champs-sur-Marne 77454 Marne-la-Vallée Cedex 2 France Arnaud.Carayol@univ-eiffel.fr Olivier Carton (Chapter 10) IRIF, Université de Paris et CNRS Bâtiment Sophie Germain Case courrier 7014 8 Place Aurélie Nemours 75205 Paris Cedex 13 France [email protected]

Alfredo Costa (Chapter 17) CMUC, Department of Mathematics University of Coimbra Apartado 3008 EC Santa Cruz 3001-501 Coimbra Portugal [email protected] Manfred Droste (Chapter 4) Institut für Informatik Universität Leipzig Augustusplatz 10-11 04109 Leipzig Germany [email protected] Jacques Duparc (Chapter 19) Department of Operations Faculty of Business and Economics University of Lausanne 1015 Lausanne Switzerland [email protected] Fabien Durand (Chapter 26) Université de Picardie Jules Verne CNRS UMR 6140 33 rue Saint Leu 80039 Amiens Cedex 1 France [email protected]

Stefano Crespi Reghizzi (Chapter 9) Dipartimento di Elettronica Informazione e Bioingegneria Politecnico di Milano Piazza Leonardo da Vinci 32 20133 Milano Italy

Søren Eilers (Chapter 27) Institut for Matematiske Fag Københavns Universitet Universitetsparken 5 2100 København Ø Denmark

[email protected]

[email protected]

List of contributors Zoltán Ésik (Chapter 20) Javier Esparza (Chapter 38) Institut für Informatik Technische Universität München Boltzmannstraße 3 85748 Garching bei München Germany [email protected] Kousha Etessami (Chapter 36) School of Informatics University of Edinburgh 10 Crichton Street Edinburgh EH8 9AB United Kingdom [email protected] Isabelle Fagnot (Chapter 10) Laboratoire d’Informatique Gaspard-Monge Université Paris-Est Marne-la-Vallée 5, boulevard Descartes Champs-sur-Marne 77454 Marne-la-Vallée Cedex 2 France Isabelle.Fagnot@univ-eiffel.fr

xix

Tero Harju (Chapter 3) Department of Mathematics and Statistics University of Turku FI-20014 Turku Finland harju@utu.fi Markus Holzer (Chapter 12) Institut für Informatik Universität Giessen Arndtstraße 2 35392 Giessen Germany [email protected] Juhani Karhumäki (Chapters 3, 30) Department of Mathematics and Statistics University of Turku FI-20014 Turku Finland karhumak@utu.fi Jarkko Kari (Chapters 15, 30) Department of Mathematics and Statistics University of Turku FI-20014 Turku Finland jkari@utu.fi

Wouter Gelade (Chapter 29) Centre of Research in the Economics of Development (CRED) University of Namur Rempart de la Vierge, 8 5000 Namur Belgium

Michal Koucký (Chapter 14) Computer Science Institute of Charles University Malostranské nám 25 118 00 Praha 1 Czech Republic

[email protected]

[email protected]ff.cuni.cz

Dora Giammarresi (Chapter 9) Dipartimento di Matematica Università di Roma “Tor Vergata” via della Ricerca Scientifica 1 00133 Roma Italy

Michal Kunc (Chapter 21) Department of Mathematics and Statistics Masaryk University Kotlářská 2 611 37 Brno Czech Republic

[email protected] Hermann Gruber (Chapters 12, 13) Knowledgepark GmbH Leonrodstr. 68 80636 München Germany [email protected]

[email protected] Orna Kupferman (Chapter 38) School of Computer Science and Engineering Hebrew University Jerusalem 91904 Israel [email protected]

xx Dietrich Kuske (Chapters 4, 31) Institut für Theoretische Informatik Fakultät Informatik und Automatisierung Technische Universtität Ilmenau Postfach 100565 98693 Ilmenau Germany [email protected] Martin Kutrib (Chapter 12) Institut für Informatik Universität Giessen Arndtstraße 2 35392 Giessen Germany [email protected]

List of contributors Jean Mairesse (Chapter 5) LIP6 – Laboratoire d’Informatique de Paris 6 UMR 7606, CNRS Université Pierre et Marie Curie Boîte courrier 169 Tour 26, Couloir 26-00, 2è étage 4 place Jussieu 75252 Paris Cedex 05 France [email protected] Wim Martens (Chapter 11) Institut für Informatik Universität Bayreuth 95440 Bayreuth Germany [email protected]

Jonathan Lee (Chapter 13) Department of Mathematics Stanford University Building 380, Sloan Hall Stanford, CA 94305 USA Christof Löding (Chapters 7, 8) Lehrstuhl Informatik 7 RWTH Aachen 52056 Aachen Germany [email protected] Sylvain Lombardy (Chapter 5) LaBRI, Université de Bordeaux et CNRS Institut Polytechnique de Bordeaux 351 cours de la Libération 33405 Talence Cedex France [email protected] Violetta Lonati (Chapter 9) Dipartimento di Informatica Università degli Studi di Milano via Celoria 18 20100 Milano Italy [email protected]

Anca Muscholl (Chapter 31) LaBRI, Université de Bordeaux et CNRS 351 cours de la Libération 33405 Talence Cedex France [email protected] Mark-Jan Nederhof (Chapter 37) School of Computer Science University of St Andrews North Haugh St Andrews KY16 9SX United Kingdom [email protected] Alexander Okhotin (Chapter 21) Department of Mathematics and Computer Science St. Petersburg State University 14th Line V.O., 29 199178 Saint Petersburg Russian [email protected] Dominique Perrin (Chapter 27) Laboratoire d’Informatique Gaspard-Monge Université Paris-Est Marne-la-Vallée 5, boulevard Descartes Champs-sur-Marne 77454 Marne-la-Vallée Cedex 2 France [email protected]

List of contributors Jean-Éric Pin (Chapter 1) IRIF, Université de Paris et CNRS Bâtiment Sophie Germain Case courrier 7014 8 Place Aurélie Nemours 75205 Paris Cedex 13 France

Thomas Schwentick (Chapter 29) Fakultät für Informatik Technische Universität Dortmund Otto-Hahn-Straße 12 44227 Dortmund Germany [email protected]

[email protected] Michel Rigo (Chapter 26) Université de Liège Institut de Mathématiques 12 Allée de la découverte (B37) 4000 Liège Belgium

Olivier Serre (Chapter 25) IRIF, Université de Paris et CNRS Bâtiment Sophie Germain Case courrier 7014 8 Place Aurélie Nemours 75205 Paris Cedex 13 France

[email protected]

[email protected]

Sasha Rubin (Chapter 28) School of Computer Science The University of Sydney Building J12/1, Cleveland St. Camperdown NSW 2006 Australia

Jeffrey Shallit (Chapter 13) School of Computer Science University of Waterloo Waterloo, ON N2L 3G1 Canada [email protected]

[email protected] Jacques Sakarovitch (Chapter 2) IRIF, Université de Paris et CNRS Bâtiment Sophie Germain Case courrier 7014 8 Place Aurélie Nemours 75205 Paris Cedex 13 France

Pedro V. Silva (Chapter 23, 24) Centro de Matemática Faculdade de Ciências Universidade do Porto R. Campo Alegre 687 4169-007 Porto Portugal [email protected]

[email protected] Giorgio Satta (Chapter 37) Department of Information Engineering University of Padua via Gradenigo 6/A 35131 Padova Italy [email protected] Sven Schewe (Chapter 6) Department of Computer Science University of Liverpool Ashton Building Ashton Street Liverpool L69 3BX United Kingdom [email protected]

Howard Straubing (Chapter 16) Computer Science Department Boston College Chestnut Hill, MA 02467 USA [email protected] Wolfgang Thomas (Chapter 7) Lehrstuhl Informatik 7 RWTH Aachen 52056 Aachen Germany [email protected]

xxi

xxii

List of contributors

Moshe Y. Vardi (Chapter 38) Department of Computer Science Mail Stop 132 Rice University 6100 S. Main Street Houston, TX 77005-1892 USA

Thomas Wilke (Chapter 6) Department of Computer Science Christian-Albrechts-Universität zu Kiel 24098 Kiel Germany

[email protected]

Abuzer Yakaryılmaz (Chapter 39) Faculty of Computing University of Latvia Raina bulv. 19 Rīga 1586 Latvia

Mikhail Volkov (Chapter 15) Institute of Natural Sciences and Mathematics 620000 Ural Federal University Ekaterinburg Russia [email protected] Igor Walukiewicz (Chapter 33) LaBRI, Université de Bordeaux et CNRS 351 cours de la Libération 33405 Talence Cedex France [email protected] Pascal Weil (Chapter 16) LaBRI, Université de Bordeaux et CNRS 351 cours de la Libération 33405 Talence Cedex France ReLaX, CNRS IRL 2000 and Chennai Mathematical Institute SIPCOT IT Park 603103 Siruseri, Chennai India [email protected]

[email protected]

[email protected]

Part I

Foundations

Chapter 1

Finite automata Jean-Éric Pin

Contents 1. 2. 3. 4. 5. 6.

Basic algebraic structures . . . . . . . . Words, languages and automata . . . . . Operations on recognisable languages . . Minimal automaton and syntactic monoid Rational versus recognisable . . . . . . Algebraic approach . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

3 4 10 17 24 35

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38

1. Basic algebraic structures A semigroup is a pair consisting of a set S and an associative binary operation on S , usually denoted multiplicatively. In this case, xy is called the product of x and y . Sometimes, the additive notation is preferred, and x C y denotes the sum of x and y . A monoid is a triple consisting of a set M , an associative binary operation on M and an identity for this operation. This identity is denoted by 1 in the multiplicative notation and by 0 in the additive notation. A semiring consists of a set k , two binary operations on k , denoted additively and multiplicatively, and two elements 0 and 1, satisfying the following conditions: 1. k is a commutative monoid for addition with identity 0; 2. k is a monoid for multiplication with identity 1; 3. multiplication is distributive over addition: s.t1 C t2 / D st1 C st2

for all s; t1 ; t2 2 k ; 4. 0s D s0 D 0 for all s 2 k .

and .t1 C t2 /s D t1 s C t2 s;

A ring is a semiring in which the monoid .k; C; 0/ is a group. A semiring is commutative if its multiplication is commutative. The simplest example of a semiring which is not a ring is the Boolean semiring B D ¹0; 1º whose operations are defined by the following tables: C 0 1

0

1

0 1

1 1

 0 1

0

1

0 0

0 1

4

Jean-Éric Pin

2. Words, languages and automata 2.1. Words and languages. Let A be a set called an alphabet, whose elements are called letters. A finite sequence of elements of A is called a finite word on A, or just a word. We denote by mere juxtaposition a1    an

the sequence .a1 ; : : : ; an /. The set of words is endowed with the operation of concatenation product also called product, which associates the word xy D a1    ap b1    bq with two words x D a1    ap and y D b1    bq . This operation is associative. It has an identity, the empty word, denoted by 1 or ", which is the empty sequence. We let A denote the set of words on A and AC the set of nonempty words. The set A [AC ], equipped with the concatenation product is thus a monoid with identity 1 (a semigroup). The set A is called the free monoid on A and AC the free semigroup on A. Let u D a1    an be a word in A and let a be a letter of A. A nonnegative integer i is said to be an occurrence of the letter a in u if ai D a. We denote by juja the number of occurrences of a in u. Thus, if A D ¹a; bº and u D abaab , one has juja D 3 and jujb D 2. The sum X juj D juja a2A

is the length of the word u. Thus jabaabj D 5.

A language is a set of words. The empty language is denoted by 0 and each singleton ¹uº is simply denoted u. Several operations can be defined on languages. 1. Boolean operations, which comprise union (which we often denote by C), intersection and complement (denoted by L ! Lc ). 2. Quotients. Given a language L and a word u of A , u 1 L D ¹v j uv 2 Lº and Lu 1 D ¹v j vu 2 Lº. 3. Star and Plus. If L is a language, L ŒLC  is the submonoid (subsemigroup) of A generated by L. Thus L D ¹u1 u2    un j n > 0; u1 ; : : : ; un 2 Lº and L D LC C 1. 4. Product. The product of two languages L1 and L2 is the language L1 L2 D ¹u1 u2 j u1 2 L1 ; u2 2 L2 º. 5. Morphisms. Let A and B be two alphabets, and let ' be a function from A into B  . Then ' extends in a unique way to a monoid morphism from A into B  . If L is a language of A , then '.L/ D ¹'.u/ j u 2 Lº is a language of B  . 6. Inverses of morphisms. If 'W A ! B  is a monoid morphism and L is a language of B  , then ' 1 .L/ D ¹u 2 A j '.u/ 2 Lº is a language of A .

The set of rational (or regular) languages on A , denoted by Rat.A /, form the smallest set of languages containing the languages 0, 1 and a for each letter a 2 A, and closed under finite union, product and star. That is, if L and L0 are rational languages, then the languages L C L0 , LL0 , and L are also rational.

1. Finite automata

5

For instance, if A D ¹a; bº, the language .a C ab C ba/ is a rational language. The set A uA of all words containing a given factor u is rational. The set of words of odd length is rational and can be written as .A2 / A. We conclude this section by a standard result: rational languages are closed under morphisms. An extension of this result will be given in Proposition 6.1. Proposition 2.1. Let 'W A ! B  be a morphism. If L is a rational language of A , then '.L/ is a rational language of B  . 2.2. Finite automata and recognisable languages. A finite automaton is a 5-tuple A D .Q; A; E; I; F /, where Q is a finite set called the set of states, A is an alphabet, E is a subset of Q  A  Q, called the set of transitions, and I and F are subsets of Q, called respectively the set of initial states and the set of final states. It is convenient to represent an automaton by a labelled graph whose vertices are the states of the automaton and the edges represent the transitions. The initial [final] states are pictured by incoming [outgoing] arrows. Example 2.1. Let A D .Q; A; E; I; F /, where Q D ¹1; 2º, I D ¹1; 2º, F D ¹2º, A D ¹a; bº, and E D ¹.1; a; 1/; .2; b; 1/; .1; a; 2/; .2; b; 2/º. This automaton is represented in Figure 1.

a a

1

2

b

b

Figure 1. An automaton

Two transitions .p; a; q/ and .p 0 ; a0 ; q 0 / are consecutive if q D p 0 . A path in the automaton A is a finite sequence of consecutive transitions c D .q0 ; a1 ; q1 /;

.q1 ; a2 ; q2 /;

:::;

.qn

1 ; an ; qn /;

also denoted by a1

cW q0 ! q1 !    ! qn

an 1

! qn

or

q0

a1 an

! qn :

The state q0 is its origin, the state qn its end, the word a1    an is its label and the integer n is its length. Is is also convenient to consider that for each state q 2 Q, there is an 1

empty path q ! q from q to q labelled by the empty word. A path in A is called initial if its origin is an initial state and final if its end is a final state. It is successful (or accepting) if it is initial and final. A state q is accessible if there is an initial path ending in q and it is coaccessible if there is a final path starting in q .

Jean-Éric Pin

6

Example 2.2. Consider the automaton represented in Figure 1. The path a

a

b

b

a

b

cW 1 ! 1 ! 2 ! 2 ! 1 ! 2 ! 2

is successful, since its end is a final state. However the path a

a

b

b

a

b

cW 1 ! 1 ! 2 ! 2 ! 1 ! 2 ! 1

has the same label, but is not successful, since its end is 1, a nonfinal state. A word is by the automaton A if it is the label of at least one successful path (beware that it can be simultaneously the label of a nonsuccessful path). The language recognised (or accepted) by the automaton A is the set, denoted by L.A/, of all the words accepted by A. Two automata are equivalent if they recognise the same language. A language L  A is recognisable if it is recognised by a finite automaton, that is, if there is a finite automaton A such that L D L.A/. Example 2.3. Consider the automaton represented in Figure 2. a a

1

2

b

b

Figure 2. The automaton A

We let the reader verify that the language accepted by A is aA , the set of all words whose first letter is a. Example 2.3 is elementary but it already raises some difficulties. In general, deciding whether a given word is accepted or not might be laborious, since a word might be the label of several paths. The notion of deterministic automaton introduced in § 2.3 permits one to avoid these problems. A standard property of recognisable languages is known as the pumping lemma. Although it is formally true for any recognisable language, it is only interesting for the infinite ones. Proposition 2.2 (pumping lemma). Let L be a recognisable language. Then there is an integer n > 0 such that every word u of L of length greater than or equal to n can be factorised as u D xyz with x; y; z 2 A , jxyj 6 n, y 6D 1 and, for all k > 0, xy k z 2 L.

Proof. Let A D .Q; A; E; I; F / be an n-state automaton recognising L and let u D a1 ar a1    ar be a word of L of length r > n. Let q0 ! q1    qr 1 ! qr be a successful path labelled by u. As r > n, there are two integers i and j , with i < j 6 n, such that qi D qj . Therefore, the word ai C1    aj is the label of a loop around qi , represented in Figure 3.

1. Finite automata

7

ai C2 qi C1 ai C1

aj q0

q1

a1

:::

ai

qi

aj C1

qj C1

:::

qr

ar

Figure 3. Illustration of the pumping lemma

Let x D a1    ai , y D ai C1    aj and z D aj C1    ar . Then jxyj 6 n and for all k > 0, one gets xy k z 2 L, since the word xy k z is the label of a successful path.

The pumping lemma permits one to show that a language like ¹an b n j n > 0º is not recognisable. However, it does not characterise the recognisable languages. For instance, if A D ¹a; b; cº, the nonrecognisable language ¹.ab/n c n j n > 0º [ A bbA [ A aaA

satisfies the pumping lemma.

2.3. Deterministic automata. An automaton A D .Q; A; E; I; F / is deterministic if I contains exactly one initial state and if, for every state q 2 Q and for every letter a 2 A, a there exists at most one state q 0 such that q ! q 0 is a transition of E . If q is the unique initial state, we adopt the notation .Q; A; E; q ; F / instead of .Q; A; E; ¹q º; F /. Example 2.4. The automaton represented in Figure 4 is deterministic. b b 1

a

2

b

3

b

4

a

5

a

Figure 4. A deterministic automaton

The following result is one of the cornerstones of automata theory. Its proof is based on the so-called subset construction. Proposition 2.3. Every finite automaton is equivalent to a deterministic one. Proof. Let A D .Q; A; E; I; F / be an automaton. Consider the deterministic automaton D.A/ D .P.Q/; A;  ; I; F/ where F D ¹P  Q j P \ F 6D ;º and, for each subset P of Q and for each letter a 2 A, P  a D ¹q 2 Q j there exists p 2 P such that .p; a; q/ 2 Eº:

We claim that D.A/ is equivalent to A.

Jean-Éric Pin

8

If u D a1    an is accepted by A, there is a successful path a1

cW q0 ! q1 !    ! qn

an

1

The word u also defines a path

a1

I D P0 ! P1 !    ! Pn

! qn :

an 1

(1)

! Pn

in D.A/. Let us show by induction on i that, for 0 6 i 6 n, qi 2 Pi . Since c is a successful path, one has q0 2 I D P0 . Suppose that qi 1 2 Pi 1 . Then since ai qi 1 ! qi is a transition, one gets qi 2 Pi 1  ai D Pi . For i D n, we get qn 2 Pn and since c is a successful path, qn 2 F . It follows that Pn meets F and hence Pn 2 F. Therefore u is accepted by D.A/. Conversely, let u D a1    an be a word accepted by D.A/ and let (1) be the successful path defined by u. Since Pn is a final state, one can choose an element qn in Pn \ F . We can now select, for i D n; n 1; : : : ; 1, an element qi 1 of Pi 1 ai such that qi 1 ! qi is a transition of A. Since q0 2 I and qn 2 F , the path a1 an q0 ! q1 !    ! qn 1 ! qn is successful, and thus u is accepted by A. This proves the claim and the proposition. The subset construction converts a nondeterministic n-state automaton into a deterministic automaton with at most 2n states. One can show that this bound is tight. 2.4. Complete, accessible, coaccessible, and trim automata. An automaton A D .Q; A;  ; q ; F / is complete if, for each state q 2 Q and for each letter a 2 A, there is a at least one state q 0 such that q ! q 0 is a transition. Example 2.5. The automaton represented in Figure 5 is neither complete, nor deterministic. It is not deterministic, since the transitions .1; a; 1/ and .1; a; 2/ have the same label and the same origin. It is not complete, since there is no transition of the form a 2 ! q. a a

1

2

b

b

Figure 5. An incomplete, nondeterministic automaton

On the other hand, the automaton represented in Figure 6 is complete and deterministic. A finite automaton is accessible if all its states are accessible. Similarly, it is coaccessible if all its states are coaccessible. Finally, an automaton is trim if it is simultaneously accessible and coaccessible. It is not difficult to see that every deterministic automaton is equivalent to a trim one.

1. Finite automata

9

b a

1

2

b

a

Figure 6. A complete and deterministic automaton

Example 2.6. Let A D ¹a; bº. Starting from the nondeterministic automaton A represented in Figure 7, we get the deterministic automaton D.A/ drawn in Figure 8. In practice, it suffices to compute the accessible states of D.A/, which gives the deterministic automaton shown in Figure 9. a; b

a

1

a; b

2

3

Figure 7. A nondeterministic automaton

a; b

b

2

12 a

a; b

;

a a

1 b

a; b 3

b

23

a; b

123

b

13

Figure 8. After determinisation . . .

12

b a

a a

1

123

b

b

b 13

Figure 9. . . . and trimming

a

a

Jean-Éric Pin

10

2.5. Standard automata. The construction described in this section might look somewhat artificial, but it will be used in the study of the product and of the star operation. A deterministic automaton is standard if there is no transition ending in the initial state. Proposition 2.4. Every deterministic automaton is equivalent to a deterministic standard automaton. Proof. Let A D .Q; A; E; q ; F / be a deterministic automaton. If A is not standard, let p be a new state and A0 D .Q [ ¹pº; A; E 0 ; p; F 0 / be the standard automaton defined by E 0 D E [ ¹.p; a; q/ j .i; a; q/ 2 Eº and ´ F if i … F ; 0 F D F [ ¹pº if i 2 F : a0

a1

an

1

Then the path q ! q1 ! q2 !    ! qn 1 ! qn is successful in A if and only a0 a1 an 1 if the path p ! q1 ! q2 !    ! qn 1 ! qn is successful in A0 . Consequently, A and A0 are equivalent. Example 2.7. Standardisation is illustrated in Figure 10. b a

1

2

b

2

b

a

b a

1 a b

a 0

Figure 10. An automaton and its standardised version

3. Operations on recognisable languages We review in this section some classical results on finite automata. We give explicit constructions for the following operations: Boolean operations, product, star, quotients and inverses of morphisms.

1. Finite automata

11

3.1. Boolean operations. We give in this section the well known constructions for union, intersection and complement. Complementation is trivial, but requires a deterministic automaton. Proposition 3.1. The union of two recognisable languages is recognisable. Proof. Let L [L0 ] be a recognisable language of A recognised by the automaton A D .Q; A; E; I; F / [A0 D .Q0 ; A; E 0 ; I 0 ; F 0 /]. We suppose that both Q and Q0 are disjoint sets and thus one can identify E and E 0 with subsets of .Q [Q0 /A.Q [Q0 /. Then L C L0 is recognised by the automaton .Q [ Q0 ; A; E [ E 0 ; I [ I 0 ; F [ F 0 /.

Example 3.1. If L [L0 ] is recognised by the automaton A [A0 ] represented in Figure 11, then L C L0 is recognised by the automaton represented in Figure 12.

a

A

a

1

a

1

b A0

2

2

a

b

b 3

Figure 11. The automata A and A0

a

1

a

a

4

b

a

2

5

b

b 6

Figure 12. An automaton recognising L C L0

Corollary 3.2. Every finite language is recognisable. Proof. Since recognisable languages are closed under union, it suffices to verify that the singletons are recognisable. But it is clear that the language a1 a2    an is recognised by the automaton represented in Figure 13. 0

a1

1

a2

2

a3

:::

Figure 13. An automaton recognising a1    an

an

n

Jean-Éric Pin

12

Proposition 3.3. The intersection of two recognisable languages is recognisable. Proof. Let L [L0 ] be a recognisable language of A recognised by the automaton A D .Q; A; E; I; F / [A0 D .Q0 ; A; E 0 ; I 0 ; F 0 /]. Consider the automaton B D .Q  Q0 ; A; T; I  I 0 ; F  F 0 /, where T D ¹..q1 ; q10 /; a; .q2 ; q20 // j .q1 ; a; q2 / 2 E and .q10 ; a; q20 / 2 E 0 º:

A word u D a1 a2    an is the label of a successful path in B a1

a2

.q0 ; q00 / ! .q1 ; q10 / ! .q2 ; q20 / !    ! .qn

an 0 1 ; qn 1 / !

.qn ; qn0 /

if and only if the paths a1

a2

q0 ! q1 ! q2 !    ! qn

an 1

! qn

and a1

a2

q00 ! q10 ! q20 !    ! qn0

an 1

! qn

are successful paths in A and A0 respectively. Therefore, B recognises L \ L0 . In practice, one just computes the trim part of B. Example 3.2. If L [L0 ] is recognised by the automaton A [A0 ] represented in Figure 11, then L \ L0 is recognised by the trim automaton represented in Figure 14.

b

1; 1

a

2; 2

b

3; 3

Figure 14. A trim automaton recognising L \ L0

Proposition 3.4. The complement of a recognisable language is recognisable. Proof. Let L be a recognisable language of A and let A D .Q; A;  ; q ; F / be a complete deterministic automaton recognising L. Then the automaton A0 D .Q; A;  ; q ; Q n F / recognises Lc . Indeed, since A and A0 are both deterministic and complete, every word u of A is the label of exactly one path starting in q . Let q be the end of this path. Then u belongs to L if and only if q belongs to F and u belongs to Lc if and only if q belongs to Q n F .

1. Finite automata

13

Example 3.3. The language .ab/ is recognised by the complete deterministic automaton A, and its complement is recognised by the automaton A0 represented in Figure 15. a 1

a 2

1

b

b a

b

2

a

b

0

0 A0

A a; b

a; b

Figure 15. Complementation of a deterministic automaton

3.2. Product Proposition 3.5. The product of two recognisable languages is recognisable. Proof. Let L1 and L2 be two recognisable languages of A , recognised by the automata A1 D .Q1 ; A; E1 ; I1 ; F1 / and A2 D .Q2 ; A; E2 ; I2 ; F2 /, respectively. One may assume, by Propositions 2.3 and 2.4, that A2 is a standard deterministic automaton and thus in particular that I2 D ¹i º. One can also suppose that Q1 and Q2 are disjoint. Let now A D .Q; A; E; I; F /, where Q D .Q1 [ Q2 / n ¹i º;

E D E1 [ ¹.q; a; q 0 / 2 E2 j q 6D i º [ ¹.q1 ; a; q 0 / j q1 2 F1 and .i; a; q 0 / 2 E2 º;

and

I D I1 ; ´ F2 F D F1 [ .F2 n ¹i º/

if i … F2 ; if i 2 F2 (i.e., if 1 2 L2 ):

We claim that A recognises L1 L2 . If u is a word of L1 L2 , then u D u1 u2 for some u1 u1 2 L1 and u2 2 L2 . Therefore, there is a successful path c1 W i1 ! q1 in A1 (with u2 i1 2 I1 and q1 2 F1 ) and a successful path c2 W i ! q2 in A2 , with q2 2 F2 . If u2 D 1, then L2 contains the empty word, the path c1 is a successful path in A and u is accepted a by A. If u2 is not the empty word, let a be the first letter of u2 and let i ! q be the first a transition of c2 . Since q1 2 F1 , q1 ! q is by definition a transition of E . Furthermore, b

if q 0 ! q 00 is a transition of c2 different from the first transition, then q 0 is the end of a transition of A2 . Since A2 is standard, this implies q 0 6D i and it follows from the b

definition of E that the transition q 0 ! q 00 is also a transition of A. Let c20 be the path

Jean-Éric Pin

14

a

a

of A obtained by replacing in c2 the first transition i ! q by q1 ! q . The resulting path c1 c20 is a successful path in A of label u and hence u is accepted by A. Conversely, let u be a word accepted by A. Then u is the label of a successful u path cW i1 ! f of A. Since the initial states of A are contained in Q1 , and since there is no transition of A starting in Q2 and ending in Q1 , c visits first some states of Q1 and then possibly some states of Q2 . If all the states visited by c are in Q1 , one has in particular f 2 Q1 . But this is only possible if 1 2 L2 , and in this case, c is also a successful path of A1 , and hence u 2 L1  L1 L2 . If c visits some states of Q2 , then c contains a unique transition of the form e D .q1 ; a; q2 / with q1 2 F1 and q2 2 Q2 . Therefore c D c1 ec2 , where c1 is a path in A1 and c2 is a path in A2 . Denoting by u1 [u2 ] the label of c1 [c2 ], we get u D u1 au2 . Since c1 is a successful path in A1 , one has u1 2 L1 . Moreover, by definition of E , e 0 D .i; a; q2 / is a transition of A2 . Therefore the path e 0 c2 is a successful path in A2 of label au2 . It follows that au2 2 L2 and thus u 2 L1 L2 , proving the claim and the proposition. Example 3.4. If L1 [L2 ] is recognised by the automaton A1 [A2 ] represented in Figure 16, then L1 L2 is recognised by the automaton represented in Figure 17. b

a

1

2 a

a

b 3

A1

a

1

2

3 b

A2

Figure 16. The automata A1 and A2

b a a

1

a

2

4

5 b

a

b

a

3

Figure 17. An automaton recognising L1 L2

1. Finite automata

15

3.3. Star Proposition 3.6. The star of a recognisable language is recognisable. Proof. Let L be a recognisable language of A , recognised by the deterministic standard automaton A D .Q; A; E; q ; F /. Let A0 D .Q; A; E 0 ; ¹q º; F [ ¹q º/ be the nondeterministic automaton defined by E 0 D E [ ¹.q; a; q 0 / j q 2 F and .q ; a; q 0 / 2 Eº:

Let us show that A0 recognises L . If u is a word of L , then either u is the empty word, which is accepted by A0 since q is a final state, or u D u1 u2    un with ui u1 ; : : : ; un 2 L n 1. Each ui is the label of a successful path in A, say ci W q ! qi with ai qi 2 F . Let ai be the first letter of ui and let q ! pi be the first transition of ci . Let ai i 2 ¹2; : : : ; nº. As qi 1 2 F , the definition of E 0 shows that qi 1 ! pi is a transition ai of A0 . Denote by ci0 the path obtained by replacing in ci the first transition q ! pi ui

ai

by qi 1 ! pi . This defines, for 2 6 i 6 n, a path ci0 W qi 1 ! qi in A0 . Therefore, the path c1 c20    cn0 is a successful path of label u in A0 and hence u is accepted by A0 . Conversely, let u be a word accepted by A0 . If u D 1, one has u 2 L . Otherwise, u is the label of a nonempty successful path c of A0 . This path can be factorised as cDq

u0

a1

u1

a2

an

un

! q1 ! q10 ! q2 ! q20 !    ! qn ! qn0 ! qnC1 a1

a2

un

0 where the transitions e1 D q1 ! q10 , e2 D q2 ! q20 , . . . , en D qn ! qnC1 are 0 0 exactly the transitions of E n E occurring in c . Thus by definition of E , one gets, for 1 6 i 6 n, qi 2 F and ei0 D .q ; ai ; qi0 / 2 E . Furthermore, qnC1 2 F [ ¹q º since c is a successful path. Consequently, the paths

q

ai

ui

! qi0 ! qi C1

are paths of A. For 1 6 i 6 n 1, these paths are successful, since qi 2 F . Moreover, since A is standard, qnC1 is different from q and hence qnC1 2 F . Consequently u0 ai ui 2 L for 1 6 i 6 n. Since q ! q1 is also a successful path of A, one also has u0 2 L, and hence u 2 L .

Example 3.5. If L is recognised by the standard deterministic automaton A2 represented in Figure 16, then L is recognised by the nondeterministic automaton represented in Figure 18. a a 1

a

2

3 a; b

Figure 18. An automaton recognising L

Jean-Éric Pin

16

3.4. Quotients. We first treat the left quotient by a word and then the general case. Proposition 3.7. Let A D .Q; A;  ; q ; F / be a deterministic automaton recognising a language L of A . Then, for each word u of A , the language u 1 L is recognised by the automaton Au D .Q; A;  ; q  u; F /, obtained from A by changing the initial state. In particular u 1 L is recognisable. Proof. First the following formulas hold: u

1

L D ¹v 2 A j uv 2 Lº D ¹v 2 A j q  .uv/ 2 F º D ¹v 2 A j .q  u/ v 2 F º:

Therefore u 1 L is accepted by Au . Proposition 3.8. Any quotient of a recognisable language is recognisable. Proof. Let .Q; A; E; I; F / be an automaton recognising a language L of A and let K be a language of A . We do not assume that K is recognisable. Setting I 0 D ¹q 2 Q j q is the end of an initial path whose label belongs to Kº;

it is easy to see that the automaton B D .Q; A; E; I 0 ; F / recognises K 1 L. For the language LK 1 , a similar proof works by considering the automaton .Q; A; E; I; F 0 /, where F 0 D ¹q 2 Q j q is the origin of a final path whose label belongs to Kº: 3.5. Inverses of morphisms. We now show that recognisable languages are closed under inverses of morphisms. Proposition 3.9. Let 'W A ! B  be a morphism. If L is a recognisable language of B  , then ' 1 .L/ is a recognisable language of A . Proof. Let B D .Q; B; E; I; F / be an automaton recognising L and let A D .Q; A; T; I; F /, where T D ¹.p; a; q/ j there is a path labelled by '.a/ from p to q in Bº:

We claim that A recognises ' 1 .L/. First, if u is accepted by A, there is a successful path of A labelled by u. Consequently, there is a successful path of B labelled by '.u/. Thus '.u/ is accepted by B and u 2 ' 1 .L/. Let now u D a1    an be a word of ' 1 .L/. Since the word '.u/ is accepted by L, there is a successful path in B labelled by '.u/. Let us factorise this path as q0

'.a1 /

q1 !    ! qn

'.an / 1

qn :

These paths define in turn a successful path in A labelled by u: a1

q0 ! q1 !    ! eqn

which shows that u is accepted by A.

an 1

! qn ;

1. Finite automata

17

4. Minimal automaton and syntactic monoid 4.1. Minimal automaton. Let L be a language of A . The Nerode automaton of L is the deterministic automaton A.L/ D .Q; A;  ; L; F / where Q D ¹u 1 L j u 2 A º, F D ¹u 1 L j u 2 Lº and the transition function is defined, for each a 2 A, by the formula .u 1 L/ a D a 1 .u 1 L/ D .ua/ 1 L: Beware of this rather abstract definition. Each state of A.L/ is a left quotient of L by a word, and hence is a language of A . The initial state is the language L, and the set of final states is the set of all left quotients of L by a word of L.

Proposition 4.1. A language L is recognisable if and only if the set ¹u 1 L j u 2 A º is finite. In this case, L is recognised by its Nerode automaton. Proof. Let L be a recognisable language, accepted by the deterministic automaton A D .Q; A;  ; q ; F /. By Proposition 3.7, the language u 1 L is accepted by the automaton Au D .Q; A;  ; q  u; F /. If n is the number of states of A, there are at most n automata of the form Au and hence at most n distinct languages of the form u 1 L. Conversely, if the set ¹u 1 L j u 2 A º is finite, the Nerode automaton of L is finite and recognises L. Indeed, a word u is accepted by A.L/ if and only if L u D u 1 L is a final state, that is if u 2 L. It follows that L is recognisable. Let A D .Q; A; E; q ; F / and A0 D .Q0 ; A; E 0 ; q 0 ; F 0 / be two deterministic automata. A morphism of automata from A to A0 is a surjective map 'W Q ! Q0 such that '.q / D q 0 , ' 1 .F 0 / D F and, for every u 2 A and q 2 Q, '.q  u/ D '.q/ u. We write A 6 A0 if there is a morphism from A to A0 . Let L be a recognisable language. The next proposition shows that, amongst the accessible and complete deterministic automata recognising L, the Nerode automaton of L is minimal for 6. For this reason it is called the minimal complete automaton of L. Proposition 4.2. Let A D .Q; A;  ; q ; F / be an accessible and complete deterministic automaton accepting L. For each state q of Q, let Lq be the language recognised by .Q; A;  ; q; F /. Then A.L/ D .¹Lq j q 2 Qº; A,  ; Lq ; ¹Lq j q 2 F º/, where, for all a 2 A and for all q 2 Q, Lq  a D Lqa . Moreover, the map q 7! Lq defines a morphism from A onto A.L/. Proof. Let q be a state of Q. Since q is accessible, there is a word u of A such that q  u D q , and by Proposition 3.7, one has Lq D u 1 L. Conversely, if u is a word, one has u 1 L D Lq with q D q  u. Therefore ¹Lq j q 2 Qº D ¹u

1

L j u 2 A º

and ¹Lq j q 2 F º D ¹u 1 L j u 2 Lº;

which proves the first part of the statement. Furthermore, for all a 2 A, one has '.q  a/ D Lqa D Lq  a D '.q/ a which shows that the map 'W q 7! Lq is a morphism from A onto A.L/.

Jean-Éric Pin

18

The direct computation of the Nerode automaton is probably the most efficient method for a computation by hand, because it gives directly the minimal automaton. In practice, one starts with the quotient L D 1 1 L and one maintains a table of quotients of L. For each quotient R, it suffices to compute the quotients a 1 R for each letter a. These quotients are compared to the existing list of quotients and possibly added to this list. But there is a hidden difficulty: the comparison of two rational expressions is not always easy, since a given language might be represented by two very different rational expressions. Example 4.1. For L D .a.ab/ / [ .ba/ , we get 1 1 L D L D L1 and a

1

a

1

a

1

a

1

a

1

L1 D .ab/ .a.ab/ / D L2 ;

L2 D bL2 [ L2 D L4 ; L3 D .ba/ D L5 ; L4 D a

L5 D ;;

1

.bL2 [ L2 / D L4 ;

b

1

b

1

b

1

b

1

b

1

L1 D a.ba/ D L3 ;

L2 D ;;

L3 D ;; L4 D b

1

.bL2 [ L2 / D L2 ;

L5 D a.ba/ D L3 ;

which gives the minimal automaton represented in Figure 19. b 5

a 3

b

1

a

2

a

4

a

b

Figure 19. The minimal automaton of L

There are standard algorithms for minimising a given accessible deterministic automaton [3] based on the computation of the Nerode equivalence. Let A D .Q; A; E; q ; F / be an accessible deterministic automaton. The Nerode equivalence  on Q is defined by p  q if and only if, for every word u 2 A , p u 2 F () q  u 2 F:

One can show that  is actually a congruence, in the sense that F is saturated by  and that p  q implies p x  q  x for all x 2 A . It follows that there is a well-defined quotient automaton A= D .Q=; A; E; qQ ; F=/, where qQ is the equivalence class of q . Proposition 4.3. Let A be an accessible and complete deterministic automaton. Then A= is the minimal automaton of A. We shall in particular use the following consequence. Corollary 4.4. An accessible and complete deterministic automaton is minimal if and only if its Nerode equivalence is the identity.

1. Finite automata

19

4.2. Automata and monoids. Let A D .Q; A;  ; q ; F / be a complete deterministic automaton. For each v 2 A , the mapping q 7 ! q v

defines a function fv W Q ! Q. If we compose these functions from left to right (in effect, letting them act on Q on the right), we evidently have fv fw D fvw

for all v; w 2 A . Thus the set M.A/ D ¹fv j v 2 A º

of these functions is a monoid, with composition as the operation, and the map A W v 7 ! fv

is a morphism from A onto M.A/: We call M.A/ the transition monoid of the automaton A. Since there are only finitely many maps from Q into itself when Q is a finite set, M.A/ is a finite monoid. Example 4.2. Consider the automaton A in Example 3.3, which recognises the language .ab/ . We will represent the mapping fv by the table 0 1 2

fv 0 v 1 v 2 v

We can then tabulate the functions fv for short words v : 0 1 2

f1 0 2 0

fa 0 0 1

fb 0 1 0

fab 0 0 1

fba 0 0 2

faa 0 0 0

It is easy to check that for any of the words v for which fv is tabulated above, the maps fva D fv fa and fvb D fv fb are already in the list. For example, faba D fa : These six elements thus constitute the entire transition monoid M.A/. Since f1 is the identity element of M.A/, we will denote it by 1. The element faa satisfies faa m D faa D mfaa

for all m 2 M.A/ (that is, it is a zero of the monoid) and so we will write faa D 0. We abbreviate the remaining elements by a; b; ab; ba; and obtain the following multiplication table for M.A/:

Jean-Éric Pin

20

 1 a b ab ba 0

1 1 a b ab ba 0

a a 0 ba a 0 0

b b ab 0 0 b 0

ab ab 0 b ab 0 0

ba ba a 0 0 ba 0

0 0 0 0 0 0 0

Again, let A D .Q; A;  ; q ; F / be a complete deterministic automaton. Then where

L.A/ D A1 .X /;

X D ¹fv j q fv 2 F º  M.A/: We say that a monoid M recognises a language L  A if there is a morphism 'W A ! M and a set X  M such that L D ' 1 .X /. We also say in this instance that the morphism ' recognises L. Thus our observation above can be restated: for any complete deterministic automaton A; L.A/ is recognised both by the monoid M.A/ and by the morphism A :

Example 4.3. In Example 4.2 we have L.A/ D .ab/ D A1 .¹1; fab º/.

4.3. Syntactic monoid and syntactic morphism. There are two ways to introduce the syntactic monoid of a recognisable language. We first present an algorithm to compute it and then give an abstract definition. Let L  A be a recognisable language and let A.L/ be its minimal automaton. The monoid Synt.L/ D M.A.L// and the morphism L D A.L/ are called, respectively, the syntactic monoid and the syntactic morphism of L. In other words, the syntactic monoid of L is the transition monoid of the minimal automaton of L. Example 4.4. In Example 4.2 we have Synt..ab/ / D M.A/, because A is the minimal automaton of .ab/ . Example 4.5. Let A be a finite alphabet and let a 2 A. Consider the language L D A aA , which consists of all words that contain an occurrence of the letter a. The minimal automaton A.L/ is .¹0; 1º; A;  ; 1; ¹0º/; where 1 a D 0 and q  b D q whenever either q ¤ 1 or b ¤ a. The syntactic monoid Synt.L/ then contains two transitions: f1 , which is the identity of the monoid, and fa , which is a zero. We can thus write this monoid as ¹0; 1º; with the usual multiplication. Let M and N be monoids. We say that M divides N; and write M  N , if there is a submonoid N 0 of N and a morphism 'W N 0 ! M that maps onto M . Informally, M  N means that M is simpler than N . We would naturally consider M to be simpler than N if M is either a submonoid or a quotient of N . We would also expect “simpler than” to be a transitive relation. It is easy to prove (and left as an exercise for the reader) that division is the least transitive relation that includes both the submonoid and quotient relation. The next proposition says that the syntactic monoid of L is the simplest monoid recognising L.

1. Finite automata

21

Proposition 4.5. Let L be a recognisable language and M a monoid. Then M recognises L if and only if Synt.L/ divides M . Proof. First suppose that Synt.L/ divides M . Then there is a submonoid M 0 of M and a surjective morphism 'W M 0 ! Synt.L/. Since ' is surjective, there exists for each a 2 A at least one ma 2 M 0 such that '.ma / D L .a/. Set .a/ D ma . Then extends to a unique morphism (also denoted ) from A into M 0 . Observe that ' ı D L , because '. .a// D L .a/ for all a 2 A. Let P D .L/. We claim that for all words v 2 A , .v/ 2 P if and only if 1 v 2 L. This will show L D .P /, and hence M recognises L. By definition, v 2 L implies .v/ 2 P . On the other hand, if .v/ 2 P , then there is some v 0 2 L such that .v/ D .v 0 /. We thus have fv D L .v/ D ' ı

.v/ D ' ı

.v 0 / D L .v 0 / D fv0 :

Let A.L/ D .Q; A;  ; q ; F / be the minimal automaton of L. Since v 0 2 L, we have q  v 0 2 F and, since fv D fv0 , q  v D q  v 0 . So q  v 2 F and thus v 2 L, as claimed. Note that we have not used the minimality of this automaton. We have actually proved that if N recognises L and N divides M , then M recognises L as well. For the converse, suppose that M recognises L via a map W A ! M and a 1 subset P  M such that .P / D L. As before, let A.L/ D .Q; A;  ; q ; F / be the minimal automaton of L. We will show that if w; w 0 2 A with .w/ D .w 0 /, then L .w/ D L .w 0 /, in other words, that the syntactic morphism L “factors through” . This implies the existence of a morphism 'W Im. / ! Synt.L/ such that ' ı D L . Since L maps onto Synt.L/, we conclude that Synt.L/  M . Suppose, contrary to what we are trying to prove, that L .w/ ¤ L .w 0 /. Thus there is some state q of A.L/ such that q  w ¤ q  w 0 . In particular, by Corollary 4.4, there is some v 2 A such that q  wv 2 F and q  w 0 v … F , or vice-versa. Since every state of A.L/ is accessible from the initial state, there is accordingly some u 2 A such that q  u D q , and thus uwv 2 L, uw 0 v … L. Since uwv 2 L, we must have .uwv/ 2 P . But .uw 0 v/ D .uwv/ is then also in P , so that uw 0 v 2 L, a contradiction. We have defined the syntactic monoid of a recognisable language as the transition monoid of its minimal automaton. One often sees a different, although equivalent, definition in terms of congruences. A congruence on a monoid M (finite or infinite) is an equivalence relation  on M that is compatible with multiplication in M : in other words, if mi  m0i for i D 1; 2, then m1 m2  m01 m02 . In this case, there is a well-defined multiplication on the quotient set M= of equivalence classes of  that turns M= into a monoid. The map taking m 2 M to its equivalence class Œm is then a morphism from M onto this quotient monoid, called the projection morphism. Let L  A be a recognisable language. We define an equivalence relation L on  A as follows: if u; v 2 A , then u L v if and only if for every x; y 2 A , xuy and xvy are either both in L or both outside of L. We call L the syntactic congruence of L.

Jean-Éric Pin

22

To see the connection to the syntactic monoid and syntactic morphism, as we have defined them, suppose that u L v . Let q be any state of the minimal automaton of L. If q  u ¤ q  v , then there is some word y 2 A such that q  uy is an accepting state and q  vy is not, or vice-versa. Without loss of generality, we can assume that q  uy is accepting. Since every state of the minimal automaton is accessible from the initial state q , we have q D q  x for some x 2 A . Thus xuy 2 L and xvy … L, contradicting u L v . Thus we must have q  u D q  v for every state q . This shows that L .u/ D L .v/. Conversely, suppose L .u/ D L .v/: Then fxuy D fx fu fy D fx fv fy : Thus q  xuy is accepting if and only if q  xvy is. That is, xuy 2 L if and only if xvy 2 L, i.e., u L v . We have shown that u L v if and only if L .u/ D L .v/, so that equivalence classes of L are in one-to-one correspondence with elements of the syntactic monoid. This implies immediately that L is a congruence, and that the correspondence L .u/ $ ŒuL is an isomorphism of monoids. We have thus proved Proposition 4.6. Let L  A be a recognisable language. Then L is a congruence, and the quotient monoid A =L is isomorphic to the syntactic monoid Synt.L/. Example 4.6. Let us use this alternative definition of the syntactic monoid to recompute Synt..ab/ /, the syntactic monoid we originally computed in Example 4.2. The syntactic congruence of L identifies two words u and v if the set of pairs of words .x; y/ such that xuy 2 L is the same as the set of pairs for which xvy 2 L. If u contains two consecutive occurrences of a or two consecutive occurrences of b , then this set of pairs is empty. Indeed, these words u are the only ones for which there is no pair .x; y/ with xuy 2 L, so one of the L -classes is A aaA [ A bbA : Nonempty words in which the letters a and b alternate lie in one of the sets .ab/C ;

.ba/C ;

b.ab/ ;

.ab/ a:

For all u in, say, b.ab/ , xuy 2 L if and only if x 2 .ab/ a and y 2 .ab/ . So all the words in b.ab/ belong to a single congruence class. Likewise, all the words in each of the other three sets belong to a single congruence class. The only word we have not considered is the empty word 1. This is not congruent to any nonempty word: for example if u starts with a we have a 1 b 2 L and aub … L. We have thus determined all six congruences classes: 1;

.ab/C ;

.ab/ a;

b.ab/ ;

.ba/C ;

A aaA [ A bbA :

We can compute the multiplication table of the syntactic monoid by choosing a representative word from each of the two classes whose product we seek, and finding which class the concatenation of the two words belongs to. This gives the same multiplication table we constructed earlier using the minimal automaton. 4.4. Ordered versions. A non-symmetrical variant of the definition of the syntactic congruence turns out to be very useful in studying recognisable languages. Let u;v 2 A : we write u 4L v if for every x; y 2 A such that xuy 2 L, we have also xvy 2 L. Then 4L is a preorder and it is immediately verified that u L v if and only if u 4L v and v 4L u. Moreover, if u1 4L v1 and u2 4L v2 , then we have u1 u2 4L v1 v2 . As

1. Finite automata

23

a result, 4L defines an order relation 6L on Synt.L/ D A =L , the syntactic monoid of L, which is stable under product. This order is called the syntactic order of L and the resulting ordered monoid .Synt.L/; 6L / is called the ordered syntactic monoid of L. Example 4.7. We have already seen that M D ¹1; a; b; ab; ba; 0º is the syntactic monoid of .ab/ . The order of its ordered syntactic monoid is given by the relations ab 6 1, ba 6 1 and 0 6 x for all x 2 M . The syntactic order can also be computed from the minimal automaton of L. Let A D .Q; A;  ; q ; F / be the minimal automaton of L. Define a relation 6 on Q by setting p 6 q if and only if, for all u 2 A , p u 2 F implies q  u 2 F . The relation 6 is clearly reflexive and transitive. Suppose that p 6 q and q 6 p . Then, for all u 2 A , p u 2 F if and only if q  u 2 F . Since A is minimal, this implies p D q . Thus 6 is an order. Furthermore, if p 6 q , then for all a 2 A, p a 6 q  a since, for all u 2 A , p au 2 F implies q  au 2 F . We know that the syntactic monoid of L is the transition monoid of its minimal automaton. The syntactic order of L can now be defined directly as follows: fu 6 fv if and only if, for every q 2 Q, q  u 6 q  v . Example 4.8. Consider the minimal complete automaton of .ab/ , represented in Figure 15. The order on the set of states is 0 < 1 and 0 < 2. Indeed, one has 0 u D 0 for all u 2 A and thus, the formal implication 0 u 2 F H) q  u 2 F

holds for any q 2 Q. One can verify that there is no other relations among the states of Q. For instance, 1 and 2 are incomparable since 1 ab D 1 2 F but 2 ab D 0 … F and 1 b D 0 … F but 2 b D 1 2 F . One also recovers the syntactic order described in Example 4.7. For instance, ab 6 1 since 0 ab D 0 1, 1 ab D 1 1 and 2 ab D 0 6 2 D 2 1. 4.5. Operations on recognisable languages. The direct product M1  M2 of two monoids M1 and M2 is just the ordinary Cartesian product with the operation .m1 ; m2 /.m01 ; m02 / D .m1 m01 ; m2 m02 /:

Proposition 4.7. Let L; L1 ; L2  A be recognisable languages. Then

Synt.L/ D Synt.A n L/; Synt.L1 [ L2 /  Synt.L1 /  Synt.L2 /; Synt.L1 \ L2 /  Synt.L1 /  Synt.L2 /:

Proof. The minimal automaton of L is identical to that of A n L, except for the set of accepting states. In particular, these two automata have the same transition monoid, which gives the first part of the claim. For the second, it is enough to prove that if M1 recognises L1 and M2 recognises L2 , then M1  M2 recognises L1 [ L2 . The result will then follow from Proposition 4.5. Suppose then that for i D 1; 2, Mi recognises Li

24

Jean-Éric Pin

through morphisms 'i and subsets Pi . Let 'W A ! M1  M2 be the morphism defined by '.w/ D .'1 .w/; '2 .w//, and let P D ¹.m1 ; m2 / j m1 2 P1 or m2 2 P2 º:

Then '.w/ 2 P if and only if either w 2 L1 or w 2 L2 . Thus M1  M2 recognises L1 [ L2 , as required. The last part of the Proposition follows from the first two parts by DeMorgan’s Laws. Example 4.9. The reason for studying the syntactic monoid is that it enables us to characterise properties of a language in terms of algebraic properties of the syntactic monoid. What follows is a very simple example of this approach, which is the subject of Chapter 16. Suppose that a recognisable language L  A is recognised by a monoid M that is both idempotent, that is, m m D m for every m 2 M , and commutative, that is, m n D n m for all m; n 2 M . Let 'W A ! M be a morphism that recognises L. If w; w 0 2 A contain the same set of letters, then '.w/ D '.w 0 /: because of idempotence and commutativity, we can duplicate letters of a word and rearrange them without changing the image of the word under ' . Thus w 2 L if and only if w 0 2 L; in other words, membership of a word in L is determined entirely by the set of letters of L. Conversely, if L has this property, then L can be written as a Boolean combination of languages of the form A aA , where a 2 A. Thus by Proposition 4.7 and Example 4.5, Synt.L/ divides a direct product of copies of the monoid ¹0; 1º. This monoid is idempotent and commutative, and it is straightforward to verify that both the idempotence and commutativity properties are preserved under direct products, quotients and taking submonoids. Thus Synt.L/ is idempotent and commutative. We have shown that membership in L is dependent only on the set of letters in the word if and only if the syntactic monoid of L is idempotent and commutative.

5. Rational versus recognisable The aim of this section is to show that, if A is a finite alphabet, a language of A is recognisable if and only if it is rational. 5.1. Local languages. A language L of A is said to be local if there exist two subsets P and S of A and a subset N of A2 such that 1 L n 1 D .PA \ A S / n A NA :

For instance, if A D ¹a; b; cº, the language

.abc/ D 1 [ Œ.aA \ A c/ n A ¹aa; ac; ba; bb; cb; ccºA 

is local. The terminology can be explained as follows: in order to check whether a nonempty word belongs to L, it suffices to verify that its first letter is in P , its last letter is in S and its factors of length 2 are not in N : all these conditions are local. Conversely, 1 P stands for prefix, S for suffix, and N for non-factor.

1. Finite automata

25

if a language L is local, it is easy to recover the parameters P , S and N . Indeed, P [S ] is the set of first [last] letters of the words of L, and N is the set of words of length 2 that are factors of no word of L. It is easy to compute a deterministic automaton recognising a local language, given the parameters P , S and N . Proposition 5.1. Let L D .PA \ A S / n A NA be a local language. Then L is recognised by the automaton A in which the set of states is A [ ¹1º, the initial state is 1, the set of final states is S , and the transitions are given by the rules 1 a D a if a 2 P and a b D b if ab … N . Proof. Let u D a1    an be a word accepted by A and let a1

a2

1 ! a1 ! a2 !    ! an

an 1

! an

be a successful path of label u. Then the state an is final and hence an 2 S . Similarly, a1 since 1 !a1 is a transition, one has necessarily a1 2 P . Finally, since for 1 6 i 6 n 1, ai C1

ai ! ai C1 is a transition, the word ai ai C1 is not in N . Consequently, u belongs to L. Conversely, if u D a1    an 2 L, one has a1 2 P , an 2 S and, for 1 6 i 6 n, a1

a2

an

ai ai C1 … N . Thus 1 ! a1 ! a2 !    ! an 1 ! an is a successful path of A and A accepts u. Therefore, the language accepted by A is L.

For a local language containing the empty word, the previous construction can be easily modified by taking S [ ¹1º as the set of final states. Note also that the automaton A described in Proposition 5.1 has a special property: all the transitions of label a have the same end, namely the state a. More generally, we shall say that a deterministic automaton (not necessarily complete) A D .Q; A;  / is local if, for each letter a, the set ¹q  a j q 2 Qº contains at most one element. Local languages have the following characterisation. Proposition 5.2. A rational language is local if and only if it is recognised by a local automaton. Example 5.1. Let A D ¹a; b; cº, P D ¹a; bº, S D ¹a; cº, and N D ¹ab; bc; caº. The automaton in Figure 20 recognises the language L D .PA \ A S / n A NA . Proof. One direction follows from Proposition 5.1. To prove the opposite direction, consider a local automaton A D .Q; A;  ; q0 ; F / recognising a language L and let P D ¹a 2 A j q0  a is definedº;

S D ¹a 2 A j there exists q 2 Q such that q  a 2 F º;

N D ¹x 2 A2 j x is the label of no path in Aº; K D .PA \ A S / n A NA :

Jean-Éric Pin

26

a

a a

c c

a

1

c

b

b b

b

Figure 20. An automaton recognising a local language

a1

an

Let u D a1    an be a nonempty word of L and let q0 ! q1 !    ! qn 1 ! qn be a successful path of label u. Necessarily, a1 2 P , an 2 S and, for 1 6 i 6 n 1, ai ai C1 … N . Consequently, u 2 K , which shows that L n 1 is contained in K . Let now u D a1    an be a nonempty word of K . Then a1 2 P , an 2 S , and, for 1 6 i 6 n 1, ai ai C1 … N . Since a1 2 P , the state q1 D q0  a1 is well defined. a1 a2 Moreover, since a1 a2 … N , a1 a2 is the label of some path p0 ! p1 ! p2 in A. But since A is a local automaton, q0  a1 D p0  a1 . It follows that the word a1 a2 is also the a1 a2 label of the path q0 ! p1 ! p2 . One can show in the same way by induction that there exists a sequence of states pi (0 6 i 6 n) such that ai ai C1 is the label of a path ai

ai C1

pi 1 ! pi ! pi C1 of A. Finally, since an 2 S , there is a state q such that q  an 2 F . But since A is a local automaton, one has q  an D pn 1  an D pn , whence pn 2 F . a1 an Therefore q0 ! p1 !    ! pn 1 ! pn is a successful path in A and its label u is accepted by A. Thus K D L n 1.

Local languages are stable under various operations. Proposition 5.3. Let A1 and A2 be two disjoint subsets of the alphabet A and let L1  A1 and L2  A2 be two local languages. Then the languages L1 C L2 and L1 L2 are local languages. Proof. Let A1 [A2 ] be a local automaton recognising L1 [L2 ]. The proofs of Propositions 3.1 and 3.5 give an automaton recognising L1 C L2 and L1 L2 . A simple verification shows that these constructions produce a local automaton when A1 and A2 are local. Proposition 5.4. Let L be a local language. Then the language L is a local language.

1. Finite automata

27

Proof. Let A be a local automaton recognising L. The proof of Proposition 3.6 gives an automaton recognising L . A simple verification shows that this construction produces a local automaton when A is local. 5.2. Glushkov’s algorithm. Glushkov’s algorithm [2] is an efficient way to convert a rational expression into a nondeterministic automaton. A rational expression is said to be linear if each letter has at most one occurrence in the expression. For instance, the expression Œa1 a2 .a3 a4 / [ .a5 a6 / a7 

(2)

is linear. One can linearise a rational expression by replacing each occurrence of a letter by a distinct symbol. For instance, the expression (2) is a linearisation of the expression e D Œab.ba/ [ .ac/ b . Now, given an automaton for e 0 , the linearisation of e , it is easy to obtain an automaton for e , simply by replacing the letters of e 0 by the corresponding letters in e . For instance, starting from the automaton A which recognises Œ.a1 a2 / a3  , one gets a nondeterministic automaton A0 which recognises Œ.ab/ a by replacing a1 and a3 by a and a2 by b , as shown in Figure 21. a2 a1

b

a

a1 a1

a a

a3

a

a

a3 A0

A a3

a

Figure 21. Construction of an automaton recognising Œ.ab/ a

It remains to find an algorithm to compute the automaton of a linear expression. Proposition 5.5. Every linear expression represents a local language. Proof. The proof works by induction on the formation rules of a linear expression. First, the languages represented by 0, 1 and a, for a 2 A, are local languages. Next, by Proposition 5.4, if e represents a local language, then so does e  . Let now e and e 0 be two linear expressions and suppose that the expression .e [ e 0 / is still linear. Let B [B 0 ] be the set of letters occurring in e [e 0 ]. Since .e [e 0 / is linear, the letters of B [B 0 ] do not occur in e 0 [e ]. In other words, B and B 0 are disjoint and the local language represented by e [e 0 ] is contained in B  [B 0  ]. By Proposition 5.3, the language represented by .e [e 0 / is also a local language. A similar argument applies for the language represented by ee 0 .

28

Jean-Éric Pin

Proposition 5.1 allows one to compute a deterministic automaton recognising a local language. It suffices to test whether the empty word belongs to L and to compute the sets P .L/ D ¹a 2 A j aA \ L 6D ;º; S.L/ D ¹a 2 A j A a \ L 6D ;º; F .L/ D ¹x 2 A2 j A xA \ L 6D ;º:

This can be done by recursion, given a linear rational expression representing the language. We first compute the procedure

EmptyWord.eW linear expression/W booleanI

which tells whether the empty word belongs to the language represented by e :

EmptyWord.0/ D falseI EmptyWord.1/ D trueI EmptyWord.a/ D false for all a 2 AI EmptyWord.e [ e 0 / D EmptyWord.e/ or EmptyWord.e 0 /I EmptyWord.e  e 0 / D EmptyWord.e/ and EmptyWord.e 0 /I EmptyWord.e  / D trueI Now P , S , and F are computed by the following recursive procedures:

P.0/ D ;I P.1/ D ;I P.a/ D ¹aº for all a 2 AI P.e [ e 0 / D P.e/ [ P.e 0 /I if EmptyWord.e/ then P.e  e 0 / D P.e/ [ P.e 0 /I else P.e  e 0 / D P.e/I P.e  / D P.e/I

S.0/ D ;I S.1/ D ;I S.a/ D ¹aº for all a 2 AI S.e [ e 0 / D S.e/ [ S.e 0 /I if EmptyWord.e 0 / then S.e  e 0 / D S.e/ [ S.e 0 /I else S.e  e 0 / D S.e 0 /I S.e  / D S.e/I

F.0/ D ;I F.1/ D ;I F.a/ D ;I F.e [ e 0 / D F.e/ [ F.e 0 /I F.e  e 0 / D F.e/ [ F.e 0 / [ S.e/ P.e 0 /I F.e  / D F.e/ [ S.e/ P.e/I In summary, Glushkov’s algorithm to convert a rational expression e into a nondeterministic automaton works as follows: 1. linearise e into e 0 and memorise the coding of the letters; 2. compute recursively the sets P .e 0 /, S.e 0 / and F .e 0 /; then compute a deterministic automaton A0 recognising e 0 ; 3. convert A0 into a nondeterministic automaton A recognising e .

1. Finite automata

29

Example 5.2. Consider the rational expression e D .a.ab/ / [ .ba/ . We first linearise e into e 0 D .a1 .a2 a3 / / [ .a4 a5 / . Let L D L.e/ and L0 D L.e 0 /. To compute the sets P , S and F , one can either use the above-mentioned recursive procedures, or proceed to a direct computation (this method is usually preferred in a computation by hand...). Recall that P [S ] is the set of first [last] letters of the words of L0 . We get P D ¹a1 ; a4 º and S D ¹a1 ; a3 ; a5 º: Note that a1 belongs to S since a1 is a word of L0 . Next we compute the set F of all words of length 2 that are factors of some word of L0 . We get F D ¹a1 a2 ; a1 a1 ; a2 a3 ; a3 a1 ; a3 a2 ; a4 a5 ; a5 a4 º. For instance, a3 a1 is a factor of a1 a2 a3 a1 and a3 a2 is a factor of a1 a2 a3 a2 a3 . Since the empty word belongs to L0 , the state 1 is final and we finally obtain the automaton represented in Figure 22. Since this automaton is local, there is actually no need to write the labels on the transitions. We now convert this automaton into a nondeterministic automaton recognising L, represented in Figure 23. To get a deterministic automaton, it remains to apply the algorithm described in § 2.3. a2

a1 a2

a4 a5

a4

a4

1

a1

a1

a5

a2

a3

a1 a3

Figure 22. A local automaton recognising L0 a2

a a

b a5

a4

b

1

a

a

a1

a

b

a a3

Figure 23. A nondeterministic automaton recognising L

5.3. Linear equations. In this section, we give an algorithm to convert an automaton into a rational expression. The algorithm amounts to solving a system of linear equations on languages. We first consider an equation of the form X D KX C L;

(3)

30

Jean-Éric Pin

where K and L are languages and X is the unknown. When K does not contain the empty word, the equation admits a unique solution. Proposition 5.6. If K does not contain the empty word, then X D K  L is the unique solution of the equation X D KX C L. Proof. Replacing X by K  L in the expression KX C L, one gets

K.K  L/ C L D K C L C L D .K C C 1/L D K  L;

and hence X D K  L is a solution of (3). To prove uniqueness, consider two solutions X1 and X2 of (3). By symmetry, it suffices to show that each word u of X1 also belongs to X2 . Let us prove this result by induction on the length of u. If juj D 0, u is the empty word and if u 2 X1 D KX1 C L, then necessarily u 2 L since 1 … K . But in this case, u 2 KX2 C L D X2 . For the induction step, consider a word u of X1 of length n C 1. Since X1 D KX1 C L, u belongs either to L or to KX1 . If u 2 L, then u 2 KX2 C L D X2 . If u 2 KX1 then u D kx for some k 2 K and x 2 X1 . Since k is not the empty word, one has necessarily jxj 6 n and hence by induction x 2 X2 . It follows that u 2 KX2 and finally u 2 X2 . This concludes the induction and the proof of the proposition. If K contains the empty word, uniqueness is lost. Proposition 5.7. If K contains the empty word, the solutions of (3) are the languages of the form K  M with L  M . Proof. Since K contains the empty word, one has K C D K  . If L  M , one has L  M  K  M . It follows that the language K  M is solution of (3) since K.K  M / C L D K C M C L D K  M C L D K  M:

Conversely, let X be a solution of (3). Then L  X and KX  X . Consequently, 2 n  K P X  nKX  X and by induction, K X  X for all n. It follows that K X D n>0 X K  X . The language X can thus be written as K M with L  M : it suffices to take M D X . In particular, if K contains the empty word, then A is the maximal solution of (3) and the minimal solution is K  L. Consider now a system of the form X1 D K1;1 X1 C K1;2 X2 C    C K1;n Xn C L1 ; X2 D :: K2;1 X1 C K2;2 X2 C    C K2;n Xn C L2 ; : Xn D Kn;1 X1 C Kn;2 X2 C    C Kn;n Xn C Ln :

(4)

We shall only consider the case when the system admits a unique solution. Proposition 5.8. If, for 1 6 i; j 6 n, the languages Ki;j do not contain the empty word, the system (4) admits a unique solution. Moreover, if the Ki;j and the Li are rational languages, then the solutions Xi of (4) are rational languages.

1. Finite automata

31

Proof. The case n D 1 is handled by Proposition 5.6. Suppose that n > 1. Consider the last equation of the system (4), which can be written Xn D Kn;n Xn C .Kn;1 X1 C    C Kn;n

1 Xn 1

C Ln /:

According to Proposition 5.6, the unique solution of this equation is  Xn D Kn;n .Kn;1 X1 C    C Kn;n

1 Xn 1

C Ln /:

Replacing Xn by this expression in the n 1 first equations, we obtain a system of n 1 equations with n 1 unknowns and one can conclude by induction. We shall now associate a system of linear equations with every finite automaton A D .Q; A; E; I; F /. Let us set, for p; q 2 Q, Kp;q D ¹a 2 A j .p; a; q/ 2 Eº; ´ 1 if q 2 F , Lq D 0 if q … F .

The solutions of the system defined by these parameters are the languages recognised by the automata Aq D .Q; A; E; ¹qº; F /:

More precisely, we get the following result.

Proposition 5.9. The system (4) admits a unique solution .Rq /q2Q , given by the formula Rq D ¹u 2 A j there is a path of label u from q to F º: P Furthermore, the language recognised by A is q2I Rq . Proof. Since the languages Kp;q do not contain the empty word, Proposition 5.8 shows that the system (4) admits a unique solution. It remains to verify that the family .Rq /q2Q is solution of the system, that is, satisfies for all q 2 Q the formula Rq D Kq;1 R1 C Kq;2 R2 C    C Kq;n Rn C Lq :

(5)

Let us denote by Sq the right hand side of (5). If u 2 Rq , then u is by definition the label of a path from q to a final state f . If u is the empty word, one has necessarily q D f and hence Lq D 1. Thus u 2 Sq in this case. Otherwise, let .q; a; q 0 / be the first transition of the path. One has u D au0 , where u0 is the label of a path from q 0 to f . Then one has a 2 Kq;q 0 , u0 2 Rq 0 and finally u 2 Sq . Conversely, let u 2 Sq . If u D 1, one has necessarily u 2 Lq , whence q 2 F and u 2 Rq . Otherwise there is a state q 0 such that u 2 Kq;q 0 Rq 0 . Therefore, u D au0 for some a 2 Kq;q 0 and u0 2 Rq 0 . On the one hand, .q; a; q 0 / is a transition of A by definition of Kq;q 0 and on the other hand u0 is the label of a final path starting in q 0 . The composition of these paths gives a final path of label u starting in q . Therefore u 2 Rq and thus Rq D Sq .

Jean-Éric Pin

32

For example, if A is the automaton represented in Figure 24, then the system can be written X1 D aX2 C bX3 ; X2 D aX1 C bX3 C 1; X3 D aX2 C 1; and has for solution X1 D .a C ba/.aa C aba C ba/ .ab C b C 1/ C b; X2 D .aa C aba C ba/ .ab C b C 1/; X3 D a.aa C aba C ba/ .ab C b C 1/ C 1:

Since 1 is the unique initial state, the language recognised by the automaton is X1 . a

1

2

a

b a

b 3

Figure 24. An automaton

5.4. Extended automata. The use of equations is not limited to deterministic automata. The same technique applies to nondeterministic automata and to more powerful automata, in which the transition labels are not letters, but rational languages. An extended automaton is a quintuple A D .Q; A; E; I; F /, where Q is a set of states, A is an alphabet, E is a subset of Q  Rat.A /  Q, called the set of transitions, I [F ] is the set of initial [final] states. The label of a path c D .q0 ; L1 ; q1 /;

.q1 ; L2 ; q2 /;

:::;

.qn

1 ; Ln ; qn /

is the rational language L1 L2    Ln . The definition of a successful path is unchanged. A word is accepted by A if it belongs to the label of a successful path. In the example represented in Figure 25, the set of transitions is ¹.1; a b C a; 2/; .1; b  ; 3/; .2; a C b; 1/; .2; b; 3/; .3; a; 1/; .3; a; 2/º:

Let A D .Q; A; E; I; F / be an extended automaton. For all p; q 2 Q, we let Kp;q denote the label of the transition from p to q . Notice that Kp;q might possibly be the empty language. We also put ´ 1 if there is a path labelled by 1 from q to F , Lq D 0 otherwise.

1. Finite automata

33

a bCa 1

2

aCb

a

b a

b 3

Figure 25. An extended automaton

Yet the associated system does not necessarily fulfil the condition 1 … Ki;j and Proposition 5.9 needs to be modified as follows. Proposition 5.10. The system (4) has a minimal solution .Rq /q2Q , given by the formula Rq D ¹u 2 A j there is a path labelled by u from q to F º: P In particular the language recognised by A is q2I Rq . Proof. Let us first verify that the family .Rq /q2Q is indeed a solution of (4), i.e., satisfies, for all q 2 Q: Rq D Kq;1 R1 C Kq;2 R2 C    C Kq;n Rn C Lq :

(6)

Denote by Sq the right hand side of (6). If u 2 Rq , then u is by definition the label of a path from q to F . If u D 1, one has Lq D 1 and thus u 2 Sq . Otherwise, let .q; u1 ; q 0 / be the first transition of the path. One has u D u1 u0 , where u0 is the label of a path from q 0 to F . Therefore u1 2 Kq;q 0 , u0 2 Rq 0 and finally u 2 Sq . Conversely, let u 2 Sq . If u D 1, one has necessarily u 2 Lq , whence q 2 F and u 2 Rq . Otherwise, there is a state q 0 such that u 2 Kq;q 0 Rq 0 . Thus u D u1 u0 for some u1 2 Kq;q 0 and u0 2 Rq 0 . On the one hand, .q; u1 ; q 0 / is a transition of A by the definition of Kq;q 0 and on the other hand, u0 is the label of a path from q 0 to F . Therefore u D u1 u0 is the label of a path from q to F and u 2 Rq . Consequently Rq D Sq . It remains to verify that if .Xq /q2Q is a solution of the system, then Rq  Xq for all q 2 Q. If u 2 Rq , there exists a path labelled by u from q to F : .q0 ; u1 ; q1 /.q1 ; u2 ; q2 /    .qr

1 ; ur ; qr /

with q0 D q , qr 2 F , ui 2 Kqi 1 ;qi and u1 u2    ur D u. Let us show by induction on r i that ui C1    ur belongs to Xqi . By hypothesis, the Xq are solutions of Xq D Kq;1 X1 C Kq;2 X2 C    C Kq;n Xn C Lq :

Jean-Éric Pin

34

In particular, since qr 2 F , one has 1 2 Lqr and hence 1 2 Xqr , which gives the result for r i D 0. Moreover, if ui C1    ur is an element of Xqi , the inclusion Kqi 1 ;qi Xqi  Xqi 1 shows that ui ui C1    ur is an element of Xqi 1 , which concludes the induction. In particular, u D u1    ur 2 Xq . Example 5.3. For the extended automaton represented in Figure 25, the system can be written X1 D .a b C a/X2 C b  X3 ;

X2 D .a C b/X1 C bX3 C 1; X3 D aX1 C aX2 C 1:

Replacing X3 by aX1 C aX2 C 1, and observing that a C b  a D b  a, we obtain the equivalent system X1 D .a b C a/X2 C b  .aX1 C aX2 C 1/ D b  aX1 C .a b C b  a/X2 C b  ;

X2 D .a C b/X1 C b.aX1 C aX2 C 1/ C 1 D .a C b C ba/X1 C baX2 C b C 1;

X3 D aX1 C aX2 C 1:

We deduce from the second equation

X2 D .ba/ ..a C b C ba/X1 C b C 1/;

and replacing X2 by its value in the first equation, we obtain

X1 D b  aX1 C .a b C b  a/.ba/ ..a C b C ba/X1 C b C 1/ C b  D .b  a C .a b C b  a/.ba/ .a C b C ba//X1 C .a b C b  a/.ba/ .b C 1/ C b  :

Finally, the language recognised by the automaton is X1 D .b  a C .a b C b  a/.ba/ .a C b C ba// Œ.a b C b  a/.ba/ .b C 1/ C b  ;

since 1 is the unique initial state.

5.5. Kleene’s theorem. We are now ready to state the most important result of automata theory. Theorem 5.11 (Kleene [4]). A language is rational if and only if it is recognisable. Proof. It follows from Proposition 5.9 that every recognisable language is rational. Corollary 3.2 states that every finite language is recognisable. Furthermore, Propositions 3.1, 3.5, and 3.6 show that recognisable languages are closed under union, product and star. Thus every rational language is recognisable. The following corollary is now a consequence of Propositions 2.1, 3.1, 3.3, 3.4, 3.5, 3.6, 3.8, and 3.9.

1. Finite automata

35

Corollary 5.12. Recognisable (rational) languages are closed under Boolean operations, product, star, quotients, morphisms and inverses of morphisms. We conclude this section by proving some elementary decidability results on recognisable languages. Recall that a property is decidable if there is an algorithm to check whether this property holds or not. We shall also often use the expressions “given a recognisable language L” or “given a rational language L.” As long as only decidability is concerned, it makes no difference to give a language by a nondeterministic automaton, a deterministic automaton or a regular expression, since there are algorithms to convert one of the forms into the other. However, the chosen representation is important for complexity issues, which will not be discussed here. Theorem 5.13. Given a recognisable language L, the following properties are decidable: 1. whether a given word belongs to L, 2. whether L is empty, 3. whether L is finite, 4. whether L is infinite. Proof. We may assume that L is given by a trim deterministic automaton A D .Q; A;  ; q ; F /. 1. To test whether u 2 L, it suffices to compute q  u. If q  u 2 F , then u 2 L; if q  u … F , or if q  u is undefined, then u … L. 2. Let us show that L is empty if and only if F D ;. The condition F D ; is clearly sufficient. Since A is trim, every state of A is accessible. Now, if A has at least one final state q , there is a word u such that q  u D q . Therefore u 2 L and L is nonempty. 3, 4. Let us show that L is finite if and only if A does not contain any loop. If A u contains a loop q ! q , then L is infinite: indeed, since A is trim, there exist y x paths i ! q and q ! f , where f is a final state and thus L contains all the words xun y . Conversely, if L is infinite, the proof of the pumping lemma shows that A contains a loop. Now, checking whether an automaton contains a loop is easy. Consider the directed graph G obtained from A by removing all the labels. Then A is loop-free if and only if G is acyclic, a property that can be checked by standard algorithms. One can for instance compute the transitive closure G 0 of G and check whether G 0 contains an edge of the form .q; q/. We leave as an exercise to the reader to prove that the inclusion problem and the equality problem are decidable for two given recognisable languages.

6. Algebraic approach The notions of rational and recognisable sets can be defined in arbitrary monoids. However, Kleene’s theorem does not extend to arbitrary monoids since rational and recognisable sets form in general two incomparable classes.

Jean-Éric Pin

36

6.1. Rational subsets of a monoid. Let M be a monoid. The set P.M / of subsets of M is a semiring with union as addition and product defined by the formula X Y D ¹xy j x 2 X and y 2 Y º:

For this reason, we shall adopt the notation we already introduced for languages. Union is denoted by C, the empty set by 0 and the singleton ¹mº, for m 2 M by m. This notation has the advantage that the identity of P.M / is denoted by 1. The powers of a subset X of M are defined by induction by setting X 0 D 1, 1 X D X and X n D X n 1 X for all n > 1. The star operation is defined by X X D Xn D 1 C X C X2 C X3 C    : n>0

In other words, X  is the submonoid of M generated by X . The set of rational subsets of a monoid M is the smallest set F of subsets of M satisfying the following conditions: 1. F contains 0 and the singletons of P.M /; 2. F is closed under union, product and star (in other words, if X; Y 2 F, then X C Y 2 F, X Y 2 F and X  2 F).

For instance, in a finite monoid, all subsets are rational. The rational subsets of Nk are the semilinear sets, which are finite unions of subsets of the form ¹v0 C n1 v1 C    C nr vr j n1 ; : : : ; nr 2 Nº;

where v0 ; v1 ; : : : ; vr are vectors of Nk . Rational subsets are also stable under morphisms. Proposition 6.1. Let 'W M ! N be a monoid morphism. If R is a rational subset of M , then '.R/ is a rational subset of N . Moreover, if ' is surjective, then for each rational subset S of N , there exists a rational subset R of M such that '.R/ D S . However, the rational subsets of a monoid are not necessarily closed under intersection, as shown by the following counterexample: Let M D a  ¹b; cº . Consider the rational subsets .a; b/ .1; c/ D ¹.an ; b n c m / j n; m > 0º;

.1; b/ .a; c/ D ¹.an ; b m c n / j n; m > 0º:

Their intersection is ¹.an ; b n c n / j n > 0º, a nonrational subset of M . It follows also that the complement of a rational subset is not necessarily rational. Otherwise, the rational subsets of a monoid would be closed under union and complement and hence under intersection. Proposition 6.2. Each rational subset of a monoid M is a rational subset of a finitely generated submonoid of M .

1. Finite automata

37

6.2. Recognisable subsets of a monoid. Let 'W M ! N be a monoid morphism. A subset L of M is recognised by ' if there exists a subset P of N such that LD'

1

.P /:

If ' is surjective, we say that ' recognises L. Note that in this case, the condition L D ' 1 .P / implies P D '.L/. A subset of a monoid is recognisable if it is recognised by a finite monoid. We let Rec.M / denote the set of recognisable subsets of M . Proposition 6.3. For any monoid M , Rec.M / is closed under Boolean operations and left and right quotients. Moreover, if 'W N ! M is a morphism, L 2 Rec.M / implies ' 1 .L/ 2 Rec.N /. Although Kleene’s theorem does not extend to arbitrary monoids, a weaker property holds for finitely generated monoids. Theorem 6.4 (McKnight [5]). Let M be a monoid. The following conditions are equivalent: 1. M is finitely generated; 2. every recognisable subset of M is rational; 3. the set M is a rational subset of M . We have seen that the intersection of two rational subsets is not necessarily rational. What about the intersection of a rational subset and a recognisable subset? Proposition 6.5. The intersection of a rational subset and of a recognisable subset of a monoid is rational. The next theorem gives a description of the recognisable subsets of a finite product of monoids. Eilenberg [1] attributes it to Mezei. Note that this result does not extend to finite products of semigroups. Theorem 6.6. Let M1 ; : : : ; Mn be monoids and let M D M1      Mn . A subset of M is recognisable if and only if it is a finite union of subsets of the form R1      Rn , where each Ri is a recognisable subset of Mi . One of the most important applications of Theorem 6.6 is the fact that the product of two recognisable relations over finitely generated free monoids is recognisable. Let A1 ; : : : ; An be finite alphabets. Then the monoid A1      An is finitely generated, since it is generated by the finite set ¹.1; : : : ; 1; ai ; 1; : : : ; 1/ j ai 2 Ai ; 1 6 i 6 nº:

Proposition 6.7. Let A1 ; : : : ; An be finite alphabets. The product of two recognisable subsets of A1      An is recognisable.

38

Jean-Éric Pin

References [1] S. Eilenberg, Automata, languages and machines. Vol. A. Pure and Applied Mathematics, 58. Academic Press, New York, 1974. MR 0530382 Zbl 0317.94045 q.v. 37 [2] V. M. Glushkov, Abstract theory of automata. Uspehi Mat. Nauk 16 (1961), no. 5(101), 3–62. In Russian. English translation, Russ. Math. Surv. 16 (1961), no. 5, 1–53. MR 0138529 Zbl 0104.35404 q.v. 27 [3] J. E. Hopcroft and J. D. Ullman, Introduction to automata theory, languages, and computation. Addison-Wesley Series in Computer Science. Addison-Wesley, Reading, MA, 1979. MR 0645539 Zbl 0426.68001 q.v. 18 [4] S. C. Kleene, Representation of events in nerve nets and finite automata. In Automata studies (C. E. Shannon and J. McCarthy, eds.). Annals of Mathematics Studies, 34. Princeton University Press, Princeton, N.J., 1956, 3–42. MR 0077478 q.v. 34 [5] J. D. McKnight, Jr., Kleene quotient theorems. Pacific J. Math. 14 (1964), 1343–1352. MR 0180612 Zbl 0144.01201 q.v. 37

Chapter 2

Automata and rational expressions Jacques Sakarovitch

Contents 1. 2. 3. 4. 5. 6. 7.

A new look at Kleene’s theorem . . . . . . Rationality and recognisability . . . . . . From automata to expressions: the € -maps From expressions to automata: the -maps Changing the monoid . . . . . . . . . . . Introducing weights . . . . . . . . . . . . Notes . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

39 41 45 54 61 63 72

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

74

1. A new look at Kleene’s theorem Not very many results in computer science are recognised as being as basic and fundamental as Kleene’s theorem. It was originally stated as the equality of two sets of objects, and is still so, even if the names of the objects have changed, see, for instance, Theorem 5.11 in Chapter 1. This chapter proposes a new look at this statement, in two ways. First, we explain how Kleene’s theorem can be seen as the conjunction of two results with distinct hypotheses and scopes. Second, we express the first of these two results as the description of algorithms that relate the symbolic descriptions of the objects rather than as the equality of two sets. A two-step Kleene’s theorem. In Kleene’s theorem, we first distinguish a step that consists in proving that the set of regular (or rational) languages is equal to the set of languages accepted by finite automata, a set that we denote by Rat A . This seems already to be Kleene’s theorem itself and is indeed what S. C. Kleene established in [39]. But it is not, if one considers – as we shall do here – that this equality merely states the equality of the expressive power of rational expressions and that of finite labelled directed graphs. This is universally true. It holds independently of the structure in which the labels of the automata or the atoms of the expressions are taken, in any monoids or even in the algebra of polynomials under certain hypotheses. By the virtue of the numerous properties of finite automata over finitely generated (f.g., for short) free monoids: being apt to determinisation for instance, the family of languages accepted by such automata is endowed with many properties as well: being closed under complementation for instance. These properties are extraneous to the

40

Jacques Sakarovitch

definition of the languages by expressions, and then – by the former result – to the definition by automata. It is then justified, especially in view of the generalisation of expressions and automata to other monoids and even to other structures, to set up a definition of a new family of languages by new means, that will extend in the case of other structures, these properties of the languages over f.g. free monoids. It turns out that the adequate definition will be given in terms of representations by matrices of finite dimension; we shall call the languages defined in that way the recognisable languages and we shall denote their family by Rec A . The second step of Kleene’s theorem consists then in establishing that finite automata are equivalent to matrix representations of finite dimension under the hypothesis that the labels of automata are taken in f.g. free monoids. These two steps correspond to two different concepts: rationality for the first one, and recognisability for the second one. This chapter focuses on rationality and on the first step, namely the equivalence of expressiveness of finite automata and rational expressions. For sake of completeness however, we sketch in § 2 how one gets from rational sets to recognisable sets in the case of free monoids and in § 5, we see that the same construction fails in non-free monoids and explore what remains true. The languages and their representation. Formal languages or, in the weighted variant, formal power series, are potentially infinite objects. We are only able to compute finite ones; here, expressions that denote, or automata that accept, languages or series. Hopefully, these expressions and automata are faithful description of the languages or series they stand for, all the more effective that one can take advantage of this double view. In order to prove that the family of languages accepted by finite automata coincide with that of the languages denoted by rational expressions we proceed by establishing a double inclusion. As sketched in Figure 1, 1 given an automaton A that accepts a language K , we describe algorithms which compute from A an expression F that denotes the same language K . We call such algorithms a € -map. Conversely, given an expression E that denotes a language L, we describe algorithms that compute from E an automaton B that accepts the same language L. We call such algorithms a -map. Most of the works devoted to the conversion between automata and expressions address the problem of the complexity of the computation of these € - and -maps. We have chosen to study here the maps for themselves, how the results of different maps applied to a given argument are related, rather than to describe the way they are actually computed. The € -maps are considered in § 3, the -maps in § 4. The path to generalisation. The main benefit of splitting Kleene’s theorem into two steps is to bring to light that the first one is a statement whose scope extends much beyond languages. It is first generalised to subsets of arbitrary monoids and then, with some precaution, to subsets with multiplicity, that is, to (formal power) series. This latter extension of the realm of Kleene’s theorem is a matter for the same “splitting” and distinction between series on arbitrary monoids and series on f.g. free monoids. 1 P.A / denotes the power set of A , that is, the set of all languages over A .

2. Automata and rational expressions

P.A /

L

41

K

Rat A

F E

A B

RatE A

Aut A

Figure 1. The € - and -maps

It would thus be possible to first set up the convenient and most general structure and then state and prove Kleene’s theorem in that framework. My experience, however, is that many readers tend to be repelled and flee when confronted with statements outside the classical realm of words, languages, and free monoids. This is the subject of the first three sections of this chapter. The only difference with the classical exposition will be in the terminology and notation that will be carefully chosen or coined so that they will be ready for the generalisation to arbitrary monoids in § 5 and to series in § 6. Notation and definitions given in Chapter 1 are used in this chapter without comment when they are referred to under the same form and with the exact same meaning.

2. Rationality and recognisability We first introduce here a precise notion of rational expression, and revisit the definition of finite automata in order to fix our notation and to state, under the form that is studied here and eventually generalised later, what we have called above the “first step of Kleene’s theorem” and which we now refer to as the fundamental theorem of finite automata. Second, we state and prove “the second step” of Kleene’s theorem in order to make the scope and essence of the first step clearer by contrast and difference. 2.1. Rational expressions. The set of rational languages of A , denoted by Rat A , is defined as in Chapter 1: it is the smallest subset of P.A / which contains the finite sets (including the empty set) and is closed under union, product, and star. A precise structure-revealing specification for building elements of this family can be given by rational expressions. Definition 2.1. A rational expression over A is a well-formed formula built inductively from the constants 0 and 1 and the letters a in A as atomic formulas, using two binary operators C and  and one unary operator : if E and F are rational expressions,

Jacques Sakarovitch

42

so are .E C F/, .E  F/, and .E /. We denote by RatE A the set of rational expressions over A and often write expression for rational expression. (As in [62], “rational expression” is preferred to the more traditional regular expression for several reasons and in particular as it will be used in the weighted case as well, see § 6.) With every expression E in RatE A is associated a language of A , which is called the language denoted by E and we write 2 it as jEj. The language jEj is inductively defined by 3 j0j D ;, j1j D ¹1Aº, jaj D ¹aº for every a in A, j.E C F/j D jEj [ jFj, j.E  F/j D jEjjFj, and j.E /j D ¹jEjº . Two expressions are equivalent if they denote the same language. Proposition 2.1. A language is rational if and only if it is denoted by an expression. Like any formula, an expression E is canonically represented by a tree, which is called the syntactic tree of E. Let us denote by ` .E/ the literal length of the expression E (that is, the number of all occurrences of letters from A in E) and by d.E/ the depth of E which is defined as the depth – or height 4 – of the syntactic tree of the expression. The classical precedence relation between operators, where  is bound tighter than  which is bound tighter than C to their arguments: “ >  > C” allows us to save parentheses in the writing of expressions: for instance, E C F  G is an unambiguous writing for the expression .E C .F  .G ///. But one should be aware that, for instance, .E  .F  G// and ..E  F/  G/ are two equivalent but distinct expressions. In particular, the derivation that we define at § 4 yields different results on these two expressions. In the sequel, any operator defined on expressions is implicitly extended additively to sets of expressions. For instance, [ jX j D jEj for all X  RatE A : E2X

Definition 2.2. The constant term of an expression E over A , written c.E/, is the Boolean value, inductively defined and computed using the following equations: c.0/ D 0;

c.1/ D 1;

c.F C G/ D c.F/ C c.G/;

c.a/ D 0

for all a 2 A;

c.F  G/ D c.F/c.G/;

c.F / D 1:

The constant term of a language L of A is the Boolean value c.L/ that is equal to 1 if and only if 1A belongs to L. By induction on d.E/, the equality c.E/ D c.jEj/ holds. 2.2. Finite automata. We denote an automaton over A by A D h A; Q; I; E; T i where, as in Chapter 1, Q is the set of states, and is also called the dimension of A, I and T are subsets of Q, and E  Q  A  Q is the set of transitions labelled by letters of A. The automaton A is finite if E is finite; hence, if A is finite, if and only if 2 The notation L.E/ is more common, but jEj is simpler and more appropriate when dealing with expressions over an arbitrary monoid or with weighted expressions. 3 The empty word of A is denoted by 1A . 4 We rather not use height because of the possible confusion with the star height, cf. § 4.

2. Automata and rational expressions

43

(the useful part of) Q is finite. A computation in A from state p to state q with label w w is denoted by p ! q . A

The language accepted 5 by A, also called the behaviour of A, denoted by jAj, is the set of words accepted by A, that is, the set of labels of successful computations: ˇ ® ¯ w jAj D w 2 A ˇ there exist i 2 I and t 2 T such that i ! t : A

The first step of Kleene’s theorem, which we call fundamental theorem of finite automata then reads as follows. Theorem 2.2. A language of A is rational if and only if it is the behaviour of a finite automaton over A . Theorem 2.2 is proved by building connections between automata and expressions. Proposition 2.3 (€ -maps). For every finite automaton A over A , there exist rational expressions over A which denote jAj. Proposition 2.4 (-maps). For every rational expression E over A , there exist finite automata over A whose behaviour is equal to jEj. In § 3 we describe how expressions are computed from automata, and in § 4 how automata are associated with expressions. Before going to this matter, which is the main subject of this chapter, let us establish the second step of Kleene’s theorem. 2.3. The “second step” of Kleene’s theorem. Let us first state the definition of recognisable languages, under the form that is given for recognisable subsets of arbitrary monoids (cf. § 6.2 in Chapter 1). Definition 2.3. A language L of A is recognised by a morphism ˛ from A into a monoid N if L D ˛ 1 .˛.L//. A language is recognisable if it is recognised by a morphism into a finite monoid. The set of recognisable languages of A is denoted by Rec A . Theorem 2.5 (Kleene). If A is a finite alphabet, then Rat A D Rec A . The proof of this statement paves the way to further developments in this chapter. Let A D h A; Q; I; E; T i be a finite automaton. The set E of transitions may be written as a Q  Q-matrix, called the transition matrix of A, also denoted by E , and whose .p; q/-entry is the set (the Boolean sum) of letters that label the transitions from p to q in A. A fundamental (and well-known) lemma relates matrix multiplication and graph traversal. 5 We prefer not to speak of the language “recognised” by an automaton, and, in contrast with Chapter 1, we would not say that a language is “recognisable” when accepted by a finite automaton, in order to have a consistent terminology when generalising automata to arbitrary monoids.

Jacques Sakarovitch

44

Lemma 2.6. Let E be the transition matrix of the automaton A of finite dimension Q. Then, for every n in N, E n is the matrix of the labels of paths of length n in A: ˇ ® ¯ w n Ep;q D w 2 An ˇ p ! q : A

The subsets I and T of Q may then be seen as Boolean vectors P of dimension Q (I as a row and T as a column-vector). From the notation 6 E  D n2N E n , it follows jAj D I  E   T:

(1)

The next step in the preparationPof the proof of Theorem 2.5 is to write the transition matrix E as a formal sum E D a2A .a/a, where for every a in A, .a/ is a Boolean Q  Q-matrix. These matrices .a/ define a map W A ! BQQ (the Boolean semiring B has been defined in Chapter 1). The second lemma involves the freeness of A . P Lemma 2.7.PLet W A ! BQQ be a morphism P and let E D a2A .a/a. For every n  n in N, E D w2An .w/w , and thus E D w2A .w/w .

Proof of Theorem 2.5. By Theorem 2.2, a rational language L of A is the behaviour of a finite automaton A D h A; Q; I; E; T i, that is, by (1), L D I  E   T and by Lemma 2.6,  X  X LDI .I  .w/  T /w; .w/w  T D w2A

w2A

that is,

L D jAj D ¹w 2 A j I  .w/  T D 1º :

Thus, if we write S D ¹m 2 BQQ j I  m  T D 1º, then L D  1 .S / and L is recognisable. Conversely, let L be a recognisable language of A , recognised by the morphism  ˛W A ! N and let S D ˛.L/. Consider the automaton A˛ whose state set is the monoid N itself and whose transition are defined by the multiplication in N , that is, A˛ D h A; N; ¹1N º; E; S i where E D ¹.n; a; n˛.a// j a 2 A; n 2 N º. It is immediate that ˇ ¯ ® w jA˛ j D w 2 A ˇ there exists p 2 S such that 1N ! p A

D ¹w 2 A j ˛.w/ 2 S º D ˛ 1 .S / DL

and L is rational by Theorem 2.2.

We postpone the example that shows that recognisability and rationality are indeed two distinct concepts and the description of the relationships that can be found between them to § 5. As mentioned in Chapter 1, the following holds. 6 As we have written jEj rather than L.E/, we write E  D

P

n2N E

n

rather than E  D

S

n2N E

n.

2. Automata and rational expressions

45

Theorem 2.8. The equivalence of finite automata over A is decidable. Proposition 2.4 then implies the following result. Corollary 2.9. The equivalence of rational expressions over A is decidable.

3. From automata to expressions: the € -maps For the rest of this section, A D h A; Q; I; E; T i is a finite automaton over A , and E is viewed, depending on the context, as the set of transitions or as the transition matrix of A. As in (1), the language accepted by A is conveniently written as X jAj D I  E   T D .E  /i;t : i 2I;t 2T

In order to prove that jAj is rational, it is sufficient to establish the following.

Proposition 3.1. The entries of E  belong to the rational closure of the entries of E . But we want to be more precise and describe procedures that produce for every entry of E  a rational expression whose atoms are the entries of E (and possibly 1). There are (at least) four classical methods to proving Proposition 3.1, which can easily be viewed as algorithms serving our purpose and which we present here: 1. direct computation of jAj; the state-elimination method looks the most elementary and is indeed the easiest for both hand computation and computer implementation; 2. computation of E   T as a solution of a system of linear equations; based on Arden’s lemma, it also allows to consider E   T as a fixed point; 3. iterative computation of E  , known as McNaughton–Yamada algorithm and probably the most popular among textbooks on automata theory; 4. recursive computation of E  : based on Arden’s lemma as well, this algorithm combines mathematical elegance and computational inefficiency. The first three are based on an ordering of the states of the automaton. For comparing the results of these different algorithms, and of a given one when the ordering of states varies, we first introduce the notion of rational identities, together with the key lemma of Arden for establishing the correctness of the algorithms as well as the identities. The section ends with a refinement of Theorem 2.2 which, by means of the notions of star height and loop complexity, relates even more closely an automaton and the rational expressions that are computed from it. 3.1. Preparation: rational identities and Arden’s lemma. By definition, all expressions which denote the behaviour of a given automaton A are equivalent. We may then ask whether, and how, this equivalence may be established within the world of expressions itself. We consider “elementary equivalences” of more or less simple expressions, which we call rational identities, or identities for short, and which correspond to properties of (the semiring of) the languages denoted by the expressions. And we try to

Jacques Sakarovitch

46

determine which of these identities, considered as axioms, are necessary, or sufficient, to obtain by substitution one expression from another equivalent one. It is known – and out of the scope of this chapter – that no finite sets of identities exist that allow to establish the equivalence of expressions in general (see Chapter 20). We shall see however that a basic set of identities is sufficient to deduce the equivalence between the expressions computed by the different € -maps described here. Trivial and natural identities. A first set of identities, that we call trivial identities, expresses the fact that 0 and 1 are interpreted as the zero and unit of a semiring: EC0  E ;

0CE  E ;

E0  0;

0E  0;

E1  E;

1E  E;

0  1: (T )

An expression is said to be reduced if it contains no subexpressions which is a left-hand side of one of the above identities; in particular, 0 does not appear in a non-zero reduced expression. Any expression H can be rewritten in an equivalent reduced expression H0 ; this H0 is unique and independent of the way the rewriting is conducted. From now on, all expressions are implicitly reduced, which means that all the computations on expressions that will be defined below are performed modulo the trivial identities. The next set of identities expresses the fact that the operators C and  are interpreted as the addition and product in a semiring: and .E  F/  G  E  .F  G/; E  .F C G/  E  F C E  G and .E C F/  G  E  G C F  G; E C F  F C E:

.E C F/ C G  E C .F C G/

(A ) (D ) (C )

The conjunction A ^ D ^ C is abbreviated as N and called the set of natural identities. Aperiodic identities. The product in P.A / is distributive over infinite sums; then K  D 1A C K  K D 1A C KK 

for all K 2 P.A /;

(2)

from which we deduce the identities E  1 C E  E

and E  1 C E  E:

(U )

Arden’s lemma, whose usage is ubiquitous, now follows from (U ) and the gradation 7 of A . Lemma 3.2 (Arden). Let K and L be two subsets of A . Then K  L is a solution of the equation X D K X C L. If c.K/ D 0, then K  L is the unique solution of the equation. For computing expressions, we prefer to use Arden’s lemma in the following form. Corollary 3.3. Let K and L be two rational expressions over A with c.K / D 0. Then, K  L denotes the unique solution of X D jK jX C jLj. 7 That is, the elements of A have a length which is a morphism from A onto N (cf. § 6).

2. Automata and rational expressions

47

The next identities, called aperiodic identities, are a consequence of Lemma 3.2. Proposition 3.4. For all rational expressions E and F over A , .E C F/  E  .F  E / 

and .E C F/  .E  F/  E ; 

.E  F/  1 C E  .F  E/  F:

(S ) (P )

There are many other (independent) identities (cf. § 7). The remarkable fact is that those listed above will be sufficient for our purpose. Identities special to P.A  /. Finally, the idempotence of the union in P.A / yields two further identities (I )

E C E  E;  



(J )

.E /  E :

In contrast with the preceding ones, these two identities I and J do not hold for expressions over arbitrary semirings of formal power series (cf. § 6). Identity I follows from 1 C 1  1 and J from 1  1 (in presence of S and P ).

3.2. The state-elimination method. The algorithm usually called state-elimination method, originally due to Brzozowski and McCluskey [13], works directly on the automaton A D h A; Q; I; E; T i. It consists in suppressing the states in A, one after the other, while transforming the labels of the transitions so that the language accepted by the resulting automaton is unchanged (cf. [70] and [71]). A current step of the algorithm is represented at Figure 2. The left diagram shows the state q to be suppressed, a state pi which is the origin of a transition whose end is q and a state rj which is the end of a transition whose origin is q (it may be the case that pi D rj ). By induction, the labels are rational expressions. The right diagram shows the automaton after the suppression of q , and the new label of the transition from pi to rj . The languages accepted by the automaton before and after the suppression of q are equal. A formal proof will follow in the next subsection. L Ki

pi

q G

Hj

rj

pi

G C K i L Hj

rj

Figure 2. One step in the state-elimination method

More precisely, the state-elimination method consists first in augmenting the set Q with two new states i and t , and adding transitions labelled with 1 from i to every initial state of A and from every final state of A to t . Then all states in Q are suppressed according to the procedure described above and in a certain order ! . At the end, only remain states i and t , together with a transition from i to t labelled with an expression which we denote by B! .A/ and which is the result of the algorithm. Thus, we have jAj D jB! .A/j:

Jacques Sakarovitch

48

Figure 3 shows every step of the state-elimination method on the automaton D3 drawn in the upper left corner and following the order !1 D r < p < q . It shows the result B!1 .D3 / D a b.ba b C ab  a/ ba C a . The computation of B! .A/ may silently involve identities in N . A common and natural way of performing the computation is to use identities I and J as well: it yields simpler results. It is then to be stressed that the use of I and J is not needed to establish the equivalence results to come, such as Theorem 3.5 and Corollaries 3.7 and 3.9. i a

D3

p

b

a

q

b

b

r

a

b 2

a

a 3

b

b 1

a

t i a

i ba b

b 2

3

i

a b

a

3

ba b C ab a

a C a b .b a b C a b a/ b a

b t

t

ba

t

Figure 3. The state-elimination method exemplified on the automaton D3

The effect of the order. The result of the state-elimination method obviously depends on the order ! in which the states are suppressed. For instance, on the automaton D3 of Figure 3, the other order !2 D r < q < p yields B!2 .D3 / D .a C b.ab  a/ b/ , and !3 D p < q < r yields B!3 .D3 / D a C a b.ba b/ ba

C a b.ba b/ a.b C a.ba b/ a/ a.ba b/ ba :

Theorem 3.5 (Conway [20] and Krob [40]). Let ! and ! 0 be two orders on the set of states of an automaton A. The assertion N ^ S ^ P B! .A/  B! 0 .A/ holds.

The question of the length of these expressions is also of interest, both from a theoretical as well as practical point of view. The above example D3 is easily generalised so as to find an exponential gap between the length of expressions for two distinct orders. The search for short expressions is performed by heuristics (see § 7). 3.3. The system-solution method. The computation of an expression that denotes the language accepted by a finite automaton as the solution of a system of linear equations is nothing else than the state-elimination method turned into a more mathematical setting. Description of the algorithm. Given A D h A; Q; I; E; T i, for every p in Q, we write Lp for the set of words which are the label of computations from p to a final state ˇ ¯ w of A: Lp D ¹w 2 A ˇ there exists t 2 T such that p ! t . For a subset R of Q, we A

2. Automata and rational expressions

49

write the symbol ıp;R for 1 if p is in R and 0 if not. The system of equations associated with A is written as follows: X X jAj D Lp D ıp;I Lp ; (3) p2I

Lp D

X

q2Q

p2Q

jEp;q jLq C jıp;T j

for all p 2 Q;

(4)

where the Lp are the “unknowns” and the entries Ep;q , which represent subsets of A, as expressions Ep;q are sums of letters labelling paths of length 1. The system (4) may be solved by successive elimination of the unknowns, by means of Arden’s lemma, since c.Ep;q / D 0 for all p , q in Q. When all unknowns Lq have been eliminated in the ordering ! on Q, the computation yields an expression that we denote by E! .A/ and jAj D jE! .A/j holds. As for the state-elimination method, the identities N (and I and J ) are likely to have been involved at any step of the computation of E! .A/. Comparison with the state-elimination method. The state-elimination method and the system-solution are indeed one and the same algorithm for computing the language accepted by a finite automaton, as stated by the following. Proposition 3.6 ([62]). The equality B! .A/ D E! .A/ holds for any order ! on the states of A. The state-elimination method reproduces, in the automaton A, the computations corresponding to the solution of the system: the latter is a formal proof of the former. As another consequence of Proposition 3.6, the following corollary of Theorem 3.5 holds. Corollary 3.7. Let ! and ! 0 be two orders on the set of states of an automaton A. Then, E! .A/  E! 0 .A/ : N ^S ^P

3.4. The McNaughton–Yamada algorithm. Given an automaton A D hQ; A; E; I; T i, the McNaughton–Yamada algorithm ([49]) – called here MN-Y algorithm for short – computes E  , whereas the two preceding methods rather compute jAj directly. Like the former methods, it relies on an ordering of Q but it is based on a different grouping of computations 8 within A. Description of the algorithm. The set Q ordered by ! is identified with the set of integers from 1 to n D Card .Q/. The key idea of the algorithm is to group the set of paths between any states p and q in Q according to the highest rank of the intermediate .k/ states. We denote by Mp;q the set of labels of paths from p to q which do not pass 8 In order to avoid confusion between the computations of expressions that denote the language accepted by A and whose variations are the subject of the chapter, and the computations within A, which is the way we call the paths in the labelled directed graph A, we use the latter terminology in this section.

Jacques Sakarovitch

50

through intermediate states of rank greater than k . And we shall compute expressions .k/ .k/ .k/ Mp;q such that jMp;q j D Mp;q . A path that does not pass through any intermediate state of rank greater than 0 .0/ .0/ reduces to a direct transition. Thus Mp;q D Ep;q and Mp;q D Ep;q . A path which goes from p to q without visiting intermediate states of rank greater than k is (a) either a path (from p to q ) which does not visit intermediate states of rank greater than k 1, (b) or the concatenation  of a path from p to k without passing through states of rank greater than k 1,  followed by an arbitrary number of paths which go from k to k without passing through intermediate states of rank greater than k 1,  followed finally by a path from k to q without passing through intermediate states of rank greater than k 1.

This decomposition implies that for all p and q in Q, for all k 6 n, it holds .k 1/

.k 1/

.k 1/

.k/ .k 1/ Mp;q D Mp;q C Mp;k .Mk;k / Mk;q

:

We write Mp;q for .E  /p;q , Mp;q for an expression that denotes it. The algorithm ends with the last equation: ´ .n/ Mp;q if p ¤ q , Mp;q D .n/ Mp;q C 1 if p D q . For consistency with the previous sections, we write X M! .A/ D Mp;q p2I;q2T

and the equality jAj D jM! .A/j holds. Example 3.1. The MN-Y algorithm applied to the automaton R1 of Figure 4 yields .k/ the following matrices (we group together, for each k , the four Mp;q into a matrix M.k/ ):   a b M.0/ D ; a b   a C a.a/ a b C a.a/ b .1/ M D ; a C a.a/ a b C a.a/ b  a C a.a/ a C .b C a.a/ b/.b C a.a/ b/ .a C a.a/ a/ M.2/ D a C a.a/ a C .b C a.a/ b/.b C a.a/ b/ .a C a.a/ a/  .b C a.a/ b/ C .b C a.a/ b/.b C a.a/ b/ .b C a.a/ b/ : .b C a.a/ b/ C .b C a.a/ b/.b C a.a/ b/ .b C a.a/ b/

2. Automata and rational expressions a

b 1

a

51

b 2

Figure 4. The automaton R1

As in the first two methods, identities in N (as well as I and J ) are likely to be used at any step of the MN-Y algorithm. What is new is that identities D and U are particularly fit for the computations involved in the MN-Y algorithm. For instance, after using these identities, the above matrices become         a a a b .a b/ a a .a b/ a b .2/ M.1/ D and M D : a a a b .a b/ a a .a b/ a b

Comparison with the state-elimination method. Comparing the MN-Y algorithm with the state-elimination method amounts to relating two objects whose form and mode of construction are rather different: on the one hand, a QQ-matrix obtained by successive transformations and on the other hand, an expression obtained by repeated modifications of an automaton, hence of a matrix, but one whose size decreases at each step.

Proposition 3.8 ([62]). Let A D h A; Q; I; E; T i be an automaton and for every p and q in Q, let Ap;q be the automaton defined by Ap;q D h A; Q; ¹pº; E; ¹qº i. For M! .Ap;q /  B! .Ap;q / holds. every order ! on Q, N ^ U

As a consequence of Proposition 3.8, we have the following corollary of Theorem 3.5.

Corollary 3.9. Let ! and ! 0 be two orders on the states of an automaton A. Then, N ^S ^P

M! .A/  M! 0 .A/:

3.5. The recursive method. This last method is due to Conway [20]. It is based on computation on matrices via block decomposition. Originally, it yielded a proof of Proposition 3.1. As we did above, we modify it so as to make it compute from E, a matrix of rational expressions which denotes E , a matrix E0 of rational expressions that denotes the matrix E  . Description of the algorithm. Let us write a block decomposition of E and the corresponding ones for E and E  :       F G F G U V  ED ; ED ; E D ; H K H K W Z where F and K (and thus F, K , U , and Z ) are square matrices. By (2), 9 it follows that        U V 1 0 F G U V E D D C ; W Z 0 1 H K W Z 9 Applied to matrices with entries in P.A / rather than to elements of P.A /.

52

Jacques Sakarovitch

an equation which can be decomposed into a system of four other equations: U D 1 C jFjU C jGjW ;

Z D 1 C jHjV C jK jZ;

V D jFjV C jGjZ;

W D jHjU C jK jW :

(5) (6)

Corollary 3.3 applies to (6) and then, after substitution, to (5). This procedure leads to the computation of E  by induction on its dimension. By the induction hypothesis, obviously fulfilled for matrices of dimension 1, jFj and jK j are denoted by matrices of rational expressions F0 and K 0 . Let us write   .F C GK 0 H/ F0 G.K C HF0 G/ E0 D ; K 0 H.F C GK 0 H/ .K C HF0 G/

and jE0 j D E  holds. Another application of the induction hypothesis to j.F C GK 0 H/j and j.K C HF0 G/j shows that the entries of E0 , which we denote by C .A/, where  is the recursive division of Q used in the computation, are all in RatE A . Example 3.2. The recursive method applied to the automaton R1 of Example 3.1 (cf. Figure 4) directly gives (there is no choice for the recursive division):   .a C b.b/ a/ a b.b C a.a/ b/ C .R1 / D : b  a.a C b.b/ a/ .b C a.a/ b/

Comparison with the state-elimination method. Both the recursive method and the MN-Y algorithm yield a matrix of expressions. Example 3.2 shows that there is no hope for an easy inference of the equivalence of the two matrices. We state however the following conjecture. Conjecture 3.10. Let A D h A; Q; I; E; T i be an automaton. For every recursive division  of Q and for every pair .p; q/ of states, there exists an ordering ! of Q such that .C .A//p;q  B! .Ap;q /: N ^U

More generally, and as a conclusion of the description of these four methods, one would conjecture that the rational expressions computed from a same finite automaton are all equivalent modulo the natural identities and the aperiodic ones S and P . Even if computed from is not formal enough, the above developments should make the general idea rather clear.

3.6. Star height and loop complexity. Among the three rational operators C, , and , the operator  is the one that “gives access to the infinite,” hence the idea of measuring the complexity of an expression by finding the degree of nestedness of this operator, a number called star height. On the other hand, it is the circuits in a finite automaton that produce an infinite number of computations, “all the more” that the circuits are more “entangled.” The intuitive idea of entanglement of circuits will be captured by the notion of loop complexity. A refinement of Theorem 2.2 relates the loop complexity of an automaton to the star height of an expression that is computed from this automaton, a result which is due originally to Eggan [23].

2. Automata and rational expressions

53

Star height of an expression. Let E be an expression over A . The star height of E, denoted by hŒE, is inductively defined by hŒE D max.hŒE0 ; hŒE00 / if E D E0 C E00 or E D E0  E00 , hŒE D 1 C hŒF if E D F , and starting from hŒ0 D hŒ1 D 0 and hŒa D 0, for every a in A. Example 3.3. (i) hŒ.a C b/  D 1 and hŒa .ba /  D 2.

(ii) The heights of the three expressions computed for the automaton D3 at § 3.2 are hŒB!1 .D3 / D 2, hŒB!2 .D3 / D 3, and hŒB!3 .D3 / D 3. Two equivalent expressions may then have different star heights, and this gives rise to the star-height problem (see § 7). Loop complexity of an automaton. We call loop complexity of an automaton A (called cycle rank in [23]) the integer lc.A/ defined inductively by the following equations, where a ball is (an efficient abbreviation for) a non-trivial strongly connected component:  lc.A/ D 0 if A contains no balls (in particular if A is empty);  lc.A/ D max¹lc.P/ j P a ball in Aº if A is not strongly connected;  lc.A/ D 1 C min¹lc.A n ¹sº/ j s state of Aº if A is strongly connected.

Theorem 3.11 (Eggan [23]). The loop complexity of a trim automaton A is the minimum of the star height of the expressions computed from A by the state-elimination method. This theorem may be proved by establishing a more precise statement which involves a refinement of the loop complexity and which we call the loop index. If ! is an order on the state set of A, we write ! x for the greatest state according to ! . If R is a subautomaton of A, we also write ! for the trace of ! over R and, in such a context, ! x for the greatest state of R according to ! . Then, the loop index of A relative to ! , written I! .A/, is the integer inductively defined by the following:  if A contains no ball, or is empty, then I! .A/ D 0;  if A is not itself a ball, then I! .A/ D max.¹I! .P/ j P ball in Aº/;  if A is a ball, then I! .A/ D 1 C I! .A n !/ x .

The difference with respect to loop complexity is that the state that we remove from a strongly connected automaton (in the inductive process) is fixed by the order ! rather than being the result of a minimisation. This definition immediately implies that lc.A/ D min¹I! .A/ j ! is an order on Qº

holds and Theorem 3.11 is then a consequence of the following. Proposition 3.12 ([43]). For any order ! on the states of A, I! .A/ D hŒB! .A/. Proposition 3.13. For every rational expression E there exists an automaton A which accepts jEj and the loop complexity of A is equal to the star height of E.

54

Jacques Sakarovitch

4. From expressions to automata: the -maps The transformation of rational expressions into finite automata establishes Proposition 2.4. It is even more interesting than the transformation in the other way, both from a theoretical point of view and for practical purposes, as there are many questions that cannot be answered directly on expressions but require first their transformation into automata. Every expression might be mapped to several automata, each of them being computed in different ways. We distinguish the objects themselves, that is, the computed automata, which we try to characterise as intrinsically as possible, from the algorithms that allow to compute them. We present two such automata: the Glushkov, or position, automaton and that we rather call the standard automaton of the expression, and the derived-term automaton, that was first defined by Antimirov. The standard automaton may be defined for expressions over any monoid whereas the derived-term automaton will be defined for expressions over a free monoid only. In this section however, we restrict ourselves to expressions over a free monoid. We begin with the presentation of two techniques for transforming an automaton into another one, that will help us in comparing the various automata associated with a given expression. 4.1. Preparation: closure and quotient Closure of automata with spontaneous transitions. Automata have been defined (§ 2.2) as graphs labelled by letters of an alphabet. It is known that the family of languages accepted by finite automata is not enlarged if transitions labelled by the empty word, called spontaneous transitions (or "-moves), are allowed as well. The backward closure of such an automaton A D h A; Q; I; E; T i is the equivalent automaton B D h A; Q; I; F; U i with no spontaneous transitions defined by ˇ ¯ ® 1A ! q; for all .q; a; r/ 2 E ; F D .p; a; r/ ˇ there exists q 2 Q such that p A

and

® ˇ U D p ˇ there exists q 2 T such that p

¯ !q ;

1A A

The automaton B is effectively computable, as the determination of F and U amounts to computing the transitive closure of a finite directed graph. Morphisms and quotient. Automata are structures; a morphism is a map from an automaton into another one which is compatible with this structure. Definition 4.1. Let A D h A; Q; I; E; T i and A0 D h A; Q0 ; I 0 ; E 0 ; T 0 i be two automata. A map 'W Q ! Q0 is a morphism (of automata) if

i. '.I /  I 0 , ii. '.T /  T 0 , iii. .'.p/; a; '.q// 2 E 0 , for all .p; a; q/ 2 E .

2. Automata and rational expressions

55

The automaton A0 is a quotient of A if, moreover,

iv. v. vi. vii.

'.Q/ D Q0 , that is, ' is surjective, '.I / D I 0 , ' 1 .T 0 / D T , for all .r; a; s/ 2 E 0 , p 2 ' 1 .r/ there exists q 2 '

1

.s/ such that .p; a; q/ 2 E .

Definition 4.1 generalises the classical notion of quotient of complete deterministic automata to arbitrary automata. Every automaton A admits a minimal quotient, which is a quotient of every quotient of A. In contrast with the case of deterministic automata, the minimal quotient of A is canonically associated with A, and not with the language accepted by A. 4.2. The standard automaton of an expression. The first automaton we associate with an expression E, which we write SE and which plays a central role in our presentation, was first defined by Glushkov (in [32]). For the same purpose, McNaughton and Yamada computed the determinisation of SE in their paper [49] that we already quoted. In order to give an intrinsic description of SE , we define a restricted class of automata, and then show that rational operations on sets can be lifted on the automata of that class. 4.2.1. Operations on standard automata. An automaton is standard if it has only one initial state, which is the end of no transition. Figure 5 shows a standard automaton, both as a sketch, and under the matrix form. The definition does not forbid the initial state i from also being final and the scalar c , equal to 0 or 1, is the constant term of jAj.

;

AD

0

;

F

0

A

U

0

B B @

i c

c

B B @

J

Figure 5. A standard automaton

Every automaton is equivalent to a standard one. Their special form allows to define operations on standard automata that are parallel to the rational operations. Let A (as in Figure 5) and B (with obvious notation) be two standard automata. Then we define the following standard automata: cCd

K

ACBD

;

F

0

0

G

;

U

;

V

(7)

Jacques Sakarovitch

56

cd

cK

0

G

;

; (8)

V

1

;

Ud

;

U

;

(9)

0

U K

B B @

A D

F

B B @

;

A BD

0

where H D U J CF . The use of the constants c and d allows a uniform treatment of the cases whether or not the initial states of A and B are final. Straightforward computations show that j.A C B/j D jAj C jBj, j.A  B/j D jAj  jBj, and j.A /j D jAj . With every rational expression E and by induction on its depth, we thus canonically associate a standard automaton, which we write SE and which we call the standard automaton of E. The induction and the computations show that the map E 7! SE is a -map. Proposition 4.1. If E is a rational expression over A , then jSE j D jEj. The inductive construction of SE also implies the following result. Property 4.2. If E is a rational expression, the dimension of SE is `.E/ C 1. Example 4.1. Figure 6 shows SE1 , where E1 D .a b C bb  a/ . b b a

b b

a b

b a

b a

b b

a

a

Figure 6. The automaton SE1

The example of E D ...a C b  / C c  / C d  / : : : shows that the direct computation of SE by (7)–(9) leads to an algorithm whose complexity is cubic in `.E/. The quest for a better algorithm leads to a construction that is interesting per se.

2. Automata and rational expressions

57

4.2.2. The star-normal form of an expression. The star-normal form of an expression has been defined by Brüggemann-Klein [11] in order to design a quadratic algorithm for the computation of the standard automaton of an expression. The interest of this notion certainly goes beyond that complexity improvement. Definition 4.2 ([11]). A rational expression E is in star-normal form (SNF)10 if and only if for any F such that F is a subexpression of E, c.F/ D 0. Two operators on expressions, written  and , are defined by a mutual recursion on the depth of the expression that defines and allows to compute the star-normal form of the expression: 0 D 0 D 0;

a D a D a for all a 2 A; ´   F CG if c.F/ D c.G/ D 1;        .F C G/ D F C G ; .F / D F ; .F  G/ D   F G otherwise, 

1 D 1 D 0; 





.F C G/ D F C G ; .F  G/ D F  G ; .F / D .F / : 

Example 4.2. Let E2 D .a b  / . Then, E2  D ..a b  / / D ..a / C .b  / / D ..a/ C .b/ / D .a C b/ : 









Theorem 4.3 ([11]). For any expression E, E is in star-normal form and one has SE D SE . As the computation of E is linear in `.E/, the goal is achieved by the following result. Theorem 4.4 ([11]). The computation of SE has a quadratic complexity in `.E/. 4.2.3. The Thompson automaton. A survey on -maps cannot miss out the method due to Thompson [67]. It was designed to be directly implementable as a program, primarily for searching with rational expressions in text. It is based on the use of spontaneous transitions. Figure 7 shows the basic steps of the construction, which, by induction, associates with an expression E a unique (and well-defined) automaton TE . It is remarkable that this construction corresponds indeed to another way of defining the standard automaton. Proposition 4.5. The backward closure of TE is equal to SE . 10 The definition, as well as the construction, have been slightly modified from the original, for simplification.

Jacques Sakarovitch

58 1A

a

i0

(a) base cases

1A i 1A

i0

i 00

t 00

(b) product

t0

1A

1A i

t i 00

1A

t0

t 00

1A

1A

i0

t0

1A

t

1A

(d) star

(c) union

Figure 7. Thompson’s construction

4.3. The derived-term automaton of an expression. Let us first recall the (left) quotient operation on languages: u

1

L D ¹v 2 A j uv 2 Lº ;

for all L 2 P.A / and u 2 A :

The quotient is a (right) action of A on P.A /: .uv/

1

LDv

1

.u

1

L/:

A fundamental, and characteristic, property of rational languages – which is another way to express that they are recognisable – is that they have a finite number of quotients. The principle of the construction we present in this section, and which we call derivation, is to lift the quotient on languages to an operation on the expressions. First introduced by Brzozowski [12], the definition of the derivation of an expression E has been modified by Antimirov [4] (cf. § 7) and yields a non-deterministic automaton AE , which we propose to call the derived-term automaton of E. This construction concerns thus expressions over free monoids only. In the sequel, E is a rational expression over A . Definition 4.3 (Brzozowski and Antimirov [4]). The derivation of E with respect to a letter a of A, denoted by @@a E, is a set of rational expressions over A , inductively defined by ´ @ @ @ ¹1º if b D a; 0D 1 D ;; bD for all b 2 A; (10) @a @a @a ; otherwise; @ @ @ .F C G/ D F [ G; @a @a @a  @  @ @ .F  G/ D F  G [ c .F / G; @a @a @a  @  @  .F / D F  F : @a @a

(11) (12) (13)

2. Automata and rational expressions

59

Equation (12) should be understood with the convention that the product xX of a set X by a Boolean value x is X if x D 1 and ; if x D 0. The induction involved in equations (11)–(13) should be interpreted by extending derivation additively (as are always derivation operators) and by distributing (on the right) the  operator over sets as well. Finally, every operation on rational expressions is computed modulo the trivial identities T , but not modulo the natural identities N (nor the idempotent identities I and J ). Definition 4.4. The derivation of E with respect to a non-empty word v of A , denoted by @@v E, is the set of rational expressions over A , defined by (11)–(13) for letters in A and by induction on the length of v by @ @  @  ED E ; for all u 2 AC and a 2 A: (14) @ua @a @u The derivation of expressions is parallel to the quotient of languages and we have ˇ ˇ ˇ @ ˇ ˇ Eˇ D u 1 jEj for all E 2 RatE A and u 2 AC : (15) ˇ @u ˇ Example 4.3. The derivation of E1 D .a b C bb  a/ (cf. Example 4.1) yields @ @ E1 D E1 D ¹a b E1 º; @a @aa @  .a b E1 / D ¹E1 º; @b

@ .E1 / D ¹E1 ; b  aE1 º; @b

@  .b aE1 / D ¹E1 º; @a

@  .b aE1 / D ¹b  aE1 º: @b

4.3.1. The derived-term automaton. Derivation thus associates a pair of an expression and a word with a set of expressions. We now turn this map into an automaton. @ Definition 4.5. We call true derived term of E every expression that belongs to @w E C for some word w of A ; we write TD.E/ for the set of true derived terms of E: [ @ TD.E/ D E: (16) @w C w2A

The set D.E/ D TD.E/ [ ¹Eº is the set of derived terms of E. Example 4.4 (continuation of Example 4.3). D.E1 / D ¹E1 ; a b E1 ; b  aE1 º. The sets of derived terms and the rational operations are related by the following equations, from which most of the subsequent properties will be derived. Proposition 4.6. Let F and G be two expressions. Then, one has

TD.F C G/ D TD.F/ [ TD.G/; TD.F  G/ D .TD.F//  G [ TD.G/; TD.F / D .TD.F//  F :

Jacques Sakarovitch

60

Starting from TD.0/ D TD.1/ D ; and TD.a/ D ¹1º for every a in A, TD.E/ can be computed from Proposition 4.6 by recursion and without reference to the derivation operation (cf. the prebases in [50] and Definition 6.2 below). By induction on d.E/, it follows in particular that Card .TD.E// 6 ` .E/ and thus: Corollary 4.7. Card .D.E// 6 ` .E/ C 1.

The computation of the derived terms and their derivations amounts to the construction of a -map, as expressed by the following.

Definition 4.6 (Antimirov [4]). The derived-term automaton of E is the automaton AE whose set of states is D.E/ and whose transitions are defined as follows. i. if K and K 0 are derived terms of E and a is a letter of A, then .K ; a; K 0 / is a transition if and only if K 0 belongs to @@a K ; ii. the initial state is E; iii. a derived term K is final if and only if c.K / D 1. Theorem 4.8 ([4]). For any rational expression E, jEj D jAE j.

Example 4.5 (continuation of Example 4.4). The automaton AE1 is shown at Figure 8. a

b

a b E1 b a

b a E1

E1

a b

b

Figure 8. The automaton AE1

4.3.2. Relationship with the standard automaton. The constructions of the standard and derived-term automata of an expression are different in nature. But both arise from the same inner structure of the expression by two inductive processes, and the two automata have a structural likeness which yields the following (and at the same time another proof of Corollary 4.7). Theorem 4.9 ([17]). For any rational expression E, the automaton AE is a quotient of SE . 4.3.3. Derivation and bracketing. The derivation operator is sensitive to the bracketing of expressions; on the other hand, it does commute with the associativity identity A . More precisely, we have the following. Proposition 4.10 ([3]). Let E, F and G be three rational expressions. Then, and

Card.D..E  F/  G// 6 Card.D.E  .F  G///

A

D..E  F/  G/  D.E  .F  G//:

2. Automata and rational expressions

61

Example 4.6. Let ab.c.ab// be an expression which is not completely bracketed. The derivation of the two expressions obtained by different bracketings yields D.a.b.c.ab// // D ¹a.b.c.ab// /; b.c.ab// ; .c.ab// ; .ab/.c.ab// º;

D..ab/.c.ab// / D ¹.ab/.c.ab// ; b.c.ab// ; .c.ab// º:

5. Changing the monoid Most of what has been presented so far extends without problems from languages to subsets of arbitrary monoids, from expressions over a free monoid to expressions over such monoids. We run over definitions and statements to transform them accordingly. The main difference will be that rational and recognisable sets do not coincide any longer, making the link between finite automata and rational expressions even tighter, and ruling out quotient and derivation that refer to the recognisable “side” of rational languages. Non-free monoids of interest in the field of computer science and automata theory are, among others, direct products of free monoids (for relations between words), free commutative monoids (for counting purpose), partially commutative, or trace, monoids (for modelling concurrent or parallel computations), free groups and polycyclic monoids (in relation with pushdown automata). In the sequel, M is a monoid, and 1M its identity element. 5.1. Rationality Rational sets and expressions. Product and star are defined in P.M / as in P.A / and the set of rational subsets of M , denoted by Rat M , is the smallest subset of P.M / which contains the finite sets (including the empty set) and which is closed under union, product, and star. Rational expressions over M are defined as those over A , with the only difference that the atoms are the elements of M ; their set is denoted by RatE M . We also write jEj for the subset denoted by an expression E. Two expressions are equivalent if they denote the same subset and we have the same statement as Proposition 2.1. Proposition 5.1. A subset of M is rational if and only if it is denoted by a rational expression over M . A subset G of M is a generating set if M D G  . The direct part of Proposition 5.1 may be restated with more precision as: every rational subset of M is denoted by a rational expression whose atoms are taken in any generating set. It follows from the converse part that a rational subset of M is contained in a finitely generated submonoid. Finite automata. An automaton over M , denoted by A D h M; Q; I; E; T i, is defined like an automaton over A , with the only difference that the transitions are labelled by elements of M : E  QM Q. Then, A is finite when E is finite.

Jacques Sakarovitch

62

The subset accepted by A, called the behaviour of A and denoted by jAj as above, is the set of labels of successful computations: ˇ ® ¯ m jAj D m 2 M ˇ there exist i 2 I and t 2 T such that i ! t : A

The fundamental theorem of finite automata. In this setting, the statement appears more clearly different from Kleene’s theorem. Its first appearance 11 seems to be in Elgot and Mezei’s paper on rational relations.

Theorem 5.2 ([25]). A subset of a monoid M is rational if and only if it is the behaviour of a finite automaton over M whose labels are taken in any generating set of M . There is not much to change in Propositions 2.3 and 2.4 to establish Theorem 5.2. Proposition 5.3 (€ -maps). For every finite automaton A over M , there exist rational expressions over M which denote jAj. All four methods described in § 3 apply for arbitrary M , even if their formal proof may be slightly different (Arden’s lemma does not hold anymore). Proposition 5.4 (-maps). For every rational expression E over M , there exist finite automata over M whose behaviour is equal to jEj. Here again, the algorithms and results described in § 4.2 for the construction of the standard automaton, Thompson automaton, etc. pass over to expressions over M . On the contrary, quotients in M define recognisable subsets of M and not rational ones (see below) and derivation of expressions over M no longer makes sense. 5.2. Recognisability. Definition 2.3 may be rephrased verbatim for arbitrary monoids. A subset P of M is said to be recognised by a morphism ˛W M ! N if P D ˛ 1 .˛.P //. A subset of M is recognisable if it is recognised by a morphism from M into a finite monoid. The set of recognisable subsets of M is denoted by Rec M . Recognisable and rational subsets. We can then reproduce almost verbatim the converse part of the proof of Theorem 2.5. Let P be in Rec M , recognised by a morphism ˛ . We replace the alphabet A by any generating set G of M in the construction of the automaton A˛ . If M is finitely generated, G is finite, so is A˛ and P is rational by Theorem 5.2: Proposition 5.5 (McKnight [48]). If M is finitely generated, then Rec M  Rat M .

On the other hand, the first part of the quoted proof does not generalise to non-free monoids and the inclusion in Proposition 5.5 is strict in general. For instance, the set .a; c/ D ..a; 1/.1; c// is a rational subset of a  c  (where the product is formed component wise). It is accepted by a two-state automaton which induces a map  from the generating set of a c  into B22 :     0 1 0 0 ..a; 1// D and ..1; c// D : 0 0 1 0 11 Hidden in a footnote!

2. Automata and rational expressions

63

But this map does not define a morphism from a c  into B22 since ..a; 1//..1; c// ¤ ..1; c//..a; 1//;

whereas .a; 1/.1; c/ D .1; c/.a; 1/ D .a; c/.

Decision problems for rational sets. In general, Rat M is not a Boolean algebra. This is also accompanied with undecidability results. The undecidability of the Post Correspondence Problem, easily expressed in terms of monoid morphisms, implies for instance the following result. Theorem 5.6 (Rabin and Scott [55]). It is undecidable whether the intersection of two rational sets of ¹a; bº ¹c; d º is empty or not. From which one deduce what follows.

Theorem 5.7 (Fischer and Rosenberg [27]). The equivalence of finite automata, and hence of rational expressions, over ¹a; bº ¹c; d º is undecidable. In contrast, the cases where Rat M is an effective Boolean algebra – such as when M is a (finitely generated) free commutative monoid [31] or free group [29] – play a key role in model-checking issues which involve counters, or pushdown automata.

6. Introducing weights Most of the statements about automata and expressions established in the previous sections extend again without much difficulties in the weighted case, as we have taken care to formulate them adequately. There are two questions though that should be settled first in order to set up the framework of this generalisation. First, the definition of the star operator requires some mathematical apparatus to be meaningful. Second, the definition of weighted expressions has to be tuned in such a way that former computations such as the derivation remain valid. Of course, the matter of this section overlaps with that of Chapter 4. 6.1. Weighted languages, automata, and expressions 6.1.1. The series semiring. The weights, with which we enrich the languages or subsets of monoids are taken in a semiring, so as to give the set of series we build the desired structure. We are interested in weights as they actually appear in the modelisation of phenomena that we want to be able to describe (and not because they fulfill some axioms). These are the classical numerical semirings N, Z, Q, etc., the less classical so-called tropical semirings such as h Z [ C1; min; C i, etc. and many other semirings indeed. None of them are Conway semirings (cf. Chapter 20), N is a quasiConway semiring but not the others. In the sequel, K is a semiring. The unweighted case corresponds to K D B and will be referred to as the Boolean case. As in the Boolean case, free monoids give rise to results which do not hold in nonfree ones (the Kleene–Schützenberger theorem). But not all non-free monoids allow to

64

Jacques Sakarovitch

easily define series with weights in arbitrary semirings. We restrict ourselves to graded monoids, that is, those that are equipped with a length function. They behave exactly like the free monoids as far as the construction of series is concerned, they cover many monoids that are considered in computer science, and they are sufficient to make clear the difference between the free and non-free cases as far as rationality is concerned. In the sequel, M is a finitely generated graded monoid. Series. Any map s from M to K is a formal power series (series for short) over M with coefficients in K. The image by s of an element m in M is written hs; mi and is called the coefficient of m in s . The set of these series, written KhhM ii, is equipped with the (left and right) “exterior” multiplications, the pointwise addition, and the (Cauchy) P product: for every m in M , hst; mi D uvDm hs; uiht; vi. As M is graded, the product is well defined, and these three operations make KhhM ii a semiring (see Chapter 4). The support of a series s is the subset of elements of M whose coefficient in s is not 0K . A series with finite support is a polynomial. The set of polynomials over M with coefficients in K is written KhM i and is a subsemiring of KhhM ii.

Topology. The following definition of the star as an infinite sum calls for the definition of a topology on KhhM ii. The semirings K we consider12 are equipped with a topology defined by a distance, whether it is a discrete topology (as in the cases of N, Z, h Z [ C1; min; C i, etc.) or a more classical one (as in the cases of Q, R, or another LhhN ii where N is a graded monoid and L a semiring, etc.). Since M is graded (and finitely generated) it is easy to derive a distance which defines on KhhM ii the simple convergence topology: sn converges to s () for all m in M , hsn ; mi converges to hs; mi.

Along the same line, a family of series ¹si ºi 2I is summable if for every m in M the family ¹hsi ; miºi 2I is summable (in K). An obvious case of summability is when for every m in M there is only a finite number of indices i such that hsi ; mi is different from 0K , in which case the family ¹si ºi 2I is said to be locally finite. All quoted semirings that we consider are topological semirings, that is, not only equipped with a topology, but their semiring operations are continuous. We also use silently in the sequel the following identification: if Q is a finite set, KhhM iiQQ , the semiring of Q  Q-matrices with entries in KhhM ii is isomorphic to KQQ hhM ii, the semiring of series on M with coefficients in KQQ . Star. The star, denoted t  , of an element t in an arbitrary topological semiring T (not only in of series) is defined if the family ¹t n ºn2N is summable and in this case, Pa semiring n  t D n2N t and t is said to be starable. If t  is defined, then t  D 1T Ct t  D 1T Ct  t hold. If moreover T is a ring, this can be written .1 t/t  D t  .1 t/ D 1 and t  is 12 Formally, we do not need the topology on K to be defined by a distance. It is just an intuitive way to define topology, that easily allows to relate the topology on K and that on KhhM ii. On the other hand, we need the topology to be strong enough to make K a regular Hausdorff space in order to set up correctly the notion of validity of automata, see below Remark 6.12. In any case, all semirings used in weighted automata for modelisation and that the author is aware of fall into that class.

2. Automata and rational expressions

65

the inverse of 1 t . Hence the name rational is given to objects that can be computed with the star. More generally, the star of an element of a semiring, if defined, may be viewed as a substitute of taking the inverse in a poor structure that has no inverse. The constant term of a series s is the coefficient of the identity of M : c.s/ D hs; 1M i. A series is proper if its constant term is zero. If s is proper, the family ¹s n ºn2N is locally finite since M is graded and the star of a proper series of KhhM ii is thus always defined. Lemma 6.1. Let s and t be two series in KhhM ii. If s is proper, then s  t is the unique solution of the equation X D s X C t . 6.1.2. Rational series and expressions. The rational operations on KhhM ii are: the two exterior multiplications by elements of K, the addition, the product, and the star which is not defined everywhere. A subset E of KhhM ii is closed under star if for every s in E such that s  is defined then s  belongs to E. The rational closure of a set E, written KRat E, is the smallest subset of KhhM ii closed under the rational operations and which contains E. The set of (K-)rational series, written KRat M , is the rational closure of KhM i. Weighted rational expressions. A rational expression on M with weight in K – that is, a weighted expression – is defined by completing Definition 2.1 with two operations for every k in K: if E is an expression, then so are .k E/ and .Ek/. The set of weighted rational expressions is written K RatE M . As for the languages, we write jEj for the series denoted by E, with the supplementary equations j.k E/j D kjEj and j.Ek/j D jEjk . The constant term c.E/ is defined as in Definition 2.2 but for the last equation [c.F / D 1] which is replaced by: “c.F / D c.F/ if the latter is defined.” An expression is valid if its constant term is defined. As M is graded, c.E/ D hjEj; 1M i holds for every valid weighted rational expression E. Finally, the following holds. Proposition 6.2. A series of KhhM ii is rational if and only if it is denoted by a valid rational K-expression over M . In this framework, we reformulate Lemma 6.1 as Corollary 6.3. Let U and V be two expressions in K RatE M . If c.U/ D 0K , then U V denotes the unique solution of the equation X D jUjX C jVj. 6.1.3. Weighted automata and the fundamental theorem. An automaton A over M with weight in K, a K-automaton for short, still written A D h M; Q; I; E; T i, is an automaton where the sets of initial and final states are replaced with maps from Q to K, that is, every state has an initial and a final weight, and where the set E of transitions is contained in QK.M n 1M /Q, that is, every transition is labelled with a monomial in KhM i, different from a constant term. The automaton A is finite if E is finite. Alternatively, the same automaton is (more often) written A D h I; E; T i, with the convention taken at § 2: E is the transition matrix of A, a Q  Q-matrix whose .p; q/-entry is the sum of the labels of all transitions from p to q , and I and T are vectors in KQ . In this setting, A is finite if every entry of E is a polynomial of KhM i.

66

Jacques Sakarovitch

The label of a computation in A is, as above, the product of the labels of the transitions that form the computation, multiplied (on the left) by the initial weight of the origin and (on the right) by the final weight of the end of the computation. With the definition we have taken for automata (no transition labelled with a constant term), and because M is graded, the family of labels of all transitions of A is summable and the series accepted by A, also called behaviour of A and written jAj, is its sum. The fundamental theorem of automata then reads as follows. Theorem 6.4. Let M be a graded monoid. A series of KhhM ii is rational if and only if it is the behaviour of a finite K-automaton over M . 6.1.4. Recognisable series. The distinction between rational and recognisable carries over from subsets of a monoid M to series over M . The equivalence between automata over free monoids and matrix representation (§ 2.3) paves the way to the definition of recognisability. A K-representation of M of dimension Q is a triple .I; ; T / where both I and T are two vectors of KQ P and W M ! KQQ is a morphism. The representation .I; ; T / realises the series s D m2M .I  .m/  T /m; a series in KhhM ii is recognisable if it is realised by a representation and the set of recognisable series is denoted by KRec M . The family of rational and of recognisable series are distinct in general. A proof which is very similar to the one given at § 2.3, and which is independent from K, yields the following (cf. Theorem 4.8 in Chapter 4). Theorem 6.5 (Kleene–Schützenberger). If A is finite, then KRat A D KRec A :

6.2. From automata to expressions: the € -maps. With the definition taken for a K-automaton A D h I; E; T i, every entry of E is a proper polynomial of KhM i, E is in KhM iQQ , hence a proper polynomial of KQQ hM i, and E  is well defined. Lemma 2.6 generalises to K-automata and jAj D I  E   T holds. In every respect, the weighted case is similar to the Boolean one. The direct part of Theorem 6.4 follows from the generalised statement of Proposition 3.1. Proposition 6.6. The entries of the star of a proper matrix E of KhhM iiQQ belong to the rational closure of the entries of E . The same algorithms as those presented at § 3: the state-elimination and systemsolution methods, the McNaughton–Yamada and recursive algorithms, establish the weighted version of Proposition 2.3: Proposition 6.7. Let M be a graded monoid. For every finite K-automaton A over M , there exist rational expressions over M which denote jAj. If the algorithms are the same, one has to establish nevertheless their correctness in this new and more complex framework. We develop the case of the systemsolution method, the other ones could be treated in the same way. To begin with, we have to enrich the set of trivial identities in order to set up the definition of reduced weighted expressions, which in turn is necessary to define computations on expressions.

2. Automata and rational expressions

67

The set T as defined at § 3.1 is now denoted as Tu : EC0  E ;

0CE  E ;

E0  0 ;

0E  0 ;

and augmented with three other sets of identities: 0K E  0;

E0K  0;

k.hE/  khE; 1k  k 1;

k 0  0;

0k  0;

.Ek/h  Ekh;

E  .k 1/  Ek;

E1  E ;

1E  E ;

1K E  E;

0  1 (Tu )

E1K  E;

.k E/h  k.Eh/;

.k 1/  E  k E:

(TK ) (AK ) (UK )

From now on, all computations on weighted expressions are performed modulo the trivial identities T D Tu ^ TK ^ AK ^ UK . Besides the trivial identities, the natural identities N D A ^ D ^ C hold on the expressions of K RatE M for any K and (graded) M , and, in contrast, the identities I and J that are special to P.M / do not hold anymore. The system-solution method. The system-solution method starts from a proper automaton A D h I; E; T i of dimension Q whose behaviour is jAj D I  V where V D E   T is a vector in KhhM iiQ . Lemma 6.1 easily generalises and as E is proper (in KQQ hhM ii), V is the unique solution of the equation X D E X CT which we rewrite as a system of Card .Q/ equations: X jEp;q jVq C jTp 1j; for all p 2 Q; (17) Vp D q2Q

where the Vp are the “unknowns,” where the entries Ep;q , which are linear combinations of elements of M , are considered as expressions and denoted as such and where jTp 1j is the series reduced to the monomial Tp 1M . The system (17) may be solved by successive elimination of the unknowns, by means of Corollary 6.3. When all unknowns Vq have been eliminated following an order ! on Q, the computation yields an expression that we denote by E! .A/, as in § 3.3, and jAj D jE! .A/j holds. The parallel with the Boolean case can be continued: given a K-automaton A of dimension Q, an ordering ! , and a recursive division  on Q, the expressions B! .A/, M! .A/, and C .A/ that all denote jAj are computed by the state-elimination method, the McNaughton–Yamada and recursive algorithms respectively. The results on the comparison between these expressions also extend to the weighted case. Proposition 6.8. The equality B! .A/ D E! .A/ holds for every order ! on Q.

Proposition 6.9. The assertion N ^ U order ! on Q.

M! .Ap;q /  B! .Ap;q / holds for every

Theorem 3.5 also extends to the weighted case (and it is now clear why it was important that identities I and J do not play a role in that result). Theorem 6.10. Let ! and ! 0 be two orders on the set of states of a K-automaton A. Then the assertion N ^ S ^ P B! .A/  B! 0 .A/ holds.

Jacques Sakarovitch

68

6.3. From expressions to automata: the -maps 6.3.1. The standard automaton of a weighted expression. The definition of a standard weighted automaton is the same as the one of a standard automaton for the Boolean case: a unique initial state on which the initial map takes the value 1K and which is not the end of any transition. Such an automaton may thus be represented as in Figure 5 and every weighted automaton is equivalent to, and may be turned into, a standard one. As in the Boolean case, operations are defined on standard weighted automata that are parallel to the rational weighted operators. With the notation of Figure 5, the operators A C B and A  B are given by (7) and (8), kA and Ak by

;

Uk

;

0

B B @

ck

;

B B @

0

;

U

;

0

0

Ak D

;

B B @

kA D

kc

B B @

kJ

and A , which is defined when c  is defined, by the following modification of (9):

;

Uc

;

(90 )

0

B B @

0

;

B B @

A D

c

where H D U  c  J C F . As in § 4.2, these operations allow us to associate with every weighted expression E, and by induction on its depth, a standard weighted automaton SE which we call the standard automaton of E. Straightforward computations show that j.kA/j D kjAj, j.Ak/j D jAjk , j.A C B/j D jAj C jBj, j.A  B/j D jAj  jBj, and j.A /j D jAj .

Not so fast. It is somewhat excessive to say that the last equality is obtained by a “straightforward computation” for it conceals a real problem. If s is a series, let us write s0 D c.s/ for the constant term of s and sp for the proper part of s , that is, s D s0 1M C sp . The equality j.A /j D jAj then follows from the properties that s  is defined if and only if s0 is defined and that in this case the equality s  D .s0 sp / s0 holds. The proof of this last equality follows a classical pattern but relies on the fact that the product of two summable families (of elements of K) is a summable family. If the latter property holds, we say that K is a strong semiring. It has been shown recently that one can build a semiring (a ring indeed) that is not strong [47], thus solving a problem left open in [44].

2. Automata and rational expressions

69

To tell the truth, we should have addressed this problem as soon as we have defined the weighted rational expressions and when we wrote: “c.F / D c.F/ if the latter is defined.” In one word, the equality s  D .s0 sp / s0 is the necessary basis on which one can build a consistent theory for weighted rational expressions. And so far we need the hypothesis that the semiring be strong in order to establish it. Nevertheless, all usual semirings are strong. In the sequel, we silently assume that K is a strong semiring and we can conclude that the construction of SE is a -map. Proposition 6.11 ([14] and [44]). If E is a valid weighted expression over A , then jSE j D jEj. The automaton SE has `.E/ C 1 states. Computing SE from (7), (8), and (90 ) is cubic in `.E/ and a star-normal form for weighted expressions is something that does not seem to exist in the general case. Figure 9 shows the standard Q-automaton SE3  1   1  and the standard Z-automaton SE4 associated with associated with E3 D 6 a C 3 b E4 D .1 a/a . 2 3b

4 3a

1 2

1 3a

2 3b

2

1 3a

5 3b

a

a 1

1 2

1

a

1

a

Figure 9. The Q-automaton SE3 and the Z-automaton SE4

Remark 6.12. The definition that we have taken implies that the behaviour of weighted automata is always a well-defined series. In contrast with the case of Boolean automata, this is no longer true if we want to enrich the model of weighted automata with spontaneous transitions, that is, with transitions that are labelled with the empty word together with a coefficient in K. A complete treatment of this question – which can be given indeed several distinct answers – is out of the scope of this chapter and we refer to Chapters 4 and 20 and to other works ([65], [10], [42], [62], [22], and [46]). Let us mention however a point that pertains to our topic. It is very easy to generalise the Thompson construction to weighted expressions and thus to associate with a weighted expression E its Thompson automaton TE . The problem is that, as Thompson construction implies the use of spontaneous transitions, it may happen that TE be not valid even in the case where E is a valid expression. An example is given by the Z-expression .a b  / (cf. [46]). Remark 6.13. The constant reader may have wondered why the equivalences km  mk , with m in M , were missing from the set of trivial identities. It is now understood that it is the necessary definition of kA and Ak that rules them out.

70

Jacques Sakarovitch

6.3.2. The derived-term automaton of a weighted expression. The (left) quotient operation also extends from languages to series: for every s in KhhA ii, and every u in A , u 1 s is defined by hu 1 s; vi D hs; uvi for every v in A . The quotient is a (right) action of A on KhhA ii: .uv/ 1 s D v 1 .u 1 s/. In contrast with the Boolean case, a series in KRat A may have an infinite number of distinct quotients. However, the quotient operation allows us to express a characteristic property of rational series. A subset U of KhhA ii that is closed under quotient, is called a stable subset. Then, a characterisation due to Jacob [37] reads as follows: a series of KhhA ii is rational if and only if it is contained in a finitely generated stable submodule of KhhA ii, cf. [10] and [64] and Theorem 5.1 in Chapter 4. Derivation. The derivation of weighted rational expressions implements the lifting of the quotient of series to the level of expressions. It yields an effective version of the characterisation quoted above. In the sequel, addition in K is written ˚ to distinguish it from the C operator in expressions. The set of (left) linear combinations of K-expressions with coefficients in K is denoted, by abuse, by KhK RatE A i. In the following, Œk E or k E is a monomial in KhK RatE A i whereas .k E/ is an expression in K RatE A . An external right multiplication on KhK RatE A i by an expression and by a scalar is needed in the sequel. It is first defined on monomials by .Œk E  F/  k .E  F/ and .Œk E k 0 /  k .E k 0 / and then extended to KhK RatE A i by linearity. Definition 6.1 ([44]). The derivation of E in K RatE A with respect to a in A, denoted by @@a E, is a linear combination of expressions in K RatE A defined by (10) for the base cases and inductively by the following formulas: @ @ .k E/ D k E; @a @a h @ i  @ .E k/ D E k ; @a @a @ @ @ .ECF/ D E˚ F; @a @a @a h i  @ @ @ .E  F / D E  F ˚ c.E/ F; @a @a @a h @ i  @  .E / D c.E/ E  .E / : @a @a

The last equation is defined only if E is a valid expression. The derivation of an expression with respect to a word u is defined by induction on the length of u: for  @ E D @@a @@u E and the definition of derivation every u in AC and every a in A, @ua ˇ ˇ is consistent with that of quotient of series since for every u in AC , ˇ @@u .E/ˇ D u 1 jEj holds. The derived-term automaton. At § 4.3.1, we have defined the derived terms of a (Boolean) expression as the expressions that occur in a derivation of that expression.

2. Automata and rational expressions

71

Proposition 4.6 then established properties that allow to compute these derived terms, without derivation. For the weighted case, we take the same properties as the definition. Definition 6.2 ([44]). The set TD.E/ of true derived terms of E in K RatE A is inductively defined by

TD.k E/ D TD.E/; TD.E k/ D .TD.E/ k/; TD.E C F/ D TD.E/ [ TD.F/; TD.E  F/ D .TD.E//  F [ TD.F/; TD.E / D .TD.E//  E ; starting from the base cases TD.0/ D TD.1/ D ;; and TD.a/ D ¹1º for every a in A.

TD.E/ is a set of unitary monomials of KhK RatE A i, with Card .TD.E// 6 `.E/. The set of derived terms of E is D.E/ D TD.E/ [ ¹Eº. Theorem 6.14 insures consistency between Definitions 6.1 and 6.2; the usefulness of the latter follows from Theorem 6.15. Theorem 6.14 ([59] and [44]). Let E be in K RatE A and D.E/ D ¹K 1 ; : : : ; K n º. For every a in A, there exist an nn-matrix .a/ with entries in K such that M @ Ki D .a/i;j Kj for all i 2 Œn: (18) @a j 2Œn

The derivation of an expression E in K RatE A with respect to every word in AC is thus a linear combination of derived terms of E. Hence the derived terms of an expression denote the generators of a stable submodule that contains the series denoted by the expression. Theorem 6.14 yields the derived-term automaton of E, AE D I; X; T i, of dimension D.E/, with I D 1K if K i D E and 0K otherwise, X D hL a2A .a/a, and Tj D c.Kj /. The K-derivation is another -map since jAE j D jEj holds. Morphisms and quotients of (Boolean) automata are generalised to Out-morphisms and quotients of K-automata (cf. [64] and [5]). Theorem 4.9 is then extended to the weighted case. Theorem 6.15 ([44]). Let E be in K RatE A . Then AE is a quotient of SE . Remark 6.16. This statement is a justification for Definition 6.2. The monomials that appear in the derivations of an expression E are in D.E/. The converse is not necessarily true when K is not a positive semiring: some derived terms may never occur in a derivation, as it can be observed for instance on the Z-expression E4 D .1 a/a (cf. Figure 10). With a definition of derived terms based on derivation only, Theorem 6.15 would not hold any longer.

Jacques Sakarovitch

72 a

a

1

a 1

1

a

1

a

E4

a

Figure 10. The Z-automaton SE4 and its Z-quotient AE4

7. Notes Most of the material presented in this chapter has appeared in previous work of the author [62], [63], and [64]. A detailed version of this chapter can be found at http://arxiv.org/abs/1502.03573

§ 1. New look at Kleene’s theorem. A detailed history of the development of ideas at the beginning of the theory of automata is given in [54]. Berstel [7] attributes the idea of distinguishing the family of recognisable from that of rational sets to Eilenberg. Besides the already quoted paper of Elgot and Mezei [25], other authors have certainly noticed the equality of expressiveness of automata and expressions beyond free monoids. It is part of Walljasper’s thesis [68]; it can be found in Eilenberg’s treatise [24]. The splitting of Kleene’s theorem was proposed in [61]. § 3. From automata to expressions. First note that this section is mostly of theoretical interest: there are very few practical examples where one would transform an automaton into an expression (cf. [51] and [34]). Identities. As mentioned, the axiomatisation of rational expressions, even hinting at bibliographic references, is out of the scope of this chapter. Conway showed that besides the identities S and P (that are at the basis of the definition of the so-called “Conway semirings,” see Chapter 20), each finite simple group gives rise to an identity that is independent from the others [20]. Krob, who showed that this set of identities is complete, called S and P the aperiodic identities [40]. State-elimination method. The example D3 of Figure 3 is easily generalised so as to find an exponential gap between the length of expressions for two distinct orders. The search for short expressions is performed by heuristics; as reported in [33], the naive one, modified or not as in [21], appears to be good (cf. Chapter 12 for more information on the subject). McNaughton–Yamada algorithm. The McNaughton–Yamada algorithm is the implementation in the semiring of languages of the contemporary Floyd–Roy–Warshall algorithms (in the Boolean or tropical semirings), see [30], [56], and [69]. Star height. The star height of a rational language L is the minimum of the star heights of the rational expressions that denote L. Whether the star height of a language is effectively computable has been a longstanding open problem until it was positively solved first by K. Hashiguchi [35] and then by D. Kirsten [38]. The subject is still the

2. Automata and rational expressions

73

object of research, in particular in connection with the theory of regular cost functions, see [19] and [26]. It is also mentioned in Chapters 5 and 12. § 4. From expressions to automata. The presentation of the standard automaton given here is not the classical one, and not only for the chosen name. The recursive definition, also used in [28] for instance, avoids the definition of First, Last, and Follow functions that are built in most papers on the subject. Based on these functions, other automata may be defined: e.g. in [49] they are used to compute directly the determinisation of SE , in [36] positions with the same image by Follow are merged, giving rise to a possibly smaller automaton, called follow automaton. Attributing derivation to Brzozowski and Antimirov together is an unusual but sensible foreshortening. Original Brzozowski’s derivatives [12] are obtained by replacing “[” by a “C” in (11) and (12). Derivatives are then expressions, and there is a finite number of them, modulo the A , C , and I identities. By replacing the “C” by a “[” in Brzozowski’s definition, Antimirov [4] changed the derivatives into a set of expressions, which he called partial derivatives, as they are “parts” of derivatives. As they are applied to union of sets, and not to expressions, the A , C , and I identities come for free, and are no longer necessary to insure the finiteness of the number of derived terms. A common technique for defining -maps has been the linearisation EN of the expression E, that is, making all letters in E distinct by indexing them by their position in E (e.g. [49] and [36]). Berry and Sethi [6] showed that the (Brzozowski) derivatives of EN coincide with the states of SE , whereas Berstel and Pin [8] observed that jEN j is a x and interpreted the result of Berry and Sethi as the construction of the local language L x. deterministic automaton canonically associated with L The similarity between Mirkin’s prebases [50] and Antimirov’s derived terms was noted by Champarnaud and Ziadi [16], who called equation automaton the derived-term automaton. The notion of derivation of expressions (which are strings of symbols) have recently been generalised to the one of derivation of terms (that represent graphs) [52]. Allauzen and Mohri generalised Proposition 4.5 and Theorem 4.9 and computed AE and the follow automaton of E from TE by quotient and elimination of marked spontaneous transitions [2]. In [15], corrected in [14], an algorithm is given which is a kind of converse of a -map: it recognises if an automaton is the standard automaton SE of an expression E and, in this case, computes such an E in star-normal form. The problem of inverting a € -map has been given a partial answer in [45]: it is possible to compute A from B! .A/ for certain A (and any ! ); this has lead to the definition of a variant of the derivation: the broken derivation, that has been further studied in [3]. § 5. Changing the monoid. Proposition 5.5 leads naturally to consider monoids M in which Rat M D Rec M holds, and which one could call Kleene monoids. In [60] was defined the family of rational monoids which contains all previously known examples of Kleene monoids; still the inclusion is strict [53]. Commutative Kleene monoids, as well as finitely generated submonoids of Rat a are rational monoids, see [57] and [1].

74

Jacques Sakarovitch

§ 6. Introducing weights. The definition of rational (and algebraic) series in noncommuting variables as generalisation of regular (and context-free) languages on one hand-side, as well as the formalisation of rational expressions on the other, date back to the beginning of automata theory in the early 60’s and are usually given origin in famous papers by Schützenberger (see [66] and [18]), Glushkov [32] or Brzozowski [12]. In contrast, the formalisation and usage of weighted rational expressions have appeared much later. Th. Wilke has found a trace of weighted rational expressions in the proceedings of a conference held in Princeton in 1971, in a one-page paper by Stanat, but real publication on the subject seem to begin in the first years of 21st century only, see [58], [14], [59], [44], and [28]. By replacing quotient and derivation by co-induction, Rutten formulated the equivalent of Theorem 6.14, see [59]. Krob [41] and Berstel and Reutenauer [9] have considered “weighted rational expressions” slightly different from those expression dealt with in this chapter. With their differentiation and derivation, they have tackled different problems than the construction of -maps. Acknowledgements. The author is grateful to Z. Ésik and J. Brzozowski who read a first draft of this chapter and made numerous and helpful remarks and to H. Gruber who twice saved him from stating erroneous assertions and hinted to references given in § 3 of this Notes Section. P. Gastin and A. Demaille sent corrections on the first version, R. Sinya found typos in the final one. The careful reading of the final version by A. Szilard and his numerous suggestions have been very encouraging and most helpful, and are heartily acknowledged. The last rigorous reading of J. Shallit is also gratefully acknowledged.

References [1] S. Afonin and E. Khazova, On the structure of finitely generated semigroups of unary regular languages. Internat. J. Found. Comput. Sci. 21 (2010), no. 5, 689–704. MR 2728319 Zbl 1207.68179 q.v. 73 [2] C. Allauzen and M. Mohri, A unified construction of the Glushkov, Follow, and Antimirov automata. In Mathematical foundations of computer science 2006 (R. Královič and P. Urzyczyn, eds.), Proceedings of the 31st International Symposium (MFCS2006) held in Stará Lesná, August 28–September 1, 2006. Lecture Notes in Computer Science, 4162. Springer, Berlin, 2006, 110–121. MR 2298170 Zbl 1132.68434 q.v. 73 [3] P. Y. Angrand, S. Lombardy, and J. Sakarovitch, On the number of broken derived terms of a rational expression. J. Autom. Lang. Comb. 15 (2010), no. 1–2, 27–51. MR 3801314 Zbl 1345.68196 q.v. 60, 73 [4] V. Antimirov, Partial derivatives of regular expressions and finite automaton constructions. Theoret. Comput. Sci. 155 (1996), no. 2, 291–319. MR 1379579 Zbl 0872.68120 q.v. 58, 60, 73 [5] M.-P. Béal, S. Lombardy, and J. Sakarovitch, Conjugacy and equivalence of weighted automata and functional transducers. In Computer science – theory and applications

2. Automata and rational expressions

[6] [7]

[8] [9] [10]

[11] [12] [13] [14] [15] [16] [17] [18] [19]

[20] [21]

75

(D. Grigoriev, J. Harrison, and E. A. Hirsch, eds.). Proceedings of the 1st International Symposium on Computer Science in Russia (CSR 2006) held in St. Petersburg, June 8–12, 2006. Lecture Notes in Computer Science, 3967. Springer, Berlin, 2006, 58–69. MR 2260982 Zbl 1185.68381 q.v. 71 G. Berry and R. Sethi, From regular expressions to deterministic automata. Theoret. Comput. Sci. 48 (1986), no. 1, 117–126. MR 0889664 Zbl 0626.68043 q.v. 73 J. Berstel, Transductions and context-free languages. Leitfäden der angewandten Mathematik und Mechanik, 38. B. G. Teubner, Stuttgart, 1979. MR 0549481 Zbl 0424.68040 q.v. 72 J. Berstel and J.-É. Pin, Local languages and the Berry–Sethi algorithm. Theoret. Comput. Sci. 155 (1996), no. 2, 439–446. MR 1379585 Zbl 0872.68116 q.v. 73 J. Berstel and C. Reutenauer, Extension of Brzozowski’s derivation calculus of rational expressions to series over the free partially commutative monoids. Theoret. Comput. Sci. 400 (2008), no. 1–3, 144–158. MR 2424348 Zbl 1145.68030 q.v. 74 J. Berstel and C. Reutenauer, Noncommutative rational series with applications. Encyclopedia of Mathematics and its Applications, 137. Cambridge University Press, Cambridge, 2011. (New version of Rational series and their languages. EATCS Monographs on Theoretical Computer Science, 12. Springer, Berlin, 1988.) MR 2760561 Zbl 1250.68007 q.v. 69, 70 A. Brüggemann-Klein, Regular expressions into finite automata. Theoret. Comput. Sci. 120 (1993), no. 2, 197–213. MR 1247207 Zbl 0811.68096 q.v. 57 J. A. Brzozowski, Derivatives of regular expressions. J. Assoc. Comput. Mach. 11 (1964), 481–494. MR 0174434 Zbl 0225.94044 q.v. 58, 73, 74 J. A. Brzozowski and E. J. McCluskey, Signal flow graph techniques for sequential circuit state diagrams. IEEE Trans. Electronic Computers 12 (1963), 67–76. Zbl 0119.12903 IEEEXplore 4037802 q.v. 47 P. Caron and M. Flouret, Glushkov construction for series: the non commutative case. Int. J. Comput. Math. 80 (2003), no. 4, 457–472. MR 1983304 Zbl 1033.68058 q.v. 69, 73, 74 P. Caron and D. Ziadi, Characterization of Glushkov automata. Theoret. Comput. Sci. 233 (2000), no. 1–2, 75–90. MR 1732178 Zbl 0952.68084 q.v. 73 J.-M. Champarnaud and D. Ziadi, From Mirkin’s prebases to Antimirov’s word partial derivatives. Fund. Inform. 45 (2001), no. 3, 195–205. MR 2036701 Zbl 0976.68098 q.v. 73 J.-M. Champarnaud and D. Ziadi, Canonical derivatives, partial derivatives and finite automaton constructions. Theoret. Comput. Sci. 289 (2002), no. 1, 137–163. MR 1932893 Zbl 1061.68109 q.v. 60 N. Chomsky and M.-P. Schützenberger. The algebraic theory of context-free languages. In Computer programming and formal systems (P. Brattfort and D. Hirschberg, eds.). NorthHolland Publishing Co., Amsterdam, 1963, 118–161. MR 0152391 Zbl 0148.00804 q.v. 74 T. Colcombet and C. Löding, The nesting-depth of disjunctive -calculus for tree languages and the limitedness problem. In Computer science logic (M. Kaminski and S. Martini, ed.). Proceedings of the 22nd International Workshop (CSL 2008), the 17th Annual Conference of the EACSL, held in Bertinoro, September 16–19, 2008. Lecture Notes in Computer Science, 5213. Springer, Berlin, 2008, 416–430. MR 2540259 Zbl 1156.68451 q.v. 73 J. H. Conway, Regular algebra and finite machines. Chapman and Hall, London, 1971. Zbl 0231.94041 q.v. 48, 51, 72 M. Delgado and J. Morais, Approximation to the smallest regular expression for a given regular language. In Implementation and application of automata (M. Domaratzki, A. Okhotin,

76

[22]

[23] [24] [25] [26]

[27] [28]

[29] [30] [31] [32]

[33]

[34]

[35] [36] [37]

Jacques Sakarovitch K. Salomaa, and S. Yu, eds.). Proceedings of the 9th International Conference, CIAA 2004, Kingston, Canada, July 22–24, 2004. Lecture Notes in Computer Science 3317. Springer, Berlin, 312–314. MR 2144483 Zbl 1115.68428 q.v. 72 M. Droste, W. Kuich, and H. Vogler (eds.), Handbook of weighted automata. Monographs in Theoretical Computer Science. An EATCS Series. Springer, Berlin, 2009. MR 2777706 Zbl 1200.68001 q.v. 69 L. C. Eggan, Transition graphs and the star-height of regular events. Michigan Math. J. 10 (1963), 385–397. MR 0157840 Zbl 0173.01504 q.v. 52, 53 S. Eilenberg, Automata, languages and machines. Vol. A. Pure and Applied Mathematics, 58. Academic Press, New York, 1974. MR 0530382 Zbl 0317.94045 q.v. 72 C. C. Elgot and J. E. Mezei, On relations defined by generalized finite automata. IBM J. Res. Develop 9 (1965), 47–68. MR 0216903 Zbl 0135.00704 q.v. 62, 72 N. Fijalkow, H. Gimbert, E. Kelmendi, and D. Kuperberg, Stamina: stabilisation monoids in automata theory. In Implementation and application of automata (in A. Carayol and C. Nicaud, eds.) Proceedings of the 22nd International Conference (CIAA 2017) held in Marne-la-Vallée, June 27–30, 2017, 101–112. MR 3677608 Zbl 06763317 q.v. 73 P. C. Fischer and A. L. Rosenberg, Multitape one-way nonwriting automata. J. Comput. System Sci. 2 (1968) 88–101. MR 0246717 Zbl 0159.01504 q.v. 63 S. Fischer, F. Huch, and T. Wilke, A play on regular expressions: functional pearl. Proceedings of the 15 th ACM SIGPLAN international conference on functional programming (P. Hudak and S. Weirich, eds.). ICFP 2010, Baltimore, MD, September 27–29, 2010. ACM SIGPLAN Notices 45, no. 9. Association for Computing Machinery, 2010, 357–368. Zbl 1323.68111 q.v. 73, 74 M. Fliess, Deux applications de la représentation matricielle d’une série rationnelle non commutative. J. Algebra 19 (1971), 344–353. MR 0321361 Zbl 0222.16001 q.v. 63 R. W. Floyd, Algorithm 97: shortest path. Comm. Assoc. Comput. Mach. 5 (1962), 345. q.v. 72 S. Ginsburg and E. H. Spanier, Semigroups, Presburger formulas and languages. Pacific J. Math. 16 (1966), 285–296. MR 0191770 Zbl 0143.01602 q.v. 63 V. M. Glushkov, Abstract theory of automata. Uspehi Mat. Nauk 16 (1961), no. 5(101), 3–62. In Russian. English translation, Russ. Math. Surv. 16 (1961), no. 5, 1–53. MR 0138529 Zbl 0104.35404 q.v. 55, 74 H. Gruber, M. Holzer, and M. Tautschnig, Short regular expressions from finite automata: empirical results. In Implementation and application of automata (S. Maneth, ed.). Proceedings of the 14th International Conference (CIAA 2009) held at the University of New South Wales, Sydney, July 14–17, 2009. Lecture Notes in Computer Science, 5642. Springer, Berlin, 2009, 188–197. MR 2550023 Zbl 1248.68296 q.v. 72 T. Han, J. Katoen, and B. Damman, Counterexample generation in probabilistic model checking. IEEE Trans. Software Eng. 35 (2009), no. 2, 241–257. IEEEXplore 4770111 q.v. 72 K. Hashiguchi, Algorithms for determining relative star height and star height. Inform. and Comput. 78 (1988), no. 2, 124–169. MR 0955580 Zbl 0668.68081 q.v. 72 L. Ilie and S. Yu, Follow automata. Inform. and Comput. 186 (2003), no. 1, 140–162. MR 2001743 Zbl 1059.68063 q.v. 73 G. Jacob, Représentations et substitutions matricielles dans la théorie algébrique des transductions. Ph.D. thesis. Université Paris VII, Paris, 1975. q.v. 70

2. Automata and rational expressions

77

[38] D. Kirsten, Distance desert automata and the star height problem. Theor. Inform. Appl. 39 (2005), no. 3, 455–509. MR 2157045 Zbl 1082.20041 q.v. 72 [39] S. C. Kleene, Representation of events in nerve nets and finite automata. In Automata studies (C. E. Shannon and J. McCarthy, eds.). Annals of Mathematics Studies, 34. Princeton University Press, Princeton, N.J., 1956, 3–42. MR 0077478 q.v. 39 [40] D. Krob, Complete systems of B-rational identities. Theoret. Comput. Sci. 89 (1991), no. 2, 207–343. MR 1133622 Zbl 0737.68053 q.v. 48, 72 [41] D. Krob, Differentiation of K -rational expressions. Internat. J. Algebra Comput. 2 (1992), no. 1, 57–87. MR 1167528 Zbl 0785.68065 q.v. 74 [42] W. Kuich and A. Salomaa, Semirings, automata, languages. EATCS Monographs on Theoretical Computer Science, 5. Springer, Berlin, 1986. MR 0817983 Zbl 0582.68002 q.v. 69 [43] S. Lombardy and J. Sakarovitch, On the star height of rational languages. In Words, languages & combinatorics III (M. Ito and T. Imaoka, eds.). Proceedings of the International Conference held at Kyoto Sangyo University, Kyoto, March 14–18, 2000. World Scientific, River Edge, N.J., 2003, 266–285. MR 2028881 q.v. 53 [44] S. Lombardy and J. Sakarovitch, Derivation of rational expressions with multiplicity. Theoret. Computer Sci. 332 (2005), 141–177. q.v. 68, 69, 70, 71, 74 [45] S. Lombardy and J. Sakarovitch, How expressions can code for automata. Theor. Inform. Appl. 39 (2005), no. 1, 217–237. Corrigendum, RAIRO Theor. Inform. Appl. 44 (2010), no. 3, 339–361. MR 2132589 MR 2761523 (corrigendum) Zbl 1102.68070 Zbl 1216.68148 (corrigendum) q.v. 73 [46] S. Lombardy and J. Sakarovitch, The validity of weighted automata. Internat. J. Algebra Comput. 23 (2013), no. 4, 863–913. MR 3078061 Zbl 1290.68076 q.v. 69 [47] D. Madore and J. Sakarovitch, An example of a non strong Banach algebra. In preparation. q.v. 68 [48] J. D. McKnight, Jr., Kleene quotient theorems. Pacific J. Math. 14 (1964), 1343–1352. MR 0180612 Zbl 0144.01201 q.v. 62 [49] R. McNaughton and H. Yamada, Regular expressions and state graphs for automata. IRE Trans. Electronic Computers 9 (1960), 39–47. Zbl 0156.25501 q.v. 49, 55, 73 [50] B. G. Mirkin, An algorithm for constructing a base in a language of regular expressions. Engineering Cybernetics 5 (1966), 51–57. q.v. 60, 73 [51] P. H. Morris, R. A. Gray, and R. E. Filman, GOTO removal based on regular expressions. J. of Software Maintenance 9 (1997), 47–66. q.v. 72 [52] Y. Nakamura, Partial derivatives on graphs for Kleene allegories. In 2017 32 nd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS). 20–23 June 2017, Reykjavík, Iceland. Selected papers from the symposium held at Reykjavík University. IEEE Press, Los Alamitos, CA, 2017, 12 pp. MR 3776958 IEEEXplore 8005132 q.v. 73 [53] M. Pelletier and J. Sakarovitch, Easy multiplications II. Extensions of rational semigroups. Inform. and Comput. 88 (1990), no. 1, 18–59. MR 1068204 Zbl 0705.20057 q.v. 73 [54] D. Perrin, Les débuts de la théorie des automates. Tech. Sci. Inform. 14 (1995), 409–443. q.v. 72 [55] M. O. Rabin and D. Scott, Finite automata and their decision problems. IBM J. Res. Develop. 3 (1959) 114–125. MR 0103795 Zbl 0158.25404 q.v. 63 [56] B. Roy, Transitivité et connexité. C. R. Acad. Sci. Paris 249 (1959) 216–218. MR 0109792 Zbl 0109792 q.v. 72

78

Jacques Sakarovitch

[57] C. P. Rupert, On commutative Kleene monoids. Semigroup Forum 43 (1991), no. 2, 163–177. MR 1114688 Zbl 0762.68048 q.v. 73 [58] J. M. Rutten, Automata, power series, and coinduction: taking input derivatives seriously. In Automata, languages and programming (J. Wiedermann, P. van Emde Boas and M. Nielsen, eds.), Proceedings of the 26th International Colloquium (ICALP ’99) held in Prague, July 11–15, 1999. Lecture Notes in Computer Science, 1644. Springer, Berlin, 1999, 645–654. MR 1731524 Zbl 0941.68638 q.v. 74 [59] J. M. Rutten, Behavioural differential equations: a coinductive calculus of streams, automata, and power series. Theoret. Comput. Sci. 308 (2003), no. 1–3, 1–53. MR 2014573 Zbl 1071.68050 q.v. 71, 74 [60] J. Sakarovitch, Easy multiplications I. The realm of Kleene’s theorem. Inform. and Comput. 74 (1987), no. 3, 173–197. MR 0906959 Zbl 0642.20043 q.v. 73 [61] J. Sakarovitch, Kleene’s theorem revisited. In Trends, techniques, and problems in theoretical computer science (A. Kelemenová and J. Kelemen, eds.). Papers from the fourth international meeting of young computer scientists held in Smolenice, October 13–17, 1986. Lecture Notes in Computer Science, 281. Springer, Berlin, 1987, 39–50. MR 0921502 Zbl 0637.68096 q.v. 72 [62] J. Sakarovitch, Éléments de théorie des automates. Vuibert Informatique, Paris, 2003. English translation, Elements of automata theory. Cambridge University Press, 2009. Translated by R. Thomas. Cambridge University Press, Cambridge, 2009. MR 2567276 Zbl 1188.68177 (English ed.) Zbl 1178.68002 (French ed.) q.v. 42, 49, 51, 69, 72 [63] J. Sakarovitch, The language, the expression and the (small) automaton. In Implementation and application of automata. (J. Farré, I. Litovsky and S. Schmitz, eds.). Revised selected papers from the 10th International Conference (CIAA 2005) held in Sophia Antipolis, June 27–29, 2005. Lecture Notes in Computer Science, 3845. Springer, Berlin, 2006, 15–30. MR 2214022 Zbl 1172.68526 q.v. 72 [64] J. Sakarovitch, Rational and recognisable power series. In Handbook of weighted automata (M. Droste, W. Kuich, and H. Vogler, eds.). Monographs in Theoretical Computer Science. An EATCS Series. Springer, Berlin, 2009, Chapter 4, 105–174. MR 2777730 q.v. 70, 71, 72 [65] A. Salomaa and M. Soittola, Automata-theoretic aspects of formal power series. Texts and Monographs in Computer Science. Springer, Berlin, 1978. MR 0483721 Zbl 0377.68039 q.v. 69 [66] M. P. Schützenberger, On the definition of a family of automata. Information and Control 4 (1961), 245–270. MR 0135680 Zbl 0104.00702 q.v. 74 [67] K. Thompson, Regular expression search algorithm. Comm. Assoc. Comput. Mach. 11 (1968), 419–422. Zbl 0164.46205 q.v. 57 [68] S. J. Walljasper, Non-deterministic automata and effective languages. Ph.D. thesis. The University of Iowa, Iowa city, 1970. MR 2619207 q.v. 72 [69] S. Warshall, A theorem on Boolean matrices. J. Assoc. Comput. Mach. 9 (1962), 11–12. MR 0149688 Zbl 0118.33104 q.v. 72 [70] D. Wood, Theory of computation. Harper & Row Computer Science and Technology Series. Harper & Row, New York, 1987. MR 1094567 Zbl 0734.68001 q.v. 47 [71] S. Yu, Regular languages. In Handbook of formal languages (G. Rozenberg and A. Salomaa, eds.). Vol. 1. Word, language, grammar. Springer, Berlin, 1997, Chapter 2, 41–110. MR 1469994 q.v. 47

Chapter 3

Finite transducers and rational transductions Tero Harju and Juhani Karhumäki

Contents 1. 2. 3. 4. 5. 6.

Introduction . . . . . . . . . . Basic definitions . . . . . . . . Morphic representations . . . . Applications . . . . . . . . . . Undecidability in transductions Further reading . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

79 80 88 92 98 104

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

105

1. Introduction Rational transductions are relations computed by generalised finite automata with input and output. Their history goes back to the early days of automata theory, especially to the paper written by Rabin and Scott [71] in 1959. Later in 1965 Elgot and Mezei [21] published a systematic study on multitape automata, and they presented several fundamental results on rational transductions, including their closure under composition. The chapter discusses basic properties of finite transducers, focusing on applications and a few recent aspects of the topic. In particular, different kinds of characterisations, as well as, decision questions are analysed. We adopt a concrete approach to transductions through words. For a more general approach through monoids, we refer to the book by Sakarovitch [73] which gives a comprehensive study of automata theory including an extensive survey of transductions. Also, a topological approach to the topic is given by Pin and Silva [68]. 1.1. Words and morphisms. Compositions of morphisms and inverse morphisms have a visible role in this chapter. These compositions are applied to represent transductions, giving a nice duality between abstract machines and basic algebraic notions concerning free monoids. In this chapter a transduction is a multivalued mapping  W A ! 2B that maps words to subsets of words. In general, a transduction is a partial mapping in the sense that the image .u/ of a word can be the empty set. We fix now terminology for words and morphisms. For definitions of language and automata theory that are not explained here, we refer to the textbooks of Berstel [5], Eilenberg [20], Harrison [45], or Salomaa [74] and [76].

Tero Harju and Juhani Karhumäki

80

Let hW A ! B  be a morphism. It is called  nonerasing if h.a/ ¤ " for all a 2 A;

 uniform if jh.a/j D jh.b/j for all a; b 2 A.

In the theory of formal languages there are numerous representations of language families in terms of simple operations. A classical example of such a result is the representation of the regular languages in terms of rational operations: the regular sets constitute the smallest family of sets that include the empty set ; and the singleton sets ¹wº, and which is closed under the operations union (L1 [ L2 ), concatenation (L1 L2 ) and the Kleene closures (L and LC ). Let H D ¹h j hW A ! B  a morphism for some A and Bº

denote the family of all morphisms between finitely generated word monoids, and let H

1

D ¹h

1

j h 2 Hº

denote the family of inverse morphisms. For two morphisms gW B  ! A and hW B  ! C  with the same domain alphabet, we have the relational composition hg

1

.w/ D ¹v j there exists u 2 B  such that g.u/ D w and h.u/ D vº: 



Every morphism hW A ! B  induces the language operations hW 2A ! 2B   and h 1 W 2B ! 2A in a natural way; see Chapter 1. The compositions of these operations are called morphic compositions. We trivially have that H ı H D H and H 1 ı H 1 D H 1 , and therefore each morphic composition  can be written as an alternating composition of morphisms and inverse morphisms, i.e.,  D h"nn h"nn 11    h"11 ;

where "j C "j

1

D 0;

(1)

where "j 2 ¹ 1; C1º for each j D 1; 2; : : : ; n (and hC1 denotes h). We shall show in § 3 that the length n of such a composition can always be reduced to four or less.  A many-valued mapping W A ! 2B is a finite substitution if the image .a/  B  is a finite set for all a 2 A and  is a morphism between the corresponding monoids: .uv/ D .u/.v/ for all u; v 2 A , where the latter product is that of subsets of words.

2. Basic definitions In this section we define finite transducers and the relations, called rational transductions, defined by these transducers. We state the main closure properties of rational transductions, and we give examples of relations that are definable and also relations that are not definable by finite transducers.

3. Finite transducers and rational transductions

81

2.1. Finite transducers. A finite transducer is a generalisation of finite automaton in the sense that a transducer also has a set of output words for each input word. Formally, a transducer is a 6-tuple T D .Q; A; B; E; q; F / (2) composed of  a finite set Q of states containing the initial state q and the set F of the final states;  (finite) alphabets A and B that are the input and output alphabets, respectively;  and a finite relation E of transitions E  Q  A  B   Q:

We also write

u;v

for .q; u; v; p/ 2 E; where u is the input word and v the output word of the transition. With this presentation, a finite transducer can be represented as a directed graph with labelled edges. The labels are represented as ordered pairs .u; v/. The initial state will be marked by an incoming arrow, and each final state is presented as a double circle; see Example 2.1. q !p

Example 2.1. In Figure 1, we have A D ¹a; bº and B D ¹0; 1º. In this example the initial state of the transducer T is q0 , and the set of final states is F D ¹q3 º. b; 01 q1 a; 01

ab; 1 q0

bb; 11

a; 1

q3

a; "

q2

b; 1

ba; 0

Figure 1. A finite transducer T of four states. The initial state is marked by an incoming arrow, and the unique final state is presented by a double circle.

The transducer T D .Q; A; B; E; q; F / is said to be

 (input) deterministic, if for each state q and word u there exists at most one u;v state p such that q ! p for some word v (the output word v need not be unique);  simple if the initial state is the unique final state: F D ¹qº;  "-free if E  Q  A  B C  Q, i.e., it never writes an empty word in one step;  sequential (or a generalised sequential machine) if E  Q  A  B   Q, i.e., it always reads a single letter in one step.

Tero Harju and Juhani Karhumäki

82

Moreover, a sequential transducer is a deterministic sequential transducer if for all q 2 Q and a 2 A there is at most one transition .q; a; w; p/ 2 E , i.e., E is a partial function Q  A ! B   Q.

Example 2.2. Deterministic sequential transducers are known as Mealy machines if the transition function allows writing letters only, i.e., the partial function is from Q A to B  Q. In other words, the output depends only on the current state and the current a;u a;v input letter. That is, if q ! p1 and q ! p2 are transitions, then u D v and p1 D p2 . On the other hand, a Moore machine is a sequential transducer where the output depends only on the current state and does not depend on the input letter. Each finite transducer T can be regarded as a nondeterministic two-tape finite u;v automaton, where a transition q ! p is interpreted as follows: while in state q the automaton T reads simultaneously the word u from the first tape and the word v from the second tape. Both tapes are one way, and thus a step in a computation can be described as the nondeterministic action q  .ux; vy/ 7 ! p  .x; y/:

This definition extends naturally to the multitape automata already considered by Rabin and Scott [71]. Let T be a transducer given in (2), and consider the set E of transitions also as an alphabet. A sequence of transitions ˛ D t1 t2    tn

with ti D .pi ; ui ; vi ; pi C1 / 2 E for i D 1; 2; : : : ; n

(3)

is a computation of T from the state p1 2 Q to the state pnC1 2 Q. This can be described graphically as follows: ˛W p1

u1 ;v1

! p2

u2 ;v2

! 

un ;vn

! pnC1 :

The word u1 u2    un is called the input in ˛ , and v1 v2    vn is the output of ˛ according u;v u;v to T . We also write p ! q or simply p ! q , if there exists a computation ˛ from p to q with input u and output v . The computation ˛ in (3) is accepting if p1 is the initial state and pnC1 is a final state. 2.2. Relations realised by transducers. We say that the transducer T D .Q; A; B; E; q; F / realises the relation  D ¹.u; v/ j u is the input and v the output word of an accepting computation ˛ of Tº:

A relation   A  B  is called a rational transduction or a finite–state transduction if it is realised by a finite transducer. We may consider a rational transduction   A B   also as a function W A ! 2B , where .u/ D ¹v j .u; v/ 2 º. The domain of  is the set dom./ D ¹w j .w/ ¤ ;º. We let T D ¹ j  is a rational transductionº

denote the full family of rational transductions.

3. Finite transducers and rational transductions

83

Example 2.3. The transducer T0 of Figure 2 is simple and sequential. It maps deterministically an input binary word w0 to its cyclic shift 0w . The empty word is fixed by T0 . Now T0 realises the rational transduction 0 D ¹.w0; 0w/ j w 2 ¹0; 1ºº [ ¹."; "/º. We have dom.0 / D ¹0; 1º 0 [ ¹"º. 0; 0

1; 1 0; 1

q0

q1 1; 0

Figure 2. A finite transducer T0 of two states

Similarly, the relation 1 D ¹.w1; 1w/ j w 2 ¹0; 1ºº [ ¹."; "/º is rational, and so is their union  D 0 [ 1 , which can be shown by introducing a new initial state where transitions labelled by the pair ."; "/ lead to the initial states of the transducers T0 and T1 . Example 2.4. The transducer T of Figure 3, where all states are final, computes division by 3, i.e., it realises the function jnk : n7 ! 3 The state qk represents the remainder k for k D 0; 1; 2. 0; 0

1; 1 1; 1

0; 1

q0

q1

q2

1; 0

0; 0

Figure 3. A finite transducer realising division by 3

Each rational transduction can be realised by a normalised transducer in the following sense, where each long transition p p

a1 ;"

! p1 !    ! pn

an ;" 1

a1 an ;b1 bm

! pn

! q is split into short transitions

";b1

! q1 !    ! qm

";bm 1

! q;

where each intermediate pi and qi is a new state. Theorem 2.1. Let   A  B  be a rational transduction. Then  is realised by a finite transducer T with a state set Q such that the transitions satisfy E  Q  .A [ ¹"º/  .B [ ¹"º/  Q:

Tero Harju and Juhani Karhumäki

84

Let   A  B  be a rational transduction. Then  is finite-valued if j.u/j < 1 for all u 2 A . If here j.u/j 6 1, then  is said to be a rational function. Note that a rational function is a partial function, i.e., it need not map every input word. The following result due to Elgot and Mezei [21] is fundamental to rational transductions; for a proof see Berstel [5]. Also, the result can be proved using the morphic representation of rational transductions; see Theorem 3.7. There one only needs to show directly that morphic compositions are rational, see Theorem 2.4. Theorem 2.2. Rational transductions are closed under composition: if 1 and 2 are rational transductions, so is 2 ı 1 , where the composition is taken relationwise, i.e., v 2 2 ı 1 .u/ if and only if there exists a word w such that w 2 1 .u/ and v 2 2 .w/. The closure of rational transductions under taking inverses is more straightforward; see e.g., Aho and Ullman [1]. This follows also from the equivalence of rational transductions and rational relations; see Theorem 2.8. Theorem 2.3. Rational transductions are closed under inverse mappings: if  is a rational transduction, so is  1 (taken as the inverse relation). It is an easy exercise to show that morphisms are rational relations. Indeed, for a morphism hW A ! B  let T be the finite transducer with a single state q together with a;h.a/

the transitions q ! q . Then T realises h as a relation. It follows that the rational transductions are closed under taking morphic images: if  is a rational relation, so is ¹.u; h.v// j v 2 .u/º. By Theorem 2.3, the inverse morphism h 1 is also rational, and hence all morphic compositions are rational. We have the following closure properties of rational transductions. Theorem 2.4. The family T of rational transductions is closed under the operations finite union, concatenation, iteration, and morphic compositions. Example 2.5. Let X D ¹w1 ; : : : ; wn º  AC be a finite set of nonempty words and let V D ¹x1 ; : : : ; xn º be an alphabet of unknowns. Define a morphism ˛W V  ! X  by ˛.xi / D wi for each i D 1; 2; : : : ; n. Write

ker.˛/ D ¹.u; v/ 2 V   V  j ˛.u/ D ˛.v/º

(4)

for the kernel of the morphism ˛ , that is, ker.˛/ consists of the solutions of all equations satisfied by the set X . Since ker.˛/ D ˛ 1 ˛ , we have that ker.˛/ is a rational transduction. 2.3. Nonrational transductions. Let  be a rational transduction realised by a finite transducer T . It is immediate that if card ..w// D 1 for some word w , then T ";v necessarily has a computation p ! p for some state p and a nonempty output word v . Also, if  is finite valued, i.e., card ..w// < 1 for all words w , then there exists a ";v constant k such that in any accepting computation a subcomputation p ! q satisfies ";v jvj 6 k . In particular, T can be assumed not to contain any loops p ! p that read

3. Finite transducers and rational transductions

85

the empty word. From this observation, we obtain the following restrictive property of rational transductions. Lemma 2.5. If a rational transduction  is finite valued, then there exists a constant k such that jvj 6 kjuj for all v 2 .u/. Example 2.6. Let A D ¹a; bº and B D ¹bº. A word v D a1 a2    an is a (scattered) subword of a word w if w D w1 a1 w2 a2    an wnC1 for some factors wi 2 A of w . Consider the function W A ! B  defined by .w/ D b m if w has exactly m subwords ab . Then for w D an b n , we have card..w// D n2 , and hence, by Lemma 2.5,  is not a rational function. By designing an appropriate reconstruction of a realising transducer, we can easily deduce the following result. Lemma 2.6. Let   A  B  be a rational transduction and R  A a regular language. Then the image .R/  B  is a regular language. Example 2.7. Let bin.n/ D a0 a1    ak denote the binary representation of the nonnegative natural number n D a0 C a1 2 C    C ak 2k . A pair .bin.n/; bin.m// is represented over ¹.0; 0/; .1; 0/; .0; 1/; .1; 1/º as follows where we assume for convenience that bin.n/ and bin.m/ have the same length k . For bin.n/ D a0 a1    ak and bin.m/ D b1 b2    bk , let

bin.n; m/ D .a0 ; b0 /.a1 ; b1 /    .ak ; bk /:

Define a relation  by .bin.n; m// D bin.nm/. Now, one has both bin.2k 1/ D 1k and bin.2k 1 / D 0k 1 1 as well as bin..2k 1/2k 1 / D 0k 1 1k . The language k 1 R D ¹.1; 0/ .1; 1/ j k > 1º is regular, but its image .R/ D ¹0k 1 1k j k > 1º is nonregular. By Lemma 2.6 binary multiplication is not realisable by finite transducers. Example 2.8. Let A D ¹a; bº be a binary alphabet, and define a Parikh-type function W A ! A by .w/ D an b m ;

where jwja D n and jwjb D m:

Choose a regular subset R D .ab/  A . Then its image .R/ D ¹an b n j n > 0º is a nonregular language. Hence  is not a rational function even though both its domain A and image .A / D a b  are regular languages. Since finite transducers have the same general structure as finite automata, the pumping lemma also holds for rational transductions. The pumping in Theorem 2.7 comes from the loops in the realising finite transducer. Theorem 2.7. For each rational transduction   A  B  , there exists a positive integer m such that if v 2 .u/ and juj C jvj > m, then u D u1 w1 u2 and v D v1 w2 v2 with 0 < jw1 j C jw2 j 6 m and v1 w2n v2 2 .u1 w1n u2 / for all n > 0.

86

Tero Harju and Juhani Karhumäki

2.4. Rational relations and functions. Rational transductions are equivalent to rational relations, i.e., rational subsets of the direct product monoids A  B  , where the product is taken pointwise: .u1 ; u2 /  .u2 ; v2 / D .u1 u2 ; v1 v2 /; see Chapter 2 for a more general aspect of this topic. By definition the family of rational relations in A  B  is the smallest family Rat D Rat.A  B  / of subsets such that 1. Rat contains ; and the singletons ¹.u; v/º for all u 2 A and v 2 B  ; 2. if 1 ; 2 2 Rat, then also 1 [ 2 ; 1  2 2 Rat; S S1 n n  3. if  2 Rat, then also C D 1 nD1  2 Rat and  D nD0  2 Rat.

Although the definitions of finite transducers and rational relations are very different they produce the same family of relations. This was shown by Elgot and Mezei [21] in 1965. Theorem 2.8. A relation   A  B  is a rational transduction if and only if  is a rational relation in Rat.A  B  /. Example 2.9. Let A D ¹a; b; cº, and consider Rat.A  A /. Both of the relations R D ¹.an ; b m c n / j n; m > 0º

and S D ¹.am ; b m c n / j n; m > 0º

are easily seen to be rational relations. Indeed, one has the equalities R D ."; b/ .a; c/ and S D .a; b/ ."; c/ . However their intersection R \ S D ¹.an ; b n c n / j n > 0º

is not rational by the pumping lemma. Hence the family of rational relations, and thus rational transductions, are not closed under intersection. Also, Rat.A  A / is not closed under complementation. This follows from the above, since, letting X c denote the complement of a set X , then X \ Y D .X c [ Y c /c , and we have already seen that Rat.A  A / is closed under union.

Rational functions form an important subfamily of rational transductions, and they are in many respects easier than the general ones. Indeed, as we shall see in Theorem 5.2, the equivalence problem is undecidable for rational transductions in general, but according to Schützenberger [78] and Blattner and Head [6] the problem is decidable for rational functions; see also Berstel [5]. This result also can be deduced from the Equality Theorem of Eilenberg [20] or its generalisation [40]; see Sakarovitch [73]. More generally it was shown by Gurari and Ibarra [36] that it is decidable, even in polynomial time, whether or not a rational transduction is k -valued for a nonnegative k . Rational functions are not defined directly by finite transducers. However, a result due to Eilenberg [20] relates rational functions to a specific type of transducers. For this, let T be a sequential transducer realising a relation   A  B  such that T satisfies the following uniqueness condition .p; a; u; q/ 2 E and .p; a; v; q/ 2 E H) u D v:

Then T is said to be unambiguous if each input u 2 A has at most one accepting computation T .

3. Finite transducers and rational transductions

87

The following is due to Eilenberg [20]. Note that the converse of the statement is trivially true. Theorem 2.9. For each rational function  there exists an unambiguous finite transducer realising  . The proof of Theorem 2.9 uses the following cross-section theorem. Theorem 2.10. Let hW A ! B  be a morphism, and R  A a regular language. There exists a regular subset X  R such that h restricted to X is a bijection onto h.R/. 2.5. Matrix representations. Consider the family S of the regular languages contained in B  for an alphabet B . We allow the operations of concatenation and union of languages in S . Thus, in algebraic terms, S will form a semiring with an identity element ¹"º. Let Q be a finite set, and denote by S QQ the set of all matrices with entries in S . Hence each matrix M 2 S QQ can be regarded as a function Q  Q ! S . The identity matrix I is the one with I.p;p/ D ¹"º and I.p;q/ D ; if p ¤ q . Let A be an alphabet, and let W A ! S QQ be a monoid morphism. More specifically, ."/ D I , the identity matrix of S , and for all words u; v 2 A , .uv/ D .u/.v/. A matrix representation M D h ; Q; q; F i

consists of a finite set Q (called states), an initial state q , a set of final states F , and a semigroup morphism W A ! S QQ . We say that M realises a transduction   A  B  if [ .u/q;p : .u/ D p2F

Theorem 2.11. Let   A  B  be a relation. Then  is a rational transduction if and only if there exists a matrix representation M realising  . The inverse relation of a rational transduction is also a rational transduction. Pin and Sakarovitch [67] showed how to obtain the inverse in terms of matrix representations. Matrix representations are used by Reutenauer and Schützenberger [72] for the minimisation problem of rational functions. The problem area of minimisation still has many unanswered questions. Also, rational transductions can be characterised in terms of formal power series; see Eilenberg [20] and Salomaa and Soittola [77]. A transductionP  A  B  is a rational transduction if and only if the formal power series w2A .w/w is recognisable. Here the coefficients .w/ are taken in the semiring Rat.B  / of rational subsets of the monoid B  . Finally, a matrix representation of rational transductions gives a direct connection to weighted automata as treated in Chapter 4.

88

Tero Harju and Juhani Karhumäki

3. Morphic representations Two classical results in formal language theory, due to Chomsky and Schützenberger [12] and to Greibach [31], give fundamental representations for the context-free languages in terms of elementary operations. In this section we study representations of rational transductions by morphic compositions. 3.1. General results for language families. We denote by \R the language operation of intersection with regular languages: \ R.L/ D ¹L \ R j R regular languageº:

The following result is due to Chomsky and Schützenberger [12]. Recall that a Dyck language consists of well-formed sequences of parentheses (with a finite number of different pairs of parentheses). Theorem 3.1. Each context-free language L has a representation L D h.D \ R/;

where D is a Dyck language, R a regular language and h a morphism, that is, each context-free language belongs to H ı .\ R/.D/ for some Dyck language D . Greibach [31] on the other hand showed that each context-free language can be obtained from a single language using only inverse morphisms.

Theorem 3.2. There exists a context-free language U2 such that each context-free language belongs to H 1 .U2 /. Culik and Maurer [17] obtained a similar result for the family of recursively enumerable languages. Morphic representations of recursively enumerable languages were found in [75], [13], and [27]. The following result by Geffert [30] is especially interesting. Theorem 3.3. Let L  A be a recursively enumerable language. Then there are two morphisms h; gW B  ! C  with h nonerasing such that L D ¹h.w/

1

g.w/ j w 2 B  º \ A :

A characterisation of recursively enumerable languages in terms of morphisms was proved by Culik [13] in 1979. The proof of this result refines the basic simulation idea of the proof of the Post Correspondence Problem. For another formulation of the proof, see Turakainen [82]. A morphism hW A ! B  is called a projection if for all a 2 A either h.a/ D a or h.a/ D ".

Theorem 3.4. For each recursively enumerable language L  A , there effectively exists morphisms ; h; g such that L D .e.h; g//;

where h is nonerasing and  is a projection, and e.h; g/ denotes the set of the solutions w with h.w/ D g.w/ that cannot be factored into smaller solutions.

3. Finite transducers and rational transductions

89

3.2. Morphic representations of rational transductions. The first concrete representation of rational transductions by simple operations was given by Nivat [66] in 1968. 

Theorem 3.5. A mapping W A ! 2B is a rational transduction if and only if there exist a regular set R and two morphisms g and h such that  D ¹.g.w/; h.w// j w 2 Rº. In other words, the family of rational transductions equals H ı .\ R/ ı H 1 . A function m W A ! .A [ ¹mº/ is a marking (or an endmarking) if it adjoins a special letter m, called a marker, at the end of each word: m .w/ D wm for each w 2 A . We denote by M the family of all markings. The families H, H 1 , and M all consist of rational transductions, and since, by Theorem 2.2, the family of rational transductions is closed under composition, the compositions of morphisms, inverse morphisms and markings are again rational transductions. These compositions will be referred to as rational compositions. The operation \ R can be rewritten as a rational composition, as shown in [51] and [80]. In the following proof we follow the techniques of Latteux and Leguy [57]. The proof of Theorem 3.6 gives a constructive method which is shared by many other representation results. The idea is to simulate a computation, in the present case that of a finite automaton, by a composition of morphisms. Theorem 3.6. \ R  H

1

ıHıH

1

ı M.

Proof. Let A be a finite nondeterministic automaton accepting R  A with the state set Q D ¹0; 1; : : : ; N º, where 0 is the initial state and, without restriction, N is the unique final state. Also, without loss of generalisation, we can assume that there are no a transitions entering the initial state 0 of A. Each transition p ! q of A is represented by a letter Œp; a; q. Let a

‰ D ¹Œp; a; q j p ! q in Aº

be the corresponding alphabet, and let  D m be a marker, where m is a new symbol. Let d be a new letter for counting purposes. We define the morphisms h1 W .‰ [ ¹mº/ ! .A [ ¹mº/ ; h2 W .‰ [ ¹mº/ ! .A [ ¹d º/ ; h3 W A ! .A [ ¹d º/

as follows: let a 2 A, h1 .Œi; a; j / D a;

i

h2 .Œi; a; j / D d ad N

h3 .a/ D ad :

N j

;

h1 .m/ D m;

h2 .m/ D d N ;

Tero Harju and Juhani Karhumäki

90

Now each word w D a1 a2    an is mapped as follows:

(by m )

a1 a2    an 7 ! a1 a2    an m

(by h1 1 )

7 ! Œi1 ; a1 ; j1 Œi2 ; a2 ; j2     Œin ; an ; jn m

7 ! d i1 a1 d N

j1

D d i1 a1 d N

d i2 a2 d N j1 Ci2

j2

a2 d N

   d in an d N

j2 Ci3

jn

  dN

jn

dN

(by h2 )

1 Cin

an d 2N

jn

:

In order to be able to continue with the inverse of the uniform morphism h3 , we must have i1 D 0, j1 D i2 ; : : : ; jn 1 D in and jn D N , in which case a1 a2    an 2 R. We then obtain a1 a2    an by h3 1 as required. When the above representation of \ R is substituted in Nivat’s theorem, Theorem 3.5, we have that each rational transduction can be written in the form T D H ı .\ R/ ı H

where one easily shows that M ı H

1

DHıH

1

DH

1

1

ıHıH

1

ı M. Since H

ıMıH 1

Theorem 3.7. The family of rational transductions equals TDHıH

1

ıHıH

1

ıH

1

1

;

DH

1

, we have

ı M:

In particular, rational compositions of length five suffice to represent all rational transductions. Also, the elements of H 1 ı H ı H 1 ı H ı M are rational transductions, and hence, by Theorem 3.7, they can be represented by elements from H ı H 1 ı H ı H 1 ı M. In fact, as a lengthy proof due to Latteux and Turakainen [58] shows, these two classes are the same: Theorem 3.8. The family of rational transductions is equal to TDHıH

1

ıHıH

1

ıMDH

1

ıHıH

1

ı H ı M:

As shown by Turakainen [81] the compositions of morphisms and inverse morphisms, without the markings, are exactly the rational transductions realised by simple transducers. Moreover, the following result was proved in a sequence of papers [57], [81], and [58]. Theorem 3.9. For the family of morphic compositions, we have HıH

1

ıHıH

1

DH

1

ıHıH

1

ıH

and it equals the family of rational transductions realised by simple transducers. If we restrict ourselves, as in Turakainen [81], to "-free transducers, then the representations become even shorter. For this result, denote by H" the family of nonerasing morphisms and by Hu the family of uniform morphisms. Theorem 3.10. The family of rational transductions realised by "-free transducers is equal to Hu 1 ı H" ı H 1 ı M. Moreover, the family of rational transductions realised by simple "-free transducers is equal to Hu 1 ı H" ı H 1 .

3. Finite transducers and rational transductions

91

The above results are effective in the sense that given a finite transducer realising  , we can effectively construct the required morphisms in Theorem 3.8. Similarly, if  is realised by a simple transducer, then the representation in Theorem 3.9 is effective. Nevertheless, we have no effective way to decide whether for a rational transduction a representation without markers exists. This was proved by Harju and Kleijn [42] using a strong undecidability result due to Ibarra [47]. Theorem 3.11. It is undecidable whether or not a rational transduction has a representation without markers, i.e., it is undecidable whether or not a rational transduction is realised by a simple transducer. In fact, it was shown by Ibarra [47] that it is undecidable whether or not a given transducer realises the relation A .u/ D ¹d j j juj 6 j 6 3  jujº;

where A is an alphabet with at least two letters. As proven in [57] the number of morphisms in Theorem 3.8 cannot be reduced further. In Figure 4 from [57], we have drawn a diagram of inclusions for the morphic compositions. 1

HıH

H

1

ıHıH

H

1

ıHıH

1

H

ıH

1

1

DH

1

ıHıH

HıH

1

HıH

1

ıH

ıH

1

H

Figure 4. The hierarchy of morphic compositions, where a path upwards means proper inclusion and a horizontal disconnection denotes incomparability.

3.3. Morphic representations of rational functions. There are many interesting subfamilies of rational transductions for which a morphic representation is known; see e.g., Harju et al. [43] and [44] for the case of rational functions. Let F be the family of rational functions and let U be the family of rational transductions realised by unambiguous transducers, i.e., finite transducers with at most one accepting computation for each input word.

92

Tero Harju and Juhani Karhumäki

In general, if X is the family of transductions realised by transducers from TX , we let X denote the family of transductions realised by the simple transducers from TX . Clearly, each  2 U is a rational function. It was shown in Theorem 2.9 that these two classes are equal, F D U. However, by [43], the simple versions of these two families are different. In fact, we have a proper inclusion U  F . Theorem 3.12. For the rational transductions realised by simple unambiguous transducers, we have U D ¹ 2 F j the domain of  is a free monoid º:

Next we observe that if the inverse morphisms in the morphic compositions are taken from Hi , the family of injective morphisms, then the compositions represent rational functions. Also, the converse is true, as shown in the next theorem from [43]. Theorem 3.13. For the rational transductions realised by (simple) unambiguous transducers, U D F D F M D HHi 1 HM D .H [ Hi

U D HHi 1 H D .H [ Hi 1 / :

1

[ M/ ;

The family F of rational functions realised by simple transducers is still missing a characterisation. For this natural family of transductions we have found no (natural) representation in terms of morphisms. We dare to suspect that there exist no such representation. This suspicion is partly confirmed in [43].

4. Applications Finite transducers arise naturally in many applications where one needs to transform strings in a local manner. In this section we consider two applications of transductions that have a ‘classical nature.’ The first one gives a completely new solution to a classical problem of existence of small aperiodic sets of tiles covering the plane. The techniques in this problem are due to Kari [54]. The second application shows that the isomorphism problem is decidable for subsemigroups of free semigroups. This second application follows the treatment of Choffrut et al. [10]. 4.1. Nonperiodic tilings. In the basic theory of tilings, introduced by Hao Wang [84] in 1961, a Wang tile is described as a unit square in the plane with labelled or coloured edges. Each tile has four directions: it has North, East, South and West edges, and these directions correspond to the natural directions of the plane. The labels come from a finite alphabet of symbols which in this section will be selected to be a subset of the rational numbers. Given a finite set of Wang tiles T, in a tiling each tile is a copy of a tile from T and will be placed on a unit cell of the integer lattice Z  Z. The tiling covers the plane such that neighbouring tiles share an edge of the same label. The tiles are not allowed

3. Finite transducers and rational transductions

93

to be rotated. A tiling is periodic if it is invariant under some nontrivial translation of the plane; otherwise it is aperiodic. It was first shown by Berger [4] that there exist aperiodic sets of tiles. The set in his proof consisted of more than 20 000 tiles. The number of tiles needed in a tiling of the plane was gradually reduced until Kari [54] showed that 14 tiles suffice by using finite transducers. Using the same techniques the bound was reduced to 13 by Culik [14]. A tile set of 14 tiles is obtained from the finite transducer T2=3 of Figure 5 where no initial state or final states need to be determined. A transition .q; a; b; s/ corresponds to the equality s D 2a=3 C q b , where q is the current remainder (state) and s is the new remainder.

2=3 2; 2

2; 1 1; 0

1; 1 1; 1 1=3

0 2; 1 2; 2

2; 1

1; 0

1; 1 1=3

Figure 5. Transducer T2=3 for aperiodic tiling with respect to multiplication by 2=3

Similarly, Figure 6 illustrates the transducer T2 , where the multiplication is by 2. The transducer T2 is not complete in the sense that it does not read the letter 0 while in state 1. 1; 2

1; 2 0; 1

1

0 1; 1

Figure 6. Transducer T2 for aperiodic tiling with respect to multiplication by 2 a;b

Each transition q ! s of these transducers defines a tile where the West and East edges are labelled by q and s , respectively, and the South and North edges are labelled by a and b , respectively. However, the symbol 0 is represented in the tiles by different labels 0 and  in T2=3 and T2 . Let TK denote the set of tiles in Figure 7.

Tero Harju and Juhani Karhumäki

94 1 1

1 1

1

0 0

2

1

2

2

1

1

0 2 2

2 3

2

1 3

1

1 3 2

1 2 3

1 3

0

2

1 1 3

1

1 3

1

1

1

0 1

1 3

1 3

1

2

1 1 3

1 2 3

1 3

1 3

1

2 3

0

0

Figure 7. The set TK of tiles

Lemma 4.1. The tile set TK does not allow a periodic tiling. Proof. Suppose there is a translation, determined by .0; 0/ 7! .x; y/ that preserves a given tiling according to TK . Consider the tiling of a x  y rectangle R of the lattice. Since the copies of this rectangle tile the plane, the string of the labels of the bottom row equals that of the top row. Also, the strings on the left and right borders of R must be equal. Let ni be the sum of the labels on the i -th horizontal line of R. From the transducers T2 and T2=3 , we derive ni C1 D qi ni for i D 0; : : : ; y 1, for qi 2 ¹2; 2=3º, correspondingly. Now, by the equality of the bottom and top rows, we have that n0 D ny , and therefore n0 D q0    qy 1 n0 and hence q0    qy 1 D 1 (since n0 > 0). The latter, however, is not possible since qi 2 ¹2; 2=3º for each i . On the other hand we have the following result.

Theorem 4.2. The tile set TK allows an aperiodic tiling. The proof of Theorem 4.2 is based on properties of the Beatty sequences Ai .˛/ D bi ˛c .i 2 Z/ for irrational numbers ˛ . The balanced sequence B.˛/ of ˛ is defined by Bi .˛/ D Ai .˛/ Ai 1 .˛/. The transducers T2 and T2=3 are related to balanced representations of numbers: if this transducer has as input an infinite balanced representation B.˛/ of ˛ , it will output B.2˛/ and B 23 ˛ , correspondingly. We do not go into the details of the proof, but refer to Kari [54].

4.2. Isomorphism of F-semigroups. A semigroup is called an F-semigroup if it is a subsemigroup of some word semigroup AC . It is well known that for all F-semigroups F the set F n F 2 is the unique minimal generating set for F ; see, e.g., Lothaire [65]. Let S  AC be a set of words, and let X be an alphabet. Consider a pair .u; v/ 2 X C  X C as an equation u D v in the variables x 2 X so that a morphism ˛W X C ! S C is a solution of u D v if and only if ˛.u/ D ˛.v/. Two relations

3. Finite transducers and rational transductions

95

R1 ; R2  X C  X C are said to be equivalent if they have the same solutions as systems of equations. In the proof of Lemma 4.4 we use an effective special case of Ehrenfeucht’s Conjecture. The general conjecture was proved independently by Albert and Lawrence [2] and Guba [35].

Theorem 4.3. Let X be an alphabet. For each relation R  X C  X C , there exists a finite subrelation R0  R equivalent to R. The following lemma shows that each finitely generated F-semigroup has a rational presentation hX I Ri, where R .D ker.˛// is a rational relation, where ker.˛/ is the kernel of ˛ ; see (4). Lemma 4.4. Let S C be an F-semigroup for a finite S  AC , and let ˛W X ! S be a bijection. Then ker.˛/ is a rational transduction, and one can effectively find a finite equivalent subrelation of ker.˛/. Proof. We have ker.˛/ D ˛ 1 ˛ , and hence ker.˛/ is a rational transduction. Let T be a finite transducer realising ker.˛/: T accepts .u; v/ if and only if ˛.u/ D ˛.v/. Assume that T has s states and let R  ker.˛/ be obtained from the computations of T of length at most 2s . It can be shown using the pumping lemma and a small combinatorial lemma on words that the relation R is as required in the claim. For more details, see [11] and [41]. We say that a pair ŒX I R is an F-presentation of S C  AC , if

i. there is a bijection ˛W X ! S such that R  ker.˛/, and ii. R is equivalent to ker.˛/.

An F-presentation ŒX I R of S C uniquely determines the F-semigroup S C up to isomorphism, i.e., modulo the presenting bijection ˛W X ! S . Therefore, S C is determined by the F-presentation together with the mapping ˛W X ! S . The following result is now obvious. Lemma 4.5. Two F-semigroups have a common F-presentation ŒX I R if and only if they are isomorphic. By Lemma 4.4, all finitely generated F-semigroups do have finite F-presentations. Also, we have an effective solution to the synthesis problem: Theorem 4.6. Each finitely generated F-semigroup S C  AC has a finite F-presentation that can be effectively found. Example 4.1. Let S1 D ¹ab; a; baºC and S2 D ¹aaab; aa; abaaºC be two F-semigroups in ¹a; bºC . They have a common F-presentation ŒX I xy D yz, where X D ¹x; y; zº. The corresponding bijections are given by the natural orders of the given generators of S1 and S2 .

96

Tero Harju and Juhani Karhumäki

Example 4.2. There is a crucial difference between the semigroup presentations and the F-presentations of F-semigroups. Consider X D ¹x; yº and R D ¹.xy; yx/º. Now hX I Ri is a presentation of a 2-generator free commutative semigroup. However, ŒX I R is not an F-presentation of any F-semigroup. Indeed, assume that S C D ŒX I R for some F-semigroup S C  AC with a bijection ˛W X ! S . Since now ˛.x/˛.y/ D ˛.y/˛.x/, there exists a primitive word w 2 AC such that ˛.x/ D w k and ˛.y/ D w t for some k; t > 1. However, now .x t ; y k / is a relation of S C in its F-presentation, that is, ˛.x t / D ˛.y k /. But ker.˛/ and ¹.xy; yx/º are not equivalent. Theorem 4.7. It is decidable whether or not a finitely generated F-semigroup S C has a given presentation ŒX I R for finite X and R  X C  X C .

Proof. Since both S and X are finite, there are only finitely many bijections ˛W X ! S . By Lemma 4.4, the congruence ker.˛/ is a rational transduction and thus it suffices to show that the problem of determining whether a rational relation K is equivalent to R is decidable. This problem reduces, again by Lemma 4.4, to checking whether two finite relations K0  K and R are equivalent. The latter problem was shown to be decidable in [15] by using Makanin’s result stating that one can effectively test whether an equation has a solution in a free semigroup. We shall now turn to the isomorphism problem of F-semigroups. Theorem 4.8. It is decidable whether or not two finitely generated F-semigroups are isomorphic. Proof. Let S1 ; S2  AC be finite sets which, without loss of generality, are supposed to be the minimal generating sets for S1C and S2C , respectively. If card.S1 / ¤ card.S2 /, then obviously S1C and S2C are not isomorphic. Let then card.X / D card.Si /. By Lemma 4.5, S1C and S2C are isomorphic if and only if they have a common F-presentation which, in turn, holds if and only if there exist bijections ˛1 W X ! S1 and ˛2 W X ! S2 such that ker.˛1 / D ker.˛2 /. The latter condition is decidable, since we can construct a finite equivalent subrelation Ri  ker.˛i /, for both i D 1; 2, by Lemma 4.4, and then we only have to check that R1  ker.˛2 / and R2  ker.˛1 /. The following two problems remain open.

Problem 1. For given finite set X and a finite relation R  X C  X C ; is it decidable whether or not ŒX I R is an F-presentation of some subsemigroup of AC ? Problem 2. Is it decidable whether or not two F-semigroups S1C and S2C generated by regular languages S1 and S2 are isomorphic? Contrary to the isomorphism problem of finitely generated F-semigroups the problem of determining whether two F-semigroups satisfy a common relation (in their F-presentations) turns out to be undecidable; see Choffrut et al. [10]. 4.3. Ehrenfeucht’s conjecture for rational transductions. According to an equivalent formulation of Ehrenfeucht’s conjecture every system of word equations ui D vi

3. Finite transducers and rational transductions

97

(in several unknowns) has an equivalent finite subsystem. This does not generalise to systems of language equations. Indeed, by Theorem 5.19, the system of language equations X n Z D Y n Z for n D 0; 1; : : :

with three unknowns X; Y; Z , does not have an equivalent finite subsystem. We may generalise the problem in Ehrenfeucht’s Conjecture on test sets to mappings that are more general than morphisms. Let F be a family of mappings of words, and let L be a language. We say that a finite subset T of L is a test set of L with respect to F, if for all ˛1 ; ˛2 2 F, ˛1 jT D ˛2 jT H) ˛1 jL D ˛2 jL;

where ˛jT denotes the restriction of ˛ to the subset T of L. We now consider two examples of such generalisations from Karhumäki [49] and Lawrence [59]. In one of these cases the conjecture holds, while in the two others it fails. Example 4.3. Let for all n > 1, ˛n W a ! a be the function defined by ´ ak if k < n; k ˛n .a / D akC1 if k > n: Clearly, ˛n can be realised by a sequential transducer, and hence ˛n is a rational function. Denote a0 is not locally finite, so it cannot be summed. We therefore define the iteration r  only for r proper: a series r is proper if .r; "/ D 0. Then, for n > jwj, one has .r n ; w/ D 0, so the family .r n /n>0 is locally finite and we can set X X r D r n or equivalently .r  ; w/ D .r n ; w/: n>0

06n6jwj

C

For the Boolean semiring and L  † , we get

supp.r  / D .supp.r//

and .1L / D 1L :

Recall from Chapter 1 (§ 2.1) that a language is rational if it can be constructed from the finite languages by union, concatenation, and Kleene-iteration. Here, we give the analogous definition for series: Definition 4.1. A series from S hh† ii is rational if it can be constructed from the monomials sa for s 2 S and a 2 † [ ¹"º by addition, Cauchy-product, and iteration (applied to proper series, only). The set of all rational series is denoted by S rat hh† ii.

Observe that the class of rational series is closed under scalar multiplication since s" is a monomial, s  r D s"  r and r  s D r  s" for r 2 S hh† ii and s 2 S .

Example 4.1. Consider the Boolean semiring B and r 2 Bhh† ii. If r is a rational series, then the above formulas show that supp.r/ is a rational language since supp commutes with the rational operations C, , and  for series and [, , and  for languages. Now suppose that, conversely, supp.r/ is a rational language. To show that also r is a rational series, one needs that any rational language can be constructed in such a way that Kleene-iteration is only applied to languages in †C . Having ensured this, the remaining calculations are again straightforward. Thus, indeed, our notion of rational series is the counterpart of the notion of a rational language. Hence, rational languages are precisely the supports of series in Brat hh† ii and recognisable languages are the supports of series in Brec hh† ii (cf. Example 2.1). Now

4. Weighted automata

121

Kleene’s theorem (Theorem 5.11 in Chapter 1) implies Brec hh† ii D Brat hh† ii. It is the aim of this section to prove this equality for arbitrary semirings. This is achieved by first showing that every rational series is recognisable. The other inclusion will be shown in § 4.2. 4.1. Rational series are recognisable. For this implication, we generalise the techniques from §§ 3.1–4.1 in Chapter 1 from classical to weighted automata and prove that the set of recognisable series contains the monomials sa and s" and is closed under the necessary operations. To show this closure, we have two possibilities (a third one is sketched after the proof of Theorem 5.1): either the purely automata-theoretic approach that constructs weighted automata, or the more algebraic approach that handles linear representations. We chose to give the automata constructions for monomials and addition, and the linear representations for the Cauchy-product and the iteration. The reader might decide which approach she prefers and translate some of the constructions from one to the other. There is a weighted automaton with just one state q and behaviour the monomial s": just set in.q/ D s , out.q/ D 1 and wt.q; a; q/ D 0 for all a 2 †. For any a 2 †, there is a two-states weighted automaton with the monomial sa as behaviour. If A1 and A2 are two weighted automata, then the behaviour of their disjoint union equals kA1 k C kA2 k. We next show that also the Cauchy-product of two recognisable series is recognisable: Lemma 4.1. If r1 and r2 are recognisable series, then so is r1  r2 . Proof. By Theorem 3.2, the series ri has a linear representation .i ; i ; i / of dimension Qi with Q1 \ Q2 D ;. We define a row vector  and a column vector of dimension Q D Q1 [ Q2 as well as a matrix .w/ for w 2 † of dimension Q  Q: X 0 1  1 2 2 1 .w/ 1 .u/ 1 2 2 .v/ 

 B C 1 wDuv;  D  0 ; .w/ D @ : A; D v¤"

2 2 0  .w/ The reader is invited to check that  is actually a monoid homomorphism from .† ; ; "/ into .S QQ ; ; E/, i.e., that .; ; / is a linear representation. One then gets X   .w/  D 1 1 .w/ 1 2 2 C 1 1 .u/ 1 2 2 .v/ 2 wDuv v¤"

D .r1 ; w/  .r2 ; "/ C

X .r1 ; u/.r2 ; v/

wDuv v¤"

D .r1  r2 ; w/:

By Theorem 3.2, the series k.; ; /k D r1  r2 is recognisable.

Manfred Droste and Dietrich Kuske

122

Lemma 4.2. Let r be a proper and recognisable series. Then r  is recognisable. Proof. There exists a linear representation .; ; / of dimension Q such that r D k.; ; /k. Consider the homomorphism 0 W .† ; ; "/ ! .S QQ ; ; E/

defined, for a 2 †, by

0 .a/ D .a/ C  .a/:

Let w D a1 a2    an 2 †C . Using distributivity of matrix multiplication or, alternatively, induction on n, it follows Y 0 .w/ D ..ai / C  .ai // 16i 6n

D

 Y X  ..w1 / C .w1 // 

.wj / :

wDw1 wk wi 2†C

26j 6k

Note that  D  ."/ D .r; "/ D 0. Hence we obtain  Y X  ..w1 / C .w1 // 

.wj /  0 .w/ D wDw1 wk wi 2†C

D

X

26j 6k

Y

 .wj /

wDw1 wk 16j 6k wi 2†C

D .r  ; w/;

as well as 0 ."/ D 0. Hence r  D k.; 0 ; /k C 1" is recognisable. Recall that the Hadamard-product generalises the intersection of languages and that the intersection of regular languages is regular. The following result extends this latter fact to the weighted setting (since the Boolean semiring is commutative). We say that two subsets S1 ; S2  S commute, if s1  s2 D s2  s1 for all s1 2 S1 , s2 2 S2 . Lemma 4.3. Let S1 and S2 be two subsemirings of the semiring S such that S1 and S2 commute. If r1 2 S1rec hh† ii and r2 2 S2rec hh† ii, then r1 ˇ r2 2 S rec hh† ii. Proof. For i D 1; 2, let Ai D .Qi ; ini ; wti ; outi / be weighted automata over Si with kAi k D ri . We define the product automaton A with states Q1  Q2 as follows:

in.p1 ; p2 / D in1 .p1 /  in2 .p2 /; wt..p1 ; p2 /; a; .q1 ; q2 // D wt1 .p1 ; a; q1 /  wt2 .p2 ; a; q2 /; out.p1 ; p2 / D out1 .p1 /  out2 .p2 /:

4. Weighted automata

123

Then, .kAk; w/ D .kA1 k ˇ kA2 k; w/ follows for all words w . For example, for a letter a 2 † we calculate as follows using the commutativity assumption and distributivity: X .kAk; a/ D ..in1 .p1 /  in2 .p2 //  .wt1 .p1 ; a; q1 /  wt2 .p2 ; a; q2 //  .out1 .q1 /  out2 .q2 /// .p1 ;p2 /;.q1 ;q2 /2Q X D .in1 .p1 /  wt1 .p1 ; a; q1 /  out1 .q1 /  in2 .p2 /  wt2 .p2 ; a; q2 /  out2 .q2 // .p1 ;p2 /;.q1 ;q2 /2Q   X in1 .p1 /  wt1 .p1 ; a; q1 /  out1 .q1 / D p1 ;q1 2Q1



 X

in2 .p2 /  wt2 .p2 ; a; q2 /  out2 .q2 /

p2 ;q2 2Q2



D .kA1 k; a/  .kA2 k; a/ D .kA1 k ˇ kA2 k; a/:

We remark that the above lemma does not hold without the commutativity assumption: Example 4.2. Let † D ¹a; bº, S D .P.† /; [; ; ;; ¹"º/, and consider the recognisable series r given by .r; w/ D ¹wº for w 2 † . Then .r ˇ r; w/ D ¹wwº and pumping arguments show that r ˇ r is not recognisable.

Note that the Hadamard product r ˇ 1L can be understood as the “restriction” of rW † ! S to L  † . As a consequence of Lemma 4.3, we obtain that these “restrictions” of recognisable series to regular languages are again recognisable.

Corollary 4.4. Let r 2 S hh† ii be recognisable and let L  † be a regular language. Then r ˇ 1L is recognisable.

Proof. Let A be a deterministic automaton accepting L with set of states Q. Then weight by 1 those triples .p; a; q/ 2 Q  †  Q that are transitions, the initial (resp., final) states with initial (resp., final) weight 1, and set all other weights to 0. This gives a weighted automaton with behaviour 1L . Since S commutes with its subsemiring generated by 1, Lemma 4.3 implies the result. 4.2. Recognisable series are rational. For this implication, we will transform a weighted automaton into a system of equations and then show that any solution of such a system is rational. This generalises the techniques from § 4.3 in Chapter 1. The following lemma (that generalises Proposition 4.6 in Chapter 1) will be helpful and is also of independent interest (cf. § 5 in [35]). Lemma 4.5. Let r; r 0 ; s 2 S hh† ii with r proper and s D r  s C r 0 . Then s D r  r 0 . Proof. Let w 2 † . First observe that s D rs C r 0 D r.rs C r 0 / C r 0

Manfred Droste and Dietrich Kuske

124

D r 2 s C rr 0 C r 0 :: : X D r jwjC1 s C r i r 0: 06i 6jwj

Since r is proper, we have .r ; u/ D 0 for all u 2 † and i > juj. This implies X .r  r 0 ; w/ D .r  ; u/  .r 0 ; v/ i

wDuv

D

 X  X .r i ; u/  .r 0 ; v/

wDuv 06i 6jwj

D

X

.r i r 0 ; w/

06i 6jwj

D .s; w/:

Now let A D .Q; in; wt; out/ be a weighted automaton. For p 2 Q, define a new weighted automaton Ap D .Q; inp ; wt; out/ by inp .p 0 / D 1 for p D p 0 and inp .p 0 / D 0 otherwise. Since all the entry weights of these weighted automata are 0 or 1, we have X X kAk D in.p/wt.p; a; q/a  kAq k C in.p/out.p/" p2Q

.p;a;q/2Q†Q

and for all p 2 Q

kAp k D

X

wt.p; a; q/a  kAq k C out.p/":

.p;a;q/2Q†Q

This transformation proves Lemma 4.6. Let r be a recognisable series. Then there are rational series rij ; ri 2 S hh† ii with rij proper and a solution .s1 ; : : : ; sn / with s1 D r of a system of equations   X Xi D rij Xj C ri : (6) 16i 6n

16j 6n

A series s is rational over the series ¹s1 ; : : : ; sn º if it can be constructed from the monomials and the series s1 ; : : : ; sn by addition, Cauchy-product, and iteration (applied to proper series, only). We prove by induction on n that any solution of a system of the form (6) consists of rational series. For n D 1, the system is a single equation of the form X1 D r11 X1 C r1 with r11 ; r1 2 S rat hh† ii and r11 proper. Hence, by Lemma 4.5, the solution s1 equals  r11 r1 and is therefore rational. Now assume that any system with n 1 unknowns has only rational solutions and consider a solution .s1 ; : : : ; sn / of (6). Then we have X sn D rnn sn C rnj sj C rn 16j º,

I n eq[ı ] Input: An ordered pair of max-plus automata .A; B/ of the same support. Question: hjAj; wi ı hjBj; wi for all w in the support.

The status of I n eq[ı ].A; B/ may be different from the status of I n eq[ı ].B; A/, depending on the subclasses of automata to which A and B belong (see Table 1 below). Fact 1. If A and B are two max-plus automata,

i. E q u i va lenc e (A,B) () I n eq[6](A,B) and I n eq[>](A,B), ii. I n eq[6](A,B) () E q u i va lenc e (A [ B,B). Given a max-plus automaton A, let 0A be the associated automaton in which all the weights have been changed to 0: since the max-plus semiring is positive, 0A is the characteristic automaton of A, which accepts the same words as A, with weight 0. Define P o s i t i v i t y.A/ = I n eq[ 0 if w D wx 2 E; P .x/ 1 ¤ 0I ˆ : h jCj; w i D 0; h jEj; w i D 0 if w D wx 2 E; P .x/ 1 D 0:

Therefore C [ E satisfies L o ca l_ Z ero if and only if P 1 has a root in Nn . th Since the 10 Hilbert problem is undecidable, we conclude that L o ca l_ Z ero is also undecidable.

The problem is also undecidable for max-plus automata valued in Nmax . If A and B are two max-plus automata valued in Zmax , there exists an integer m, such that the

5. Max-plus automata

177

automata A0 and B0 obtained from A and B by adding m to each weight appearing in these automata, are valued in Nmax . Clearly A0 and B0 are equivalent if and only if A and B are. Decision of equivalence in smaller classes of max-plus automata. Recall Proposition 2.3: if S is in NA m b, then S also is. Proposition 4.6. Let A be in NA m b and let B be a max-plus automaton. Then we have that both I n eq[>].A; B/ and E q u i va lenc e.A; B/ are PSPACE-complete in Rmax , while I n eq[].A; B/ D N egat i v i t y.C/. This last problem is polynomial. Globally, I n eq[>].A; B/ is PSPACE-complete. In the same way, I n eq[](A,B) are decidable. According to Proposition 2.4, S A can be given as a finite union of unambiguous automata of the same support, A D i 2I Ai . Let Ci be an automaton realising jBj jAi j (Proposition 2.3). We have

I n eqŒ6.A; B/ () I n eqŒ6.Ai ; B/ D I n eqŒ6.0Ci ; Ci / for all i:

Therefore, I n eqŒ6.A; B/ is decidable and I n eqŒ>.A; B/= I n eqŒ6.B; A/ is also decidable. It concludes the proof. Table 1. The status of Inequality[ c , the equation er.v/ D mv still holds. For every other v 2 U , the values are “lifted”: er.v/ D mv C 1. If extinction ranks are used, then an overapproximation as described above can be avoided by adding an appropriate recurrence condition – more precisely, a generalised transition-Büchi condition. For every rank i , there is a transition-Büchi set Bi which includes all transitions in which i does not occur as a value on the upper level or i is less than the critical value (and thus lifted), see Remark 4.2(2) and Figure 2. Theorem 4.3 (lift construction [9]). The lift construction yields, for every k , a backward deterministic generalised transition-Büchi automaton with at most .k C1/n states outputting, for every leveled DAG of width at most k , the subgraph of its finitary [infinitary] vertices. Note that the above approach is very versatile. If, for instance, one wants to determine the vertices which have at least one descendant with no successors, which one could call weakly finitary vertices, then one can take the same approach, replacing maximisation by minimisation. The above description of the lift construction is somewhat technical because of the measure introduced; a more “automatic” description follows. A state is a sequence P0    Pm 1 of nonempty pairwise disjoint sets of vertices. Assume a letter a (a slice) is read backwards. Then the new state is determined in two steps. First, the sequence P 0 1 P00    Pm0 1 is determined where

202

Thomas Wilke (revised by Sven Schewe)

i. P 0 1 consists of all vertices v on the upper level of a without successors and ii. Pi0 , for i > 0, consists of all such vertices with some successor in Pi , but no successor in Pi 0 for any i 0 < i . Second, the new state is obtained from P 0 1 P00    Pm0 1 by removing all empty entries. The recurrence condition is, again, a generalised transition-Büchi condition: For every i , there are infinitely many transitions with i > m or Pi00 ¤ ; for i 0 < i . The vertices that occur in the states are exactly the finitary ones. The lift construction is used for different purposes in § 5.2. 4.5. Latest appearance records. Given an alphabet A and a special symbol $ not in A, the latest appearance automaton (LAA) is a forward deterministic automaton with states being words over A [ ¹$º, where every letter from A occurs at most once and $ occurs exactly once. One such word is called a latest appearance record (LAR) and the part to the right of “$” is its frame. The initial state of the LAA is the one-letter word $; the recurrence condition is trivial; the transition function ı is defined as follows. When u is a state of the form v $v 0 and a is a letter of the alphabet occurring in vv 0 , say vv 0 D waw 0 , then ı.u; a/ D w $w 0 a. When a does not occur in vv 0 , then ı.u; a/ D vv 0 $a. So the order in which the letters occur in the current state of the automaton is the order of their latest appearances in the prefix of the given word read so far, with all letters in the frame of the current state being the ones that have occurred since the previous occurrence of the letter just read. From this, the following can be derived. Remark 4.4 ([8] and [20]). Consider the frames of maximal length among all frames occurring infinitely often in the run of the LAA on a given word. Then all theses frames contain the same letters and these are exactly the ones occurring infinitely often in the given word. An interesting application of the latest appearance record is the transformation of a given Muller automaton into an equivalent parity automaton. First, the Muller condition is removed (and replaced by the trivial recurrence condition). Second, the automaton is augmented by the trivial output function, which simply outputs the current state. Third, the generated automaton is cascaded with the LAA over the state set of the Muller automaton. Finally, assuming the automaton has n states, a priority function is added that assigns each state hq; v $v 0 i the priority 2n 2jv 0 j if occ.v 0 / is a Muller set and 2n 2jv 0 j C 1 otherwise.

Theorem 4.5 (Muller to parity [32]). For every forward deterministic [nondeterministic] Muller automaton with n states there is an equivalent forward deterministic [nondeterministic] parity automaton with .n C 1/Š states and at most 2n priorities. Remark 4.6. This construction is wasteful: as there is exactly one recurring set occ.v 0 / of maximal length in a run, sets of equal length do not compete. Consequently, one can just as well assign the priority 2n 2jv 0 j 1, or 2n 2jv 0 j ˙ 1, when occ.v 0 / is not a Muller set. This can be used to remove every second odd number from the domain of the priority function. From there, it is easy to reduce the number of priorities

6. ! -Automata

203

to n. For example, when nlis even each state m and V a Muller set, then one can assign  jv0 j ˘ jv 0 j 0 0 1. hq; v $v i the priority n 2 2 if occ.v / is a Muller set and else n 2 2 A refined construction, which saves priorities if possible, is presented in § 8.2.

5. Run DAG’s of Büchi automata Büchi automata, in general, are nondeterministic automata; in other words, there may be several runs of a given Büchi automaton on a given word. These runs have to be considered at the same time if, for instance, one wants to turn a Büchi automaton into a Büchi automaton for the complement of the language recognised, because not to accept means all initial runs are not recurrent. There are essentially two global structures that have been investigated for arranging all runs of a Büchi automaton in a concise way: DAG’s and trees. The former are treated in this section, the latter in the next one. Applications are complementation, determinisation, and disambiguation (defined in § 6.1). Assume a Büchi automaton is given. The run DAG of a given ! -word u is the leveled graph with levels ¹Q¹i ººi 2! and edges hhq; i i; hq 0 ; i C1ii for hq; u.i /; q 0 i 2 . Its width is the number of states of the given automaton. Often, it is useful to think of a run DAG as a graph labelled with elements from Q; in this section, it is sufficient to think of it as providing only information about whether the state component of a vertex is an initial or a Büchi state. Technically, the DAG is labelled with elements from }.¹I; Bº/ and we say it is ¹I; Bº-tagged; if a vertex is labelled with a letter a and I 2 a, we say it is I-tagged, and, analogously, if B 2 a, we say it is B -tagged. A vertex of an ¹I; Bº-tagged DAG is called B -recurring if a path with an infinite number of B -tagged vertices starts in it; it is called B -free if none of its descendants (including itself) is B -tagged. The ultimate width of such a DAG is the limes inferior of the number of non-B -recurring infinitary vertices on a given level. Remark 5.1. An ! -word is accepted by a Büchi automaton if and only if there is an I -tagged B -recurring vertex on level 0 of the run DAG of the word. The main insight needed about ¹I;Bº-tagged DAG’s (or simply ¹Bº-tagged DAG’s) of finite width is that they can be decomposed in a simple manner. Consider the following operation, here called peeling. First, remove all finitary vertices; second, remove all B -free vertices. Peeling does not remove any B -recurring vertex, and if it does not change the DAG at all, then all vertices are B -recurring, because every vertex has a strict B -tagged descendant. Moreover, if there are non-B -recurring infinitary vertices, then peeling decreases the ultimate width by at least one, as explained in what follows. Consider a non-B -recurring infinitary vertex. By König’s lemma [24], there is an infinite path starting with it. Assume that every B -tagged strict descendant of the vertex is finitary. Then, after removing the finitary vertices, each successor of the vertex is B -free, but the infinite path is still there and all of its vertices (except, perhaps, the first one) are removed in the second step, decreasing the ultimate width by one. If there

204

Thomas Wilke (revised by Sven Schewe)

is a strict B -tagged infinitary descendant of the vertex, apply the same argument to it. This cannot continue indefinitely, because a path with an infinite number of B -tagged vertices would be constructed. This all implies the following lemma. Lemma 5.2 (peeling [25]). For every Büchi automaton with n states, peeling the run DAG of any ! -word n times yields the subgraph induced by the B -recurring vertices. This can be used in various ways; in particular, it can be used for complementing Büchi automata, determinising them backward, and showing that alternating Büchi automata can easily be converted into weak alternating automata. See § 5.1, § 5.2, and § 9.5, respectively. To describe these applications, it is useful to have some notation and terminology at hand. By the above, each vertex v in a ¹Bº-tagged DAG of finite width can be assigned a value in ! [ ¹1º according to when the vertex is removed by peeling the DAG successively. More precisely, when i is a natural number and all vertices with value < 2i are removed from the given DAG, the finitary vertices in the remaining DAG get assigned 2i ; when all vertices with value < 2i C 1 are removed, the B -free vertices in the remaining DAG get assigned 2i C 1. The B -recurring vertices get assigned 1. The number assigned to a vertex v is called its canonical rank and denoted c.v/. According to the above, the canonical rank is equal to 1 or smaller than 2n, when n is the width of the DAG. Corollary 5.3. For a Büchi automaton with n states, let c be the canonical rank function of the run DAG of some ! -word : 1. the word is accepted if and only if c.v/ D 1 for some I-tagged vertex v on level 0; 2. equivalently, the word is not accepted if and only if c.v/ < 2n for every I-tagged vertex v on level 0. 5.1. Complementation via canonical ranks. The idea of using ranks or “progress measures” for complementing ! -automata goes back to [23] and has been improved and refined over the years, especially in [25]. The basic idea is to implement Corollary 5.3(2). The starting point is a compilation of properties of the canonical rank function of a given ¹I; Bº-tagged leveled DAG. Property 5.4. Let v be any vertex. If v does not have any successor, let M D 0, else let M be the maximum of all values c.v 0 / for successors v 0 of v . If v is not B -tagged or if M is even, then c.v/ D M ; if v is B -tagged and M is odd, then c.v/ D M C 1.

Property 5.5. For any vertex with an even rank, the number of its descendants with the same rank is finite. In general, a rank function of a leveled DAG of width n with vertex set V is a function f W V ! Œ2n satisfying Properties 5.4 and 5.5 with f instead of c .

Remark 5.6. Any rank function is pointwise greater or equal to the canonical rank function.

6. ! -Automata

205

A complementation construction for Büchi automata can now be based on Corollary 5.3(2) and the following observations. First, there is a forward deterministic automaton with trivial recurrence condition that outputs the part of the ¹I; Bº-tagged run DAG of a given word which is reachable from the I -tagged vertices on level 0. Second, there exists a nondeterministic Büchi automaton that produces for every ¹I; Bº-tagged leveled graph of width at most n the same graph, but with any labelling with numbers from Œ2n such that Property 5.4 is satisfied. Third, using a variant of the breakpoint construction, see Theorem 4.1, a Büchi automaton can be constructed that checks Property 5.5 for a Œ2n-labelled DAG. In other words, a suitable cascade yields a Büchi automaton for the complement of the language recognised by a given Büchi automaton. Theorem 5.7 (complementation via ranks [25]). Complementation via canonical ranks yields, for every Büchi automaton with n states, a Büchi automaton with at most .6n/n states. In [18], the above approach is improved, resulting in an asymptotic upper bound of .0:96 n/n for the number of states, and in [43] a further improvement leads to a construction, which is optimal within a factor of O.n2 / (compared to the lower bound from [60]) and has an asymptotic upper bound of .0:76n/n . 5.2. Backward determinisation via canonical ranks. A second application of canonical ranks is the conversion of a given nondeterministic Büchi automaton into an equivalent backward deterministic generalised transition-Büchi automaton. The idea, which is due to [9], is to use Corollary 5.3(1) and to construct an automaton, which labels the run DAG in a backward deterministic fashion with the values of the canonical rank function. The key to designing such an automaton is the fact that the canonical rank function is the only function on an ¹I; Bº-tagged DAG satisfying Property 5.4 and the following one, Property 5.8. Property 5.8. 1. For every even number i , the set c 1 .¹i º/ is exactly the set of finitary vertices in the sub-DAG without the vertices in c 1 .Œi /. 2. For every vertex v 2 V , if c.v/ > 1 and c.v/ is odd, then v has a descendant v 0 with c.v 0 / D c.v/ 1. 3. In the sub-DAG consisting of the vertices in c 1 .¹1º/, every vertex has a strict B -tagged descendant. This means a backward deterministic generalised transition-Büchi automaton computing the rank function for a run DAG can be constructed as a cascade of two automata:

i. an automaton with a backward deterministic transition function and a trivial recurrence condition that outputs the run DAG of a given word and an assignment to the vertices satisfying Property 5.4; ii. a backward deterministic automaton checking Property 5.8 using adaptations of the lift construction, see Theorem 4.3 and also the subsequent remark on weakly finitary vertices.

206

Thomas Wilke (revised by Sven Schewe)

An automaton equivalent to the given Büchi automaton is obtained when the states, which assign 1 to an I -tagged vertex, are chosen to be initial.

Theorem 5.9 (backward determinisation via canonical ranks [9]). Backward determinisation via canonical ranks yields, for every Büchi automaton with n states, a generalised transition-Büchi automaton with at most .3n/n states. Similar to the savings in the complementation of Büchi automata in [43], this automaton can be cut down to O.n  .2n/n / states when the breakpoints are checked sequentially instead of in parallel.

6. Run trees of Büchi automata Run DAG’s are one way to represent the set of all runs of a Büchi automaton on an ! -word. A different approach, which can serve as a basis for complementation, disambiguation, and forward determinisation, is to use compressed run trees. In a first step towards the definition of the compressed run tree for a given word u with respect to a given Büchi automaton, a labelled binary tree t is defined, using a refined subset construction. Just as in the subset construction, all the states reachable from the initial states are tracked at the same time. The difference is that, in each step, the set of states reachable by reading the next letter is split into the Büchi states and the non-Büchi states: a binary tree emerges. To keep this tree compact, only one occurrence of each state – more precisely, its leftmost occurrence – is kept, that is, the tree is pruned in a straightforward fashion. In the following, when a vertex v is called “to the left” of another vertex v 0 , then this means that v and v 0 are on the same level, that is, jvj D jv 0 j, and there exists i < jvj such that v.j / D v 0 .j / for all j < i , v.i / D 0, and v 0 .i / D 1. The corresponding ordering is denoted by i , consider the rightmost vertex wj on level j such that vj 6lft wj and there is no infinitary vertex v 0 with vj jvj, the vertex wi with origin v . By König’s lemma, there is a rooted path hv0 ; v1 ; : : : i in the tree, which consists of all vertices wi and their ancestors. This path is left-recurring, because otherwise there would be some j with

210

Thomas Wilke (revised by Sven Schewe)

vk 2 vj 1 for all k > j , which is a contradiction to v moving left in infinitely many slices.

There are several ways for an automaton to check whether there is an origin, which moves to the left in infinitely many slices. One is explained in what follows and another one is sketched later. We use the notion of military ordering, denoted if the pattern has been identified, and q for the other parts of the tree. The transitions are given below, where e denotes an arbitrary symbol from †: .q  ; e; q/; .q  qd q  ; e; qd0 /;

.q  ; d; qd /;

.q  qd0 q  ; a; q> /;

.q  q> q  ; e; q> /:

The only final state is q> . 5.2. Relation to ranked tree automata. In this section we use the encodings of unranked trees as introduced above as a tool to study regular tree languages over unranked alphabets. The following equivalence theorem can be shown by simple manipulations of automata. We do not attribute it to specific papers but consider it as folklore. unr , the following conditions are equivalent: Theorem 5.1. Given a language T  T†

 T is recognisable by a hedge automaton;  fcns.T / is regular;  ext.T / is regular.

7. Automata on finite trees

257

Proof. The equivalence can be shown by direct automaton constructions. We illustrate this by showing how to transform a hedge automaton into an automaton for ext.T /. The other constructions follow a similar principle and are left to the reader. Let A D .Q; †; ; F / be an NFHA. We assume that for each a and q there is exactly one rule .La;q ; a; q/. This is no restriction because several rules can be merged by taking the union of the horizontal languages. Furthermore, assume that each language La;q is given by a deterministic finite (word) automaton Ba;q D .Pa;q ; Q; pa;q ; ıa;q ; Fa;q / over the input alphabet Q. We assume that all state sets are pairwise disjoint. For simulating a run of A on an unranked tree t by a run on ext.t/ we only use the states of the automata Ba;q . A subtree of ext.t/ that is rooted at a node u that is the right child of some other node corresponds to a subtree in t , say at node u0 . The simulation is implemented such that on ext.t/ at u there is a state from Fa;q if and only if in the corresponding run on t the state q is at node u0 , and u0 is labelled a. The NFTA A0 D .Q0 ; †0 ; 0 ; F 0 / for ext.T / is defined as follows. S  Q0 D a;q Pa;q .  0 contains the following transitions: – .a; pa;q / for each a 2 † and q 2 Q (These transitions are used to guess at each leaf the state q that is used in a run on the unranked tree at the node corresponding to this leaf.); – .p; p 0 ; @; p 00 / if p; p 00 are both in some Pb;q , p 0 2 Fa;q 0 for some q 0 , and 0 00 ıb;q S.p; q / D p . 0  F D a2†;q2F Fa;q . The construction is illustrated in Figure 8. On the left hand side a run on the unranked tree b.c; d / is shown. The states of the form p are the states of the automata Ba;q for the horizontal languages. In the picture it is shown how they are used to process the states of the form q . Note that the initial states pc;q1 and pd;q2 of Bc;q1 and Bd;q2 must also be final states because q1 and q2 are assigned to the two leaves and hence " 2 Lc;q1 and " 2 Ld;q2 . Furthermore p2 is a final state of Bb;q3 . If q3 2 F , then both runs are accepting because p2 is a final state of A0 .

pb;q3

q1 c pc;q1

q3 b p1

p2 @ q2 d pd;q2

p2

p1 @

pd;q2 d

pb;q3

pc;q1 c b Figure 8. Transferring runs from an unranked tree t to ext.t/.

Based on this theorem we call a language of unranked trees regular if it is the language of some NFHA.

258

Christof Löding and Wolfgang Thomas

Theorem 5.2. The class of regular unranked tree languages is closed under union, intersection, and complement. unr be two regular languages of unranked trees. According to Proof. Let T1 ; T2  T† Theorem 5.1 we obtain that ext.T1 / and ext.T2 / are regular. Since ext is a bijection between unranked trees and ranked trees we obtain ext.T1 / \ ext.T2 / D ext.T1 \ T2 /. By Theorem 2.2 ext.T1 / \ ext.T2 / is regular and therefore also ext.T1 \ T2 /. Another application of Theorem 5.1 yields that T1 \ T2 is regular. Similar arguments work for union and complement.

It is also possible to show Theorem 5.2 directly by giving automaton constructions that are similar to the ones for the ranked case. But because of the horizontal languages these constructions are more technical to write. Finally, we mention that we can also use the encodings to solve decision problems for hedge automata using the results from § 2.3. The translations forth and back from hedge automata to binary tree automata are polynomial, and thus we obtain the same complexity bounds as in § 2.3. Note however, that there is a variety of formalisms for representing the horizontal languages in hedge automata which depend on the application at hand. Of course, this has an influence on the complexity. The above statement on the transfer of complexity bounds from NFTA’s assumes that the horizontal languages are represented by regular expressions or nondeterministic finite automata. A more detailed analysis of algorithms for hedge automata can be found in [15]. The formalism for representing the horizontal languages is also relevant for the notion of deterministic hedge automata and their properties, see Chapter 10 of [30]. 5.3. XML and extended regular tree grammars. Document type definitions for XML documents can be seen on an abstract level as definitions of regular tree languages. In this section we develop this aspect. This requires to extend the concept of regular tree grammar, introduced in § 3 above, from ranked to unranked trees. In particular, we need a mechanism for horizontal recursion allowing to generate trees of unbounded width. We start from the normalised regular tree grammars as mentioned above (after Example 3), where a rule has the form X ! a.X1 ; : : : ; Xm / if a is of rank m. We now allow a regular set of words over N in place of the single word X1    Xm . An extended regular tree grammar is of the form G D .N; †; S; P /, where † is the unranked alphabet (the letters are also called terminal symbols), N is a finite set of nonterminals, S 2 N is the start symbol, and P is a finite set of rules of the form X ! a.r/, where a 2 † and r is a regular expression over N . The right-hand side of each rule defines a set of trees of height 1 (or height 0 in case " is in the language defined by the regular expression). A derivation of such a grammar starts with the nonterminal S and in each step replaces a nonterminal X with a tree a.w/ if there is a rule X ! a.r/, and w 2 N  is in the language defined by r . The language unr that can be generated in this way (in T .G/ defined by G is the set of all trees in T† finitely many steps).

7. Automata on finite trees

259

Example 5.2. The language from Example 5.1 is generated by the grammar with start symbol S> and the following rules: S> ! e.X  S> X  /; S> ! a.X  Yd X  /;

for each e 2 †.

Yd ! e.X  Xd X  /; Xd ! d.X  /;

X ! e.X  /;

The translation between regular tree grammars and hedge automata is rather simple. In particular, the definition of extended regular tree grammars that we have given here is very close to the definition of hedge automata: the nonterminals are in correspondence to the states of the automaton, and the production rules correspond to the transitions (compare Examples 5.1 and 5.2). Remark 5.3. The languages that can be generated by extended regular tree grammars are exactly the regular tree languages. By imposing restrictions on the grammars one can obtain common languages that are used for defining types of XML documents. Such an analysis is given in [33] (see also [15]). Document type definitions (DTD) correspond to local grammars, in which nonterminals can be identified with the symbols from †, and the rules are of the form a ! a.r/, i.e., each nonterminal generates the terminal symbol it corresponds to. In XML Schema the nonterminals are typed versions of the terminals, and the rules are of the form a.i / ! a.r/, where the superscript on the left-hand side of the rule indicates the type of the nonterminal. This itself is not yet a true restriction for regular tree grammars, only a naming convention for nonterminals. Regular grammars following this convention are also referred to as extended DTDs (EDTDs). XML Schema corresponds to so-called single type EDTDs: the regular expressions on the right-hand side of the rules are restricted such that they do not produce words that contain two different types of the same symbol. For example, the rule a.1/ ! a.b .1/ b .2/ / is not allowed because the word b .1/ b .2/ contains two different types of b . Based on these restrictions it is possible to obtain more efficient algorithms. A detailed analysis of XML Schema and single type EDTDs is given in [31] (where extended DTDs are called specialised DTDs).

6. Classification of regular tree languages As demonstrated in the first sections of this chapter, many concepts and facts on tree automata arise by a smooth generalisation of the theory of finite automata over finite words. This has served (and still serves) as a motivation to do this “lifting” also for the classification theory of regular languages. In this classification theory, a number of natural restrictions of the concept of regular language are studied, often starting from special formats of regular expressions. Let us briefly recall some prominent cases. (More details are given in Chapter 29 of this work.)

260

Christof Löding and Wolfgang Thomas

The calculus of regular expressions (over a given alphabet), involving symbols for union, concatenation, and Kleene iteration, may be extended by the Boolean operations intersection and complement without enlarging the expressive power. An interesting language class arises when the Kleene star is disallowed but complementation is included. One obtains the class of star-free languages. The fundamental nature of this class is confirmed by natural characterisations in logic, automata theory, and algebra: first, a language is star-free if and only if it is definable in first-order logic FO[ ia ; ka > ib (vertical adjacency is similarly defined); they are adjacent if they are horizontal or vertical adjacent. Since the pixels on the picture boundary play often a special role for recognition, it is convenient to surround them by a frame: for a picture p , the bordered picture pO of size jpjrow C 2, jpjcol C 2 is obtained by surrounding p with the special boundary symbol # 62 A. The domain of pO is ¹0; 1; : : : ; jpjrow C 1º  ¹0; 1; : : : ; jpjcol C 1º. We let Jp K denote the set of all size-.2; 2/ subpictures, referred to as tiles, of the bordered picture pO . Definition 2.2. A 2D or picture language over A is a subset of ACC .

9. Two-dimensional models

305

In the following, the term “language” always stands for picture language, and string or 1D languages are qualified as such; moreover, if D denotes some kind of picturedefining device, then L.D/ denotes the class of corresponding picture languages. Definition 2.3. Let p; q 2 ACC . The horizontal (or column) concatenation,denoted by p ȅ q , and the vertical (or row) concatenation, denoted by p q , are partial operations, defined only if jpjrow D jqjrow and if jpjcol D jqjcol , respectively, by pȅq D

p

q

;

p q D

p q

:

Let L1 ; L2  ACC . We extend the concatenations to languages as follows: and

L1 ȅ L2 D ¹x ȅ y j x 2 L1 and y 2 L2 º

L1 L2 D ¹x y j x 2 L1 and y 2 L2 º:

By iteration we also obtain the horizontal (or column) and vertical (or row) closures. Definition 2.4. Let L  ACC ; the horizontal closure of L (denoted by LCȅ )S and the vertical closure of L (denoted by LC ) are defined respectively as LCȅ D i Li ȅ S and LC D i Li , where L1 ȅ D L; Lnȅ D L.n 1/ȅ ȅ L and L1 D L; Ln D L.n 1/ L.

Other operations occasionally considered are picture rotation by an angle multiple of 90 degrees, and horizontal and vertical reversal. For every a 2 A, we still use a to denote both the picture of size .1; 1/ containing symbol a and the language containing only picture a. That is, we shall omit brackets and write am;n , aCC , aCȅ , aC , ACȅ , AC . 2.1. A small collection of examples. Here we collect some examples of languages to be used later to illustrate different devices for defining pictures, as well as to separate classes of languages.  The language of a-homogeneous squares: L(1) D ¹p 2 aCC j jpjrow D jpjcol > 1º:

(1)

 The language of squares with 1 in the main diagonal and 0 elsewhere: L(2) D ¹p 2 ¹0; 1ºCC j jpjrow D jpjcol and p.i;j / D 1 if and only if i D j º:

(2)

 The set of pictures with exactly one 1 pixel: L(3) D ¹p 2 ¹0; 1ºCC j there exists a unique couple.i; j / 2 dom.p/ such that p.i;j / D 1º:

(3)

306

Stefano Crespi Reghizzi, Dora Giammarresi, and Violetta Lonati

– The languages of pictures over ¹a; bº with two or more columns, satisfying some equality between columns: L(4) D ¹p j the first column is equal to the last oneº (4) D ¹p j jpjcol > 2 and p.1;1Ijpjrow ;1/ D p.1;jpjcol Ijpjrow ;jpjcol / ºI L(5) D ¹p j the first column is equal to another oneºI

(5)

L(6) D ¹p j the last column is equal to another oneºI

(6)

L(7) D ¹p j there exist two equal columnsº

(7)

– The language of pictures such that each column is an even palindrome: L(8) D ¹p j for all i there exists u 2 ¹a; bº such that p.1;i Ijpjrow ;i / D u:uR º: – The language of 2  2 grids with distinct homogeneous meshes:

L(9) D ¹p 2 .ah;i ȅ b h;j / .c k;i ȅ d k;j / j h; i; j; k > 1º:

(8)

(9)

2.2. Languages defined by regular expressions. A simple language family can be defined by means of 2D regular expressions (RegExp) i.e., formulas that use set union, vertical and horizontal concatenations, and their closures. To illustrate, consider the expression: .aCC [ b C / ȅ c ȅC . It defines single-row pictures of the forms a ::: a c ::: c

or

b c ::: c :

The descriptive capacity of such expressions is rather limited: very simple languages that are not in L.RegExp/ are the sets of square pictures (for example L(2) or L(1) ), see [31]. In general, there is no way to impose mathematical relations between height and width by means of regular expressions. On the other hand, regular expressions are able to define surprisingly subtle constraints. Example 2.1. The regular expression 0CC .0Cȅ 1 0Cȅ / 0CC defines the set of pictures having exactly one 1 pixel, which moreover is not on the boundary. The set L(3) , containing pictures with exactly one 1 pixel, is defined by the union of the previous regular expression together with other regular expressions like .0Cȅ 1 0Cȅ/ 0CC :

Example 2.2. The set of pictures whose first column equals the last one, and the set of pictures containing two equal columns, are respectively defined by the following formulas: [ .a ȅ ACȅ ȅ a/C and L(7) D ACC ȅ L(4) ȅ ACC : L(4) D a2A

By definition, L.RegExp/ is closed under union. On the other hand, while string regular expressions are closed under complement and intersection, not even the second property holds for L.RegExp/, see [31].

9. Two-dimensional models

307

2.3. Local languages. The family of local picture languages is very basic and plays a role in the definition of more complex language families. Definition 2.5. A language L  ACC is local if there exists a finite set ‚ of allowed tiles over A [ ¹#º such that L D ¹p 2 ACC j Jp K  ‚º; we also write L D L.‚/. The family of local picture languages is denoted by LOC. Notice that LOC generalises to 2D the family of 1D local languages (see e.g., [18]), which are specified by the allowed length-2 substrings, and the initial and final letters.

Example 2.3. Language L(2) (squares with 1 in the main diagonal and 0 elsewhere) is local: L(2) D L.‚/ where ‚ is u } 1 0 0 0 ´ w  1 0 0 0 0 1 0 0 # 1 # 0 0 0 0 1 w 0 1 0 0  ; ; ; ; ; ; ; ; w D 0 1 1 0 0 0 0 0 # 0 # 0 # # # # v 0 0 1 0 ~ µ 0 0 0 1 0 # 0 # # # # # # # # # # 0 1 # : ; ; ; ; ; ; ; 1 # 0 # 0 0 1 0 # 1 0 # # # # # Notice that, without the presence of a distinct letter in the diagonal, it would be impossible to constrain the shape to be a square; therefore, L(1) , the set of a-homogeneous squares, is not a local language. Proposition 2.1. The families L.RegExp/ and LOC are incomparable. Proof. The set L(2) is local by the previous example, but is not in L.RegExp/ because regular expressions cannot enforce a numerical relation between picture width and height. Vice versa, L(3) , the set of pictures with exactly one 1 pixel, is in L.RegExp/ by Example 2.1, but is not a local language: clearly, the tiles needed to cover a picture containing just one 1 would enable also to cover the pictures with two or more far distant 1’s. Proposition 2.2. If L 2 LOC, then the complement of L is in L.RegExp/.

Proof. Any picture p in the complement of L contains a tile t not in the tile set ‚ representing L. Reasoning as in Example 2.1, it is straightforward to write the regular expression for the language of pictures containing t . Hence, the complement is the union of such regular expressions, for all tiles not in ‚. Concerning the closure properties, LOC is remarkably similar to the family of local string languages [27]. Proposition 2.3. The family LOC is closed under intersection. It is not closed under union, complement, vertical and horizontal concatenations and their closures. Proof. For intersection, it suffices to take as tile set the intersection of the original tile sets. For concatenation, consider the set of pictures with exactly one 1 pixel in the first row, but not in position .1; 1/, which is not local for the same reason as L(3) . It can be expressed as the horizontal concatenation La ȅ Lb , where La D 0CC and Lb is

308

Stefano Crespi Reghizzi, Dora Giammarresi, and Violetta Lonati

the set of pictures having exactly one 1 pixel at the top-left corner, which clearly are both local languages. For union, consider the set Lc of pictures with non-adjacent 1’s and, similarly, the set of pictures with non-adjacent 2’s on a background of 0’s: they are both local languages but their union is not, for the same argument used to prove Proposition 2.1. For complement, consider the language La over alphabet ¹0; 1º; its complement contains in particular Lc , whose tiles would cover also La .

3. Tiling recognition This section presents the family REC of tiling recognisable languages, introduced by Giammarresi and Restivo in [19]. The aim was to define a class which extends into 2D the family of string languages recognised by finite automata and inherits many properties from the corresponding definition for strings. The definition of REC is based on a characterisation of 1D finite automata given in terms of local string languages and projection (cf. [18]). Actually, the graph of a finite automaton can be described as follows: the set of edges is described by a finite set € , the edge adjacency is given by a set ‚ of length-2 strings over € , a projection W € ! A gives the edge labels; initial and final edges can be marked in a special way in order to identify the accepting paths. A word of L.‚/ corresponds to an accepting path in the graph and its projection by  gives the actual word accepted by the automaton. This setting is naturally generalised to 2D by using local picture languages, with the merit that, for pictures of size .1; n/ (or size .n; 1/), the definition of recognisability reduces to the definition of recognisability for strings. A local picture language together with an alphabetic projection is called tiling system. REC is a robust class, which can be defined using also other approaches, summarised in § 3.2; its closure properties are collected in § 3.3; § 3.4 discusses some necessary conditions for recognisability; § 3.5 concludes our survey on tiling recognisability addressing the issue of 2D determinism. REC will be compared with picture grammars in § 5. 3.1. The family REC. Let € and A be finite alphabets. Given a mapping W € ! A, to be termed projection, we extend  to pictures, respectively over € and A, by p D .p 0 / such that p.i; j / D .p 0 .i; j // for all .i; j / 2 dom.p 0 /. p 0 is called the pre-image of p . Definition 3.1. A tiling system (TS) is a quadruple T D .A; €; ‚; / where A and € are finite alphabets, ‚ is a tile set over € [ ¹#º, and W € ! A is a projection. The local language over an auxiliary alphabet and the projection are used in the next definition. Definition 3.2. A language L  ACC is recognised by a tiling system T D .A; €; ‚; / if L D .L.‚//. We also write L D L.T /. The family L.TS/ of all tiling recognisable languages is usually denoted by REC.

9. Two-dimensional models

309

Example 3.1. L(1) , the language of squares over the one letter alphabet A D a, is recognisable by the TS where € D ¹0; 1º and ‚ is the tile set defined in Example 2.3. Notice that L.‚/ D L(2) . Example 3.2. L(4) , the set of pictures whose first column is equal to the last one, is in REC. Informally, we can define a local language where information about first column symbols of a picture p is horizontally propagated, by means of subscripts, to match the last column of p . More precisely, take € D ¹xy j x; y 2 Aº and a projection such that .xy / D x . Hence, tiles are defined with same subscripts within a row while, in the left-border and in the right-border tiles, subscripts and main symbols should match. So, the tiles on the left border, those on the right border, and the “middle tiles” must be of the form # zz zz # xz sz ; ; and ; # tt tt # yt rt respectively. Below it is an example of a picture p 2 L(4) together with its pre-image p 0 : b b a b b pD a a b a a ; b a a a b

bb bb ab bb bb p 0 D aa aa ba aa aa : bb ab ab ab bb

The natural subclass of unambiguous tiling recognisable 2D languages, introduced in [19], is defined next. Definition 3.3. A tiling system T D .A; €; ‚; / is unambiguous if for any picture p 2 L.T / there exists a unique local picture p 0 2 L.‚/ such that p D .p 0 /. A recognisable language L  ACC is unambiguous if it can be recognised by an unambiguous tiling system. UREC denotes the family of unambiguous recognisable languages. Example 3.3. It is not difficult to verify that languages L(1) and L(4) of Examples 3.1 and 3.2 are in UREC. Observe that, by recalling the correspondence between tiling system and finite automata in 1D discussed above, an unambiguous tiling system can be viewed as a generalisation in 2D of the definition of unambiguous automaton for string languages. But, in contrast to the string case, it is undecidable whether a given tiling system is unambiguous [3]. Proposition 3.1. LOC  UREC  REC. The proposition is easily proved by definition and Example 3.1. Actually, we shall see in Theorem 3.8 that the second inclusion is also proper. 3.2. Other models characterising REC. In this section we mention other models that define REC, thus substantiating the robustness of the definition of “two-dimensional finite state recognisability” via tiling systems.

310

Stefano Crespi Reghizzi, Dora Giammarresi, and Violetta Lonati

We consider first a model based on a variant of Wang tiles [52] introduced in [16]. A Wang tile is a unitary square with colored edges. Color represents compatibility: two tiles may be adjacent only if the color of the touching edges is the same. Although Wang tiles are not considered as a device for specifying picture languages, one may use them to define a local language over the edge alphabet, with pixels arranged on a rectangular grid. To obtain a language-defining device in our sense, one has to move to labelled Wang tiles. A labelled Wang tile is a Wang tile bearing also a label; a set of such tiles is called Wang system. More formally, given a finite alphabet C of colors, containing the special symbol # representing borders, and a finite alphabet A of labels, a labelled Wang tile is a quintuple .n; s; e; w; x/ 2 C 4 A. Intuitively, n; s; e; w represent the colors respectively at the top, bottom, right, and left of the tile, whereas x is the label of the tile. For better readability, we represent the labelled Wang tile .n; s; e; w; x/ as n w

e:

x s

Given ˆ  C 4  A, a Wang-tiled picture over ˆ is any picture in ˆCC such that adjacent pixels are compatible, also considering borders, as in the following example, where A D ¹0; 1º and C D ¹c1 ; c2 ; c3 ; c4 ; #º: # #

#

0

# c4 c4

1

c1

c3

c1

c3

1

#

c2 c2

0

# 2 ˆ2;2 :

#

#

The label of a Wang-tiled picture P over ˆ is the picture over A having for pixels the labels of pixels of P . For instance, the label of the picture above is 0 1 : 1 0

A Wang system is a triple .C; A; ˆ/ and it generates the language over A of all labels of Wang-tiled pictures over ˆ. In [16] it is proved that the family of languages generated by Wang systems is exactly REC: the colors of the sides of the labelled Wang tiles play the same role as the symbols of the local alphabet of a corresponding tiling system. If p is a picture recognised by a given Wang system, then it corresponds to a Wang-tiled picture P , while it corresponds to a local picture p 0 when it is recognised by a tiling system. Observe that P and p 0 are conceptually similar since both are somehow enriched

9. Two-dimensional models

311

versions of p that satisfy local conditions. Roughly speaking, while P has the extra symbols on the edges of the cells, p 0 has its extra symbols inside the cells. We also mention another 2D model using Wang tiles, namely the quadrapolic automaton, introduced in [10]. Another interesting variant of tiling systems are the domino systems introduced in [26] (see also [20]). They are based on hv-local picture languages. The formal definition can be obtained in the spirit of Definition 2.5 by replacing “tiles” by “dominoes,” which are pictures of two kinds: horizontal dominoes of size .1; 2/ and vertical dominoes of size .2; 1/; the pre-image in this case must be covered by horizontal dominoes and by vertical dominoes. While it can be proved that the family of hv-local picture languages is strictly contained in LOC, when we put projection on top to define domino systems we obtain that the family of languages generated is exactly REC. Notice that, from a computational point of view, this corresponds to the fact that the horizontal and the vertical scanning of an input picture can be done separately. As a consequence, a recognisable language can be represented by two regular string languages, resp. associated with the rows and the columns, and a mapping. For instance the language L(1) of all a-squares can be represented by the horizontal and vertical string languages 0 10 and the projection 0 ! a and 1 ! a: A completely different characterisation of family REC is given via logic formulas (cf. [20]). The logic formalism to describe words is “translated” into 2D in a formalism to describe labelled grid graphs. A picture p of size .m; n/ over A can be represented by the signature p D .dom.p/; S1 ; S2 ; .Pa /a2A / where S1 and S2 are the successor reN components of a point in dom.p/, and Pa D ¹.i; j /j p.i;j / D aº, for lations for the two a 2 A, gives the set of points in dom.p/ that are labelled with a. Moreover, properties of pictures can be described by first-order and monadic second-order formulas, using first-order variables x; y; z; x1 ; x2 ; : : : for points of dom.p/, i.e., positions, and monadic second-order variables X; Y; Z; X1 ; X2 ; : : : for sets of positions. It can be shown, for example, that the set of square pictures cannot be described by a first-order sentence. In [22] it is shown that the family of languages recognised by tiling systems and the family of languages defined by existential monadic second-order formulas coincide. This characterisation shows that the computational model of tiling systems has also a descriptive model counterpart and it generalises to 2D Büchi’s theorem for recognisable string languages. Remark that in 1D the characterisation is in terms of monadic second-order formulas because regular string languages are closed under complement; on the other hand, in 2D this closure property does not hold (see Theorem 3.5), hence the term existential is mandatory, since an alternation of existential and universal monadic quantifiers would lead outside REC. If the alphabet has only one symbol, a characterisation of REC in terms of computational complexity is given in [7]: this class corresponds to a class of languages recognised by linearly space-bounded one-tape Turing machines with certain constraints on the number of head reversals; a similar correspondence holds between UREC and unambiguous Turing machines.

312

Stefano Crespi Reghizzi, Dora Giammarresi, and Violetta Lonati

Regarding approaches in terms of automata, we mention the 2D on-line tessellation automaton (OTA) introduced by K. Inoue and A. Nakamura [24]. OTAs are defined as a restricted type of 2D cellular automaton in which cells do not make transitions at every time step. Rather a “transition wave” sweeps diagonally across the array. Each cell changes its state depending on the two neighbors to the top and to the left. A run of a OTA on a picture p assigns a state (from a finite set) to each position .i; j / of p . Such state depends on the states already associated with positions .i 1; j / and .i; j 1/ and on symbol p.i;j / . At time t D 0 an initial state q0 is associated with all the positions of the first row and of the first column of pO . The computation involves jpjrow C jpjcol 1 steps. It starts at time t D 1 by reading p.1; 1/; at time t D 2, states are simultaneously assigned to positions .1; 2/ and .2; 1/, and so on, to the next diagonals. Picture p is recognised if there exists a run such that the state assigned to position .jpjrow ; jpjcol / is final. Notice that OTAs reduce to standard string automata when we restrict them to operate on one-row pictures. In [25] (also see [20]) it is proved that the class of languages accepted by OTA and REC coincide and therefore tiling systems can simulate a machine device. 3.3. Closure properties. Family REC is next shown to be closed with respect to several operations, thus reinforcing the analogy with the 1D case. More detailed proofs of the following theorems are in [20]. Theorem 3.2. REC is closed under rotation, horizontal and vertical reversal, concatenations and their closures. Proof. For rotation and reversals, it suffices to rotate and mirror all tiles in the original tile set. For horizontal concatenation, let L1 ; L2 2 REC. We choose the corresponding TSs .A; €1 ; ‚1 ; 1 / and .A; €2 ; ‚2 ; 2 / with €1 and €2 disjoint. Then a TS .A; €; ‚; / for L1 ȅ L2 is defined with € D €1 [ €2 while set ‚ contains all tiles from ‚1 and ‚2 excluding right-border tiles of L1 and left-border tiles for L2 , plus a set of tiles that “concatenate” rightmost columns of L1 ’s pictures to leftmost columns of L2 ’s pictures. The projection  coincides with 1 and 2 when restricted to €1 and €2 , respectively. Remark that not all tiles will necessarily occur in pictures in L1 ȅ L2 (actually it may be L1 ȅ L2 D ;). A TS for the horizontal closure of a language L can be found with the same technique by considering two disjoint-alphabet TSs for L. Similar techniques apply for vertical concatenation and its closure. Theorem 3.3. REC is closed under union and intersection. Proof. Let L1 ; L2 2 REC. We choose the corresponding TSs with disjoint local alphabets €1 and €2 . Then the tile set for L1 [ L2 is simply the union of the tile sets for L1 and L2 . For L1 \ L2 the construction is more involved: the local alphabet is the

9. Two-dimensional models

313

set of pairs €1  €2 and the tile set contains all elements .a1 ; b1 / .a3 ; b3 /

.a2 ; b2 / .a4 ; b4 /

such that

a1 a2 a3 a4

and

b1 b2 b3 b4

are tiles for L1 and L2 , respectively. From the previous results, we obtain some other examples of languages in REC. Example 3.4. We consider language L(4) 2 REC (Example 3.2) and L(5) , the language of pictures over A D ¹a; bº such that the first column is equal to some other column. Since L(5) D L(4) ȅ ACC , L(5) is also in REC. Similar reasoning proves that the following languages are in REC: L(6) D ACC ȅ L(4) (pictures with the last column equal to some other one), and L(7) D ACC ȅ L(4) ȅ ACC (pictures with two occurrences of the same column). Theorem 3.4. L.RegExp/  REC and the inclusion is proper.

Proof. The inclusion follows from the fact that REC is closed under union, horizontal and vertical concatenations, and their closure. Strictness is witnessed by the set L(1) of square pictures. One feature that distinguishes recognisable languages in 1D and 2D is the complement operation. In a different setting, K. Inoue and I. Takanami (cf. [25] and [24]) proved that REC is not closed under complement. In [19] and [20] a combinatorial proof of this fact is given, which refers directly to TSs. Here we reproduce the proof using a different language, which will be needed later. Theorem 3.5. The family REC is not closed under complement. Proof. Let L D L(8) be the language of pictures whose columns are even-length palindrome strings (for brevity, palindromes). By contradiction, suppose that L 2 REC, that is L is a projection of a local language L0 over an alphabet € . A counting argument will show that this leads to a contradiction. Let k and be the sizes of the alphabets A and € , respectively. Let p R denote the vertical reversal of a picture p , and for n > 1 let Ln D ¹p 2 ACC j p D s s R where jsj D .n; n/º: 2

The number of pictures in Ln is k n . Let L0n be the set of pictures in L0 (over € ) whose projections are in Ln . For the stripes over € of size .2; n/ consisting of the n-th and .n C 1/-st rows in the rectangles of L0n there are at most 2n possibilities. Taking n 2 sufficiently large, it is k n > 2n and there exist two different pictures p D sp spR and q D sq sqR in Ln (with sp ¤ sq ) such that the pre-images p 0 D sp0 sp00 and q 0 D sq0 sq00 in L0n have the same stripes consisting of the n-th and .n C 1/-st rows. This implies that, by definition of local languages, also the picture sp0 sq00 belongs to L0n and therefore its image sp sqR belongs to Ln , a contradiction.

314

Stefano Crespi Reghizzi, Dora Giammarresi, and Violetta Lonati

We now prove that the complement of L, CL, is in REC. Clearly CL D L1 [ L2 , where and

L1 D¹p 2 ACC j jpj D .2m C 1; n/; m > 0; n > 0º 2 REC; L2 D¹p 2 ACC j jpj D .2m; n/; m; n > 0 and some column is not a palindromeº:

L2 can be written as follows:

L2 D ACC ȅ .L4 .ACC ȅ L3 ȅ ACC / L4 / ȅ ACC ;

where L3 is the set of one-column pictures (with even height) such that the top symbol is different from the bottom one, L4 is the set of squares over A, and both are in REC (for L4 one can use the same technique used in Example 3.1 for the set of a-homogeneous squares). This shows that L2 2 REC and consequently CL is recognisable. The previous result prompts to introduce the following class of languages.

Definition 3.4. Co-REC is the class of languages whose complement is in REC. Clearly, REC  REC [ Co-REC and the inclusion is proper.

As far as UREC is concerned, it is known that it is closed under intersection [3] and disjoint union [33]. However, it is not closed under horizontal and vertical concatenations and their closures [3]. The closure of UREC under complement is still an open question that was also stated in [37] in a graph-theoretical framework. 3.4. Necessary condition for tiling recognisability. The classes UREC, REC, and REC [ Co-REC were introduced in the previous sections. Now we provide some necessary conditions for a language to be in these classes, obtained by combining an approach originally proposed by Matz [32] and a technique based on Hankel matrices [3] and [21]. As a consequence, we get examples of languages that separate UREC, REC and REC [ Co-REC. Matz’s technique reduces 2D languages to string languages over the alphabet of the columns. More precisely, let L  ACC and, for any m > 1, consider the subset L.m/  L containing all pictures with exactly m rows, viewed as a string language over the alphabet Am;1 of the columns (i.e., words in L.m/ have a “fixed height m”). Applying the transformation from the representations by local languages and projections to finite automata, in [32] the following lemma is proved. Lemma 3.6. Let L 2 REC. Then there is a k > 1 such that for all m > 1 there is a finite (string) automaton with k m states that accepts the string language L.m/. In the proof, k is the cardinality of the local alphabet for L, and the local alphabet for L.m/, i.e., the set of local columns of height m, has cardinality k m . If S is a string language, the Hankel matrix of S is the infinite Boolean matrix MS D ka˛ˇ k˛2A ;ˇ 2A , where a˛ˇ D 1 if and only if ˛ˇ 2 S . Given a picture language L, consider the following complexity functions defined in terms of the Hankel

9. Two-dimensional models

315

matrices of string languages L.m/: KL .m/ gives the rank of ML.m/ ; RL .m/ gives the number of distinct rows of ML.m/ ; PL .m/ gives the size of a maximal permutation matrix that is a submatrix of ML.m/ (a permutation matrix is a Boolean matrix that has exactly one 1 in each row and in each column). Notice that, since every regular language has a finite index (Myhill–Nerode theorem), the number of different rows of ML.m/ is finite, hence all these functions assign an integer value to any integer m. The following theorem summarises some necessary conditions concerning complexity functions. Theorem 3.7. Let L be a picture language. 1. If L 2 UREC, then there exists k such that KL .m/ 6 k m ; 2. if L 2 REC, then there exists k such that PL .m/ 6 k m ; m 3. if L 2 REC [ Co-REC, then there exists k such that RL .m/ 6 2k and the bound is tight. Proof. We present only the proof of the first item; the other proofs can be found in [21]. Let L 2 UREC and let .A; €; ‚; / be an unambiguous tiling system for L. Then the automata defined as in Lemma 3.6 for string languages L.m/ will result unambiguous too. Therefore there exists a k such that, for all m > 1 the string language L.m/ is accepted by an unambiguous (string) automaton with k m states. In [23] it is proved that, for every regular string language S  A , the number of states of an unambiguous automaton recognising a language S is at least the rank of matrix MS , and this concludes the proof. We remark that item (2) of the previous theorem was first proved in [32]. In [5] it is proved that the converse of item (2) and (3) does not hold, i.e., the conditions are necessary but not sufficient for tiling recognisability. Another necessary but not sufficient condition for REC is given in [6]. The following separation result can be proved by applying the previous criteria. Theorem 3.8. UREC  REC  REC [ Co-REC and all inclusions are proper.

Proof. Consider language L D L(7)  ACC , containing all pictures that have two occurrences of some column. L is in REC by Example 3.4. In [3] the rank of ML.m/ m is computed, obtaining KL .m/ D 2k C 1, where k is the cardinality of A. Hence, by Theorem 3.7 (1), L is not in UREC. The separation between REC and REC [ Co-REC has already been shown in Theorem 3.5. Another example, obtained in [3] by applying Theorem 3.7 (2), consists of the complement of the language of pictures where all columns occur at least twice. A relevant consequence of the previous argument is that UREC is not closed under horizontal concatenation (since L(7) D ACC ȅ L(4) ȅ ACC ). Also notice that the separation between classes UREC and REC still holds in the case of a one-letter alphabet [4]. To the best of our knowledge, all known examples of languages in REC n UREC are such that their complement is not in REC. Indeed, an open question, already stated

316

Stefano Crespi Reghizzi, Dora Giammarresi, and Violetta Lonati

in [37], is whether L 2 REC and CL 62 REC imply L 62 UREC. Clearly such question is related to the question whether UREC is closed under complement or not, which is also open. 3.5. Determinism. TSs are implicitly nondeterministic: in the 1D case, a TS reduces in general to a non-deterministic string automaton, unless further constraints are imposed on the tile set. However, while in 1D deterministic and nondeterministic finite automata are equivalent, REC is clearly a nondeterministic class, as confirmed by the nonclosure of REC under complement, and the fact that parsing in REC is NP-complete, as proved in [27], where TS are called homomorphisms of local lattice languages. This fact motivates the search for reasonably “rich” deterministic subsets of REC, which would allow linear time parsing with respect to the picture size. Unfortunately, the concept of determinism for picture languages is still far from being well understood. Here we present some deterministic subclasses of REC, that are defined by considering prefixed strategy to visit the pictures. We then discuss some automata models that have been proposed recently to provide viable deterministic 2D recognition. The deterministic versions of OTA [24] (see § 3.2) defines one of the earliest deterministic subclasses of REC; in a DOTA, pixels are visited exactly once according to a pre-fixed scanning strategy directed from top-left corner to bottom-right corner. The DOTA model inspired the notion of determinism of [2], which relies on four diagonalbased scanning strategies, each starting from one of the four corners. We will call the corresponding deterministic class Diag-DREC.1 Consider a scanning strategy that respects the tl2br direction (from the top-left corner to the bottom-right corner): any position .i; j / is read only if all the positions that are above and to the left of .i; j / have already been read. An example of tl2br strategy is obtained by scanning pictures row by row, from left to right, and from top to bottom. Roughly speaking, tl2br determinism means that any accepted picture p 2 †CC admits exactly one pre-image, which can be build deterministically pixel by pixel when scanning p with any such strategy. The formal definition follows. Definition 3.5. A tiling system .A; €; ‚; / is tl2br-deterministic if for any a, b , c 2 € [ ¹#º and x 2 † there exists at most one tile a b c d

with .d / D x (similar definitions apply to the other three corner-to-corner cases). There exist languages that are recognised by TS that are tl2br-deterministic, but not d -deterministic, for d ¤ tl2br. A recognisable 2D language L is diagonaldeterministic, if it admits a d -deterministic tiling system for some corner-to-corner direction d . Diag-DREC is the corresponding language family [2]. 1 The original name is DREC.

9. Two-dimensional models

317

In the 1D case, there are two “natural” strategies to scan a string (from left to right, and from right to left) corresponding to the ways words are written in natural languages. Finite string automata that follow either strategy are equivalent, since the classes they define (regular and co-regular string languages) coincide. Quite differently, in 2D the choice of a scanning strategy is somehow arbitrary. Actually, the eye movement of an observer viewing a picture may follow rather irregular trajectories. This suggests defining larger deterministic subclasses of REC by considering scanning strategies more complex than the corner-to-corner ones, as recently done, for instance, in [1], [11], [28], [29], and [30]. For instance, Definition 3.5 is extended in [28] to consider boustrophedonic scanning strategies, like the one starting from the top-left corner and scanning the first row of a picture rightwards, then the second row leftwards, and so on. The corresponding class is named Snake-DREC. It is proved that this notion of determinism coincides with line unambiguity, an intermediate notion between diagonal determinism and unambiguity, embodied in classes Row-UREC or Col-UREC, and based on backtracking at most linearly in one dimension of the picture [2]. In a further attempt to explore determinism in REC, [29] defines polite scanning strategies, that visit each position in such a way that the next position to scan is always adjacent to the previous one, and depends only on two pieces of information: which neighboring positions have already been visited, and which direction we are moving from. Examples of such scanning strategies are those following the boustrophedonic order, spirals, and others. This leads to a proper subclass of REC, called Scan-DREC, which is closed under complement, rotation and reversal operations. The relations among the above mentioned classes is summarised as follows. Diag-DREC  Snake-DREC D Row-UREC [ Col-UREC  Scan-DREC  UREC:

The above definitions of determinism within REC are not fully convincing, since imposing a prefixed scanning strategy for tiling systems is somehow a stretch with respect to their intrinsic non-oriented nature. Indeed, the notion of determinism is more natural in the framework of automata. The oldest automaton model for pictures is the four-way automaton (4NFA) of Blum and Hewitt [8], that generalises the two-way 1D automaton by allowing the reading head to move left, right, up and down within the picture. Differently from the case of tiling systems, where the pixels in the picture can be examined in any order, a run of 4NFA walks along a path traversing picture pixels; such path scans the picture or parts of it, guided by the transition function. Although 4NFA are a very natural generalisation of finite automata to two dimensions, they do not preserve most of the important properties that finite automata have for strings: in particular, L.4NFA/ is not closed under row and column concatenation and under closure operations [8]. In the deterministic version, for every run the path is unique. The deterministic version of the model is less powerful than the non-deterministic one.

318

Stefano Crespi Reghizzi, Dora Giammarresi, and Violetta Lonati

More recently, new models of automata have been proposed with the aim of defining deterministic classes of picture languages recognisable in polynomial time: tiling automata [1], Wang automata [29], restarting tiling automata [41], sgraffito automata [42], and ordered restarting automata [36]. All of them have in common the following properties: they all perform some kind of rewriting steps on pixels (similarly to one-tape Turing machines on strings); they recognise exactly regular languages when restricted to strings (i.e., pictures of one row); their membership problem is decidable in polynomial time; they extend Diag-DREC. Several features characterise and distinguish such models: the rewriting moves may or may not depend on the adjacent pixels, and the rewriting steps may or may not be followed by a restart; either a predetermined scanning strategy is used to visit the input picture, or a head is moved through the picture according to a finite state control; a position may be visited several times, or it may be required to be visited exactly once. A survey for restarting and sgraffito automata is provided by [35]. Another interesting but radically different notion of determinism is proposed in [9], where a TS is used to transform the input picture into a unique tiling through a process similar to the way in which a Sudoku puzzle is being solved. While it is decidable to check if a TS is diagonal, or snake deterministic, at present the status of the decidability problem for [9] is unknown.

4. Grammars We describe and compare early and recent types of picture grammars that extend into 2D the classical 1D context-free rewriting rules. Due to space restrictions, we limit the selection to a few grammar types that are representative of different approaches, yet are more comparable to each other and more deeply investigated; references to some other related types are given below. A grammatical approach not considered here is based on graph representations and transformations: a picture is described by a decomposition into subpictures, and by geometrical relations between them, such as “is right of,” “is connected with,” etc. Such recursive definition can be viewed as a graph, whose nodes represent subpictures, and the edges denote geometrical relations. A class of pictures is then defined by a suitable graph grammar ([46] is an introduction), i.e., by a graph rewriting system. All the considered grammar types have the same structure as 1D context-free grammars (1CFG). A grammar is a 4-tuple G D .N; A; R; S /, where N is the set of nonterminals, disjoint from the terminal alphabet A, their union is denoted V , R is the set of rules (productions), and S 2 N is the axiom. For all types we adhere to the following naming scheme: terminals are denoted by lower case Latin letters a; b; : : : and nonterminals by capital Latin letters X; Y; : : : Typically a rule has the form X ! ˛ , where the left-hand side X 2 N , and the right-hand side ˛ has a form that depends on the grammar type. A rule having S as left hand side is termed axiomatic.

9. Two-dimensional models

319

All grammar types considered admit a normal form, called nonterminal, such that the right hand side of a rule either contains just nonterminals or just terminals, resp., termed nonterminal and terminal rule. In order to see the motivation behind so many ingenious 2D grammar proposals, imagine a rule X ! q , where q is a picture of non-unitary size .h; k/ and naively apply it to rewrite an X -pixel occurring at some position in picture r . Since the right hand side q is larger than .1; 1/ the neighbors of X in r must be spread out, thus creating new rectangular regions contiguous to q , whose content is undetermined by the rule. This so-called shearing effect precludes performing rewriting as in 1CFGs, and has prompted two main alternative approaches, symbolic and isometric. § 4.1 and § 4.2 present grammar models that are more naturally defined by the symbolic approach; in particular, the models presented in § 4.2 use grids as righthand sides. § 4.3 presents isometric models based on tiles. Symbolic and isometric rewriting mechanisms seem disparate at first glance, but by deeper analysis the two approaches can be compared, and the latter can be proved to be more powerful, at least in the present range of models. Last, we mention an interesting idea, which is similarly inspired by a 2D extension of string grammars, but currently does not fit in our classification. Nivat et al. [34] proposed a model called puzzle grammar that is suitable for describing and generating connected patterns of pixels; the pattern is immersed in a neutral background, and it is not constrained to fit inside a rectangular array. 4.1. Context-free Kolam grammars and related symbolic grammars. This class of symbolic grammars has been invented by Siromoney et al. [48] and initially named “array grammar,” then renamed Kolam array grammar (to avoid confusion with the homonymous model due to Azriel Rosenfeld and with reference to traditional Indian decorative patterns) and much later reinvented by Matz [31], whose terser definition and analysis are adopted. In symbolic rewriting, the right hand side of a rule is interpreted as a formula describing a picture language (for instance in the formalism of 2D regular expression), and the nonterminals are viewed as variables. Starting from the axiom, each rewriting step replaces a variable by a formula, until all variables have been eliminated. The last formula denotes a picture (if any) generated by the grammar. Notice that symbolic rewriting operates on texts, not on pictures. Definition 4.1. A sentential form over an alphabet V is a non-empty 2D regular expression using only the two concatenation operators and ȅ, and symbols of V . A sentential form  defines either one picture over V , denoted by L M, or none. SF.V / denotes the set of all sentential forms over V . Definition 4.2. In a context-free Kolam grammar (KG) G D .N; A; R; S / the rule set is R  N  SF.V /, i.e., a rule has the form X !  .

320

Stefano Crespi Reghizzi, Dora Giammarresi, and Violetta Lonati

Pictures are obtained by a derivation process similar to 1D context-free derivation. Starting from the initial string S , a sentential form is obtained from the preceding one by replacing a nonterminal with one of its right hand side The derivation ends when the sentential form contains no nonterminals; then the denoted picture (if any), is generated 1

by the grammar. As usual, H) G or H) G denotes a one-step derivation, while 

H) G denotes the reflexive and transitive closure; the grammar name will be dropped unless indispensable. The picture language generated by G is the set L.G/ D ¹L M j



2 SF.A/ ^ S H) 

º:

With a slight abuse of notation, we also write X H) p , with X 2 N; p 2 ACC , instead  of 9W X H) ; L M D p . Similar to Chomsky’s nonterminal normal form of a 1CFG, KG admits a normal form [31], to be named binary nonterminal, such that the right hand side’s take one of the types: a 2 A, X Y , X ȅ Y . Rewriting by means of the last two types induces a partition of a (sub)picture into two regions, one on top, or to the side, of the other.

Example 4.1. The following KG defines the set L(8) of pictures such that each column is an even-length palindrome: S ! X ȅ S j X;

X ! a X a j b X b j a a j b b:

For better readability, we have dropped redundant parentheses, e.g., in .a X / a D a .X a/. A successful derivation is S H) .X ȅ S /

H) ..a X a/ ȅ S / 2

H) ..a b b a/ ȅ X /

H) ..a b b a/ ȅ .b X b//

H) ..a b b a/ ȅ .b b b b// :

It is important to say that KG rules can be made more concise by allowing 2D regular expressions that use also union and ȅ and closure in the right hand side. For instance, rules S ! X ȅ S j X can be replaced by the more concise form S ! X Cȅ . Moreover, for brevity we sometimes write a single “rule” Y ! aCC instead of the rules Y ! X Cȅ , X ! aC . Such regular expressions may denote more than one picture. Notice that this extension does not extend the generated language family; this fact is remarkably similar to what happens to 1CFGs. Historical remark: context-free matrix grammars. The grammar presented in Example 4.1 has a very simple form that corresponds to an early model named contextfree matrix grammar (here simply MG) [47]. The axiomatic rules generate a horizontal string language (in this case X Cȅ ) and then each instance of nonterminal X generates a vertical string (in this case a palindrome). More precisely, a MG is characterised by the partition of the rule set into two blocks: a first “horizontal” block that includes the

9. Two-dimensional models

321

axiomatic rules, generates a horizontal string of so-called vertical nonterminals. Each vertical nonterminal can then generate a vertical string by means of the rules of the second “vertical” block. This idea inspired some later models. Subramanian et al. [51] have recently extended this approach by allowing the vertical and horizontal rule sets to be alternatively applied any number of times; the resulting grammars currently do not fit in our classification. KG model was originally proposed as an extension of MGs, to overcome their weakness, as illustrated in the following example. Example 4.2. Language L(1) of square homogeneous pictures is easily generated by the KG having rules: S ! .S ȅ aC / aCȅ

and S ! a:

Notice that, in order to generate some picture, the first rule must be applied with a suitable proportion of vertical and horizontal a symbols; namely, only rules like S ! .S ȅ an / a.nC1/ȅ with n > 1 generate some picture. For instance, a successful generation is S H) .S ȅ a2 / a3ȅ H) .Œ.S ȅ a1 / a2ȅ  ȅ a2 / a3ȅ H) a3;3 :

This language is out of reach for MGs, because, similarly to regular expressions, the separation of horizontal and vertical rules prevents the control of picture width and height equality [21]; the formal proof is based on a double use of 1D context-free pumping lemma, first using horizontal rules, then using vertical ones for each individual column. L.KG/ is obviously closed under union, under the operations of horizontal and vertical concatenation and their closures, and under rotation, but it is not closed under complement, nor intersection. This immediately follows from the non-closure of the class of 1D context-free languages under the same operations, and the fact that the latter family coincides with the L.KG/ languages having exactly one row. For MG grammars the language emptiness problem and the language finiteness problems are decidable. The proof in [50] is based on the density properties of languages in L.MG/, but another proof of emptiness decidability based on the pumping lemma is in [43]. Concerning KG, in [43] the authors prove that the undecidable halting problem for a 2-counter Minsky machine reduces to the emptiness problem for KG. This negative result affects also the more powerful grammar models to be presented in what follows.

4.2. Grammars with grids as righthand sides. To motivate the introduction of more powerful grammar models here and in § 4.3, we discuss a limitation of KGs coming from lack of coordination between the horizontal and vertical compositions induced by concatenation operators. An example of language exceeding the generative capacity of KGs (the proof given in [31] for a slightly different language also applies here) is the following.

Stefano Crespi Reghizzi, Dora Giammarresi, and Violetta Lonati

322

Example 4.3. Consider the set L(9) of grids with four homogeneous meshes. The KG having rules S ! X1 X2 ;

X1 ! Ya ȅ Yb ;

X2 ! Yc ȅ Yd ;

Yz ! z CC ;

(10)

with z 2 ¹a; b; c; d º, does not generate L(9) , since it defines the larger language which includes, say,

¹.aCC ȅ b CC / .c CC ȅ d CC /º; a a b : c d d

But, allowing intersection in 2D regular expressions, L(9) can be written as L(9) D ..aCC ȅ b CC / .c CC ȅ d CC // \ ..aCC c CC / ȅ .b CC d CC //;

which is the intersection of two languages in L.KG/, the former generated by the grammar with rules (10), the latter by the grammar with rules: S ! Z1 ȅ Z2 ;

Z1 ! Ya Yc ;

W2 ! Y b Y d ;

Yz ! z CC :

Incidentally, this illustrates the already mentioned property that 2D regular expressions with intersection are more powerful than without. To generate the preceding language, we can resort to a rather moderate extension of KG that has been proposed by Průša [39] and [40]. A rule X ! ˛ has as right hand side a picture ˛ 2 N m;n , m; n > 1, which we may call a m  n “grid.” Moreover there are terminal rules of the form X ! a 2 A. The greater generative power comes from the following constraint on derivations: when a nonterminal occurring in ˛ is in turn rewritten and replaced by another grid, the result must preserve and refine the picture partition determined by the original m  n grid. A precise definition can be given by a symbolic approach that extends Definition 4.1. Definition 4.3. For m; n > 1, the partial function grid constructor gm;n ./ has m:n pictures as arguments and returns a picture. More precisely, the picture returned by gm;n .p11 ; : : : ; p1n ; p21 ; : : : ; p2n ; : : : ; pm1 ; : : : ; pmn /

is if and only if and

..p11 ȅ    ȅ p1n /    .pm1 ȅ    ȅ pmn //

jpi1 jrow D jpi 2 jrow D    D jpi n jrow

for all 1 6 i 6 m

jp1j jcol D jp2j jcol D    D jpmj jcol

for all 1 6 j 6 n:

A (grid) sentential form over an alphabet V is an expression using grid constructors and symbols of V . A sentential form  defines either one picture over V , denoted by L M, or none. SF.V / denotes the set of all sentential forms over V .

9. Two-dimensional models

323

Notice that g1;2 .p11 ; p12 / D p11 ȅ p12 and g1;2 .p11 ; p21 / D p11 p21 ; this implies that KG rules can also be specified by means of grid constructors. For better readability, the arguments of gm;n will be sometimes placed in a m by n array. Example 4.4. The sentential form  D g2;2 .g2;2 .b; b; b; b/; g2;1 .a; b/; g1;2 .g2;1 .c; d /; g2;1 .d; c//; g2;1 .a; a//

defines the picture b b L M D c d

b b d c

a b : a a

Definition 4.4. A Průša grid grammar (PGG) 2 has rules of two forms: X ! a;

X ! gm;n .Y11 ; : : : ; Y1n ; Y21 ; : : : ; Y2n ; : : : ; Ym1 ; : : : ; Ymn /;

where a 2 A and Yi;j 2 N . Derivation steps using PGG rules can be formalised via the symbolic approach, analogously as for KG. The obvious definitions are omitted. Example 4.5. Language L(9) is generated by the PGG having rules S ! g2;2 .Ya ; Yb ; Yc ; Yd /;

Ya ! aCC ;

Yc ! c CC ;

Yb ! b CC ;

Yd ! d CC ;

where the first rule may also be written as S !

Ya Yb Yc Yd

and the other rules are shortcuts for standard KG rules. From this example and the previous remark on grid constructors for KGs, it follows that L.KG/  L.PGG/. A similar kind of grid grammar was studied by Drewes from the perspective of quadtrees, a picture decomposition into four quadrants pioneered by Rosenfeld [45] and often applied in image processing algorithms. Actually Drewes developed a range of models for picture generation and transformation [17] around the idea of tree languages and tree transducers; different interpretations give rise to line drawings, collages, developmental languages, etc. In [38] Drewes’s basic language family is proved to be included in L.KG/. 2 The name is proposed here instead of the original generic name “context-free picture grammar.”

324

Stefano Crespi Reghizzi, Dora Giammarresi, and Violetta Lonati

4.3. Grammars with isometric rules based on tiles. As the name says, isometric rewriting rules have the form p ! q , for some pictures p and q over V with jpj D jqj; but such rules would qualify as context-sensitive rather than context-free, because several nonterminals occurring in the left hand side would be possibly rewritten at once, unless further constraints are enforced. In particular, the constraint introduced by tile grammars is that p is a X -homogeneous picture, X 2 N . Although the historical definitions of most picture grammar models are idiosyncratic, and several of them have been reinvented in the course of time, a common ground is here provided by the framework of the isometric rules of tile grammars by Crespi Reghizzi and Pradella [14], that allows a direct comparison of their generative power. The left hand side of such rules is a X -homogeneous picture for some nonterminal X and the right hand side is an isometric picture over terminals and nonterminals, defined by means of a tile set. A special role is played by regional tiling grammars, which extend the grammar models presented in the previous section and admit polynomial parsing [38]. We need some definitions. A homogeneous partition of a picture p is a partition P D ¹d1 ; d2 ; : : : ; dn º of dom.p/ into homogeneous subdomains d1 ; d2 ; : : : ; dn . The unit partition of p , written Up , is the homogeneous partition of dom.p/ defined by single pixels. An homogeneous partition is strong if adjacent subdomains have distinct labels. A tile such as a a b a with a ¤ b (or one of its rotations) is called concave. If a picture p admits a strong partition, then it cannot contain any concave tile, moreover the strong partition is unique and will be denoted by ….p/. A strong partition is regional 3 if all distinct (not necessarily adjacent) subdomains have distinct labels. A picture is regional if it admits a regional partition, and a language is regional if all its pictures are so. Example 4.6. Here there are various kinds of homogeneous partitions; the third one is strong but not regional, and the last one is regional (and hence strong): a a ; a b

a a ; a b

a b b ; c a a

a b b : c d d

Definition 4.5. A tile grammar (TG) has two kinds of rules: fixed size rule: X ! a, where a 2 A; variable size rule: X ! ‚, where ‚ is a tile set over N [ ¹#º such that no tile in ‚ is concave. If, for every rule X ! ‚, the local language L.‚/ is regional, then the grammar is a regional tile grammar (RTG). 3 The adjective “regional” is a metaphor of geographic political maps, where different regions are filled with distinct colors.

9. Two-dimensional models

325

Any variable size rule X ! ‚ stands for unboundedly many rules X h1 ;k1 ! ˛1 , X ! ˛2 ; : : : ; where each ˛i is in L.‚/ and j˛i j D .hi ; ki /. The meaning of a variable sized rule X ! ‚ is that the left hand side is a X -homogeneous picture automatically adjusted to the size of some right hand side picture ˛ 2 L.‚/. A derivation starts from an S -homogeneous picture. At each step, a rule X !    is applied. Informally, if the rule has variable size then an X -homogeneous subpicture is replaced by a picture of the same size belonging to the local language defined by the right hand side of the rule; if the rule has fixed size, then an X -pixel is replaced by the symbol in A defined as the right hand side of the rule. The process terminates when all nonterminals have been eliminated from the current picture. To define picture derivation, different but equivalent formalisations have been proposed in [14] and [12], and we present the one based on picture partitions [38]. Let p 2 V h;k be a picture and P be a partition of domain dom.p/. Then a picture p 0 2 V h;k with partition P0 derives in one step from .p; P/, written .p; P/ H) G .p 0 ; P0 /, if one of the following situations occurs. h2 ;k2

 There is a fixed size rule X ! a with a 2 A, P contains an X -pixel, p 0 is obtained from p by replacing that X -pixel with a, and P0 D P.

 There is a variable size rule X ! ‚, P contain an X -homogeneous subdomain d (called application area), p 0 is obtained from p by replacing pd with some picture s 2 L.‚/ of the same size, and P0 is obtained from P by subdividing the application area d into the partition ….s/ (of course after shifting ….s/ to the application area). Notice that ….s/ is well defined since ‚ does not contain concave tiles, whereas P0 in general is not a strong partition.

The picture language defined by a grammar G is the set L.G/ of pictures p over A such that 

.S jpj ; ¹dom.p/º/ H) G .p; Up /: 

For short we also write S H) G p . We emphasise that, to generate a picture, one must start from a S -homogeneous picture of the same dimension. Notice that the definition of TGs ensure that, for every picture obtained by a rewriting step, the strong partition uniquely determines the X -homogeneous subpictures, X 2 N , to be taken as left hand side for further rewriting steps. In the following examples we will use again the notation Jp K to denote the set of tiles of pO . Notice that if p is a regional picture, and ‚ D Jp K, then the local language L.‚/ is a regional language. Example 4.7. The language of pictures such that every column is an even-length palindrome (see Example 4.1) is generated by a RTG with the following rules: u }ˇ | R S S ˇˇ t w ˇ R S ! vR S S ~ˇ ; ˇ R ˇ R S S

Stefano Crespi Reghizzi, Dora Giammarresi, and Violetta Lonati

326

u

A w R w R ! w vR A0

}ˇ ˇ ˇ ˇ ˇ ˇ ~ˇ ˇ ˇ

u

B w wR w vR B0

ˇ }ˇ ˇ ˇ ˇ t | |ˇ t ˇ ˇ ˇ B ˇ A ; ˇ ˇ ~ ˇ A0 ˇ B 0 ˇ ˇ ˇ ˇ

A A0 B B0

! a; !a ! b; ! b:

An example of derivation is the following, where vertical and horizontal divisors are used to represent, at each step, the resulting partition of the application area: S S S S

S S S S

S S S S

R R H) R R

S S S S

S S S S

A R H) R A0

H)

a b b a

S S S S

S S H) S S

H)

a b b a

b a a b

S S H) S S

S S S S

S S S S

A B H) B0 A0

a b b a

R R R R

S S H) S S

a b b a

b a a b

a a : a a

a b b a

S S S S

S S S S

B R R B0

S S S S

a b H) b a

B A A0 B0

S S S S

It may help to imagine what a string syntax tree, obtained by a context-free derivation, becomes for a tile or regional grammar. As a syntax tree is a well-nested structure and requires a 2-dimension representation, a picture derivation requires a .2C1/-dimension representation, and consists of well-nested parallelepipeds [38]. The restriction of regional partitions has the effect to impose a bound on the number of parallelepipeds that can be immediately nested inside another one. In [14] it is proved that L.TG/ is closed with respect to union, horizontal and vertical concatenations and their closure, rotation, and projection. We also observe that all the grammar families presented so far are closed under union but are not closed under intersection (excluding REC) and complement. This is proved as for contextfree string languages, since all such 2D grammar models define exactly the context-free string languages if restricted to 1D, and it is well known that language ¹an b n c n j n > 0º is not context-free even though it is the intersection of two context-free string languages, ¹an b m c n j m; n > 0º and ¹an b n c m j m; n > 0º. Finally it is worth mentioning that all the grammar types considered here, except the most general tile grammar model, have polynomial-time picture recognition algorithms, which can be viewed as natural extensions of classical string parsing algorithms [15] and [38].

9. Two-dimensional models

327

5. Comparison of language families Here we present the relations between the language families defined by grammars in the previous section, we show that they form a proper hierarchy, and we discuss their relation with tiling systems. More details can be found in [14] and [38]. Proposition 5.1. The following inclusion relations are proper: L.MG/  L.KG/  L.PGG/  L.RTG/  L.TG/:

Proof. Inclusion L.MG/  L.KG/ has already been argued for, in Example 4.2. L.KG/  L.PGG/ since a KG in binary nonterminal normal form is exactly the same as a PGG such that the right hand side’s use only the grid constructors g1;2 and g2;1 . Strict inclusion is proved by language L(9) (Example 4.3). To prove that L.PGG/  L.RTG/ in [38] a construction is exhibited that converts a PGG into an equivalent TG that is actually a RTG. We sketch the similar, less general but simpler, construction proposed in [14] that converts a KG into an equivalent TG that is actually a RTG: terminal rules X ! a, with a 2 A, are formally identical in the two models and generate single pixels; KG rules of form Y ! W ȅ Z (resp. Y ! W Z ) are equivalent to RTG rules resp. of the form u } W W t | w  W W Z Z wW W  Y ! ; Y !w : vZ Z~ W W Z Z Z Z

To prove that the inclusion is strict, consider the language L  ¹a; b; cº2;C of “misaligned palindromes” (Example 4 of [38]), i.e., pictures that are a ribbon of two rows, divided into four fields: at the top-left and at the bottom-right are palindromes over alphabet ¹a; bº. The other two fields are filled with c ’s and must not overlap. An example of picture in L is a a b b a a c c c c : c c b a b a a b a b In [38] it is proved that L cannot be generated by any PGG, by a reasoning similar to the pumping lemma for string languages. On the other hand, L is generated by the RTG having the following rules, where the straightforward RTG rules generating palindromes are omitted, and 1 6 i 6 2: s { X1 X1 X1 X1 Y1 Y1 S ! ; Y i ! JY Y i Y i K; Y2 Y2 X2 X2 X2 X2 Xi ! H;

C

H H) palindrome rows,

Y

! c;

Yi ! c:

Finally L.RTG/  L.TG/ since RTG rules are obtained from TG rules adding the regional constraint. A witness language separating such classes is proposed in § 5 of [38]; its definition is quite long and is omitted here because of space reasons.

328

Stefano Crespi Reghizzi, Dora Giammarresi, and Violetta Lonati

Proposition 5.2. REC  L.TG/.

Proof. Given a tiling system T D .A; €; ‚; /, we intuitively explain and exemplify how to obtain an equivalent tiling grammar G . The idea is to take the tile set ‚ and add two markers, e.g., ¹b; wº in a “chessboard-like” fashion to build up a tile set suitable for the right hand side of the variable size axiomatic rule of G ; other straightforward fixed size rules are used to encode the projection  . The detailed construction is in [14] and is illustrated by Example 5.1. The “chessboard-like” construction ensures that the strong partition of a right hand side, obtained in applying a rule, is the unitary partition. In this way, the same effect of projection  is achieved by applying derivation steps based on terminal rules. Moreover, the language inclusion is proper: L(8) 62 REC, as argued in the proof of Theorem 3.5, but it is generated by the KG defined in Example 4.1, hence L(8) 2 L.TG/ by Proposition 5.1. Example 5.1. Consider the TS described in Example 3.1, which recognises language L(1) of a-homogeneous squares. The equivalent tile grammar has the rules: 1w ! a, 1b ! a, 0w ! a, 0b ! a, and } u } u 1w 0b 0w 0b 1b 0w 0b 0w  w  w w0 1 0 0  w 0 1 0 0  S ! w w b w b  [ w b w b w : v 0b 0w 1b 0w ~ v 0w 0b 1w 0b ~ 0b 0w 0b 1w 0w 0b 0w 1b Observe that the last rule matches the pictures, in the local language, that have 1 on the main diagonal and 0 elsewhere. The grammar has to consider two sets of tiles as right hand side: the tiles arising from chessboard structures with a “black” in top-left position, and the others with a “white.” Indeed, to fill with 0’s the areas over and under the diagonal we need both tiles 0b 0w 0w 0b

and

0w 0b : 0b 0w

A question, prompted by the loose analogy between regular string languages and REC on one side, and between context-free string languages and TG on the other side, is the following: under what conditions a TG generates a language that is in REC? Some sufficient conditions are presented in [12], which are reminiscent of the classical result saying that a 1CFG such that all recursive derivations are left (or right) generates a regular language. Proposition 5.3. LOC and REC are incomparable with L.RTG/, L.PGG/, L.KG/, and L.MG/. Proof. The witness language that separates classes L.TG/ and L.RTG/, presented in § 5 of [38] (and already mentioned in proof of Proposition 5.1), belongs also to LOC and hence to REC. Hence we have that LOC and REC are neither contained in L.RTG/ nor, a fortiori, in L.PGG/, L.KG/, and L.MG/, by Proposition 5.2. The opposite relations descend from the fact that language L(8) is generated by the MG

9. Two-dimensional models

329

defined in Example 4.1, but is not in REC as argued in the proof of Theorem 3.5, and hence is not in LOC either. As a consequence, L.MG/ 6 REC, L.MG/ 6 LOC, and the same property holds for families larger that L.MG/. The inclusion relations proved in Propositions 5.1, 5.2, and 5.3 are summarised in the following diagram: tile grammars, L.TG/ tiling systems, REC

regional-tile grammars, L.RTG/

LOC

Průša’s grid grammars, L.PGG/ context-Free Kolam grammars, L.KG/ context-Free Matrix grammars, L.MG/

6. Conclusion This chapter is intended as a contribution to the presentation of the theory of 2D languages as a coherent theory. In the last two decades a considerable flurry of research activity has taken place on tiling systems and picture grammars, and we have attempted, as best as we could, to unify the basic definitions in order to highlight the relations, often neglected in the past, between several models originating from different authors. For that, two notions have played an essential unifying role: the use of tiles on an auxiliary alphabet, corresponding to the by now well established Tiling Systems; and the use of isometric rewriting rules, specified by tiles, and especially the simpler, polynomially parsable, Regional Tile grammars. The mathematical properties that we have considered are the classical ones for string languages: closure properties, ambiguity and determinism, and parsing complexity. Inclusion relations between language families have been illustrated by typical examples, and we have described, or referred to, reduction procedures for transforming one model into another. Whereas for string languages formal language theory and automata theory are coextensive terms, this is not the case for picture languages, where automata models are much less consolidated: our survey could just mention early attempts at formalising automata, and some recent, on-going research on deterministic picture scanning. Much remains to be done!

References [1] M. Anselmo, D. Giammarresi, and M. Madonia, A computational model for recognisable two-dimensional languages. Theoret. Comput. Sci. 410 (2009), no. 37, 3520–3529. MR 2553028 Zbl 1191.68371 q.v. 317, 318

330

Stefano Crespi Reghizzi, Dora Giammarresi, and Violetta Lonati

[2] M. Anselmo, D. Giammarresi, and M. Madonia, Deterministic and unambiguous families within recognizable two-dimensional languages. Fund. Inform. 98 (2010), no. 2–3, 143–166. MR 2650720 Zbl 1196.68117 q.v. 316, 317 [3] M. Anselmo, D. Giammarresi, M. Madonia, and A. Restivo, Unambiguous recognizable two-dimensional languages. Theor. Inform. Appl. 40 (2006), no. 2, 277–293. MR 2252639 Zbl 1112.68085 q.v. 309, 314, 315 [4] M. Anselmo and M. Madonia, Deterministic and unambiguous two-dimensional languages over one-letter alphabet. Theoret. Comput. Sci. 410 (2009), no. 16, 1477–1485. MR 2502122 Zbl 1162.68020 q.v. 315 [5] M. Anselmo and M. Madonia, Classes of two-dimensional languages and recognizability conditions. RAIRO Theor. Inform. Appl. 44 (2010), no. 4, 471–488. MR 2775407 Zbl 1211.68230 q.v. 315 [6] M. Anselmo and M. Madonia, A stronger recognizability condition for two-dimensional languages. Discrete Math. Theor. Comput. Sci. 15 (2013), no. 2, 139–155. MR 3084555 Zbl 1281.68141 q.v. 315 [7] A. Bertoni, M. Goldwurm, and V. Lonati, On the complexity of unary tiling-recognizable picture languages. Fund. Inform. 91 (2009), no. 2, 231–249. q.v. 311 [8] M. Blum and C. Hewitt, Automata on a 2-dimensional tape. In Proceedings of the 8 th Annual Symposium on Switching and Automata Theory. SWAT 1967. Held in Austin, TX, October 8-20, 1967. IEEE Computer Society Press, Los Alamitos, CA, 1967, 155–160. IEEEXplore 5397208 q.v. 317 [9] B. Borchert and K. Reinhardt, Deterministically and sudoku-deterministically recognizable picture languages. In Proceedings of the 1st International Conference on Language and Automata Theory and Applications (R. Loos, S. Z. Fazekas, and C. Martín-Vide, eds.). Universitat Rovira i Virgili, Tarragona, 2007, 175–186. q.v. 318 [10] S. Bozapalidis and A. Grammatikopoulou, Recognizable picture series. J. Autom. Lang. Comb. 10 (2005), no. 2–3, 159–183. MR 2285327 Zbl 1161.68514 q.v. 311 [11] R. Brijder and H. J. Hoogeboom, Perfectly quilted rectangular snake tilings. Theoret. Comput. Sci. 410 (2009), no. 16, 1486–1494. MR 2502123 Zbl 1162.68021 q.v. 317 [12] A. Cherubini, S. Crespi Reghizzi, M. Pradella, and P. San Pietro, Picture languages: Tiling systems versus tile rewriting grammars. Theoret. Comput. Sci. 356 (2006), no. 1–2, 90–103. MR 2217829 Zbl 1160.68392 q.v. 325, 328 [13] A. Cherubini and M. Pradella, Picture languages: From Wang tiles to 2D grammars. In Prooceedings of the 3rd Algebraic Informatics (CAI). (S. Bozapalidis and G. Rahonis, eds.). Lecture Notes in Computer Science, 5725. Springer, Berlin, 2009, 13–46. MR 2683253 Zbl 1256.68098 q.v. 303 [14] S. Crespi Reghizzi and M. Pradella, Tile rewriting grammars and picture languages. Theoret. Comput. Sci. 340 (2005), no. 2, 257–272. MR 2150754 Zbl 1079.68047 q.v. 324, 325, 326, 327, 328 [15] S. Crespi Reghizzi and M. Pradella, A CKY parser for picture grammars. Inform. Process. Lett. 105 (2008), no. 6, 213–217. MR 2387849 Zbl 1184.68301 q.v. 326 [16] L. de Prophetis and S. Varricchio, Recognizability of rectangular pictures by Wang systems. J. Autom. Lang. Comb. 2 (1997), no. 4, 269–288. MR 1646448 Zbl 0908.68109 q.v. 310 [17] F. Drewes, Grammatical picture generation. A tree-based approach. Texts in Theoretical Computer Science. An EATCS Series. Springer, Berlin, 2006. MR 2206187 Zbl 1085.68177 q.v. 323

9. Two-dimensional models

331

[18] S. Eilenberg, Automata, languages, and machines. Vol. A. Pure and Applied Mathematics, 58. Academic Press, New York, 1974. MR 0530382 Zbl 0317.94045 q.v. 307, 308 [19] D. Giammarresi and A. Restivo, Recognizable picture languages. Int. J. Pattern Recognit. Artif. Intell. 6 (1992), no. 2–3, 241–256. q.v. 308, 309, 313 [20] D. Giammarresi and A. Restivo, Two-dimensional languages. In Handbook of formal languages (G. Rozenberg and A. Salomaa, eds.). Vol. 3. Beyond words. Springer, Berlin, 1997, 215–267. MR 1470021 q.v. 303, 304, 311, 312, 313 [21] D. Giammarresi and A. Restivo, Matrix based complexity functions and recognizable picture languages. In Logic and automata (E. Grader, J.Flum, and T. Wilke, eds.). History and perspectives. Texts in Logic and Games, 2. Amsterdam University Press, Amsterdam, 2008, 315–337. MR 2508747 Zbl 1217.68129 q.v. 314, 315, 321 [22] D. Giammarresi, A. Restivo, S. Seibert, and W. Thomas, Monadic second-order logic over rectangular pictures and recognizability by tiling systems. Inform. and Comput. 125 (1996), no. 1, 32–45. MR 1385806 Zbl 0853.68131 q.v. 311 [23] J. Hromkovic, S. Seibert, J. Karhumäki, H. Klauck, and G. Schnitger, Communication complexity method for measuring nondeterminism in finite automata. Inform. and Comput. 172 (2002), no. 2, 202–217. MR 1881189 Zbl 1009.68067 q.v. 315 [24] K. Inoue and A. Nakamura, Some properties of two-dimensional on-line tessellation acceptors. Inform. Sci. 13 (1977), no. 2, 95–121. MR 0537582 Zbl 0371.94067 q.v. 312, 313, 316 [25] K. Inoue and I. Takanami, A characterization of recognizable picture languages. Int. J. Pattern Recognit. Artif. Intell. 8 (1994), no. 2, 501–508. q.v. 312, 313 [26] M. Latteux and D. Simplot, Recognizable picture languages and domino tiling. Theoret. Comput. Sci. 178 (1997), no. 1–2, 275–283. MR 1453855 Zbl 0912.68106 q.v. 311 [27] K. Lindgren, C. Moore, and M. Nordahl, Complexity of two-dimensional patterns. J. Statist. Phys. 91 (1998), no. 5–6, 909–951. MR 1637266 Zbl 0917.68156 q.v. 307, 316 [28] V. Lonati and M. Pradella, Snake-deterministic tiling systems. In Mathematical foundations of computer science 2009 (R. Královič and D. Niwiński, eds.). Proceedings of the 34th International Symposium (MFCS 2009) held in Novy Smokovec, August 24–28, 2009. Lecture Notes in Computer Science, 5734. Springer, Berlin, 2009, 549–560. MR 2539521 Zbl 1250.68168 q.v. 317 [29] V. Lonati and M. Pradella, Deterministic recognizability of picture languages with Wang automata. Discrete Math. Theor. Comput. Sci. 12 (2010), no. 4, 73–94. MR 2760336 Zbl 1286.68291 q.v. 317, 318 [30] V. Lonati and M. Pradella, Strategies to scan pictures with automata based on Wang tiles. RAIRO Theor. Inform. Appl. 45 (2011), no. 1, 163–180. MR 2776859 Zbl 1219.68100 q.v. 317 [31] O. Matz, Regular expressions and context-free grammars for picture languages. In STACS ’97 (R. Reischuk and M. Morvan, eds.). Proceedings of the 14th Annual Symposium on Theoretical Aspects of Computer Science held in Lübeck, February 27–March 1, 1997. Lecture Notes in Computer Science, 1200. Springer, Berlin, 1997, 283–294. MR 1473781 q.v. 306, 319, 320, 321 [32] O. Matz, On piecewise testable, starfree, and recognizable picture languages. In Foundations of software science and computation structures (M. Nivat, ed.). Proceedings of the 1st International Conference (FoSSaCS ’98) held as part of the Joint European Conferences on Theory and Practice of Software (ETAPS ’98) in Lisbon, March 28–April 4, 1998. Lecture

332

[33]

[34] [35]

[36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48]

Stefano Crespi Reghizzi, Dora Giammarresi, and Violetta Lonati Notes in Computer Science, 1378. Springer, Berlin, 1998, 203–210. MR 1641340 q.v. 314, 315 I. Mäurer, Weighted picture automata and weighted logics. In STACS 2006 (B. Durand and W. Thomas, eds.). Proceedings of the 23rd Annual Symposium on Theoretical Aspects of Computer Science held in Marseille, February 23–25, 2006. Lecture Notes in Computer Science, 3884. Springer, Berlin, 2006, 313–324. MR 2249378 Zbl 1136.68421 q.v. 314 M. Nivat, A. Saoudi, K. G. Subramanian, R. Siromoney, and V. R. Dare, Puzzle grammars and context-free array grammars. Int. J. Pattern Recognit. Artif. Intell. 5 (1991), 663–676. q.v. 319 F. Otto, Restarting automata for picture languages: A survey on recent developments. In Implementation and application of automata (M. Holzer and M. Kutrib, eds.). Proceedings of the 19th International Conference (CIAA 2014) held at Universität Giessen, Giessen, July 30–August 2, 2014. Lecture Notes in Computer Science, 8587. Springer, Cham, 2014, 16–41. MR 3247080 Zbl 1302.68174 q.v. 303, 318 F. Otto and F. Mráz, Deterministic ordered restarting automata for picture languages. Acta Inform. 52 (2015), no. 7–8, 593–623. MR 3404702 Zbl 1330.68174 q.v. 318 A. Potthoff, S. Seibert, and W. Thomas, Nondeterminism versus determinism of finite automata over directed acyclic graphs. Bull. Belg. Math. Soc. Simon Stevin 1 (1994), no. 2, 285–298. (Journées Montoises, Mons, 1992.) MR 1318971 Zbl 0803.68032 q.v. 314, 316 M. Pradella, A. Cherubini, and S. Crespi Reghizzi, A unifying approach to picture grammars. Inform. and Comput. 209 (2011), no. 9, 1246–1267. MR 2849281 Zbl 1235.68096 q.v. 323, 324, 325, 326, 327, 328 D. Průša, Two-dimensional context-free grammars. In Proceceedings of the Conference ITAT 2001: Information Technologies – Applications and Theory (G. Andrejková and S. Krajči, eds.). Zuberec, Slovakia, September 2001, 27–40. q.v. 322 D. Průša, Two-dimensional languagesTwo-dimensional languages. PhD thesis. Charles University, Faculty of Mathematics and Physics, Czech Republic, 2004. q.v. 322 D. Průša and F. Mráz, Restarting tiling automata. Internat. J. Found. Comput. Sci. 24 (2013), no. 6, 863–878. MR 3158973 Zbl 1286.68294 q.v. 318 D. Průša, F. Mráz, and F. Otto, Two-dimensional Sgraffito automata. RAIRO Theor. Inform. Appl. 48 (2014), no. 5, 505–539. MR 3343516 Zbl 1328.68117 q.v. 318 D. Průša and K. Reinhardt, Undecidability of the emptiness problem for context-free picture languages. Theoret. Comput. Sci. 679 (2017), 118–125. MR 3653964 Zbl 1371.68164 q.v. 321 A. Rosenfeld, Picture languages. Formal models for picture recognition. Computer Science and Applied Mathematics. Academic Press, New York and London, 1979. MR 0528637 Zbl 0471.68074 q.v. 303 A. Rosenfeld, Quadtree grammars for picture languages. IEEE Transactions on Systems, Man, and Cybernetics 12 (1982), 401–405. IEEEXplore 4308831 q.v. 323 G. Rozenberg (ed.), Handbook of graph grammars and computing by graph transformations. Vol. 1. Foundations. World Scientific Publishing Co., River Edge, N.J., 1997. MR 1480952 Zbl 0908.68095 q.v. 318 G. Siromoney, R. Siromoney, and K. Krithivasan, Abstract families of matrices and picture languages. Comput. Graphics and Image Processing 1 (1972), no. 3, 284–307. MR 0381404 q.v. 320 G. Siromoney, R. Siromoney, and K. Krithivasan, Picture languages with array rewriting rules. Information and Control 22 (1973), 447–470. MR 0339569 Zbl 0266.68037 q.v. 319

9. Two-dimensional models

333

[49] R. Siromoney, Advances in array languages. Graph-grammars and their application to computer science (H. Ehrig, M. Nagl, G. Rozenberg, and A. Rosenfeld eds.). Papers from the 3rd International Workshop held in Warrenton, Virginia, December 2–6, 1986. Lecture Notes in Computer Science, 291. Springer, Berlin, 1987, 549–563. MR 0943181 Zbl 0643.68117 q.v. 303 [50] R. Stiebe, Slender Siromoney matrix languages. Inform. and Comput. 206 (2008), no. 9–10, 1248–1258. MR 2440666 Zbl 1328.68107 q.v. 321 [51] K. Subramanian, M. Geethalakshmi, A. Nagar, and S. Lee, Two-dimensional picture grammar models. In 2008 Second UKSIM European Symposium on Computer Modeling and Simulation (D. Al-Dabass, A. Nagar, H. Tawfik, A. Abraham, and R. N. Zobel, eds.). Held in Liverpool, September 8-10, 2008. IEEE Computer Society Press, Los Alamitos, CA, 2008, 263–267. IEEEXplore 4625283 q.v. 321 [52] H. Wang, Proving theorems by pattern recognition I. Comm. ACM 3 (1960), no. 4, 220–234. Zbl 0101.10504 q.v. 310

Part II

Complexity issues

Chapter 10

Minimisation of automata Jean Berstel, Luc Boasson, Olivier Carton, and Isabelle Fagnot

Contents 1. 2. 3. 4. 5. 6. 7. 8. 9.

Introduction . . . . . . . . . Definitions and notation . . . Brzozowski’s algorithm . . . Moore’s algorithm . . . . . . Hopcroft’s algorithm . . . . Slow automata . . . . . . . . Minimisation by fusion . . . Dynamic minimisation . . . Extensions and special cases

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

337 339 341 343 345 351 354 365 367

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

370

1. Introduction This chapter is concerned with the design and analysis of algorithms for minimising finite automata. Getting a minimal automaton is a fundamental issue in the use and implementation of finite automata tools in frameworks such as text processing, image analysis, linguistic computer science, and many other applications. There are two main families of minimisation algorithms. The first uses a sequence of refinements of a partition of the set of states, while the second uses a sequence of fusions or merges of states. Among the algorithms of the first family, we mention a simple algorithm described in the book [38]. It operates by a traversal of the product of the automaton with itself, and therefore is in time and space complexity O.n2 /. Other algorithms are Hopcroft’s and Moore’s algorithms, which will be considered in depth later. The linear-time minimisation of acyclic automata of Revuz belongs to the second family. Brzozowski’s algorithm stands quite isolated and fits in neither of these two classes. The algorithm for the minimisation of complete deterministic finite state automata given by Hopcroft [37] runs in worst-case time O.n log n/. It is, up to now, the most efficient algorithm known in the general case. It has recently been extended to incomplete deterministic finite automata [55] and [9].

338

Jean Berstel, Luc Boasson, Olivier Carton, and Isabelle Fagnot

Hopcroft’s algorithm is related to Moore’s partition refinement Algorithm [48], although it is different. One of the purposes of this chapter is the comparison of the nature of Moore’s and Hopcroft’s algorithms. This gives some new insight into both algorithms. As we shall see, these algorithms are quite different both in behaviour and in complexity. In particular, we show that it is not possible to simulate the computations of one algorithm by the other. Moore’s partition refinement algorithm is much simpler than Hopcroft’s algorithm. It has been shown [6] that, although its worst-case behaviour is quadratic, its average running time is O.n log n/. No evaluation of the average behaviour is known for Hopcroft’s algorithm. The family of algorithms based on fusion of states is important in practice for the construction of minimal automata representing finite sets, such as dictionaries in natural language processing. A linear-time implementation of such an algorithm for cycle-free automata was given by Revuz [51]. This algorithm has been extended to a more general class of automata by Almeida and Zeitoun [3], namely to automata where all strongly connected components are simple cycles. It has been demonstrated in [8] that minimisation by state fusion, which is not always possible, works well for local automata. There is another efficient incremental algorithm for finite sets, by Daciuk et al. [27]. The advantage of this algorithm is that it does not build the intermediate trie which is rather space consuming. We also consider updating a minimal automaton when a word is added or removed from the set it recognises. Finally, we discuss briefly the case of nondeterministic automata. It is well known that minimal nondeterministic automata are not unique. However, there are several subclasses where the minimal automaton is unique. We do not consider here the problem of constructing a minimal automaton starting from another description of the regular language, such as the synthesis of an automaton from a regular expression. We also do not consider devices that may be more space efficient, such as alternating automata or two-way automata. Other cases not considered here concern sets of infinite words and the minimisation of their accepting devices. The chapter is organised as follows. The first section just fixes notation and the next one describes briefly Brzozowski’s algorithm. In § 4, we give basic facts on Moore’s minimisation algorithm. § 5 is a detailed description of Hopcroft’s algorithm, with the proof of correctness and running time. It also contains the comparison of Moore’s and Hopcroft’s algorithms. The next section is devoted to so-called slow automata. Some material in these two sections is new. § 7 and § 8 are devoted to the family of algorithms working by fusion. We describe in particular Revuz’s algorithm and its generalisation by Almeida and Zeitoun, the incremental algorithm of Daciuk et al., and dynamic minimisation. The last section contains miscellaneous results on special cases and a short discussion of nondeterministic minimal automata.

10. Minimisation of automata

339

2. Definitions and notation It appears to be useful, for a synthetic presentation of the minimisation algorithms of Moore and Hopcroft, to introduce some notation for partitions of the set of states. Partitions and equivalence relations. A partition of S a set E is a family P of nonempty, pairwise disjoint subsets of E such that E D P 2P P . The index of the partition is the number of its elements. A partition defines an equivalence relation P on E . Conversely, the set of all equivalence classes Œx, for x 2 E , of an equivalence relation on E defines a partition of E . This is the reason why all terms defined for partitions have the same meaning for equivalence relations and vice versa. A subset F of E is saturated by P if it is the union of classes of P. Let Q be another partition of E . Then Q is a refinement of P, or P is coarser than Q, if each class of Q is contained in some class of P. If this holds, we write Q 6 P. The index of Q is then larger than the index of P. Given two partitions P and Q of a set E , we let U D P ^ Q denote the coarsest partition which refines P and Q. The classes of U are the nonempty sets P \ Q, for P 2 P and Q 2 Q. The notation is extended to a set of partitions in the usual way: we write P D P1 ^    ^ Pn for the common refinement of P1 ; : : : ; Pn . If n D 0, then P is the universal partition of E composed of the single class E . This partition is the identity element for the ^-operation. Let F be a subset of E . A partition P of E induces a partition P0 of F by intersection: P0 is composed of the nonempty sets P \ F , for P 2 P. If P and Q are partitions of E and Q 6 P, then the restrictions P0 and Q0 to F still satisfy Q0 6 P0 . If P and P0 are partitions of disjoint sets E and E 0 , we denote by P_P0 the partition of E [ E 0 whose restriction to E and E 0 are P and P0 respectively. So, one may write _ ¹P º: PD P 2P

Minimal automaton. We consider a deterministic automaton A D .Q; i; F / over the alphabet A with set of states Q, initial state i , and set of final states F . To each state q corresponds a subautomaton of A obtained when q is chosen as the initial state. We call it the subautomaton rooted at q or simply the automaton at q . Usually, we consider only the trim part of this automaton, that is, the part which is accessible from q . To each state q corresponds a language Lq .A/ which is the set of words recognised by the subautomaton rooted at q , that is Lq .A/ D ¹w 2 A j q  w 2 F º:

This language is called the future of the state q , or also the right language of this state. Similarly one defines the past of q , also called the left language, as the set ¹w 2 A j i  w D qº. The automaton A is minimal if Lp .A/ ¤ Lq .A/ for each pair of distinct states p; q . The equivalence relation  defined by p  q () Lp .A/ D Lq .A/

Jean Berstel, Luc Boasson, Olivier Carton, and Isabelle Fagnot

340

is a congruence, that is p  q implies p  a  q  a for all letters a. It is called the Nerode congruence. Note that the Nerode congruence saturates the set of final states. Thus an automaton is minimal if and only if its Nerode equivalence is the identity. Minimising an automaton is the problem of computing the Nerode equivalence. Indeed, the quotient automaton A= is obtained by taking as set of states the set of equivalence classes of the Nerode equivalence, as initial state the class of the initial state i , as set of final states the set of equivalence classes of states in F and by defining the transition function by Œp  a D Œp  a. It accepts the same language, and its Nerode equivalence is the identity. The minimal automaton recognising a given language is unique. Partitions and automata. Again, we fix a deterministic automaton A D .Q; i; F / over the alphabet A. It is convenient to use the shorthand P c for Q n P when P is a subset of the set Q. Given a set P  Q of states and a letter a, we let a 1 P denote the set of states q such that q  a 2 P . Given sets P; R  Q and a 2 A, we let .P; a/jR

denote the partition of R composed of the nonempty sets among the two sets R\a

1

P D ¹q 2 R j q  a 2 P º

Note that R n a 1 P D R \ .a P and P c . In particular

1

and R n a 1 P D ¹q 2 R j q  a … P º:

P /c D R \ a

1

.P c /, so the definition is symmetric in

.P; a/jR D .P c ; a/jR:

(1)

The pair .P; a/ is called a splitter. Observe that .P; a/jR D ¹Rº if either R  a  P or R  a \ P D ;, and .P; a/jR is composed of two classes if both R  a \ P ¤ ; and R  a \ P c ¤ ; or equivalently if R  a 6 P c and R  a 6 P . If .P; a/jR contains two classes, then we say that .P; a/ splits R. Note that the pair S D .P; a/ is called a splitter even if it does not split. It is useful to extend the notation above to words. Given a word w and sets P; R  Q of states, we let w 1 P denote the set of states such that q  w 2 P , and .P; w/jR denote the partition of R composed of the nonempty sets among R\w

1

P D ¹q 2 R j q  w 2 P º

and R n w

1

P D ¹q 2 R j q  w … P º:

As an example, the partition .F; w/jQ is the partition of Q into the set of those states from which w is accepted, and the set of the other states. A state q in one of the sets and a state q 0 in the other are sometimes called separated by w . The Nerode equivalence is the coarsest equivalence relation on the set of states that is a (right) congruence saturating F . With the notation of splitters, this can be rephrased as follows. Proposition 2.1. The partition corresponding to the Nerode equivalence is the coarsest partition P such that no splitter .P; a/, with P 2 P and a 2 A, splits a class in P, that is such that .P; a/jR D ¹Rº for all P; R 2 P and a 2 A.

10. Minimisation of automata

341

Later we will use the following lemma which was already given in Hopcroft’s paper [37]. It is the basic observation that ensures that Hopcroft’s algorithm works correctly. Lemma 2.2. Let P be a set of states, and let P D ¹P1 ; P2 º be a partition of P . For any letter a and for any set of states R, one has .P; a/jR ^ .P1 ; a/jR D .P; a/jR ^ .P2 ; a/jR D .P1 ; a/jR ^ .P2 ; a/jR;

and consequently

.P; a/jR > .P1 ; a/jR ^ .P2 ; a/jR; .P1 ; a/jR > .P; a/jR ^ .P2 ; a/jR:

(2) (3)

3. Brzozowski’s algorithm The minimisation algorithm given by Brzozowski [18] is quite different from the two families of iterative algorithms (by refinement and by fusion) that we consider in this chapter. Although its worst-case behaviour is exponential, it is conceptually simple, easy to implement, and it is quite efficient in many cases. Moreover, it does not require the automaton to be deterministic, contrary to the algorithms described later. Given an automaton A D .Q; I; F; E/ over an alphabet A, its reversal is the automaton denoted AR obtained by exchanging the initial and the final states, and by inverting the orientation of the edges. Formally AR D .Q; F; I; E R /, where E R D ¹.p; a; q/ j .q; a; p/ 2 Eº. The basis for Brzozowski’s algorithm is the following result. Proposition 3.1. Let A be a finite deterministic automaton, and let A be the deterministic trim automaton obtained by determinising and trimming (that is, removing the non-accessible part) the reversal AR . Then A is minimal. For a proof of this proposition, see, for instance, Sakarovitch’s book [52]. The minimisation algorithm now just consists in two applications of the operation. Observe that the automaton one starts with need not to be deterministic. Corollary 3.2. Let A be a finite automaton. Then .A / is the minimal automaton of A. Example 3.1. We consider the automata over the alphabet A D ¹a; bº given in Figure 1. Each automaton is the reversal of the other. However, determinisation of the automaton on the top requires exponential time and space. Despite its worst-case exponential time complexity, Brzozowski’s minimisation algorithm has recently be reconsidered. De Felice and Nicaud [32] study the number of states of the minimal automaton of the reversal of a rational language recognised by a random deterministic automaton with n states. They prove that, for any d > 0, the probability that this number of states is greater than nd tends to 1 as n tends to infinity.

342

Jean Berstel, Luc Boasson, Olivier Carton, and Isabelle Fagnot a; b

0

a

1

a; b

2

a; b

a; b

a; b

a; b

n

a; b

0

a

1

a; b

2

n

Figure 1. The automaton on the top recognises the language A aAn . It has n C 1 states and the minimal deterministic automaton for this language has 2n states. The automaton on the bottom is its reversal. It is minimal and recognises An aA .

As a consequence, both the generic and the average complexities of Brzozowski’s minimisation algorithm are super-polynomial for the uniform distribution on deterministic automata. Let us briefly recall the notion of generic complexity, introduced in [43] and which has been applied successfully since 2003. Informally, the generic complexity measures the worst-case complexity of an algorithm by neglecting a “small” set of inputs, where a “small set” is defined in terms of asymptotic density. To be more precise, let I be an infinite set of inputs for some algorithm, equipped with a size function. One considers a probability distribution n on each of the balls Bn of data of size at most n, and one defines the asymptotic density of a set X  I of input to be the limit .X / D lim n .X \ Bn /; n!1

provided it exists. Frequently, the probability distribution is the uniform one, so n .X \ Bn / D

Card.X \ Bn / : Card Bn

A set X is generic if .X / D 1. The generic complexity of an algorithm is the worst-case complexity of a generic set of inputs of the algorithm. In [33], De Felice and Nicaud analyse the average complexity of Brzozowski’s minimisation algorithm for distributions of deterministic automata with a small number of final states. They show that, as in the case of the uniform distribution, the average complexity is super-polynomial even if for random deterministic automata with only one final state. This improves a previous study of the authors where the number of final states was linear in the number of states. The result holds for alphabets with at least 3 letters.

10. Minimisation of automata

343

4. Moore’s algorithm The minimisation algorithm given by Moore [48] computes the Nerode equivalence by a stepwise refinement of some initial equivalence. All automata are assumed to be deterministic. 4.1. Description. Let A D .Q; i; F / be an automaton over an alphabet A. Define, for q 2 Q and h > 0, the set  L.h/ q .A/ D ¹w 2 A j jwj 6 h; q  w 2 F º:

The Moore equivalence of order h is the equivalence h on Q defined by p h q () Lp.h/ .A/ D L.h/ q .A/:

Using the notation of partitions introduced above, one can rephrase the definitions of the Nerode equivalence and of the Moore equivalence of order h. These are the equivalences defined by ^ ^ .F; w/jQ: .F; w/jQ and w2A

w2A jwj6h

Since the set of states is finite, there is a smallest h such that the Moore equivalence h equals the Nerode equivalence . We call this integer the depth of Moore’s algorithm on the finite automaton A, or the depth of A for short. The depth depends in fact only on the language recognised by the automaton, and not on the particular automaton under consideration. Indeed, each state of an automaton recognising a language L represents in fact a left quotient u 1 L for some word u. The depth is the smallest h such that h equals hC1 . This leads to the refinement algorithm that computes successively 0 , 1 , . . . , h , . . . , halting as soon as two consecutive equivalences are equal. The next property gives a method to compute the Moore equivalences efficiently. Proposition 4.1. For two states p; q 2 Q, and h > 0, one has

p hC1 q () .p h q and p  a h q  a for all a 2 A/:

(4)

We use this proposition in a slightly different formulation. Denote by Mh the partition corresponding to the Moore equivalence h . Then the following equations hold. Proposition 4.2. For h > 0, one has  _^ ^ ^ ^ .P; a/jR : .P; a/jQ D MhC1 D Mh ^ a2AP 2Mh

R2Mh a2A P 2Mh

The computation is described in Algorithm 1. It is realised by a loop that refines the current partition. The computation of the refinement of k partitions of a set with n elements can be done in time O.k n2 / by brute force. A radix sort improves the running time to O.k n/. With k D Card.A/, each tour in the loop is realised in time O.k n/,

344

Jean Berstel, Luc Boasson, Olivier Carton, and Isabelle Fagnot

so the total time is O.`k n/, where ` is the number of refinement steps in the computation of the Nerode equivalence , that is the depth of the automaton. The worst case behaviour is obtained for ` D n 2. We say that automata having maximal depth are slow automata and more precisely are slow automata for Moore. These automata are investigated later. We will show that they are equivalent to automata we call slow for Hopcroft. Algorithm 1 M o o r e.A/ P ¹F; F c º repeat P0 P for all a 2V A do Pa V P 2P .P; a/jQ P P ^ a2A Pa until P D P0

Moore’s minimisation algorithm F the initial partition F P0 is the old partition, P is the new one

4.2. Average complexity. The average case behaviour of Moore’s algorithm has recently been studied in several papers. We report here some results given in [6] and [30]. The authors make a detailed analysis of the distribution of the number ` of refinement steps in Moore’s algorithm, that is of the depth of automata, and they prove that there are only a few automata for which this depth is larger than log n. More precisely, fix some alphabet and consider deterministic automata over this alphabet. A semi-automaton K is an automaton whose set of final states is not specified. Thus, an automaton is a pair .K; F /, where K is a semi-automaton and F is the set of final states. The following theorem gives an upper bound on the average complexity of Moore’s algorithm for all automata derived from a given semi-automaton. Theorem 4.3 (Bassino, David, and Nicaud [6]). Let K be a semi-automaton with n states. The average complexity of Moore’s algorithm for the automata .K; F /, for the uniform probability distribution over the sets F of final states, is O.n log n/. The result also holds for Bernoulli distributions for final states. The result remains valid for subfamilies of automata such as strongly connected automata or group automata. When all semi-automata are considered to be equally likely, then the following bound is valid. Theorem 4.4 (David [30]). The average complexity of Moore’s algorithm, for the uniform probability over all complete automata with n states, is O.n log log n/. This result is remarkable in view of the lower bound which is given in the following statement [7]. Theorem 4.5. If the underlying alphabet has at least two letters, then Moore’s algorithm, applied on a minimal automaton with n states, requires at least .n log log n/ operations.

10. Minimisation of automata

345

5. Hopcroft’s algorithm Hopcroft [37] has given an algorithm that computes the minimal automaton of a given deterministic automaton. The running time of the algorithm is O.k n log n/ where k is the cardinality of the alphabet and n is the number of states of the given automaton. The algorithm has been described and re-described several times, see [1], [10], [16], [35], and [44]. 5.1. Outline. The algorithm is outlined in the function Hopcroft given in Algorithm 2. We let min.P; P 0 / denote the set of smaller size of the two sets P and P 0 , and any one of them if they have the same size. Algorithm 2 H o p c ro ft.A/ P ¹F; F c º W ; for all a 2 A do A dd..min.F; F c /; a/; W/ while W ¤ ; do .W; a/ Ta k eS o me.W/ for all P 2 P which is split by .W; a/ do P 0 ; P 00 .W; a/jP R ep lac e P by P 0 and P 00 in P for all b 2 A do if .P; b/ 2 W then R ep lac e .P; b/ by .P 0 ; b/ and .P 00 ; b/ in W else A dd..min.P 0 ; P 00 /; b/; W/

Hopcroft’s minimisation algorithm F the initial position F the waiting set F initialisation of the waiting set F take and remove some splitter F compute the split F refine the partition F update the waiting set

Given a deterministic automaton A, Hopcroft’s algorithm computes the coarsest congruence that saturates the set F of final states. It starts from the partition ¹F; F c º, which obviously saturates F and refines it until it gets a congruence. These refinements of the partition are always obtained by splitting some class into two classes. The algorithm proceeds as follows. It maintains a current partition P D ¹P1 ; : : : ; Pn º and a current set W of splitters, that is, pairs .W; a/ that remain to be processed, where W is a class of P and a is a letter. The set W is called the waiting set. The algorithm stops when the waiting set W becomes empty. When it stops, the partition P is the coarsest congruence that saturates F . The starting partition is the partition ¹F; F c º and the starting set W contains all pairs .min.F; F c /; a/ for a 2 A. The main loop of the algorithm removes one splitter .W; a/ from the waiting set W and performs the following actions. Each class P of the current partition (including the class W ) is checked as to whether it is split by the pair .W; a/. If .W; a/ does not split P , then nothing is done. On the other hand, if .W; a/ splits P into say P 0 and P 00 , the class P is replaced in the partition P by P 0 and P 00 . Next, for each letter b , if the pair .P; b/ is in W, it is replaced in W by the two pairs .P 0 ; b/ and .P 00 ; b/, otherwise only the pair .min.P 0 ; P 00 /; b/ is added to W.

346

Jean Berstel, Luc Boasson, Olivier Carton, and Isabelle Fagnot

It should be noted that the algorithm is not really deterministic because it has not been specified which pair .W; a/ is taken from W to be processed at each iteration of the main loop. This means that for a given automaton, there are many executions of the algorithm. It turns out that all of them produce the right partition of the states. However, different executions may give rise to different sequences of splitting and also to different running times. Hopcroft has proved that the running time of any execution is bounded by O.jAjn log n/. 5.2. Behaviour. The pair .P; W/ composed of the current partition and the current waiting set in some execution of Hopcroft’s algorithm is called a configuration. The following proposition describes the evolution of the current partition in Hopcroft’s algorithm. Formula 5 is the key inequality for the proofs of correctness and termination. We will use it in the special case where the set R is a class of the current partition. Proposition 5.1. Let .P; W/ be a configuration in some execution of Hopcroft’s algorithm on an automaton A on A. For any P 2 P, any subset R of a class of P, and a 2 A, one has ^ .P; a/jR > .W; a/jR; (5) .W;a/2W

that is, the partition .P; a/jR is coarser than the partition

V

.W;a/2W .W; a/jR .

Proof. The proof is by induction on the steps of an execution. The initial configuration .P; W/ is composed of the initial partition P D ¹F; F c º and of the initial waiting set W which is either W D ¹.F; a/ j a 2 Aº or W D ¹.F c ; a/ j a 2 Aº. Since the partitions .F; a/jR and .F c ; a/jR are equal for any set R the proposition is true. y W/ € be a configuNow assume that .P; W/ is not the initial configuration. Let .P; y W/ € in one ration that precedes immediately .P; W/. Thus .P; W/ is obtained from .P; € step of Hopcroft’s algorithm, by choosing one splitter S in W, and by performing the y and W €. required operations on P First we observe that, by the algorithm, and by Lemma 2.2, one has for any set of states R, ^ ^ € ; a/jR > .W .W; a/jR: (6) € ;a/2Wn¹Sº € .W

.W;a/2W

€ ;a/2W € .W

€ ;a/2Wn¹Sº € .W

€ Indeed, the set W contains all Wn¹S º with the exception of those pairs .P; a/ for which P is split into two parts, and in this case, the relation follows from (2). Next, consider a subset R of a set of P. Since R was not split by S , that is, since S jR D ¹Rº, we have ^ ^ € ; a/jR D € ; a/jR: .W .W (7) y, Moreover, by the induction hypothesis, and since R is also a subset of a set of P y we have for any Py 2 P, ^ € ; a/jR: .Py ; a/jR > .W (8) € ;a/2W € .W

10. Minimisation of automata

347

y , in view of (6)–(8), we have Consequently, for any subset R of a set of P, any Py 2 P ^ .Py ; a/jR > .W; a/jR: (9) .W;a/2W

Let now R 2 P; a 2 A, and let again R be a subset of a set of P: We consider the partition .P; a/jR. We distinguish two cases. y . The proof follows directly from (9). Ca s e 1. Assume P 2 P y . Then there exists Py 2 P y such that S jPy D ¹P; P 0 º. Ca s e 2. Assume P … P € n ¹S º, then both .P; a/ and .P 0 ; a/ are in W. Consequently If .Py ; a/ 2 W ^ .P; a/jR > .W; a/jR: .W;a/2W

€ If, on the contrary, .Py ; a/ … Wn¹S º, then by the algorithm, one of .P; a/ or .P 0 ; a/ is in W.

Ca s e 2a. Assume that .P; a/ 2 W. Then obviously ^ .P; a/jR > .W; a/jR: .W;a/2W

Ca s e 2b. Assume .P 0 ; a/ 2 W. By Lemma 2.2, we have .P; a/jR > .P 0 ; a/jR ^ .Py ; a/jR

as obviously we have .P 0 ; a/jR >

^

.W; a/jR;

.W;a/2W

and by use of (9), we obtain .P; a/jR >

^

.W; a/jR:

.W;a/2W

This completes the proof. Corollary 5.2. The current partition at the end of an execution of Hopcroft’s algorithm on an automaton A is the Nerode partition of A. Proof. Let P be the partition obtained at the end of an execution of Hopcroft’s algorithm on an automaton A. By Proposition 2.1, it suffices to check that no splitter splits a class of P. Since the waiting set W is empty, the right-hand side of (5) evaluates to ¹Rº for each triple .P; a; R/. This means that .P; a/ indeed does not split R.

348

Jean Berstel, Luc Boasson, Olivier Carton, and Isabelle Fagnot

5.3. Complexity Proposition 5.3. Hopcroft’s algorithm can be implemented to have worst-case time complexity O.k n log n/ for an automaton with n states over a k -letter alphabet. To achieve the bound claimed, a partition P of a set Q should be implemented in a way to allow the following operations:  accessing the class to which a state belongs in constant time;  enumeration of the elements of a class in time proportional to its size;  adding and removing of an element in a class in constant time.

The computation of all splittings P 0 ; P 00 of classes P by a given splitter .W; a/ is done in time O.Card.a 1 W // as follows. 1. One enumerates the states q in a 1 W . For each state q , the class P of q is marked as a candidate for splitting, the state q is added to a list of states to be removed from P , and a counter for the number of states in the list is incremented. 2. Each class that is marked is a candidate for splitting. It is split if the number of states to be removed differs from the size of the class. If this holds, the states in the list of P are removed to build a new class. The other states remain in P . The waiting set W is implemented such that membership can be tested in constant time, and splitters can be added and removed in constant time. This allows the replacement of a splitter .P; b/ by the two splitters .P 0 ; b/ and .P 00 ; b/ in constant time, since in fact P 0 is just the modified class P , and it suffices to add the splitter .P 00 ; b/. Several implementations of partitions that satisfy the time requirements exist. Hopcroft [37] describes such a data structure, reported in [14]. Knuutila in [44] gives a different implementation. Proof of Proposition 5.3. For a given state q , a splitter .W; a/ such that q 2 W is called a q -splitter. Consider some q -splitter. When it is removed from the waiting set W, it may be smaller than when it was added, because it may have been split during its stay in the waiting set. On the contrary, when a q -splitter is added to the waiting set, then its size is at most one half of the size it had when it was previously removed. Thus, for a fixed state q , the number of q -splitters .W; a/ which are removed from W is at most k log n, since at each removal, the number of states in W is at most one half of the previous addition. The total number of elements of the sets a 1 W for .W; a/ in W is O.k n log n/. Indeed, for a fixed state q 2 Q, a state p such that p  a D q is exactly in those sets a 1 W for which .W; a/ is a q -splitter in W. There are at most O.log n/ of such sets for each letter a, so at most .k log n/ sets for each fixed q . Since there are n states, the claim follows. This completes the proof since the running time is bounded by the size of the sets a 1 W for .W; a/ in W.

10. Minimisation of automata

349

5.4. Miscellaneous remarks. There are some degrees of freedom in Hopcroft’s algorithm. In particular, the way the waiting set is represented may influence the efficiency of the algorithm. This issue has been considered in [44]. In [5] some practical experiments are reported. In [49] it is shown that the worst-case reported in [13] in the case of de Bruijn words remains of this complexity when the waiting set is implemented as a queue (LIFO), whereas this complexity is never reached with an implementation as a stack (FIFO). See [49] for other discussions, in particular in relation with cover automata. Hopcroft’s algorithm, as reported here, requires the automaton to be complete. This may be a serious drawback in the case where the automaton has only a few transitions. For instance, if a dictionary is represented by a trie, then the average number of edges per state is about 1:5 for the French dictionary (personal communication of Dominique Revuz); see Table 2 below. Recently, two generalisations of Hopcroft’s algorithm to incomplete automata were presented in [9] and [55], with running time O.m log n/, where n is the number of states and m is the number of transitions. Since m 6 k n where k is the size of the alphabet, the algorithm achieves the same time complexity. Valmari [54] extends and simplifies his previous algorithm that uses Hopcroft’s algorithm for minimising incomplete deterministic automata. The algorithm is claimed to be easy to implement and with an easy formal correctness proof. The time complexity is O.n C m log m/, where n is the number of states and m is the number of transitions. A series of papers by Loris D’Antoni, Margus Veanes, and their coauthors (see, e.g., [28]) deal with symbolic automata and especially their minimisation. They carefully describe the adaptation of Hopcroft’s algorithm required to handle this extension efficiently. 5.5. Moore versus Hopcroft. We present an example that illustrates the fact that Hopcroft’s algorithm is not just a refinement of Moore’s algorithm. This is proved by checking that one of the partitions computed by Moore’s algorithm in the example does not appear as a partition in any of the executions of Hopcroft’s algorithm on this automaton. The automaton we consider is over the alphabet A D ¹a; bº. Its set of states is Q D ¹0; 1; 2; : : : ; 9º and the set of final states is F D ¹7; 8; 9º. The next-state function and the graph are given in Figure 2. The Moore partitions are easily computed. The partition M1 of order 1 is composed of the five classes: ¹0; 3; 4º;

¹1; 2º;

¹5; 6º;

¹7; 9º;

¹8º:

The Moore partition of order 2 is the identity. The initial partition for Hopcroft’s algorithm is ¹F; F c º, where F D 789 (we will represent a set of states by the sequence of its elements). The initial waiting set is composed of .789; a/ and .789; b/. There are two cases, according to the choice of the first splitter.

350

Jean Berstel, Luc Boasson, Olivier Carton, and Isabelle Fagnot a

3

a

b

2

a

b a

0 b

a; b

5

4

b

a; b

a; b b

9 a

7

8

1

a; b

6

a; b Figure 2. Next-state function of the example automaton

Ca s e 1. The first splitter is .789; a/. Since a 1 789 D 568, each of the classes F and F c is split. The new partition is 01234j56j79j8. The new waiting set is .79; b/;

.8; b/;

.8; a/;

.56; a/;

.56; b/:

The first three columns in Table 1 contain the sets c 1 P , for .P; c/ in this waiting set. By inspection, one sees that each entry in these columns cuts off at least one singleton class which is not in the Moore equivalence M1 . This implies that M1 cannot be obtained by Hopcroft’s algorithm in this case.

Ca s e 2. The first splitter is .789; b/. Since b 12j03456j79j8. The new waiting set is .79; a/;

.8; a/;

.8; b/;

1

789 D 034568, the new partition is

.12; a/;

.12; b/:

Again, each entry in the last three columns of Table 1 cuts off at least one singleton class which is not in the Moore equivalence M1 . This implies that, also in this case, M1 cannot be obtained by Hopcroft’s algorithm. Table 1. The sets c P a b

1P 1P

1P ,

56 49 2

with .P; c/ in a waiting set

8 6 46

79 58 0357

12 017 17

Despite the difference illustrated by this example, there are similarities between Moore’s and Hopcroft’s algorithms that have been exploited by Julien David in his thesis [29] to give an upper bound on the average running time of Hopcroft’s algorithm for a particular strategy.

10. Minimisation of automata

351

In this strategy, there are two waiting sets, the current set W and a future waiting set F. Initially, F is empty. Hopcroft’s algorithm works as usual, except for line 14: Here, the splitter .min.P 0 ; P 00 /; b/ is added to F and not to W. When W is empty, then the contents of F and W are swapped. The algorithm stops when both sets W and F are empty. Proposition 5.4 (David [29]). There is a strategy for Hopcroft’s algorithm such that its average complexity, for the uniform probability over all complete automata with n states, is O.n log log n/. Julien David shows in his thesis that at the end of each cycle, that is when W becomes empty, the current partition P of the set of states is in fact a refinement of the corresponding level in Moore’s algorithm. This shows that the number of cycles in Hopcroft’s algorithm, for this strategy, is bounded by the depth of the automaton. Thus Theorem 4.4 applies. A paper [30] that appeared in 2012 contains interesting improvements and extensions of the results reported in his thesis. In this paper, Julien David proves that for the uniform distribution on complete deterministic automata, the average and generic time complexity of Moore’s and Hopcroft’s state minimisation algorithm are both O.n log log n/, where n is the number of states in the input automata and the number of letters in the alphabet is fixed. The main result is the average complexity of Moore’s algorithm. His argument is to prove that the set of automata minimised in more than O.log log n/ partition refinements by Moore’s algorithm is negligible. David characterises a specific family of implementations of Hopcroft’s algorithm, for which the algorithm is shown to be always faster than Moore’s algorithm. Thus the result for Hopcroft’s algorithm follows.

6. Slow automata We are concerned in this section with automata that behave badly for Hopcroft’s and Moore’s minimisation algorithms. In other terms, we look for automata for which Moore’s algorithm performs the maximal number of steps, and similarly for Hopcroft’s algorithm. 6.1. Definition and equivalence. Recall that an automaton with n states is called slow for Moore if the number ` of steps in Moore’s algorithm is n 2. A slow automaton is minimal. It is equivalent to say that each Moore equivalence h has exactly h C 2 equivalence classes for h 6 n 2. This is due to the fact that, at each step, just one class of h is split, and that this class is split into exactly two classes of the equivalence hC1 . Proposition 4.2 takes the following special form for slow automata. Proposition 6.1. Let A be an automaton with n states which is slow for Moore. For all n 2 > h > 0, there is exactly one class R in Mh which is split, and moreover, if .P; a/ and .P 0 ; a0 / split R, with P; P 0 2 Mh , then .P; a/jR D .P 0 ; a0 /jR.

352

Jean Berstel, Luc Boasson, Olivier Carton, and Isabelle Fagnot

An automaton is slow for Hopcroft if, for all executions of Hopcroft’s algorithm, the splitters in the current waiting set either do not split or split in the same way: there is a unique class that is split into two classes, and always into the same two classes. More formally, at each step .W; P/ of an execution, there is at most one class R in the current partition P that is split, and for all splitters .P; a/ and .P 0 ; a0 / in W that split R, one has .P; a/jR D .P 0 ; a0 /jR. The definition is close to the statement in Proposition 6.1 above, and indeed, one has the following property. Theorem 6.2. An automaton is slow for Moore if and only if it is slow for Hopcroft. Proof. Let A be a finite automaton. We first suppose that A is slow for Moore. We consider an execution of Hopcroft’s algorithm, and we prove that each step of the execution that changes the partition produces a Moore partition. This holds for the initial configuration .W; P/, since P D M0 . Assume that one has P D Mh for some configuration .W; P/ and some h > 0. Let R be the class of Mh split by Moore’s algorithm. Let S 2 W be the splitter chosen in Hopcroft’s algorithm. Then either S splits no class, and the partition remains equal to Mh or by Proposition 6.1 it splits the class R. In the second case, this class is split by S into two new classes, say R0 and R00 . The partition P0 D P n ¹Rº [ ¹R0 ; R00 º is equal to MhC1 . Conversely, suppose that A is slow for Hopcroft. We show that it is also slow for Moore by showing that the partition MhC1 has only one more class than Mh . For this, we use Proposition 4.2 which states that each class R in Mh is refined in MhC1 into the partition R given by ^ ^ .P; a/jR: RD a2A P 2Mh

We show by induction on the number of steps that, in any execution of Hopcroft’s algorithm, P D Mh for some configuration .P; W/. This holds for the initial configuration. Let .W; a/ be some splitter in W. It follows from Proposition 5.1 that ^ ^ R> .W; a/jR: a2A .W;a/2W

Thus the partition R is coarser than the partition of R obtained by Hopcroft’s algorithm. Since the automaton is slow for Hopcroft, the partition on the right-hand side has at most two elements. More precisely, there is exactly one class R that is split by Hopcroft’s algorithm into two classes. Since Moore’s partition MhC1 is coarser, it contains precisely these classes. This proves the claim.

6.2. Examples Example 6.1. The simplest example is perhaps the automaton given in Figure 3. For 0 6 h 6 n 1, the partition Mh is composed of the class ¹0; : : : ; n h 1º, and of the singleton classes ¹n hº; ¹n h C 1º; : : : ; ¹nº. At each step, the last state is split off from the class ¹0; : : : ; n h 1º.

10. Minimisation of automata

353 a

0

a

1

a

2

a

a

n

Figure 3. An automaton over one letter recognising the set of words of length at least n

Example 6.2. The automaton of Figure 4 recognises the set D .n/ of Dyck words w over ¹a; bº such that 0 6 juja jujb 6 n for all prefixes u of w . The partition Mh , for 0 6 h 6 n 1, is composed of ¹0º; : : : ; ¹hº, and ¹1; h C 1; : : : ; nº. At h D n, the state ¹1º is separated from state n. a 1

b

0

a 1

b

a

a n

2 b

b

b

Figure 4. An automaton recognising the Dyck words of “height” at most n

Example 6.3. Let w D b1    bn be a word of length n over the binary alphabet ¹0; 1º. We define an automaton Aw over the unary alphabet ¹aº as follows. The state set of Aw is ¹1; : : : ; nº and the next state function is defined by i a D i C1 for i < n and na D 1. Note that the underlying labelled graph of Aw is just a cycle of length n. The final states really depend on w . The set of final states of Aw is F D ¹1 6 i 6 n j bi D 1º. We call such an automaton a cyclic automaton. For a binary word u, we define Qu to be the set of states of Aw which are the starting positions of circular occurrences of u in w . If u is the empty word, then Qu is by convention the set Q of all states of Aw . By definition, the set F of final states of Aw is Q1 while its complement F c is Q0 . Consider the automaton Aw for w D 01001010 given in Figure 5. The sets Q1 , Q01 , and Q11 of states are respectively ¹2; 5; 7º, ¹1; 4; 6º and ;. If w is a Sturmian word, then the automaton Aw is slow. Slow automata are closely related to binary Sturmian trees. Indeed, consider a finite automaton, for instance over a binary alphabet. To this automaton corresponds an infinite binary tree, composed of all paths in the automaton. The nodes of the tree are labelled with the states encountered on the path. For each integer h > 0, the number of distinct subtrees of height h is equal to the number of classes in the Moore partition Mh . It follows that the automaton is slow if and only if there are h C 1 distinct subtrees of height h for all h: this is precisely the definition of Sturmian trees, as given in [12]. We now consider the problem of showing that the running time O.n log n/ for Hopcroft’s algorithm on n-state automata is tight. The algorithm has a degree of freedom because, in each step of its main loop, it allows one to choose the splitter

354

Jean Berstel, Luc Boasson, Olivier Carton, and Isabelle Fagnot a

3

a

2

4

a

a

1

5 a

a 8

6 a

7

a

Figure 5. Cyclic automaton Aw for w D 01001010. Final states are circled.

to be processed. Berstel and Carton [13] introduced a family of finite automata based on de Bruijn words. These are exactly the cyclic automata Aw of Example 6.3 where w is a binary de Bruijn word. They showed that there exists some “unlucky” sequence of choices that slows down the computation to achieve the lower bound .n log n/. In the papers [20] and [21], Castiglione, Restivo and Sciortino replace de Bruijn words by Fibonacci words. They observe that for these words, and more generally for all circular standard Sturmian words, there is no choice in Hopcroft’s algorithm. Indeed, the waiting set always contains only one element. The uniqueness of the execution of Hopcroft’s algorithm implies by definition that the associated cyclic automata for Sturmian words are slow. They show that, for Fibonacci words, the unique execution of Hopcroft’s algorithm runs in time .n log n/, so that the worst-case behaviour is achieved for the cyclic automata of Fibonacci words. The computation is carried out explicitly, using connections between Fibonacci numbers and Lucas numbers. In [22], they give a detailed analysis of the reduction process that is the basis of their computation, and they show that this process is isomorphic, for all standard Sturmian words, to the refinement process in Hopcroft’s algorithm. In [11], the analysis of the running time of Hopcroft’s algorithm is extended to cyclic automata of standard Sturmian words. It is shown that the directive sequences for which Hopcroft’s algorithm has worst-case running time are those sequences .d1 ; d2 ; d3 ; : : :/ for which the sequence of geometric means ..pn /1=n /n>1 , where pn D d1 d2    dn , is bounded.

7. Minimisation by fusion In this section, we consider the minimisation of automata by fusion of states. An important application of this method is the computation of the minimal automaton recognising a given finite set of words. This is widely used in computational linguistics for the space-efficient representation of dictionaries.

10. Minimisation of automata

355

Let A be a deterministic automaton over the alphabet A, with set of states Q. The signature of a state p is the set of pairs .a; q/ 2 A  Q such that p  a D q , together with a Boolean value denoting whether p is final or not. Two states p and q are called mergeable if and only if they have the same signature. The fusion or merge of two mergeable states p and q consists in replacing p and q by a single state. The state obtained by fusion of two mergeable states has the same signature. Minimisation of an automaton by a sequence of fusion of states with the same signature is not always possible. Consider the two-state automaton over the single letter a given in Figure 6 which recognises a . It is not minimal. The signature of state 1 is C; .a; 2/ and the signature of state 2 is C; .a; 1/ (here “C” denotes an accepting state), so the states have different signatures and are not mergeable. a 1

2

a Figure 6. An automaton recognising the set a which cannot be minimised by fusion of its states

7.1. Local automata. M.-P. Béal and M. Crochemore [8] designed an algorithm for minimising a special class of deterministic automata by a sequence of state mergings. These automata are called irreducible local automata. They occur quite naturally in symbolic dynamics. The running time of the algorithm is O.min.m.n r C1/; m log n//, where m is the number of edges, n is the number of states, and r is the number of states of the minimised automaton. In particular, the algorithm is linear when the automaton is already minimal. Hopcroft’s algorithm has running time O.k n log n/, where k is the size of the alphabet, and since k n > m, it is worse than Béal and Crochemore’s algorithm. Moreover, their algorithm does not require the automaton to be complete. The automata considered here have several particular features. First, all states are both initial and final. Next, they are irreducible, that is, their underlying graph is strongly connected. Finally, the automata are local. By definition, this means that two distinct cycles carry different labels. This implies that the labels of a cycle are primitive words, since otherwise there exist different traversals of the cycle that have the same label. In [8], the constructions and proofs are done for a more general family of automata called AFT (for automata of almost finite type). We sketch here the easier case of local automata. Figure 7 gives an example. Since all states are final, two states p and q of an automaton are mergeable if and only if, for all letters a 2 A, p  a is defined if and only if q  a and, if this is the case, then p  a D q  a. The basic proposition is the following. It shows that an irreducible local automaton can be minimised by a sequence of fusion of states. Proposition 7.1. If an irreducible local automaton is not minimal, then at least two of its states are mergeable.

356

Jean Berstel, Luc Boasson, Olivier Carton, and Isabelle Fagnot a

b

a

b 1

a b

2

1

2

a

b Figure 7. The automaton on the left is local, the automaton on the right is not because of the two loops labelled a, and because the label bb of the cycle through 1 and 2 is not a primitive word.

The minimisation algorithm assumes that the alphabet is totally ordered. It uses the notion of partial signature. First, with each state q is associated the signature .q/ D a1 p1 a2 p2    am pm , where ¹.a1 ; p1 /; : : : ; .am ; pm /º is the signature of q , and the sequence is ordered by increasing value of the letters. Since all states are final, the Boolean indicator reporting this is omitted. A partial signature is any prefix a1 p1 a2 p2    ai pi of a signature. A first step consists in building a signature tree which represents the sets of states sharing a common partial signature. The root of the tree represents the set of all states, associated to the empty signature. A node representing the sets of states with a partial signature a1 p1 a2 p2    ai pi is the parent of the nodes representing the sets of states with a partial signature a1 p1 a2 p2    ai pi ai C1 pi C1 . As a consequence, leaves represent full signatures. All states that correspond to a leaf are mergeable. When mergeable states are detected in the signature tree, they can be merged. Then the signature tree has to be updated, and this is the difficult part of the algorithm. 7.2. Bottom-up minimisation. In this section, all automata are finite, acyclic, deterministic and trim. A state p is called confluent if there are at least two edges in A ending in p . A trie is an automaton whose underlying graph is a tree. Thus an automaton is a trie if and only if it has no confluent state. Bottom-up minimisation is the process of minimising an acyclic automaton by a bottom-up traversal. In such a traversal, children of a node are treated before the node itself. During the traversal, equivalent states are detected and merged. The basic property of bottom-up minimisation is that the check for (Nerode) equivalence reduces to equality of signatures. The critical point is to organise the states that are candidates in order to do this check efficiently. The bottom-up traversal itself may be organised in several ways, for instance as a depth-first search with the order of traversal of the children determined by the order on the labels of the edges. Another traversal is by increasing height, as done in Revuz’s algorithm given next. One popular method for the construction of a minimal automaton for a given finite set of words consists in first building a trie for this set and then minimising it. Daciuk et al. [27] propose an incremental version which avoids this intermediate construction.

10. Minimisation of automata

357

Recall that the signature of a state p is the set of pairs .a; q/ such that p  a D q together with a Boolean value indicating whether the state is final or not. It is tacitly understood that the alphabet of the automaton is ordered. The signature of a state is usually considered as the ordered sequence of pairs, where the order is determined by the letters. It is important to observe that the signature of a state evolves when states are merged. As an example, the state 6 of the automaton on the top of Figure 8 has signature C; .a; 3/; .b; 10/, and the same state has signature C; .a; 3/; .b; 7/ in the automaton on the bottom of the figure. As already mentioned, if the minimisation of the children of two states p and q has been done, then p and q are (Nerode) equivalent if and only if they have the same signature. So the problem to be considered is the bookkeeping of signatures, that is, the problem of detecting whether the signature of the currently considered state has already occurred before. In practical implementations, this is done by hash coding the signatures. This allows one to perform the test in constant average time. One remarkable exception is Revuz’s algorithm to be presented now, and its extension by Almeida and Zeitoun that we describe later. 7.3. Revuz’s algorithm. Revuz [51] was the first to give an explicit description of a linear-time implementation of the bottom-up minimisation algorithm. The principle of the algorithm was also described by [45]. Define the height of a state p in an acyclic automaton to be the length of the longest path starting at p . It is also the length of the longest word in the language of the subautomaton at p . Two equivalent states have the same height. Revuz’s algorithm operates by increasing height. It is outlined in Algorithm 3. Heights may be computed in linear time by a bottom-up traversal. The lists of states of a given height are collected during this traversal. The signature of a state is easy to compute provided the edges starting in a state have been sorted (by a bucket sort, for instance to remain within the linear-time constraint). Sorting states by their signature again is done by a lexicographic sort. As for Moore’s algorithm, the last step can by done by a simple scan of the list of states, since states with equal signature are consecutive in the sorted list. The whole algorithm can be implemented to run in time O.m/ for an automaton with m edges. Algorithm 3 R ev u z.A/ for h D 0 to Height.A/ do S G et Stat esF o r H ei g h t.h/ S o rt S i g nat u r es.S/ for s 2 S do if s and s:next have the same signature then M erg e.s; s:next/

Revuz’s minimisation algorithm

F compute states of height h F compute and sort signatures F merge mergeable states F mergeable states are consecutive

358

Jean Berstel, Luc Boasson, Olivier Carton, and Isabelle Fagnot

Revuz’s algorithm relies on a particular bottom-up traversal of the trie. This traversal is defined by increasing height of states, and it makes the check for equivalent states easier. With another method for checking signatures in mind, like hash coding, the algorithm may be organised in a different way. For instance, the traversal by heights can be replaced by a traversal by lexicographic order. The last item in the algorithm may be replaced by another check. Whenever a state has been found which must be in the minimal automaton, its hash code is registered. When the signature of a state is computed, one checks whether its hash code is registered. If not, it is added to the register, and otherwise it is replaced by the hash code. Several implementations have been given in various packages. A comparison has been given in [25]. See also Table 2 on page 362 for numerical data. 7.4. The algorithm of Almeida and Zeitoun. Almeida and Zeitoun [3] consider an extension of the bottom-up minimisation algorithm to automata that contain only simple cycles. They describe a linear-time algorithm for these automata. Let A be a finite trim automaton. We call it simple if every nontrivial strongly connected component is a simple cycle, that is, if every vertex of the component has exactly one successor vertex in this component. The automaton given on the top of Figure 8 is simple. Simple automata are interesting because they recognise exactly the bounded regular languages or, equivalently, the languages with polynomial growth. These are the simplest infinite regular languages. The starting point of the investigation of [3] is the observation that minimisation can be split into two parts: minimisation of an acyclic automaton and minimisation of the set of strongly connected components. There are three subproblems, namely (1) minimisation of each strongly connected component, (2) identification and fusion of isomorphic minimised strongly connected components, and (3) wrapping, which consists in merging states which are equivalent to a state in a strongly connected component, but which are not in this component. The authors show that if these subproblems can be solved in linear time, then, by a bottom-up algorithm which is a sophistication of Revuz’s algorithm, the whole automaton can be minimised in linear time. Almeida and Zeitoun show how this can be done for simple automata. The outline of the algorithm is given in Algorithm 4. The algorithm works as Revuz’s algorithm as long as no nontrivial strongly connected components occur. In our example automaton, the states 8, 11 and 12 are merged, and the states 10 and 7 also are merged. This gives the automaton on the bottom of Figure 8. Then a cycle which has all its descendants minimised is checked for possible minimisation. This is done as follows: the weak signature of a state p of a cycle is the signature obtained by replacing the name of its successor in the cycle by a dummy symbol, say . In our example, the weak signatures of the states 3; 5; 9; 6 are respectively: a8b; Cab7; a8b; Cab7: Here we write ‘C’ when the state is final, and ‘ ’ otherwise.

10. Minimisation of automata 11

359

10 b

a

6 b

a

b

b

a

3 b 1

9

a

5

a

8 b

b

b

2

7

a 4

b

a

12

a 6 b a 3

9

b 1

b

b

5

a

a

a

8

b b

2

a b

b 7 a

4

Figure 8. On the top a simple automaton: its nontrivial strongly connected components are the cycles 2; 4 and 3; 5; 9; 6. The minimisation starts by merging 11, 12 and 8, and also 10 and 7. This gives the automaton on the bottom.

Algorithm 4 A lmei da Zei t o u n.A/ S Zero H ei g h t.A/ while S ¤ ; do M i n i mi z eC yc les.S/ M erg eI so mo r p h i c C yc les.S/ Wr a p.S/ S N ex t H ei g h t.A; S/

The algorithm of Almeida and Zeitoun F states and cycles of height 0 F minimise each cycle F compute and sort signatures and merge F search states to wrap F compute states for next height

360

Jean Berstel, Luc Boasson, Olivier Carton, and Isabelle Fagnot

It is easily seen that the cycle is minimal if and only if the word composed of the sequence of signatures is primitive. In our example, the word is not primitive since it is a square, and the cycle can be reduced by identifying states that are at corresponding positions in the word, that is states 5 and 6 can be merged, and states 3 and 9. This gives the automaton on the top of Figure 9. Similarly, in order to check whether two (primitive) cycles can be merged, one checks whether the words of their weak signatures are conjugate. In our example, the cycles 2; 4 and 3; 5 have the signatures Cab7; a8b

and

a8b; Cab7:

These words are conjugate and the corresponding states can be merged. This gives the automaton on the bottom of Figure 9. This automaton is minimal. a

b

3 b 1

5 b

a

a

8

a

2

a 4

b

3

a

b a

1

b

7

b

8

b b

a 2

b

7

Figure 9. The minimisation continues by merging the states 5 and 6, and the states 3 and 9. This gives the automaton on the top. The last step of minimisation merges the states 2 and 5, and the states 3 and 4.

A basic argument for preserving the linearity of the algorithm is the fact that the minimal conjugate of a word can be computed in linear time. This can be done, for instance, by Booth’s algorithm (see [24]). Thus, testing whether a cycle is minimised

10. Minimisation of automata

361

takes time proportional to its length, and for each minimised cycle, a canonical representative, namely the unique minimal conjugate, which is a Lyndon word, can be computed in time proportional to its length. The equality of two cycles then reduces to the equality of the two associated words. Finding isomorphic cycles is accomplished by a lexicographic ordering of the associated words, followed by a simple scan for equal words. A few words on wrapping: it may happen that states of distinct heights in a simple automaton are equivalent. An example is given in Figure 10. Indeed, states 6 and 8 have the same signature and therefore are mergeable, but have height 1 and 0, respectively. This situation is typical: when states s and t are mergeable and have distinct heights, and t belongs to a minimised component of current height, then s is a singleton component on a path to the cycle of t . Wrapping consists in detecting these states, and in “winding” them around the cycle. In our example, both 6 and 5 are wrapped in the component of 7 and 8. In the algorithm given above, a wrapping step is performed at each iteration, after the minimisation of the states and the cycles and before computing the states and cycles of the next height. In our example, after the first iteration, states 3 and 4 are mergeable. A second wrapping step merges 3 and 4. These operations are reported in Figure 11. A careful implementation can realise all these operations in global linear time. a 1

b

a

2

a

3 c

c

a 4

5

c

a

6

b

7

8 b

a

a

b

2

a

3

c

1 a

c a

4

c

7

8

b a Figure 10. The automaton on the top has one minimal cycle of height 1. By wrapping, states 6 and 8, and states 5 and 7 are merged, respectively, giving the automaton on the bottom.

Jean Berstel, Luc Boasson, Olivier Carton, and Isabelle Fagnot

362

b

2

c a

1 a

4

a 7

8 b

c

a

1

a; b

a 4

c

7

8 b

a

Figure 11. The automaton on the top has one minimal cycle of height 1. By wrapping, states 6 and 8, and states 5 and 7 are merged, respectively, giving the automaton on the bottom.

7.5. Incremental minimisation: the algorithm of Daciuk et al.The algorithm presented in [27] is an incremental algorithm for the construction of a minimal automaton for a given set of words that is lexicographically sorted. The algorithm is easy to implement and it is efficient: the construction of an automaton recognising a typical dictionary is done in a few seconds. Table 2 was kindly communicated by Sébastien Paumier. It contains the space saving and the computation time for dictionaries of various languages.

Table 2. Running time and space requirement for the computation of minimal automata (communication of Sébastien Paumier) File delaf-de delaf-en delaf-es delaf-fi delaf-fr delaf-gr delaf-it delaf-no delaf-pl delaf-pt delaf-ru delaf-th

Lines 189878 296637 638785 256787 687645 1288218 611987 366367 59468 454241 152565 33551

Text file 12.5Mb 13.2Mb 35.4Mb 24.6Mb 38.7Mb 83.8Mb 35.9Mb 23.3Mb 3.8Mb 24.8Mb 10.8Mb 851Kb

States

Automaton Trans.

Size

57165 109965 56717 124843 109466 228405 64581 75104 14128 47440 23867 36123

103362 224268 117417 133288 240409 442977 161718 166387 20726 115694 35966 61357

1.45Mb 2.86Mb 1.82Mb 4.14Mb 3.32Mb 7.83Mb 1.95Mb 2.15Mb 502Kb 1.4Mb 926Kb 925Kb

Time Revuz Daciuk 4.22s 5.94s 10.61s 6.40s 13.03s 28.33s 10.43s 6.86s 1.19s 7.87s 2.95s 0.93s

4.44s 6.77s 11.28s 7.02s 14.14s 31.02s 11.46s 7.44s 1.30s 8.45s 3.17s 1.14s

10. Minimisation of automata

363

The algorithm described here is simple because the words are sorted. There exist other incremental algorithms for the case of unsorted sets. One of them will be described in the next section. Another algorithm, called semi-incremental because it requires a final minimisation step, is given in [56]. We start with some notation. Let A D .Q; i; F / be a finite, acyclic, deterministic and trim automaton. We say that a word x is in the automaton A if i  x is defined. In other words, x is in A if x is a prefix of some word recognised by A. Let w be a word to be added to the set recognised by an automaton A. The factorisation w D x  y;

where x is the longest prefix of w which is in A, is called the prefix-suffix decomposition of w . The word x (resp. y ) is the common prefix (resp. corresponding suffix) of w . One has x D " if either w D " or i  a is undefined, where a is the initial letter of w . Similarly, y D " if w itself is in A. If y ¤ " and starts with the letter b , then i  xb D ?. The insertion of a word y at state p is an operation that is performed provided y D " or p  b D ?, where b is the initial letter of y . If y D ", the insertion simply consists in adding state p to the set F of final states. If y ¤ ", set y D b1 b2    bm . The insertion consists in adding new states p1 ; : : : ; pm to Q, with the next state function defined by p  b1 D p1 and pi 1  bi D pi for i D 2; : : : ; m. Furthermore, pm is added to the set F of final states. Assume that the language recognised by A is not empty, and that the word w is lexicographically greater than all words in A. Then w is not in A. So the common prefix x of w is strictly shorter than w and the corresponding suffix y is nonempty. The incremental algorithm works at follows. At each step, a new word w that is lexicographically greater than all previous ones is inserted in the current automaton A. First, the prefix-suffix decomposition w D xy of w , and the state q D i x are computed. Then the segment starting at q of the path carrying the suffix y 0 of the previously inserted word w 0 is minimised by merging states with the same signature. Finally, the suffix y is inserted at state q . The algorithm is given in Algorithm 5. Algorithm 5 Dac i u k Et A l.A/ for all w do .x; y/ P r efS u ffD eco mp.w/ q i x M i n i mi z eLa st Pat h.q/ A ddPat h.q; y/

The incremental algorithm of Daciuk et al. F words are given in lexicographic order F x is the longest prefix of w in A F q is the state reached by reading x F minimise the states on this path F add a path starting in q and carrying y

The second step deserves a more detailed description. We observe first that the word x of the prefix-suffix decomposition w D xy of w is in fact the greatest common prefix of w 0 and w . Indeed, the word x is a prefix of some word recognised by A (here A is the automaton before adding w ), and since w 0 is the greatest word in A, the word x is a prefix of w 0 . Thus x is a common prefix of w 0 and w . Next, if x 0 is a common

Jean Berstel, Luc Boasson, Olivier Carton, and Isabelle Fagnot

364

prefix of w 0 and w , then x 0 is in A because it is a prefix of w 0 , and consequently x 0 is a prefix of x because x is the longest prefix of w in A. This shows the claim. There are two cases for the merge. If w 0 is a prefix of w , then w 0 D x . In this case, there is no minimisation to be performed. If w 0 is not a prefix of w , then the paths for w 0 and for w share a common initial segment carrying the prefix x , from the initial state to state q D i  x . The minimisation y0

concerns the states on the path q ! t 0 carrying the suffix y 0 of the factorisation w D xy 0 of w 0 . Each of the states in this path, except the state q , will never be visited again in any insertion that may follow, so they can be merged with previous states.

Example 7.1. We consider the sequence of words .aa; aba; ba; bba; bc/. The first two words give the automaton of Figure 12(a). Adding the word ba permits the merge of states 2 and 4. The resulting automaton is given in Figure 12(b). After inserting bba, there is a merge of states 6 and 2, see Figure 13(a).

0

a

a

1

2

0

a

a

1

b

a

b a

3

b

4

2

3 5

(a)

a

6

(b)

Figure 12. (a). The automaton for aa; aba. (b). The automaton for aa; aba; ba. Here state 4 has been merged with state 2.

0

a

a

1

a

b b

2

0

a b

3 b

a b

(a)

7

a

b

3 5

a

1

a

8

5

c

a 9

(b)

Figure 13. (a). The automaton for aa; aba; ba; aba; bba. (b). The automaton for aa; aba; ba; aba; bba; bc . After inserting bc , states 8 and 2 are merged, and then states 7 and 2.

2

10. Minimisation of automata

365

8. Dynamic minimisation Dynamic minimisation is the process of maintaining an automaton minimal when insertions or deletions are performed. A solution for adding and for removing a word was proposed by Carrasco and Forcada [19]. It consists in an adaptation of the usual textbook constructions for intersection and complement to the special case where one of the languages is a single word. It appears that the finiteness of the language L plays no special role, so we assume here that it is regular, not necessarily finite. The construction for adding a word has also been proposed in [53], and in [27] for acyclic automata. An extension to general automata, and several other issues, are discussed in [26]. Here we only consider (for lack of space) the deletion of a word from the set recognised by an automaton, and minimisation of the new automaton. Let A D .Q; i; T / be the minimal automaton recognising a language L over the alphabet A, and let w be a word in L. Let Aw denote the minimal automaton recognising the complement A n w . The automaton has n C 2 states, with n D jwj. Among them, there are n C 1 states that are identified with the set P of prefixes of w . The last state is a sink state denoted by ?. An example is given in Figure 14. a a "

a

a

b

ab

a

aba

b

abab

a; b

?

a; b

b b

Figure 14. The minimal automaton recognising the complement of the word abab . Only the state abab is not final.

The language L n w is equal to L \ .A n w/, so it is recognised by the trimmed part B of the product automaton A  Aw . Its initial state is .i; "/, and its states are of three kinds.  Intact states: these are states of the form .q; ?/ with q 2 Q. They are called so because the language recognised from .q; ?/ in B is the same as the language recognised from q in A: LB .q; ?/ D LA .q/.  Cloned states: these are accessible states .q; x/ with x 2 P , so x is a prefix of w . Since these states are assumed to be accessible, one has q D i  x in A, and there is one such state for each prefix. The next-state function on these states is defined by ´ .q  a; xa/ if xa 2 P , .q; x/  a D .q  a; ?/ otherwise. Observe that .i  w; ?/ is an intact state because w is assumed to be recognised by A.

366

Jean Berstel, Luc Boasson, Olivier Carton, and Isabelle Fagnot

 Useless states: these are all states that are removed when trimming the product automaton.

Here trimming consists in removing the state .i; ?/ if it is no longer accessible, and the states reachable only from this state. For this, one follows the path defined by w and starting in .i; ?/ and removes the states until one reaches a confluent state (one that has at least two incoming edges). The automaton obtained is minimal. The whole construction finally consists in keeping the initial automaton, by renaming a state q as .q; ?/, adding a cloned path, and removing state .i; ?/ if it is no longer accessible, and the states reachable only from this state. The algorithm is given below. Algorithm 6 R emov eI nc r emen ta l.w; A/ A0 A ddC lo n edPat h.w; A/ Tr i m.A0 /

Removing the word w from the language recognised by A F add a new path for w in A F return trimmed automaton

A ddC lo n edPat h.a1    an ; A/ p0 I n i t i a l.A/I q0 C lo n e.p0 / for i D 1 to n do pi pi 1  ai I qi C lo n e.pi / qi 1  ai qi S et F i na l.qn ; false/

F add a new initial state q0 F qi inherits the transitions of pi F this edge is redirected

Of course, one may also use the textbook construction directly, that is, without taking advantage of the existence of the automaton given at the beginning. For this, one starts at the new initial state .i; "/ and one builds only the accessible part of the product automaton. The method has complexity O.n C jwj/, where n is the number of states of the initial automaton, whereas the previous method only has complexity O.jwj/. Example 8.1. The automaton given in Figure 15 recognises the language L D .ab/C [ ¹abc; acbº. The direct product with the automaton of Figure 14 is shown in Figure 16. Observe that there are intact states that are not accessible from the new initial state .0; "/. The minimal automaton is shown in Figure 17. b 2

b 0

a

1

a

4

c

a

c 3

b

6

5

Figure 15. The minimal automaton recognising the language L D .ab/C [ ¹abc; acbº

10. Minimisation of automata

367 b

2; ?

b 0; ?

a

a

1; a

4; ?

c

1; ?

3; ?

b

2; ab

b c a

6; ? a

c c

0; "

a

a

5; ? 4; aba

b

6; abab

Figure 16. The automaton recognising the language L D .ab/C [ ¹abc; acbº n ¹ababº. There are still unreachable states (shown in gray).

b 3 c 0

a

1

b

b

5

7

c 2

a

8 a

4

b

a

6

Figure 17. The minimal automaton for the language L D .ab/C [ ¹abc; acbº n ¹ababº

9. Extensions and special cases In this section, we consider extensions of the minimisation problem to other classes of automata. The most important problem is to find a minimal nondeterministic automaton recognising a given regular language. Other problems, not considered here, concern sets of infinite words and the minimisation of their accepting devices, and the use of other kinds of automata known to be equivalent with respect to their accepting capability, such as two-way automata or alternating automata; see, for instance [50]. 9.1. Special automata. Here we briefly mention special cases where minimisation plays a role. It is well known that string matching is closely related to the construction of particular automata. If w is a nonempty word over an alphabet A, then searching for all occurrences of w as a factor in a text t is equivalent to computing all prefixes of t ending in w , and hence to determining all prefixes of t that are in the regular language A w . The minimal automaton recognising A w has n C 1 states, where n D jwj, and can

Jean Berstel, Luc Boasson, Olivier Carton, and Isabelle Fagnot

368

be constructed in linear time. The automaton has many interesting properties. For instance there are at most 2n edges, when one does not count edges ending in the initial state. This is due to Imre Simon; also see [36]. For a general exposition, see e.g., [24]. Extension of string matching to a finite set X of patterns has been done by Aho and Corasick. The associated automaton is called the pattern matching machine; it can be computed in time linear in the sum of the lengths of the words in X . Again, see [24]. However, this automaton is not minimal in general. Indeed, the number of states is the number of distinct prefixes of words in X , and this may be greater than the number of states in the minimal automaton (consider, for example, the set X D ¹ab; bbº over the alphabet A D ¹a; bº). There are some investigations on the complexity of minimising Aho–Corasick automata; see [2]. Another famous minimal automaton is the suffix automaton. This is the minimal automaton recognising all suffixes of a given word. The number of states of the suffix automaton of a word of length n is less than 2n, and the number of its edges is less than 3n. Algorithms for constructing suffix automata in linear time have been given in [23] and [17]; again see [24] for details. 9.2. Nondeterministic automata. A nondeterministic automaton is minimal if it has the minimal number of states among all automata recognising the same language. Nondeterministic automata are not unique. In Figure 18, we give two non-isomorphic nondeterministic automata which are both smaller than the minimal deterministic automaton recognising the same language. This language is a.b  [ c  /ab C . The example is derived from an automaton given in [4]. b

b b

a

1

a

a

c 0

3 a

a 2

b

b

1

a

c 4

0

3 a

b

4

a 2

Figure 18. Two non-isomorphic non-deterministic automata recognising the set a.b  [ c  /ab C

One might ask if there are simple conditions on the automata or on the language that ensure that the minimal nondeterministic automaton is unique. For instance, the automata of Figure 19 both recognise the same language, but the second has a particular property that we will describe now. The uniqueness of the minimal automaton in the deterministic case is related to the fact that the futures of the states of such an automaton are pairwise distinct, and that each future is some left quotient of the language: for each state q , the language Lq .A/ is equal to a set y 1 L, for some word y .

10. Minimisation of automata a; b

369

a; b

a a

0

a

1

0

1 a; b

Figure 19. Two nondeterministic automata recognising the set of words ending with the letter a

This characterisation has been the starting point for investigating similar properties of nondeterministic automata. Let us call a (nondeterministic) automaton a residual automaton if the future of its states are left quotients of the language; it has been shown in [31] that, among all residual automata recognising a given language, there is a unique residual automaton having a minimal number of states; moreover, this automaton is characterised by the fact that the set of its futures is the set of the prime left quotients of the language, a left quotient being prime if it is not the union of other nonempty left quotients. For instance, the automaton on the right of Figure 19 has this property, since L0 D ¹a; bº a and L1 D a 1 L0 D " [ ¹a; bº a and there are no other nonempty left quotients. The automaton on the left of Figure 19 is not residual since the future of state 1 is not a left quotient. The problem of converting a given nondeterministic automaton into a minimal nondeterministic automaton is NP-hard, even over a unary alphabet [39]. The problem is even PSPACE-hard on larger alphabets because checking whether a non-deterministic automaton accepts all words is PSPACE-hard. NP-hardness also applies to unambiguous automata [47]. In [15], these results have been extended as follows. The authors define a class ı NFA of automata that are unambiguous, have at most two computations for each string, and have at most one state with two outgoing transitions carrying the same letter. They show that minimisation is NP-hard for all classes of finite automata that include ı NFA, and they show that these hardness results can also be adapted to the setting of unambiguous automata that can non-deterministically choose between two start states, but are deterministic everywhere else. Even approximating minimisation of nondeterministic automata is intractable, see [34]. There is an algebraic framework that allows one to represent and to compute all automata recognising a given regular language. The state of this theory, that goes back to Kameda and Weiner [42], has been described by Lombardy and Sakarovitch in a recent survey paper [46]. There is a well-known exponential blow-up from nondeterministic automata to deterministic ones. The usual textbook example, already given in the first section (the automaton on the top in Figure 1) shows that this blow-up holds also for unambiguous automata, even if there is only one edge that causes the nondeterminism. It has been shown that any value of blow-up can be obtained, in the following sense [40]: for all integers n; N with n 6 N 6 2n , there exists a minimal nondeterministic automaton with n states over a four-letter alphabet whose equivalent minimal

370

Jean Berstel, Luc Boasson, Olivier Carton, and Isabelle Fagnot

deterministic automaton has exactly N states. This was improved to ternary alphabets [41]. Acknowledgements. We had several helpful discussions with Marie-Pierre Béal, Julien David, Sylvain Lombardy, Wim Martens, Cyril Nicaud, Sébastien Paumier, Jean-Éric Pin and Jacques Sakarovitch. We thank Narad Rampersad and Jeffrey Shallit for their careful reading of the text.

References [1] A. V. Aho, J. E. Hopcroft, and J. D. Ullman, The design and analysis of computer algorithms. Addison-Wesley Series in Computer Science and Information Processing. AddisonWesley, Reading, MA, 1974. MR 0413592 Zbl 0326.68005 q.v. 345 [2] O. AitMous, F. Bassino, and C. Nicaud, Building the minimal automaton of A X in linear time, when X is of bounded cardinality. In Combinatorial pattern matching (A. Amir and L. Parida, eds.). Lecture Notes in Computer Science, 6129. Springer, Berlin, 2010, 275–287. MR 2684962 Zbl 1286.68271 q.v. 368 [3] J. Almeida and M. Zeitoun, Description and analysis of a bottom-up DFA minimization algorithm. Inform. Process. Lett. 107 (2008), no. 2, 52–59. MR 2422199 Zbl 1186.68242 q.v. 338, 358 [4] A. Arnold, A. Dicky, and M. Nivat, A note about minimal non-deterministic automata. Bull. European Assoc. Theor. Comput. Sci. 47 (1992), 166–169. Zbl 0751.68038 q.v. 368 [5] M. Baclet and C. Pagetti, Around Hopcroft’s algorithm. In Implementation and application of automata. (O. H. Ibarra and H. Yen, eds.). Papers from the 11th International Conference (CIAA 2006) held at National Taiwan University, Taipei, August 21–23, 2006. Lecture Notes in Computer Science, 4094. Springer, Berlin, 2006, 114–125. MR 2296451 Zbl 1160.68399 q.v. 349 [6] F. Bassino, J. David, and C. Nicaud, On the average complexity of Moore’s state minimization algorithm. In STACS 2009: 26 th International Symposium on Theoretical Aspects of Computer Science (S. Albers and J.-Y. Marion, eds.). LIPIcs. Leibniz International Proceedings in Informatics, 3. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2009, 123–134. MR 2870646 Zbl 1236.68162 q.v. 338, 344 [7] F. Bassino, J. David, and C. Nicaud, Average case analysis of Moore’s state minimization algorithm. Algorithmica 63 (2012), no. 1–2, 509–531. MR 2886085 Zbl 1291.68176 q.v. 344 [8] M.-P. Béal and M. Crochemore, Minimizing local automata. In 2007 IEEE International Symposium on Information Theory (G. Caire and M. Fossorier, eds.). Held in Nice, June 24–29, 2007. IEEE Press, Los Alamitos, CA, 2007, 1376–1380. IEEEXplore 4557131 q.v. 338, 355 [9] M.-P. Béal and M. Crochemore, Minimizing incomplete automata. Preprint, 2008. hal-00620274 q.v. 337, 349 [10] D. Beauquier, J. Berstel, and P. Chrétienne, Éléments d’algorithmique. Masson, Paris, 1992. q.v. 345 [11] J. Berstel, L. Boasson, and O. Carton, Continuant polynomials and worst-case behavior of Hopcroft’s minimization algorithm. Theoret. Comput. Sci. 410 (2009), no. 30–32, 2811–2822. MR 2543335 Zbl 1173.68029 q.v. 354

10. Minimisation of automata

371

[12] J. Berstel, L. Boasson, O. Carton, and I. Fagnot, Sturmian trees. Theory Comput. Syst. 46 (2010), no. 3, 443–478. MR 2592178 Zbl 1209.68394 q.v. 353 [13] J. Berstel and O. Carton, On the complexity of Hopcroft’s state minimization algorithm. In Implementation and application of automata (M. Domaratzki, A. Okhotin, K. Salomaa, and S. Yu, eds.). Papers from the 9th International Conference (CIAA 2004) held at Queen’s University, Kingston, ON, July 22–24, 2004. Lecture Notes in Computer Science, 3317. Springer, Berlin, 2005, 35–44. MR 2143392 Zbl 1115.68417 q.v. 349, 354 [14] J. Berstel and D. Perrin, Algorithms on words. In Algebraic combinatorics on words (M. Lothaire, ed.). Encyclopedia of Mathematics and its Applications, 90. Cambridge University Press, Cambridge, 2002, 1–100. MR 1905123 Zbl 1001.68093 q.v. 348 [15] H. Björklund and W. Martens, The tractability frontier for NFA minimization. J. Comput. System Sci. 78 (2012), no. 1, 198–210. MR 2896357 Zbl 1282.68119 q.v. 369 [16] N. Blum, A O.n log n/ implementation of the standard method for minimizing n-state finite automata. Inform. Process. Lett. 57 (1996), no. 2, 65–69. MR 1374702 Zbl 0875.68649 q.v. 345 [17] A. Blumer, J. A. Blumer, D. Haussler, A. Ehrenfeucht, M. T. Chen, and J. I. Seiferas, The smallest automaton recognizing the subwords of a text. Theoret. Comput. Sci. 40 (1985), no. 1, 31–55. Special issue: Eleventh international colloquium on automata, languages and programming (Antwerp, 1984). MR 0828515 Zbl 0574.68070 q.v. 368 [18] J. A. Brzozowski, Canonical regular expressions and minimal state graphs for definite events. In Proceedings of the Symposium on Mathematical Theory of Automata. New York, April 24–26, 1962. Microwave Research Institute Symposia Series, XII. Polytechnic Press of Polytechnic Institute of Brooklyn, Brooklyn, N.Y., 1963, 529–561. MR 0175719 Zbl 0116.33605 q.v. 341 [19] R. C. Carrasco and M. L. Forcada, Incremental construction and maintenance of minimal finite-state automata. Comput. Linguist. 28 (2002), no. 2, 207–216. MR 1915835 Zbl 1232.68080 q.v. 365 [20] G. Castiglione, A. Restivo, and M. Sciortino, Circular words and automata minimization. Words 2007 (P. Arnoux, N. Bédaride, and J. Cassaigne, eds.). 6th International Conference on Words. CIRM, Marseille, September 17–21, 2007. Institut de Mathématiques de Luminy, Marseille, 2007, 79–89. q.v. 354 [21] G. Castiglione, A. Restivo, and M. Sciortino, Hopcroft’s algorithm and cyclic automata. In Language and automata theory and applications. (C. Martín-Vide, F. Otto, and H. Fernau, eds.). Revised papers from the 2nd International Conference (LATA 2008) held in Tarragona, March 13–19, 2008. Lecture Notes in Computer Science, 5196. Springer, Berlin, 2008, 172–183. MR 2540322 Zbl 1163.68021 q.v. 354 [22] G. Castiglione, A. Restivo, and M. Sciortino, Circular Sturmian words and Hopcroft’s algorithm. Theoret. Comput. Sci. 410 (2009), no. 43, 4372–4381. MR 2553588 Zbl 1187.68360 q.v. 354 [23] M. Crochemore, Transducers and repetitions. Theoret. Comput. Sci. 45 (1986), no. 1, 63–86. MR 0865967 Zbl 0615.68053 q.v. 368 [24] M. Crochemore, C. Hancart, and T. Lecroq, Algorithms on strings. Translated from the 2001 French original. Cambridge University Press, Cambridge, 2007. MR 2355493 Zbl 1137.68060 q.v. 360, 368 [25] J. Daciuk, Comparison of construction algorithms for minimal, acyclic, deterministic finite-state automata from sets of strings. In Implementation and application of automata (J.-M. Champarnaud and D. Maurel, eds.). Papers from the 7th International Conference

372

[26]

[27]

[28]

[29] [30] [31] [32]

[33] [34] [35] [36] [37]

[38]

[39] [40]

Jean Berstel, Luc Boasson, Olivier Carton, and Isabelle Fagnot (CIAA 2002) held at the Université François Rabelais of Tours, Tours, July 3–5, 2002. Lecture Notes in Computer Science, 2608. Springer, Berlin, 2003, 255–261. MR 2047731 Zbl 1033.68575 q.v. 358 J. Daciuk, Comments on “Incremental construction and maintenance of minimal finite-state automata” by Rafael C. Carrasco and Mikel L. Forcada. Comput. Linguist. 30 (2004), no. 2, 227–235. MR 2091102 Zbl 1234.68207 q.v. 365 J. Daciuk, S. Mihov, B. W. Watson, and R. E. Watson, Incremental construction of minimal acyclic finite-state automata. Comput. Linguist. 26 (2000), no. 1, 3–16. MR 1783750 Zbl 1232.68081 q.v. 338, 356, 362, 365 L. D’Antoni and M. Veanes, Minimization of symbolic automata. In Proceedings of the 41 st ACM SIGPLAN-SIGACT symposium on principles of programming languages (S. Jagannathan and P. Sewell, eds.). POPL ’14, San Diego, CA, USA, January 22–24, 2014. Association for Computing Machinery, New York, N..Y, 541–554. Zbl 1284.68347 q.v. 349 J. David, Génération aléatoire d’automates et analyse d’algorithmes de minimisation. Thèse de doctorat. University Paris-Est, Paris, 2010. q.v. 350, 351 J. David, Average complexity of Moore’s and Hopcroft’s algorithms. Theoret. Comput. Sci. 417 (2012), 50–65. MR 2885889 Zbl 1235.68100 q.v. 344, 351 F. Denis, A. Lemay, and A. Terlutte, Residual finite state automata. Fund. Inform. 51 (2002), no. 4, 339–368. MR 1999650 Zbl 1011.68048 q.v. 369 S. De Felice and C. Nicaud, Brzozowski algorithm is generically super-polynomial for deterministic automata. In Developments in language theory (M.-P. Béal and O. Carton, eds.). Proceedings of the 17th International Conference (DLT 2013) held at Université Paris-Est, Marne-la-Vallée, June 18–21, 2013. Lecture Notes in Computer Science, 7907. Springer, Berlin, 2013, 179–190. MR 3097327 Zbl 1381.68113 q.v. 341 S. De Felice and C. Nicaud, Average case analysis of Brzozowski’s algorithm. Internat. J. Found. Comput. Sci. 27 (2016), no. 2, 109–126. MR 3493541 Zbl 1338.68143 q.v. 342 G. Gramlich and G. Schnitger, Minimizing NFA’s and regular expressions. J. Comput. System Sci. 73 (2007), no. 6, 908–923. MR 2332724 Zbl 1152.68459 q.v. 369 D. Gries, Describing an algorithm by Hopcroft. Acta Informat. 2 (1973), 97–109. MR 0341936 Zbl 0242.94042 q.v. 345 C. Hancart, On Simon’s string searching algorithm. Inform. Process. Lett. 47 (1993), no. 2, 95–99. MR 1234561 Zbl 0781.68068 q.v. 368 J. E. Hopcroft, An n log n algorithm for minimizing states in a finite automaton. In Theory of machines and computations (Z. Kohavi and A. Paz, eds.). Proceedings of an International Symposium on the Theory of Machines and Computations held at Technion in Haifa, Israel on August 16–19, 1971. Academic Press, New York and London, 1971, 189–196. MR 0403320 Zbl 0293.94022 q.v. 337, 341, 345, 348 J. E. Hopcroft and J. D. Ullman, Introduction to automata theory, languages, and computation. Addison-Wesley Series in Computer Science. Addison-Wesley, Reading, MA, 1979. MR 0645539 Zbl 0426.68001 q.v. 337 T. Jiang and B. Ravikumar, Minimal NFA problems are hard. SIAM J. Comput. 22 (1993), no. 6, 1117–1141. MR 124718 Zbl 0799.68079 q.v. 369 J. Jirásek, G. Jirásková, and A. Szabari, Deterministic blow-ups of minimal nondeterministic finite automata over a fixed alphabet. Internat. J. Found. Comput. Sci. 19 (2008), no. 3, 617–631. MR 2417959 Zbl 1155.68041 q.v. 369

10. Minimisation of automata

373

[41] G. Jirásková, Magic numbers and ternary alphabet. Internat. J. Found. Comput. Sci. 22 (2011), no. 2, 331–344. MR 2772813 Zbl 1222.68109 q.v. 370 [42] T. Kameda and P. Weiner, On the state minimization of nondeterministic finite automata. IEEE Trans. Computers C-19 (1970), no. 7, 617–627. MR 0398705 Zbl 0195.02701 IEEEXplore 1671587 q.v. 369 [43] I. Kapovich, A. Miasnikov, P. Schupp, and V. Shpilrain, Generic-case complexity, decision problems in group theory, and random walks. J. Algebra 264 (2003), no. 2, 665–694. MR 1981427 Zbl 1041.20021 q.v. 342 [44] T. Knuutila, Re-describing an algorithm by Hopcroft. Theoret. Comput. Sci. 250 (2001), no. 1–2, 333–363. MR 1795249 Zbl 0952.68077 q.v. 345, 348, 349 [45] S. L. Krivol, Algorithms for minimization of finite acyclic automata and pattern matching in terms. Kibernetika 3 (1991), 11–16. In Russian. English translation, Cybernetics 27 (1991), 324–331, 1991. Zbl 0800.68445 q.v. 357 [46] S. Lombardy and J. Sakarovitch, The universal automaton. In Logic and automata (J. Flum„ E. Grädel, and T. Wilke, eds.). History and perspectives. Texts in Logic and Games, 2. Amsterdam University Press, Amsterdam, 2008, 457–504. MR 2508751 Zbl 1217.68133 q.v. 369 [47] A. Malcher, Minimizing finite automata is computationally hard. Theoret. Comput. Sci. 327 (2004), no. 3, 375–390. MR 2098313 Zbl 1071.68047 q.v. 369 [48] E. F. Moore, Gedanken experiments on sequential machines. In Automata studies (C. E. Shannon and J. McCarthy, eds.). Annals of Mathematics Studies, 34. Princeton University Press, Princeton, N.Y., 1956, 129–153. MR 0078059 q.v. 338, 343 [49] A. Pa˘un, M. Pa˘un, and A. Rodríguez-Patón, On Hopcroft’s minimization technique for DFA and DFCA. Theoret. Comput. Sci. 410 (2009), no. 24–25, 2424–2430. MR 2522446 Zbl 1168.68028 q.v. 349 [50] D. Perrin and J.-É. Pin, Infinite words. Automata, semigroups, logic and games. Pure and Applied Mathematics (Amsterdam), 141. Elsevier/Academic Press, 2004. Zbl 1094.68052 q.v. 367 [51] D. Revuz, Minimisation of acyclic deterministic automata in linear time. Theoret. Comput. Sci. 92 (1992), no. 1, 181–189. Combinatorial Pattern Matching School (Paris, 1990). MR 1143138 Zbl 0759.68066 q.v. 338, 357 [52] J. Sakarovitch, Éléments de théorie des automates. Vuibert Informatique, Paris, 2003. English translation, Elements of automata theory. Cambridge University Press, 2009. Translated by R. Thomas. Cambridge University Press, Cambridge, 2009. MR 2567276 Zbl 1188.68177 (English ed.) Zbl 1178.68002 (French ed.) q.v. 341 [53] K. N. Sgarbas, N. D. Fakotakis, and G. K. Kokkinakis, Optimal insertion in deterministic DAWGs. Theoret. Comput. Sci. 301 (2003), no. 1–3, 103–117. MR 1975222 Zbl 1022.68070 q.v. 365 [54] A. Valmari, Fast brief practical DFA minimization. Inform. Process. Lett. 112 (2012), no. 6, 213–217. MR 2876204 Zbl 1242.68149 q.v. 349 [55] A. Valmari and P. Lehtinen, Efficient minimization of DFAs with partial transition functions. In STACS 2008: 25 th International Symposium on Theoretical Aspects of Computer Science (S. Albers and P. Weil, eds.). LIPIcs. Leibniz International Proceedings in Informatics, 1. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2008, 645–656. MR 2873773 Zbl 1259.68115 q.v. 337, 349 [56] B. W. Watson, A new algorithm for the construction of minimal acyclic DFAs. Sci. Comput. Programming 48 (2003), no. 2–3, 81–97. MR 2002350 Zbl 1059.68071 q.v. 363

Chapter 11

Learning algorithms Henrik Björklund, Johanna Björklund, and Wim Martens

Contents 1. 2. 3. 4. 5. 6. 7. 8.

Introduction . . . . . . . . . . . . . . . . Preliminaries . . . . . . . . . . . . . . . Classical results . . . . . . . . . . . . . . Learning from given data . . . . . . . . . Learning non-deterministic finite automata Learning regular tree languages . . . . . . PAC learning . . . . . . . . . . . . . . . Applications and further material . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

375 376 379 387 392 396 399 401

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

403

1. Introduction What is a learning algorithm? How can a computer automatically learn things and somehow become smarter as it receives more information? According to Valiant, “a program for performing a task has been acquired by learning if it has been acquired by any means other than explicit programming” [62]. This view of learning is very broad, as is the field of learning algorithms. Learning algorithms, or learners, can be said to specialise in generalising from instances to concepts. At the heart of learning lies the ability to derive from a number of instances the common concept that they exemplify. The concept can, in principle, be almost anything. If it is a logical formula, the instances may be structures that satisfy it. If it is a language, the instances may be words that belong to it. Grammatical inference is the subfield of algorithmic learning where the concept to be learned is a formal language. Since languages can be infinite, we are interested in learners that output finite representations of the languages, such as automata or grammars. Even grammatical inference is, however, a field too large to be covered here, and we therefore focus on what we believe to be the most fundamental aspect of it, namely the learning of regular languages, represented by finite automata. After some preliminary definitions, we present the most well-known classical results on the learning of deterministic finite automata in § 3. These results have since been extended to the learning of, e.g., nondeterministic finite automata, and regular tree automata. We discuss these extensions in §§ 4–6. The concept of probably

376

Henrik Björklund, Johanna Björklund, and Wim Martens

approximately correct (PAC) learning is important in general learning theory, and we discuss its implications for learning finite automata in § 7. Finally, in § 8, we present some examples of problems closely related to grammatical inference and more applied settings in which grammatical inference has been successfully used, such as natural language processing, XML databases, and formal verification.

2. Preliminaries We denote the set of Booleans by B D ¹0; 1º where 1 represents true and 0 represents false. We use N for the natural numbers and NC for the strictly positive natural numbers. For k 2 N, we denote the set ¹1; : : : ; kº by Œk, with Œ0 D ;. Given a partial function f W D ! D 0 , the domain of f is the subset of D on which f is defined. We write Rng.f / for the range of f , that is, the set ¹d 0 j .d 0 2 D 0 / and there exists d 2 D such that f .d / D d 0 º. In the following, A always denotes a finite alphabet. A set S of words over alphabet A is prefix-closed if w  x 2 S implies that w 2 S , for all w; x 2 A . The set is suffixclosed if w  x 2 S implies that x 2 S . We often abbreviate (nondeterministic) finite automaton and deterministic finite automaton by NFA and DFA, respectively. Given an NFA A D .Q; I; E; T / and a set S  Q, we write E.S / for the set of successors of states in S with respect to E , that is, E.S / D ¹p j there exist q 2 S and a 2 A such that .q; a; p/ 2 Eº. Similarly, for a 2 A, we write E.S; a/ for the set ¹p j there exists q 2 S with .p; a; q/ 2 Eº. For a word w , we denote by E  .w/ the set of states that can be reached by reading w . That is, E  ."/ D I and E  .wa/ D E.E  .w/; a/. The language of A is denoted L.A/. By A.w/ we denote the Boolean value that is true if and only if w 2 L.A/. Given a set S D ¹w1 ; : : : ; wn º of words, the prefix tree acceptor, or PTA of S is the DFA A D .Q; I; E; T / such that L.A/ D S and Q is the set of all prefixes of words from S , I D ", T D S , and E D .w; a; wa/ for each state wa in Q. 2.1. A warm-up to learning. At the heart of many learning algorithms for regular languages lies the Myhill–Nerode theorem. We therefore remind the reader of its statement. Definition 2.1. Let X be a language over alphabet A. Two words w1 ; w2 in A are equivalent with respect to X , written w1 X w2 , if, for every word x 2 A , w1  x 2 X if and only if w2  x 2 X . For a word w 2 A , we write ŒwX for the equivalence class of w in the equivalence relation X , that is, for the set of words ¹y 2 A j y X wº. We say that w is a representative of class ŒwX . Theorem 2.1 (Myhill–Nerode). A language X is regular if and only if X has finite index, that is, has a finite number of equivalence classes. Furthermore, if X is regular, then each state of the minimal DFA for X corresponds to an equivalence class of X and vice versa. In particular, the index of X is exactly the number of states of the minimal DFA for X .

11. Learning algorithms

377

We provide an example as a warm-up for the reader to the general philosophy behind many algorithms for learning regular languages. Example 2.1. Assume that we are given the following information about an unknown word language Z over A D ¹a; bº.  The equivalence relation Z has four classes: S1 ; S2 ; S3 ; and S4 .  The words "; a; and baa belong to S1 .  The words b and bab belong to S2 .  The words ba and bba belong to S3 .  The words bb and bbb belong to S4 .  .S3 [ S4 / D Z . Due to Theorem 2.1, we know that we are looking for a regular word language. As we shall see, this information is enough to recreate the minimal DFA AZ D .Q; I; E; T / for Z . Consider the set S D ¹"; b; ba; bbº of the shortest representatives we know for each equivalence class of Z . Notice that, for every w 2 S and every c 2 A, we know to which equivalence class w  c belongs. Furthermore, since w Z x implies that w  c Z x  c holds for all w; x 2 A and all c 2 A, we know everything we need to know about the relationships of the equivalence classes to each other. If we let Q D ¹S1 ; S2 ; S3 ; S4 º as in the Myhill–Nerode theorem, we know, for example, that since ba 2 S3 and bab 2 S2 , there should be a b -labelled edge from S3 to S2 . Furthermore, since " 2 S1 we have I D ¹S1 º and since .S3 [ S4 / D Z we have T D ¹S3 ; S4 º. The full automaton AZ is depicted in Figure 1. The language Z can also be characterised through the regular expression .a C b/ b.a C b/. a

b b

b "

a

b a

b

ba

bb a

Figure 1. The automaton AZ from Example 2.1. Each state is labelled by its shortest representative.

2.2. Observation tables for word languages. Many learning algorithms use so-called observation tables to represent the information collected about an unknown language X . In this section, we recall the observation tables for words from Angluin [5] and show how to generalise them for trees in § 6. An observation table for words is a tuple .Spre ; Ssuff ; Obs/, where  Spre is a nonempty, finite, prefix-closed set of words;  Ssuff is a nonempty, finite, suffix-closed set of words; and  Obs is a finite function from ..Spre [ Spre  A/  Ssuff / to ¹0; 1º. Here, Obs records what is known about X , that is, Obs.w/ D 1 if and only if w 2 X .

378

Henrik Björklund, Johanna Björklund, and Wim Martens

An observation table can be organised in rows and columns. The rows are indexed by elements from .Spre [ Spre  A/ and the columns are indexed by elements from Ssuff . The entry for row x and column y is then equal to Obs.x  y/. We sometimes denote this entry by Obs.x; y/. If w 2 .Spre [ Spre  A/ then we denote by roww the finite function from Ssuff to ¹0; 1º with roww .x/ D Obs.w  x/. An observation table is closed when for each word w in Spre  A there exists a word x in Spre such that roww D rowx . It is consistent if whenever w and x are elements of Spre with roww D rowx , then, for all a 2 A, we have that rowwa D rowxa . It is useful to think of the elements of Spre as potential equivalence classes of X , or, equivalently, as potential states of a DFA for X . If we have w 2 Spre and w  a 2 Spre , we know that there should be an a-labelled edge from the state represented by w to the state represented by w  a. But what if w  a is not an element of Spre ? The elements of .Spre  A/ n Spre can then be thought of as representing the extra transitions needed in the following sense. If w 2 Spre , w  a 2 .Spre  A/ n Spre , and rowwa D rowx , for some x 2 Spre , there should be an a-labelled edge from the state represented by w to the state represented by x . With these correspondences in mind, we are better prepared to understand the significance of closedness and consistency. If the observation table is not closed, then there are transitions leading to unknown states. If, on the other hand, the observation table is not consistent, then the corresponding automaton is not deterministic. Formally, the finite automaton AObs D .Q; I; E; T / associated to a closed, consistent observation table .Spre ; Ssuff ; Obs/ is defined as follows: Q D ¹roww j w 2 Spre º; I D ¹row" º;

E D ¹.roww ; a; rowwa / j w 2 Spre and a 2 Aº; T D ¹roww j w 2 Spre and Obs.w/ D 1º:

We argue that AObs is well defined. It is clear that I is well defined since Spre is a nonempty prefix-closed set and therefore contains ". To prove that T is well defined, let w1 and w2 be two words in Spre such that roww1 D roww2 . Since Ssuff is nonempty and suffix-closed, Ssuff always contains ". Therefore, we have that Obs.w1 / D Obs.w1 "/ D roww1 ."/ D roww2 ."/ D Obs.w2  "/ D Obs.w2 / and therefore T is well defined. Finally, we show that E is well defined. To this end, assume that w1 and w2 are two words in Spre such that roww1 D roww2 . Since the observation table .Spre ; Ssuff ; Obs/ is consistent, we have that, for each a 2 A, roww1 a D roww2 a . Furthermore, since .Spre ; Ssuff ; Obs/ is closed, roww1 a and roww2 a are equal to roww for some word w in Spre . This proves that AObs is well defined. The automaton AObs D .Q; I; E; T / associated to table .Spre ; Ssuff ; Obs/ is also consistent with Obs in the following sense. For every w1 2 .Spre [ Spre  A/ and w2 2 Ssuff , we have that E  .w1 w2 / is in T if and only if Obs.w1 w2 / D 1. Furthermore, we note that any other DFA consistent with Obs but inequivalent to AObs must have more states than AObs (Theorem 1 in [5]).

11. Learning algorithms

379

3. Classical results 3.1. Learning in the limit. Gold’s learning paradigm learning in the limit formalises the view of human language acquisition as a process, in which the internal representation of the unknown language is continuously refined by new evidence, and converges towards an accurate model as time advances. In Gold’s learning paradigm, an algorithm (henceforth, the learner) is to infer a formal representation an unknown language X . It is known that X is a subset of some universe U, belongs to a class of languages C, and that the learner needs to infer a formal representation from a certain hypothesis space R. Suppose, for instance, that we are interested in learning regular word languages over the alphabet A. Then C would be the class of regular word languages, U would be set of words in A , and R could be the set of deterministic finite state automata or the set of regular expressions over A. In the presentation of Gold’s paradigm, it will be convenient to identify a language X over the universe U of words with its characteristic mapping. That is, we also view X as a mapping X W U ! B such that, for each word w 2 U, X.w/ D 1 if w 2 X and X.w/ D 0 otherwise. Similarly, we can view a sample of X on the subdomain D  U as the restriction X jD of X to D . As such, X jD represents the annotated set of examples ¹.w; X.w// j w 2 Dº. The positive examples contained in X jD are X jD1 .1/, and the negative examples are X jD1 .0/. Additional information about X is provided to the learner at discrete time steps, according to a presentation, that is, a function gW NC ! .U  B/. At time i , the learner is told whether some element w of U is in X or not. Formally, this is done by the presentation g by mapping i to .w; 1/ if w 2 X or to .w; 0/ otherwise. For every new piece of information, the learner must guess the identity of X by outputting an element in R. This means that at time i 2 NC , the learner may support its conjecture on membership information for the words in ¹g.1/; : : : ; g.i /º. It is commonly assumed that the learner does not care about the order in which the examples are presented. From here on, we denote by gŒi  the first i examples provided by g, that is, the set Rng.gjŒi  /. Gold is primarily interested in two classes of presentations: texts and informants. A presentation g is a text for X if Rng.g/ D X  ¹1º. Thus, a text is restricted to positive information, which reflects the early linguistic hypothesis that children learn their native language by listening to others. Notice that a text eventually outputs every element from X . A presentation g is an informant for X if Rng.g/ is a subset of U  B such that, for every word w 2 U, there is an i 2 NC and b 2 B such that g.i / D .w; b/. Furthermore, b D .w 2 X /. When aided by an informant, the learner has access to both positive and negative information. This more expressive type of presentation can be justified by the fact that children also receive negative examples, for example, when they try to speak, but fail to be understood. Definition 3.1. Let fin.U/ be the set of all finite subsets of U. A class of languages C is learnable in the limit from an informant (respectively, from text) if there is a computable function LW ¹X jD j X 2 C and D 2 fin.U/º ! R

380

Henrik Björklund, Johanna Björklund, and Wim Martens

such that the following condition holds: for every language X 2 C and every informant gW NC ! U for X (respectively, every text gW NC ! X for X ), there is an index i 2 NC and a representation A 2 R of X , such that L.X jgŒj  / D A, whenever j > i .

Definition 3.1 does not require that the learner’s initial conjectures are consistent with the given information – only that the conjectures eventually converge to the correct answer. A learner that only produces consistent conjectures is feasible. In the remainder of this section, we briefly discuss learning from text versus learning from an informant. 3.1.1. Learning from text. Early results regarding learning from text were discouraging. Angluin showed that no language class with infinite elasticity is learnable in the limit from text [4]. The elasticity of a class is the length of the longest chain of inclusions X1 ¨ X2 ¨ X3 ¨    , where Xi 2 C for every i 2 N. The intuition behind Angluin’s argument is the following. Towards a contradiction, assume that there is a learner that can infer C from a text presentation. If the learner is given a presentation g that begins with examples taken from X1 , then there is an index i1 2 N such that after seeing the first i1 examples, the learner will make the conjecture X1 . This is necessary because X1 2 C and we assumed that the learner can identify every member of C in the limit. But assume that after index i1 C 1, g contains a sequence of examples from X2 . Again, since X2 2 C, there is an index i2 such that after seeing i2 examples, the learner will conjecture L2 . Now starting at index i2 C1, g contains a sequence of examples from X3 , and so forth. Such an unfavorable presentation will always exist, and it will force the learner to renew its conjecture infinitely many times, making convergence impossible. This contradicts our assumption. It follows that no class of languages containing the finite languages and a single infinite language is learnable in the limit from text. There are, however a few positive results, e.g., it is known that languages of fixed cardinality can be (trivially) inferred from text. Further positive results are briefly discussed in § 8.

3.1.2. Learning from an informant. Whereas most language classes cannot be learned from text, even the primitive recursive languages can be learned from an informant using a technique called identification by enumeration [34]. This technique is applicable whenever the hypothesis space consists of a recursively enumerable family of representations R and the problem of deciding whether a particular representation is consistent with a restriction X jD of some language X to some finite domain D is decidable. The identification rule at time i reads as follows: enumerate the representations in R in order of size (using the lexicographical order to resolve ties) and take as conjecture the first (an hence the smallest) representation that is consistent with X jgŒi  . Since the minimal representation A of X will eventually be reached, the learner only has to modify its hypothesis a finite number of times before finding A. Once the learner has made the conjecture A, the addition of new examples will not cause an inconsistency, so the learner will never be compelled to discard the correct answer. 3.1.3. Characteristic sets. A mapping charL W C ! fin.U/ is a characteristic mapping for the class C and computable function LW ¹X jD j X 2 C and D 2 fin.U/º ! R if the following condition holds: for every language X 2 C there is a representation A 2 R

11. Learning algorithms

381

of X such that L.X jD / D A whenever charL .X /  D . If charL is a characteristic mapping for C, then charL .X / is a characteristic set for X 2 C. A class C has a characteristic mapping with respect to a learner L, if and only if L infers C in the limit from an informant [22]. For every X 2 C and presentation g , there is an index i 2 N such that charL .X /  X jgŒi , so from time i onwards, L will correctly identify X . Vice versa, if there is a language X 2 C for which no characteristic set exists, then one can construct a presentation of X on which L will never converge. 3.1.4. Polynomial time inference. A class C is learnable with polynomial time and data if there is an algorithm 1 L that infers C in the limit, a characteristic mapping charL , and polynomials p and q such that for every language X 2 C and every presentation g the following conditions hold: steps to output a conjec1. at each time i , the learner L uses p.jgi j/ computation P ture consistent with X jgŒi  , where jgi j D j 2¹1;:::;i º jg.j /j, and 2. jcharL .X /j D q.jAj/, where A is the smallest representation of X in R, see [35]. Under this complexity model, identification by enumeration does not yield polynomial-time inference of regular languages when the representation space is deterministic finite automata (if P ¤ NP). Recall that identification by enumeration takes as conjecture the smallest representation that is consistent with the current information. Our observation now follows from the NP-completeness of the minimal consistent representation (MCR) problem for DFAs [35], a decision problem which can be stated as follows: given k 2 N and a finite sample X jD of some unknown language X , is k the minimal integer such that there is a DFA of size k consistent with X jD ? The problem cannot even be efficiently approximated, since deciding whether a given DFA is only polynomially larger any state-minimal DFA is also NP-hard [51]. Intuitively, the difficulty is that to synthesise A, missing data must be guessed, and the hypothesis space is exponential in the size of D . We note that any polynomial-time procedure for computing the next hypothesis in the identification by enumeration algorithm would also provide an efficient solution to the MCR problem, so no such procedure can exist. An alternative attempt at in-the-limit learning could be to maintain a prefix tree acceptor that reflects the information in X jD . However, also this approach misses because although the time needed to update the conjecture is linear in the size of the input, the conjectured automaton changes with every new positive example received, so the algorithm does not converge if X is an infinite language. Gold [35] was first to present an algorithm that identifies the class of regular languages (over an alphabet A) in the limit with polynomial time and data. To compute a conjecture based on the information contained in X jD , Gold’s algorithm searches for a subset S of D from which a DFA A can be synthesised without having to guess missing data. If A agrees with X on all of D , then it becomes the learner’s next conjecture. If no such A can be obtained, then the learner synthesises a consistent PTA instead, which 1 Here we blur the distinction between the computable function L and the algorithm that computes it.

382

Henrik Björklund, Johanna Björklund, and Wim Martens

it uses as a dummy conjecture. In essence, the learner waits for a characteristic set for the target language, and disregards data which it cannot use easily. Gold’s algorithm is outlined in Algorithm 1. In the try-catch block spanning lines 1–5, an attempt is made to construct a DFA from the sample X jD by invoking the timid state characterisation algorithm, but the attempt is unsuccessful if X jD does not contain sufficient information. When this happens, the algorithm resorts to building a prefix tree acceptor that is consistent with X jD , something which is always feasible. On line 6, A.w/ denotes the Boolean that is true if and only if w is accepted by A. The procedure P r ef i x T r eeAc c ep t o r is outlined in Algorithm 5. Algorithm 1

Gold’s algorithm to identify a DFA in the limit from an informant [35]

Require: the restriction XjD W D ! B of XW A ! B to some finite domain D  A . Ensure: the DFA A is consistent with X on D . 1. 2. 3. 4. 5. 6. 7. 8.

try A Ti mi dStat eC h a r ac t er i z at i o n.XjD / catch insufficient information in XjD return P r efi x Tr eeAc c ep t o r.XjD / end try if 9w 2 DW A.w/ 6D XjD .w/ return P r efi x Tr eeAc c ep t o r.XjD / return A

The timid state characterisation algorithm (TSCA) is the core of Gold’s algorithm (see Algorithms 2 and 3). If TCSA is informed about X on any superset of a characteristic set for X with respect to Gold’s algorithm, then it returns the minimal DFA recognising X in polynomial time. It does so by constructing an observation table for X jD . However, if TSCA is not provided with a superset of a characteristic set, then it may fail to produce a DFA altogether. This is either because the observation table contains incomplete information, or because it contains inconsistencies that cannot be resolved without expanding the domain of X jD . The latter problem arises in the subprocedure Sy n t h es i z e (not listed explicitly) which computes the DFA associated to a closed and consistent observation table (which we explained how to do in § 2.2) or, if the table is not consistent, flags this and aborts. Gold’s algorithm thus performs rather poorly as long as it does not receive the right data, but since the data set continues to grow, there will be some time i 2 NC when the algorithm has received the shortest prefix of g that contains a characteristic set for X with respect to the learner. Although i is guaranteed to be finite, it may be arbitrarily large, and the majority of the algorithm’s conjectures before time i are prefix tree acceptors. 3.2. Learning from a minimally adequate teacher. Learning from a minimally adequate teacher (MAT-learning) was introduced in a seminal paper by Angluin [5].

11. Learning algorithms

383

Algorithm 2 Ti mi dStat eC h a r ac t er i z at i o n [35] Require: the restriction XjD W D ! B of XW A ! B to some finite domain D  A that contains a characteristic set for X . Ensure: the DFA A is consistent with X on D and is the minimal DFA for L.A/. S ¹"º T suffixes.D/ Obs FillTable.XjD ; S; T / while 9w  a 2 S  AW rowwa 62 ¹rowx j x 2 Sº do S S [ ¹w  aº Obs F i llTa b le.XjD ; S; T / return A Synthesize.S; T; Obs/

Algorithm 3 Require: the restriction XjD W A ! B of X to some finite domain D , and a pair of sets S; T  A . Ensure: Obs is a (possibly inconsistent and incomplete) observation table.

F i llTa b le

for all w 2 S do for all x 2 T do if w  x 2 D then Obs.w; x/ XjD .w  x/ else throw insufficient information return Obs

The MAT-learning scenario assumes a learner and a teacher. The learner wants to learn an unknown regular language X over alphabet A known by the teacher. In order to learn X , the learner asks questions, which the teacher must answer. In particular, the learner can ask two types of questions, namely 1. membership queries, each consisting of a word w , and 2. conjectures, each consisting of a description of a language Y . The teacher will respond to membership queries by answering either yes (if w 2 X ) or no (otherwise) and to conjectures by either yes (if X D Y ) or by a word x in the symmetric difference of X and Y (otherwise). The word x may be chosen arbitrarily and we refer to it as a counterexample. We say that a teacher that behaves as described is a minimally adequate teacher (MAT). Angluin’s main result is that the learner can learn any regular language from a teacher in time polynomial in jAj, jAj, and jxj, where jAj is the number of symbols in A, jAj denotes the number of states of the minimal DFA for X , and jxj denotes the maximum length of any counterexample presented by the teacher. If the teacher is able to answer both membership queries and conjectures correctly, it may seem that she should actually already have a representation of the language to be learned. If this is the case, we may ask ourselves why she doesn’t simply give the

384

Henrik Björklund, Johanna Björklund, and Wim Martens

learner this representation, rather than playing the cat-and-mouse game of queries and answers. One could therefore think that the MAT learning paradigm is certainly more of a mathematical abstraction than a realistic setting. Still, it has played an important role in the field of grammatical inference. The explanation for this has two parts. The first is that grammatical inference, as we have seen in § 3.1, is inherently hard. To be able to achieve positive theoretical results, we must either restrict the class of languages to be learned severely, or give the learner access to a powerful information source, such as a MAT. The second is that there are practical settings where we can assume something similar to a MAT. One such example comes from formal verification. One approach to so-called black box checking [49] and [37], uses a MAT algorithm where the equivalence queries to the teacher are replaced by conformance testing algorithms; see § 8. 3.2.1. The MAT learning algorithm. The crux of Angluin’s learning algorithm is that the learner maintains an observation table and iteratively makes this table closed and consistent by querying the teacher. The MAT learning algorithm is presented as Algorithm 4. Initially, the learner sets Spre D Ssuff D ¹"º and asks the teacher membership queries for " and for each a 2 A. The initial observation table is then constructed to reflect the learned information. Algorithm 4 The MAT learning algorithm by Angluin [5] 1. initialise Spre and Ssuff to ¹"º 2. ask membership queries for " and each a 2 A 3. construct the initial observation table .Spre ; Ssuff ; Obs/ 4. repeat 5. while .Spre ; Ssuff ; Obs/ is not closed or not consistent do 6. if .Spre ; Ssuff ; Obs/ is not closed then 7. find w1 2 Spre and a 2 A such that 8. roww1 a is different from roww for all w 2 Spre 9. add w1  a to Spre 10. extend Obs to .Spre [ Spre  A/  Ssuff using membership queries 11. if .Spre ; Ssuff ; Obs/ is not consistent then 12. find w1 and w2 in Spre , a 2 A, and w 2 Ssuff such that 13. roww1 D roww2 and Obs.w1  a  w/ ¤ Obs.w2  a  w/ 14. add a  w to Ssuff 15. extend Obs to .Spre [ Spre  A/  Ssuff using membership queries 16. once .Spre ; Ssuff ; Obs/ is closed and consistent, 17. let AObs be its associated automaton and make the conjecture AObs 18. if the teacher replies with a counterexample w then 19. add w and all its prefixes to Spre 20. extend Obs to .Spre [ Spre  A/  Ssuff using membership queries 21. until the teacher replies yes to the conjecture AObs 22. return AObs

11. Learning algorithms

385

The purpose of the algorithm’s main loop is to make the observation table closed and consistent, and to present the teacher with a conjecture. If the observation table is not consistent, then the learner finds words w1 , w2 in Spre , an a in A, and a w in Ssuff that witness this fact. The witnessing suffix aw is then added to Ssuff and the table is completed to incorporate this new entry. Similarly, if the observation table is not closed, the learner finds a word w1 in Spre and an a in A witnessing this fact. The witnessing word w1 a is then added to Spre and the table is completed to incorporate this new entry. If the observation table is closed and consistent, a conjecture AObs is generated. If the conjecture is correct, the learner is done and can output AObs . Otherwise, the counterexample w from the teacher is added, along with its prefixes, to Spre . The table is then completed to incorporate this new information. Angluin proves that the total running time of Algorithm 4 is O.km2 n2 C kmn3 /, where k D jAj, n is the number of states of the minimal DFA for X , and m is the maximum length of any counterexample presented by the teacher [5]. It follows that, if the teacher always presents counterexamples of minimal length, then they will be at most O.n/ in length and therefore the learning algorithm will run in time O.k n4 /. Example 3.1. We consider a run of Algorithm 4 with the language Z of the regular expression r D .a C b/ b.a C b/

as the target. This language consists of all words over A D ¹a; bº that have length at least two and such that the second to last letter is b . The first thing the algorithm does is to set Spre D Ssuff D ¹"º and to use membership queries to find the Obs-values for "; a, and b . This results in the following table. " a b

" 0 0 0

In the table, the columns represent elements of Ssuff and the rows elements in Spre [ .Spre  A/. Furthermore, we have the elements of Spre above the double horizontal line and the elements of .Spre  A/ n Spre below it. The value of a cell in row w and column x is Obs.w  x/. The table is clearly consistent, since Ssuff is a singleton. It is also closed, since every row that appears below the double line also appears above it. This means that the algorithm will reach line 16 and construct an automaton from the table. Since Spre is a singleton and the table contains only zeroes, the automaton will only have one state and this state will not be accepting. Thus it will accept the empty language. When the algorithm makes this conjecture, the teacher is obliged to provide a counterexample. Since the conjecture was an automaton that accepts nothing, the counterexample will have to be a word in Z . We will assume that the teacher responds with ba 2 Z . The algorithm now adds ba, b , and " to Spre (the latter was of course already included) and uses membership queries to compute a new table:

386

Henrik Björklund, Johanna Björklund, and Wim Martens

" 0 0 1 0 1 0 0

" b ba a bb baa bab

The new table is obviously closed, since every row below the double line is also present above the line. However, it is not consistent. For example, row" D rowb , but Obs."  a  "/ D Obs.a/ D 0 while Obs.b  a  "/ D Obs.ba/ D 1. This means that the algorithm will execute the code on lines 11–15, for example with the instantiations w1 D ", w2 D b , a D a, and w D ". In this case, a  " D a will be added to Ssuff and the following table will be computed:

" b ba a bb baa bab

" 0 0 1 0 1 0 0

a 0 1 0 0 1 0 1

This third table is consistent, since no two rows above the double line are the same. But, this time, it is not closed since rowbb does not appear above the double line. This means that the code on lines 6–10 will be executed with w1 D b and a D b . Finally, bb will be added to Spre with the following table as a result:

" b ba bb a baa bab bba bbb

" 0 0 1 1 0 0 0 1 1

a 0 1 0 1 0 0 1 0 1

11. Learning algorithms

387

This table is consistent, since no two rows above the double line have the same values, and closed, since all possible rows are represented above the double line. Thus the algorithm will reach line 16 again and construct an automaton A D .Q; I; E; T / with Q D ¹"; b; ba; bbº; I D ¹"º;

E D ¹."; a; "/; ."; b; b/; .b; a; ba/; .b; b; bb/;

.ba; a; "/; .ba; b; a/; .bb; a; ba/; .bb; b; bb/º;

T D ¹ba; bbº:

This automaton is shown in Figure 1 and is the minimal DFA for Z . Thus the teacher will approve the conjecture and the algorithm will terminate. Since its introduction, MAT learning has been applied to a range of object types. These include multilinear logic programs [39], two-tape automata [67], c -deterministic context-free grammars [58], non-deterministic finite automata [66], sequential transducers [65], pattern languages [1], transducers with origin information [10], and directed acyclic graph languages [9]. The topic of MAT learning is revisited in § 6, which outlines a MAT learner for regular tree languages. Weighted tree languages are also learnable within this setting [29]; a survey is provided in [27].

4. Learning from given data Recall that Gold’s algorithm (which was covered in § 3.1) is informed about a regular language X through an infinite sequence g of positive and negative examples and converges in the limit to the minimal DFA for X . The convergence, however, may take an unbounded amount of time, and the majority of the algorithm’s conjectures before that will be prefix tree acceptors (PTAs) which, in a practical setting, is not convenient. PTAs do not provide any compression or generalisation of the data. In other words, they do not represent the data more succinctly than the data itself and always describe exactly the positive examples seen so far, and nothing more. In brief, PTAs are not much more useful than the positive data itself. Since Gold’s first paper on learning in the limit, much effort has been invested in the search for polynomial-time learning algorithms that produce small automata and which generalise the given examples even when conditions are less than perfect, that is, when the examples do not contain a characteristic set. In this work, the emphasis has shifted from in-the-limit behaviour to the problem of learning from given data. This problem, which is also known as the consistency problem, reads as follows. Given a restriction XD of a target language X to a finite domain D , find a representation A in the hypothesis space R that agrees with X on all of D , that is, A.w/ D X.w/, for every w 2 D . Ideally, we want such representations to be small and we want to find them efficiently.

388

Henrik Björklund, Johanna Björklund, and Wim Martens

An algorithm that learns from given data may also identify X in the limit, but this is not always true. For example, the naive algorithm that outputs a PTA for the input sample solves the consistency problem, but it does not have the in-the-limit behaviour. However, if L is a learning algorithm that solves the consistency problem and every language X 2 C has a characteristic set with respect to L, then L identifies C in the limit. In our presentation of algorithms that learn from given data, we assume that the input is a restriction of the target language X to a finite domain D . As in § 3.1, we say that the function X jD is a sample of X . The algorithm’s objective is to produce an automaton that is consistent with X jD , that is, a representation A such that X jD1 .1/  L.A/ and X jD1 .0/ \ L.A/ D ;.

State merging algorithms. A common approach to learning from given data is to start by constructing a PTA for the positive sample X jD1 .1/ (see Algorithm 5) and then merge as many states as possible in this PTA. Since merging states may result in an automaton that recognises a larger language, each merge has to be preceded by a consistency check, to make sure that none of the negative examples from X jD1 .0/ are included in the language of the automaton. The method, which is due to Trakhtenbrot and Barzdin [61], has given rise to a family of state merging algorithms that mainly differ in the strategies they use to merge states and the order in which the merges are done. This order may influence the eventual outcome, because performing one state merge may preclude another one. Algorithm 5 Require: a restriction XjD of X to the finite domain D  A . Ensure: A is a DFA consistent with X on D .

P r efi x Tr eeAc c ep t o r

Q prefixes.XjD1 .1// I " E ¹.x; a; x  a/ j x  a 2 prefixes.XjD1 .1//º T XjD1 .1/ return A D .Q; I; E; T /

When the hypothesis space is deterministic finite automata, merges are typically done as in Algorithm 6. The algorithm takes as input a DFA A D .Q; I; E; T / and a pair of states q1 ; q2 in Q, and returns a DFA A0 in which q1 and q2 have been merged into the new state q . Every edge that once led to q1 or q2 now leads to q , and every edge that left q1 or q2 is now leaving q . If at least one of q1 or q2 is an initial state, then so is q , and the same holds for the property of being an accepting state. It is easy to see that the resulting automaton recognises a superset of L.A/, but is not necessarily deterministic. This is the case when both q1 and q2 have outgoing edges that are labelled by the same symbol, but which lead to different states p1 and p2 . To remove the nondeterminism, the merge algorithm calls itself recursively to merge p1 and p2 . Since every application of the algorithm produces a smaller automaton, the algorithm will eventually terminate and return a DFA.

11. Learning algorithms

389

Algorithm 6 M erg e Require: a DFA or NFA A D .Q; I; E; T / and a pair of states q1 ; q2 2 Q. Ensure: in the DFA A0 , the states q and q 0 have been merged into one. Q0 ¹¹qº j q 2 Q n ¹q1 ; q2 ºº [ ¹¹q1 ; q2 ºº I0 ¹q 0 2 Q0 j q 0 \ I 6D ;º 0 E ¹.q 0 ; a; p 0 / j 9q 2 q 0 ; p 2 p 0 W .p; a; q/ 2 Eº 0 T ¹q 0 2 Q0 j q 0 \ T 6D ;º A0 D .Q0 ; I 0 ; E 0 ; T 0 / while 9q 0 ; p10 ; p20 2 Q0 ; a 2 AW .q 0 ; a; p10 / 2 E 0 ^ .q 0 ; a; p20 / 2 E 0 do A0 M erg e.A0 ; p10 ; p20 / 0 return A

The RPNI algorithm. Perhaps the most well-known state merging algorithm is the regular positive-negative inference (RPNI) Algorithm [47]. The basic idea of the algorithm is to control the order in which merges of states are performed. In particular, the algorithm avoids merging two states that are both involved in loops. This is done as follows. First, the root of the PTA is coloured red and all successors of the root are coloured blue (all other states are white). The algorithm then always tries to merge a blue state with a red state. When a merge is performed, the new combined state is red and all its white successors are coloured blue. If a blue state cannot be merged with any red state, it is coloured red and its white successors become blue. The basic RPNI scheme is shown in Algorithm 7. Example 4.1. Consider running the RPNI algorithm with input ZjD , where  ZjD1 .1/ D ¹ba; bb; aba; bba; bbb; babb; baabaº and  ZjD1 .0/ D ¹"; a; b; aa; babº.

Constructing a prefix tree acceptor from ZjD1 .1/, we get the automaton shown in Figure 2(a). State p1 is the only red state and its successors p2 and p3 are blue. Algorithm 7 Require: a restriction XjD of X to the finite domain D  A . Ensure: A is a DFA consistent with X on D . 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

A D .Q; I; E; T / P r efi x Tr eeAc c ep t o r.XjD1 .1// R ed " B lu e E.R ed/ while B lu e ¤ ; do choose qb 2 B lu e B lu e B lu en¹qb º if 9qr 2 R ed: M erg e.A; qr ; qb / is consistent with X on D then A M erg e.A; qr ; qb / else R ed R ed[¹qb º B lu e E.R ed/nR ed return A D .Q; I; E; T /

RPNI

390

Henrik Björklund, Johanna Björklund, and Wim Martens

States p1 and p2 can be merged without violating ZjD . To avoid nondeterminism, the successors of p2 are “folded” into the successors of p1 . This means that p4 is merged with p3 and p7 is merged with p5 . The resulting automaton is depicted in Figure 2(b), where we have also introduced a number of new red states, which is justified as follows. If we were to merge p3 with p1 , the resulting automaton would accept a and b , both of which belong to ZjD1 .0/. Once p3 becomes red, p5 and p6 become blue. None of them can be merged with p1 , since they are accepting, while p1 is initial, and " belongs to ZjD1 .0/. Also, none of them can be merged with p3 , since the resulting automaton would accept b 2 ZjD1 .0/. Thus we can colour both of them red and their successors blue. We can now merge state p8 with p1 . The resulting automaton is shown in Figure 2(c). Next, we merge p9 with p3 , resulting in the automaton in Figure 2(d). Finally, state p10 can be merged with p5 and state p11 can be merged with p6 . This gives us the minimal DFA AZ for the language Z from Example 2.1, shown in Figure 1. The running time of the RPNI algorithm is polynomial in the size of D , that is, in the sum of lengths of the words for which X jD is defined. This can be seen from the fact that every iteration of the main while loop colours at least one white state blue and no state that has once become blue can ever become white. The algorithm also identifies the regular languages in the limit. In fact, for every implementation of RPNI, there is a mapping char_RPNIW Reg ! pow .U/ from the regular languages to the powerset of the universe such that, for every language X 2 Reg,  the size of char_RPNI.X / is polynomial in the size of the minimal DFA for X and  if char_RPNI.X /  D , then the algorithm, when run on input X jD , returns the minimal DFA for X . For a complete proof of this statement, we refer to de la Higuera [23]. Notice that above, we talk about a characteristic sample for every implementation of the RPNI algorithm. This is because Algorithm 7 does not completely specify in which order merges should be performed. To specify this completely, we must define how the choice of blue state on line 5 is made and in which order red states are considered on line 7. These choices lead to different functions for the characteristic samples and may also significantly influence the performance of the algorithm. It is also in these choices that many variants of the RPNI scheme differ. We discuss one such variant next. Evidence driven state merge. The evidence driven state merge (EDSM) algorithm was entered by Price into the Abbadingo One DFA Learning Competition [40] and was one of two co-winners. It is a good example of an RPNI-variant that uses a heuristic to find the merges that are in some sense most promising. These merges are then performed first. In each iteration the EDSM algorithm first checks whether there is some blue state that can be promoted. Only when this is not the case it tries to find a merge. At this stage, the algorithm evaluates every possible merge, that is, every pair of one red state and one blue state. Each possible merge is given a score and the merge with the highest

11. Learning algorithms b

p2

p4

391

a

p7

a

p8

b

p12

p9

b

p13

a p1 b

p5

a

b

p3 b

a p6

b

a

p14

a

p14

p10 p11

(a) The original prefix tree acceptor a a b

p1

a

p8

b

p12

b

p9

b

p13

a

p10

b

p11

p5

p3 b

p6

(b) After merging p1 and p2 a

a

a p1

b

a

p5

b

p9

b

p1

p3 b p6

a b

p10

p5

a

p13 b

p3

b b p6

a b

p11

(c) After merging p8 with p1

p10 p11

(d) After merging p9 with p3

Figure 2. The automata from Example 4.1. The accepting state are shown as double circles. Red states are coloured dark grey while blue states are light gray.

score is performed before the algorithm continues with the next iteration. The score for a merge of a red state pr and a blue state pb is computed as follows. The automaton A0 D M erg e.A; pr ; pb / is constructed. If it violates X jD , the score is set to 1. Otherwise, let S be the subset of the states of A0 such that, for every p 2 S , there is some word in D that takes A0 from the initial state to p . Set the score for the merge to jDj jS j. Intuitively, the score gets better the more words from D lead to the same state.

392

Henrik Björklund, Johanna Björklund, and Wim Martens

5. Learning non-deterministic finite automata While DFAs and NFAs both define the class of regular languages, there are many situations where one is preferable to the other. Much research on learning regular languages has focused on DFAs, largely because of the direct correspondence between the Myhill–Nerode equivalence classes and the states of a minimal DFA. In many applications, however, the use of NFAs as language representations is more desirable, primarily due to the compactness of representation they offer. There are also language classes for which the DFA-based methods perform poorly, and NFA-based methods can be expected to do better [32]. Since there is not always a unique minimal NFA for a language, most research into NFA learning deals with subclasses where the languages have representations that are in some sense unique. An important instance are the residual finite state automata. Residual languages. Let X  A be a language and let w 2 A be a word. Then the set of all suffixes that can be added to w to form a word in X is a residual language of X . In other words, Y is a residual language of X if there is a word w 2 A such that Y D ¹x j w  x 2 X º. We call Y the residual language of X with respect to w (also known as the Brzozowski derivative [12] or left quotient [68]). If w is not a prefix of any word in X , then the residual language of X with respect to w is ;. For the set of all residual languages of X , we write Res.X /. Recall that two words w1 and w2 are equivalent with respect to X in the sense of Myhill and Nerode, written w1 X w2 , if and only if for every word x , we have w1  x 2 X if and only if w2  x 2 X (Definition 2.1). This means that w1 X w2 if and only if the residual languages of X with respect to w1 and w2 are the same. We can thus conclude that every language X has exactly as many residual languages as the equivalence X has classes. In particular, a regular language has finitely many residual languages, one corresponding to each state of the minimal DFA for the language. We say that a residual language Y of X is prime if it is not the union of some members of Res.X / n ¹Y º. Equivalently, the residual language Y of X is composite, or non-prime, if and only if it is the union of all members of Res.X / that it properly contains. Example 5.1. Consider the language Z represented by the regular expression .a C b/ b.a C b/:

This language has the four equivalence classes Œ"Z ; ŒbZ ; ŒbaZ ; and ŒbbZ . In the following, we denote by Zw the residual language of Z with respect to w . In this notation, these classes correspond to residual languages as follows: Œ"Z ŒbZ ŒbaZ ŒbbZ

corresponds to Z" D ZI corresponds to Zb D Z [ ¹a; bºI corresponds to Zba D Z [ ¹"ºI corresponds to Zbb D Z [ ¹"; a; bº:

11. Learning algorithms

393

Notice that we have Zbb D Zb [ Zba , while none of the other residual languages can be formed as a non-trivial union of some of the others. Therefore, Z" ; Zb ; and Zba are prime residual languages, while Zbb is a non-prime residual language. Residual finite state automata. The above terminology was introduced by Denis et al., who also used it to define an important subclass of the NFAs, the Residual finite state automata (RFSA) [25] and [26]. As noted above, each state in the minimal DFA for a language corresponds to a unique residual language. This is also true for RFSA, but we do not require every residual language to be represented by a state. Definition 5.1. An NFA A is a residual finite state automaton if, for every state p of A, the language accepted by A, when started from p , is a residual language of the language accepted by A. In particular, every DFA without unreachable states is an RFSA. Indeed, if p is a state of A and w is a word that takes A from the initial state to p , then the language accepted by A when it is started from p is the residual language of L.A/ with respect to w . Denis et al. prove the following theorem [25]. Theorem 5.1. Every regular language X is accepted by an RFSA with a number of states equal to the number of prime residual languages of X . The automaton of Theorem 5.1 can be constructed as follows. Let PrimeRes.X / be the set of prime residual languages of the regular language X over alphabet A. We can construct an RFSA AX for X from PrimeRes.X / as follows:    

the set of states of AX is PrimeRes.X /; p 2 PrimeRes.X / is an initial state if p  X ; p 2 PrimeRes.X / is an accepting state if " 2 p ; and .p1 ; a; p2 / is an edge of AX if p2 is a subset of the residual language of p1 with respect to a.

The automaton AX is saturated in the sense that no additional state can be made initial and no additional edge can be added without changing the language of the automaton. It is also reduced in the sense that no state represents a language that is the union of the languages of some other states. In fact, AX is the unique reduced and saturated RFSA for X and is called the canonical RFSA for X , see [25] and [26]. Example 5.2. Consider our example language Z again, defined by the regular expression .a C b/ b.a C b/. Since the language has four equivalence classes, we know that the minimal DFA that recognises it has four states. Theorem 5.1 tells us that there is an RFSA for Z that has only three states, one each for the prime residual languages Z" , Zb , and Zba . Using the above construction, we obtain the canonical RFSA for Z , shown in Figure 3. The state for Z" is an initial state, since Z" D Z is a subset of Z . The state Zba is accepting, since " 2 Zba . For the transitions, consider, for example, the b -labelled edge from Zba to Zb . This edge is included because Zb D Z [ ¹a; bº is a subset of the residual language of Zba D Z [ ¹"º with respect to b , which is in fact Zb .

Henrik Björklund, Johanna Björklund, and Wim Martens

394

a; b

a; b

a; b

b Zb

Z" a; b

Zba b

Figure 3. The canonical RFSA for Z from Example 5.2

The uniqueness of canonical RFSA for regular languages opens up the possibility of reusing many of the learning techniques used for DFAs. A MAT learning algorithm for RFSA. The DFA-like properties of RFSA come to the fore in the context of MAT learning. Bollig et al. [11] developed such an algorithm. With the right definitions, the changes needed to the original algorithm of Angluin (Algorithm 4) are rather minor. Given an observation table .Spre ; Ssuff ; Obs/, we need a notion of inclusion between rows. Given w; x 2 Spre [ Spre  A, we say that the row of w is covered by the row of x , written roww v rowx , if Obs.w  y/ 6 Obs.x  y/ for every y 2 Ssuff . Let R be the set of rows of the table, i.e., R D ¹roww j w 2 Spre [ Spre  Aº. For a subset R of R, let JoinR be the function from Ssuff to ¹0; 1º defined by JoinR .y/ D maxr2R .r.y//. We can now define prime rows as follows. The row roww of w 2 Spre [ Spre  A is composite if there is an R  R n ¹roww º such that roww D JoinR . In fact, roww is composite if and only if it is the join of all rows that it strictly covers. Otherwise, roww is prime. We write Primes for the set of all prime rows and Primespre for Primes \ ¹roww j w 2 Spre º. The notion of closedness of an observation table now translates into a notion of RFSA-closedness that requires that every row corresponding to a word in Spre  A n Spre must be the join of the rows in Primespre that it covers. Formally, .Spre ; Ssuff ; Obs/ is RFSA-closed if, for every word w 2 Spre  A n Spre , there is R  Primespre such that roww D JoinR . Similarly, .Spre ; Ssuff ; Obs/ is RFSA-consistent if, for all w1 ; w2 2 Spre and every a 2 A, if roww1 v roww2 , then roww1 a v roww2 a . Finally, we need to modify the way an automaton is constructed from an observation table. Given .Spre ; Ssuff ; Obs/, we define the corresponding automaton AObs D .Q; I; E; T / with    

Q D Primespre , I D ¹p 2 Q j p v row" º, T D ¹roww 2 Q j Obs.w/ D 1º, and E D ¹.roww ; a; rowx // j rowx v rowwa º.

The well-definedness of AObs is proved similarly as in § 2.2. The MAT learner for RFSA is presented in Algorithm 8. It requires O.n2 / equivalence queries and O.mn3 /

11. Learning algorithms

395

membership queries, where n is the size of the minimal DFA for the language and m is the length of the longest counterexample received from the teacher [11]. Algorithm 8 The MAT-Learning algorithm for RFSA by Bollig et al. [11] initialise Spre and Ssuff to ¹"º ask membership queries for " and each a 2 A construct the initial observation table .Spre ; Ssuff ; Obs/ repeat while .Spre ; Ssuff ; Obs/ is not RFSA-closed or not RFSA-consistent do if .Spre ; Ssuff ; Obs/ is not RFSA-closed then find w1 2 Spre and a 2 A such that roww1 a 2 Primes n Primespre add w1  a to Spre extend Obs to .Spre [ Spre  A/  Ssuff using membership queries if .Spre ; Ssuff ; Obs/ is not RFSA-consistent then find w1 and w2 in Spre , a 2 A, and w 2 Ssuff such that Obs.w1  a  w/ D 1 and Obs.w2  a  w/ D 0 and roww1 v roww2 add a  w to Ssuff extend Obs to .Spre [ Spre  A/  Ssuff using membership queries once .Spre ; Ssuff ; Obs/ is RFSA-closed and RFSA-consistent, let AObs be its associated automaton and make the conjecture AObs if the teacher replies with a counterexample w then add w and all its suffixes to Ssuff extend Obs to .Spre [ Spre  A/  Ssuff using membership queries until the teacher replies yes to the conjecture AObs return AObs

Other learning algorithms for NFAs. During the first decade of the 21st century, several new algorithms for learning NFAs were presented. Here, we survey some of them. Denis at al. present a learning algorithm for RFSA, called DeLeTe2, which is based on prefix tree acceptors and state merging [26]. The algorithm does not necessarily learn the canonical RFSA for the target language, but rather an RFSA whose size lies between that of the canonical RFSA and the minimal DFA. The unambiguous finite automata (UFA) form another class of restricted nondeterministic finite automata, with the central property that no word in the language of a UFA has more than one accepting run [57]. Coste and Fredouille give an algorithm for learning a UFA from given data, again by merging states from a prefix tree acceptor [17]. It has the property that, given a specific sample, the same automaton will be constructed, irrespective of the order in which merges are performed. The consistency problem for unrestricted (that is, fully nondeterministic) finite automata is treated by Vazquez et al. [64]. They present a family of in-the-limit algorithms for nondeterministic finite automata that use maximal automata and state merging to infer a regular language. This work was later continued with the definition of the OIL algorithm, based on so-called universal automata [32] and [31].

396

Henrik Björklund, Johanna Björklund, and Wim Martens

6. Learning regular tree languages In this section we discuss the extension of MAT-learning (§ 3.2) to (ranked) regular tree languages. This topic was pioneered by Sakakibara, who studied the derivation trees of context-free grammars [55]. The connection between regular tree languages and derivation trees of context-free grammars can be seen as follows. The yield of a tree is the sequence of labels of the leafs, from left to right. The set of derivation trees of a context free grammar is always a regular tree language and each language of yields of a regular tree language is a context-free word language. Sakakibara generalised MAT-learning to skeletal tree languages, which are tree languages in which all internal nodes are unlabelled [42]. We note that MAT-learning can also be extended to weighted tree series – for a survey on this topic, we suggest [27]. The extension to full regular tree languages, which we present here, is by Drewes and Högberg [28]. We assume that the alphabet A is partitioned into A.0/ ; : : : ; A.R/ for some constant natural number R. Here A.i / contains the symbols of rank i . We denote trees by the letters t and s . We write (ranked) trees as a.t1 ; : : : ; tn /, where the root is labelled by a 2 A.n/ under which the (ranked) trees t1 ; : : : ; tn are attached. A context c is a tree in which exactly one leaf is labelled with a hole marker . The depth of a context c is the length of the path from the root of c to the hole marker, not including the hole marker. We denote the depth of context c by depth.c/. For example, a context where the hole marker is a child of the root has depth one. Given a context c and a tree t , we denote by cŒt the tree obtained by replacing the unique -labelled node in c by the tree t . If S is a set of trees, we denote by A.S / the set ¹a.t1 ; : : : ; tn / j a 2 A.n/ and t1 ; : : : ; tn 2 S º. As before, we denote the unknown regular (tree) language by X . 6.1. Observation tables for trees. An observation table for trees .Tpre ; Csuff ; Obs/ is defined analogously as in § 3.2 with the differences that  Tpre is a subtree-closed set of trees, that is, for every tree a.t1 ; : : : ; tn / 2 Tpre we also have hat t1 ; : : : ; tn are in Tpre ;  Csuff is a generalisation-closed set of contexts, that is, for every context of the form cŒa.t1 ; : : : ; ti 1 ; ; ti C1 ; : : : ; tn / 2 Csuff we have that the context c is in Csuff as well and t1 ; : : : ; ti 1 ; ti C1 ; : : : ; tn are in Tpre ;

 Obs is a function Obs W .Tpre [ A.Tpre //  Csuff ! ¹0; 1º such that, for a context c and a tree t , Obs.t; c/ D 1 if and only if cŒt 2 X .

By rowt we denote the function rowt W Csuff ! ¹0; 1º such that, for each context c , rowt .c/ D 1 if and only if cŒt 2 X . In this section, we are interested in learning bottom-up deterministic tree automata. For more details about tree automata, we refer to Chapter 7. With bottom-up automata in mind, the intuition behind observation tables for trees is similar to observation tables for words, i.e., Tpre can be seen as a set of prefixes and Csuff as a set of suffixes. An observation table for trees is closed if, for each a 2 A.n/ and trees t1 ; : : : ; tn 2 Tpre , there is a t 2 Tpre such that rowa.t1 ;:::;tn / D rowt . It is consistent if the following

11. Learning algorithms

397

condition is satisfied: for all a 2 A.n/ , all t1 ; : : : ; tn ; t10 ; : : : ; tn0 2 Tpre , if rowti D rowti0 for all i D Œn, then rowa.t1 ;:::;tn / D rowa.t10 ;:::;tn0 / . With a closed and consistent observation table for trees .Tpre ; Csuff ; Obs/, we can associate a finite bottom-up tree automaton AObs D .Q; A; ; T / that represents the information we currently S have about the unknown language X . Here, the transition relation  is a subset of n .Qn  A.n/  Q/. This tree automaton is defined as follows:  Q D ¹rowt j t 2 Tpre º;  for every n 2 N, a 2 A.n/ ,

.rowt1 ; : : : ; rowtn ; a; rowa.t1 ;:::;tn / / 2 I  I D ¹rowa j a 2 Aº; and  T D ¹rowt j t 2 Tpre \ X º.

The well-definedness of AObs is proved analogously as in § 3.2. Furthermore, given an observation table, AObs can be constructed in time linear in the size of its transition table, i.e., time jTpre jR  jCsuff j, where R is the maximum rank of A [28]. The following observation is easily proved by induction. Observation 6.1. Let .Tpre ; Csuff ; Obs/ be a closed and consistent observation table for trees. For all trees t 2 .Tpre [ A.Tpre // and all contexts c , we have that cŒt 2 L.AObs / if and only if Obs.t; c/ D 1. Moreover, AObs is the unique minimal bottom-up finite tree automaton with this property (up to isomorphism). 6.2. The MAT-learning algorithm for trees. The MAT-learning algorithm for regular tree languages is structured similarly to the standard MAT learning algorithm for regular word languages. However, as we will see, there are some subtleties that need to be taken into account when dealing with trees. The algorithm is summarised in Algorithm 9. In the main loop, if the observation table is not closed, we repair the table much like in Algorithm 4. The manner in which non-consistent tables are addressed, however, is quite different. We provide some intuition. If the table .Tpre ; Csuff ; Obs/ is not consistent, then there exist trees a.s1 ; : : : ; sn / and a.s10 ; : : : ; sn0 / in A.Tpre / such that, for all i 2 Œn, rowsi D rowsi0 , but rowa.s1 ;:::;sn / ¤ rowa.s10 ;:::;sn0 / . It follows that there also must be an i 2 Œn such that

rowa.s1 ;:::;si

0 0 1 ;si ;:::;sn /

¤ rowa.s1 ;:::;si ;si0 C1 ;:::;sn0 / ;

since otherwise we would have that

rowa.s1 ;:::;sn / D rowa.s1 ;:::;sn

0 1 ;sn /

D    D rowa.s10 ;:::;sn0 / :

This means that we can take c D a.s1 ; : : : ; si 1 ; ; si0 C1 ; : : : ; sn0 /, t0 D si , and t1 D si0 and obtain trees cŒt0 ; cŒt1  2 A.Tpre / with t0 ; t1 2 Tpre such that rowt0 D rowt1 but rowcŒt0  ¤ rowcŒt1  . Furthermore, we have that depth.c/ D 1. This is exactly what is tested on lines 10–18 of Algorithm 9.

398 Algorithm 9 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25.

Henrik Björklund, Johanna Björklund, and Wim Martens MAT-Learning for regular tree languages [28]

initialise Tpre WD ¹aº for some arbitrary a 2 A.0/ initialise Csuff to ¹º construct the initial observation table .Tpre ; Csuff ; Obs/ repeat while .Tpre ; Csuff ; Obs/ is not closed or not consistent do if .Tpre ; Csuff ; Obs/ is not closed then find t 2 A.Tpre / such that rowt is different from rows for all s 2 Tpre add t to Tpre extend Obs to .Tpre [ A.Tpre //  Csuff using membership queries if .Tpre ; Csuff ; Obs/ is not consistent then find cŒt0 , cŒt1  2 A.Tpre / with t0 ; t1 2 Tpre and depth.c/ D 1 such that rowcŒt0  ¤ rowcŒt1  and rowt0 D rowt1 find s0 ; s1 2 Tpre such that rows0 D rowcŒt0  and rows1 D rowcŒt1  find c 0 2 Csuff such that rowt0 .c 0 / ¤ rowt1 .c 0 / add c 0 Œc to Csuff extend Obs to .Tpre [ A.Tpre //  Csuff using membership queries once .Tpre ; Csuff ; Obs/ is closed and consistent, let AObs be its associated automaton and make the conjecture AObs if the teacher replies with a counterexample tree t then Ex t r ac t (Tpre, t ) extend Obs to .Tpre [ A.Tpre //  Csuff using membership queries until the teacher replies yes to the conjecture AObs return AObs

Once the observation table .Tpre ; Csuff ; Obs/ is closed and consistent, we produce a conjecture AObs . If the teacher replies with a counterexample tree t we could, in principle, simply add t so Tpre and extend Obs. However, since extending Obs would require adding all subtrees of t to Tpre , the observation table can possibly become extremely large. Therefore, a more refined approach is presented here in which we extract from t another counterexample s for which Tpre [¹sº is subtree-closed. As such, only one tree needs to be added to Tpre . The counterexample tree s is constructed through repeated substitutions of subtrees and its construction is detailed in the procedure E x t r ac t (Algorithm 10). The E x t r ac t procedure first locates a subtree t0 of t which is in A.Tpre / n Tpre . Since t … Tpre (Observation 6.1), such a subtree must exist. Then the procedure searches a tree t 0 2 Tpre such that cŒt 0  is also a counterexample (line 3). We test whether cŒt 0  is also a counterexample by testing whether rowt 0 D rowt0 and whether t 2 X () cŒt 0  2 X . This test is correct for the following reason. By construction of AObs and since rowt 0 D rowt0 , we have that  .t 0 / D  .t0 /, where  .s/ denotes the state of AObs reached at the root of s after reading s in a bottom-up fashion. Therefore, by the Myhill–Nerode theorem for trees, we have that cŒt 0  2 L.AObs / if and only if cŒt0  2 L.AObs /. But, since t D cŒt0  is a counterexample, we have t 2 L.AObs / if and

11. Learning algorithms

399

only if t … X . This implies that cŒt 0  2 L.AObs / if and only if cŒt 0  … X and therefore cŒt 0  is a counterexample as well. Finally, if it is not possible to find a replacement tree t 0 on line 3, we add t to Tpre and return. In particular, if the teacher provides a counterexample, E x t r ac t makes sure that at most one tree is added to Tpre . Algorithm 10 Procedure Ex t r ac t (Tpre, t ) for algorithm 9 1. choose context c and a subtree t0 of t such that 2. t0 2 A.Tpre / n Tpre and t D cŒt0  0 3. if there is a t 2 Tpre such that rowt 0 D rowt0 and (t 2 X ( ) cŒt 0  2 X ) then 0 4. Ex t r ac t (Tpre, cŒt ) 5. else 6. add t to Tpre

6.3. Complexity. The complexity of the learner is in polynomial time in the size I of the minimal deterministic tree automaton for X and the largest counterexample tmax returned by the teacher. Just as for Angluin’s algorithm we always have that the tree automaton AObs associated with .Tpre ; Csuff ; Obs/ is the minimal bottom-up deterministic tree automaton consistent with Obs (Observation 6.1). By induction on the number of iterations of the repeat-until loop, one can prove that 1. for all trees t1 ; t2 2 Tpre , t1 ¤ t2 , we always have that t1 6X t2 ; and 2. we always have that jCsuffj 6 jTpre j. This proves that the number of rows and columns in the observation table is always polynomial in I . In particular, it also implies that we execute the bodies of the if-tests at lines 6 and 10 in Algorithm 9, and the call to E x t r ac t at line 22 in Algorithm 9 at most polynomially often. Furthermore, the if-tests on lines 6 and 10 can clearly be executed in polynomial time in the size of the observation table. Finally, one call to E x t r ac t also takes time polynomial in jtmax j and I . Theorem 6.2. If tmax is the largest counterexample returned by the teacher, the running time of the learner is polynomial in I R and jtmax j, where R denotes the maximum rank in A and I the number of equivalence classes in X .

7. PAC learning In an effort to initiate the study of complexity issues for machine learning tasks, Valiant introduced a formal setting for studying such problems, which was later dubbed Probably approximately correct (PAC) learning [62]. In Valiants terminology, the setting was originally intended for studying the learning of concepts. A concept can, for example, be a Boolean function over a set of variables. Positive and negative examples can be given as variable assignments, annotated with information as to whether they belong to the concept or not. In such a setting, exact learning is very hard, and Valiant therefore made his setting probabilistic.

400

Henrik Björklund, Johanna Björklund, and Wim Martens

The main idea is that the set of examples should be drawn at random from some distribution D that is fixed, but unknown to the learning algorithm. The algorithm is then supposed to, with high probability, come up with a concept that gives an approximation of the concept to be learnt that is close with respect to D . In other words, for an element of the target domain, drawn at random according to D , the probability of a correct classification should be high. The following formal definition of PAC learning is due to Angluin [6]. Let U be a finite or countable universe and let X , the concept to be learned, be a subset of U. Let D be a probability distribution over U. Let ¹X1 ; X2 ; : : : º be a countable set of subsets of U. This set is the hypothesis space. As a knowledge source, the learner has access to an oracle. When called, the oracle returns an element x 2 U, drawn according to D . It also indicates whether x belongs to X or not. The PAC learning problem is parameterised by two positive probabilities: 1. The accuracy parameter  and 2. the confidence parameter ı . The learner is set to find, with probability at least 1 ı , an index i such that the probability that Xi disagrees with X on an element x 2 U drawn according to D is at most  . PAC learning for regular languages. Given an alphabet A and a probability distribution D over A , we can define a PAC learning setting for regular languages as follows. The universe is A and the concept to be learned is some regular language X  A . As hypothesis space, we could take the set of all DFAs over A. A PAC learning algorithm gets example words drawn from A according to D and annotated with the information whether they belong to X or not. To probably approximately correctly learn X with parameters  and ı , the DFA A that the algorithm outputs should, with probability at least 1 ı , be such that a word drawn according to D has probability at most  of being classified incorrectly by A. In other words, with probability at least 1 ı , a random word drawn according to D should lie in the symmetric difference of X and the language of A with probability at most  . The above still does not say anything about the complexity of PAC learning. After considering a number of possibilities, Pitt gave the following definition of what it would mean for the regular languages to be PAC learnable, assuming a DFA representation [50]. Definition 7.1. DFAs are PAC identifiable if and only if there exists a (possibly randomised) algorithm L such that for any input parameters 0 <  < 1 and 0 < ı < 1, for any DFA A of size n, for any number m, and for any probability distribution D on strings of A of length at most m, if L obtains words generated according to distribution D and labelled according to membership in L.A/, then L produces a DFA B such that, with probability at least 1 ı , the probability (with respect to D ) of the set ¹w j w 2 L.A/ ˚ L.B/º is at most  (where ˚ denotes the symmetric difference). The running time of L is required to be polynomial in n; m; 1 ; 1ı ; and jAj.

11. Learning algorithms

401

Notice that the above definition requires D to be a probability distribution over Am , rather than A , that is, over a finite subset of A . This is to avoid technical difficulties in dealing with words of arbitrary length, and is justified as follows. For any probability distribution D 0 over the countable set A and for any arbitrarily small  > 0, there is a number m such that the probability, when drawing a word from A according to D 0 , the probability of the word having length larger than m is smaller than . Thus there is a probability distribution D over A that assigns zero probability to all words longer than m and approximates D 0 within . Though PAC is a well known and widely used framework within machine learning, the results for grammatical inference have mostly been negative. In particular, a number of authors have shown the problem to be hard under various complexity-theoretic assumptions; an overview is given by Pitt [50]. Simple PAC. Since grammatical inference in the PAC model has not yielded any significant positive results, researchers have studied reasonable restrictions of the model. In particular, the PAC requirement that learning algorithms should work equally well for any distribution, and that the distribution is completely unknown to the algorithm, has been questioned. It can be argued that there are a number of practical settings in which the distribution is more predictable. For instance, it is not unreasonable to assume that the distribution will have a higher probability for words that are in some sense simple. The question is how to define this simplicity in a formal and uniform way. The method of choice has mostly been to use Kolmogorov complexity. A formal definition, known as PACS, for this setting was introduced by Denis et al. [24]. It assumes the distribution over the words to be the universal distribution of Solomomoff-Levin. The authors also prove some polynomiality results for restricted classes of DFAs. Later, Parekh and Honovar showed that the full class of DFA are PACS learnable in polynomial time [48].

8. Applications and further material To round off this chapter, we discuss a number of research areas closely related to grammatical inference. We can by no means claim to give a complete view of such fields, but rather aim at giving some examples and pointers. Natural language processing. Several well-known inference algorithms stem from the field of natural language processing. The algorithm Adios (for Automatic DIstillation Of Structure) derives a context-free grammar from a positive sample of unannotated sentences by searching for sequential patterns in the data [30]. A different approach is suggested by Sakakibara and Muramatsu who present a genetic algorithm (GA) for inferring context-free grammars from (partially structured) positive and negative samples. The algorithm organises a maximal set of nonterminals in a table similar to that used in chart parsing, and then uses a GA-based technique to merge nonterminals, so as to obtain a small output grammar [56].

402

Henrik Björklund, Johanna Björklund, and Wim Martens

A polynomial time in-the-limit algorithm that infers substitutable context-free languages from positive examples was presented by Clark and Eyraud [15]. A CFL L is substitutable if it adheres to Harris’ principle: if a pair of word sequences constitute the same grammatical category, then they can be interchanged in any sentence without altering the grammatical correctness of the sentence [38]. Harris’ principle is also the foundation for alignment-based learning (ABL), an inference framework for unsupervised learning. ABL algorithms typically operate in two phases. In the first phase, sample sentences are aligned to generate a set of hypotheses, each suggesting that a pair of sub-sentences constitute the same grammatical category. In the second phase, the most probable combination of hypotheses is selected through an expectation-maximisation search [63] and [33]. The algorithm Emile learns a shallow context-free language in the limit from a set of positive examples. A context-free language is shallow if it is generated by a grammar G such that every production in G is used in to generate at least one sentence of a length that is logarithmic in the size of G . Emile works by dividing and recombining the sample sentences, so as to discover syntactical categories. For efficiency reasons, some presentations of the algorithm includes a membership oracle that can answer whether a given combination of sentence fragments is grammatically correct [2] and [3]. Recently, there has been an increasing interest in so-called semantic graphs, such as abstract meaning representations [7], that model the meaning of natural language sentences. This has also led to work on MAT learning for restricted graph languages [9]. XML schemas and web applications. The eXtensible Markup Languages (XML) is a popular data format for exchanging data on the Web. In order to automatically process XML data, it is often beneficial to have an XML schema associated to the data (e.g., for automatic error detection in the data). However, such a schema is not always present. In terms of formal languages, one can abstract XML data as unranked trees (that is, trees in which each node can have arbitrarily many children) and XML schemas as tree automata. Therefore, Bex et al. investigated automatically learning XML schemas [8]. In this setting, a big challenge is that there is no negative information available, and therefore one encounters the problems discussed in § 3.1.1. In this particular case, a solution was the observation that regular expressions in practical XML schemas are often of a very restricted form [44] and that regular expressions of this form can be learned by positive information alone. Another interesting application of learning is by Carme et al. [13]. In particular, they investigate automatically learning web information extraction algorithms based from annotated examples by the user. More concretely, they learn node-selecting tree transducers which are based on stepwise tree automata, which allow a Myhill–Nerode characterisation for unranked trees [45]. Staworko and Wieczorek [59] and [60] studied learning algorithms for twig queries, also known as tree patterns [46] and [20]. Twig queries represent a fundamental core of XPath and therefore occur in many query languages for tree-structured data. Even though originally designed for querying tree-structured data, twig queries can also be used for querying graph databases [43] and [20]. Staworko and Wieczorek study

11. Learning algorithms

403

which twig queries are learnable with polynomial time and data [59] and which are characterisable by finite sets of positive and negative examples [60]. Learning and separability. A question closely related to learning problems is separability. We say that two languages LC and L are separable by a class of languages S if there exists a language S 2 S such that LC  S and S \ L D ;. In other words, if we take LC and L to be sets of positive and negative examples respectively, then separability asks if there is a language in S that is consistent with the given information. The corresponding decision problem is the separability problem for a class of languages C by S which asks, for two input languages LC and L from C, if they are separable by S. Separability is a classical problem that recently attracted much new interest. For instance, it was shown that separability of regular word languages by piecewise testable languages is in polynomial time [52] and [19]. Separability of regular languages by more and more expressive languages (notably, locally testable, locally threshold testable, and first order definable languages) was studied by Place et al. [53] and [54]. Beyond regular languages, Czerwiński et al. [21] provided a characterisation for when a class C has decidable separability by piecewise testable languages. Separability of expressive language classes such as one counter automata and Parikh automata by regular languages has recently been considered as well [16] and [18]. In the tree case, Goubault-Larrecq and Schmitz have shown that it is decidable whether regular ranked tree languages, ordered by homomorphic embedding, are separable by piecewise testable languages [36]. Verification and testing. We finally give a few examples where automaton learning techniques have been applied to the verification of software systems. For further applications, see the survey by Leucker [41]. In so-called black box checking [49] and [37], the goal is to model-check an implemented system, whose internal structure is unknown. One approach to this problem uses a MAT algorithm to obtain a model for the relevant aspects of the system’s structure. Here, the equivalence queries to the teacher are replaced by conformance testing algorithms, which compare an automaton to a “black box,” that is, to the implemented system. Another verification area where learning techniques have been used is compositional verification. Chen et al. [14] give a learning algorithm that learns the minimal separating DFA for a pair of regular languages L1 and L2 by querying an extended MAT oracle. This learning algorithm is then used to infer the contextual assumption needed for the compositional verification. For references to other uses of learning in compositional verification, see the introduction of Chen et al. [14].

References [1] P. J. Abisha, D. G. Thomas, and S. J. Kumaar, Learning subclasses of pure pattern languages. In Grammatical inference: algorithms and applications (A. Clark, F. Coste, and L. Miclet, eds). Proceedings of the 9th international colloquium, ICGI 2008, Saint-Malo,

404

[2] [3]

[4] [5] [6] [7]

[8] [9]

[10]

[11]

[12] [13] [14]

Henrik Björklund, Johanna Björklund, and Wim Martens France, September 22–24, 2008. Lecture Notes in Computer Science 5278. Springer, Berlin, 280–282. Zbl 1177.68104 q.v. 387 P. Adriaans, Learning shallow context-free languages under simple distributions. Article id. PP-1999-13, Institute for Logic, Language, and Computation, Amsterdam, 1999. q.v. 402 P. W. Adriaans, M. Trautwein, and M. Vervoort, Towards high speed grammar induction on large text corpora. In SOFSEM 2000: Theory and practice of informatics Proceedings of the 27th conference on current trends in theory and practice of informatics, Milovy, Czech Republic, November 25–December 2, 2000. Lecture Notes in Computer Science, 1963. Springer, Springer, 173–186. Zbl 1043.68771 q.v. 402 D. Angluin, Inductive inference of formal languages from positive data. Inform. and Control 45 (1980), no. 2, 117–135. MR 0584828 Zbl 0459.68051 q.v. 380 D. Angluin, Learning regular sets from queries and counterexamples. Inform. and Comput. 75 (1987), no. 2, 87–106. MR 0916360 Zbl 0636.68112 q.v. 377, 378, 382, 384, 385 D. Angluin, Queries and concept learning. Mach. Learn. 2 (1988), no. 4, 319–342. MR 3363446 q.v. 400 L. Banarescu, C. Bonial, S. Cai, M. Georgescu, K. Griffitt, U. Hermjakob, K. Knight, P. Koehn, M. Palmer, and N. Schneider, Abstract meaning representation for sembanking. In Proceedings of the 7 th Linguistic Annotation Workshop and Interoperability with Discourse (A. Pareja-Lora, M. Liakata, and S. Dipper, eds.). ACL, Stroudsburg, PA, USA, 2013. 178–186. q.v. 402 G. J. Bex, F. Neven, T. Schwentick, and S. Vansummeren, Inference of concise regular expressions and DTDs. ACM Trans. Database Syst. 35 (2010), no. 2, 1–47. q.v. 402 H. Björklund, J. Björklund, and P. Ericson, On the regularity and learnability of ordered DAG languages. In Implementation and application of automata (Carayol and C. Nicaud, eds.). Proceedings of the 22nd International Conference (CIAA 2017) held in Marne-laVallée, June 27–30, 2017. Lecture Notes in Computer Science, 10329. Springer, Cham, 2017, 27–39. MR 3677602 Zbl 06763311 q.v. 387, 402 M. Bojańczyk, Transducers with origin information. In Automata, languages, and programming (J. Esparza, P. Fraigniaud, T. Husfeldt, and E. Koutsoupias, eds.). Part II. Proceedings of the 41st International Colloquium (ICALP 2014) held at the IT University of Copenhagen, Copenhagen, July 8–11, 2014. Lecture Notes in Computer Science, 8573. Springer, Heidelberg, 2014, 26–37. MR 3238358 Zbl 1409.68152 q.v. 387 B. Bollig, P. Habermehl, C. Kern, and M. Leucker, Angluin-style learning of NFA. In 21st International Joint Conference on Artificial Intelligence. (IJCAI), Pasadena. Morgan Kaufmann Publishers, San Francisco, CA, 2009, 1004–1009. q.v. 394, 395 J. Brzozowski, Derivatives of regular expressions. J. Assoc. Comput. Mach. 11 (1964), 481–494. MR 0174434 Zbl 0225.94044 q.v. 392 J. Carme, R. Gilleron, A. Lemay, and J. Niehren, Interactive learning of node selecting tree transducers. Mach. Learn. 66 (2007, no. 1, 33–67. q.v. 402 Y. Chen, A. Farzan, E. Clarke, Y. Tsay, and B. Wang, Learning minimal separating DFA’s for compositional verification. In (S. Kowalewski and A. Philippou, eds.), Tools and algorithms for the construction and analysis of systems. Proocedings of the 15th international conference, TACAS 2009, held as part of the joint European conferences on theory and practice of software, ETAPS 2009, York, U.K., March 22–29, 2009. Lecture Notes in Computer Science, 5505. Springer, Berlin, 2009, 31–45. Zbl 1234.68166 q.v. 403

11. Learning algorithms

405

[15] A. Clark and R. Eyraud, Polynomial identification in the limit of substitutable context-free languages. J. Mach. Learn. Res. 8 (2007), 1725–1745. MR 2332446 Zbl 1222.68093 q.v. 402 [16] L. Clemente, W. Czerwiński, S. Lasota, and C. Paperman, Regular separability of parikh automata. In 44 th International Colloquium on Automata, Languages, and Programming. Proceedings of the colloquium (ICALP 2017) held in Warsaw, July 10–14, 2017. LIPIcs. Leibniz International Proceedings in Informatics, 80. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2017, Art. No. 117, 13 pp. MR 3685857 q.v. 403 [17] F. Coste and D. Fredouille, Unambiguous automata inference by means of state-merging methods. In Machine learning: ECML 2003 (N. Lavrač, D. Gamberger, H. Blockeel, and L. Todorovski, eds.), Proceedings of the 14th European Conference held in Cavtat, September 22–26, 2003. Lecture Notes in Computer Science, 2837. Lecture Notes in Artificial Intelligence. Springer, Berlin, 2003, 60–71. MR 2075641 Zbl 1257.68087 q.v. 395 [18] W. Czerwiński and S. Lasota, Regular separability of one counter automata. In 2017 32 nd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS). June 20–23, 2017, Reykjavík, Iceland. IEEE Press, Los Alamitos, CA, 2017, 12 pp. MR 3776905 Zbl 07080196 IEEEXplore 8005079 q.v. 403 [19] W. Czerwiński, W. Martens, and T. Masopust, Efficient separability of regular languages by subsequences and suffixes. In Automata, languages, and programming (F. V. Fomin, R. Freivalds, M. Kwiatkowska, and D. Peleg, eds.). Part II. Proceedings of the 40 th International Colloquium (ICALP 2013) held at the University of Latvia, Riga, July 8–12, 2013. Notes in Computer Science, 7966. Springer, Berlin, 2013, 150–161. MR 3109143 Zbl 1334.68115 q.v. 403 [20] W. Czerwiński, W. Martens, M. Niewerth, and P. Parys, Optimizing tree patterns for querying graph- and tree-structured data. SIGMOD Record 46 (2017), no. 1, 15–22. q.v. 402 [21] W. Czerwiński, W. Martens, L. van Rooijen, M. Zeitoun, and G. Zetzsche, A characterization for decidable separability by piecewise testable languages. Discrete Math. Theor. Comput. Sci. 19 (2017), no. 4, Paper No. 1, 28 pp. MR 3738510 Zbl 1400.68101 q.v. 403 [22] C. de la Higuera, Characteristic sets for polynomial grammatical inference. Mach. Learn. 27 (1997), 125–138. q.v. 381 [23] C. de la Higuera, Grammatical inference. Learning automata and grammars. Cambridge University Press, Cambridge, 2010. MR 2654165 Zbl 1227.68112 q.v. 390 [24] F. Denis, C. D’Halluin, and R. Gilleron, PAC learning with simple examples. In STACS ’96 (C. P. C and R. Reischuk, eds.). Proceedings of the 13th Annual Symposium on Theoretical Aspects of Computer Science held in Grenoble, February 22–24, 1996. Lecture Notes in Computer Science, 1046. Springer, Berlin, 1996, 231–242. MR 1462100 Zbl 1379.68190 q.v. 401 [25] F. Denis, A. Lemay, and A. Terlutte, Residual finite state automata. Fund. Inform. 51 (2002), no. 4, 339–368. MR 1999650 Zbl 1011.68048 q.v. 393 [26] F. Denis, A. Lemay, and A. Terlutte, Learning regular languages using RFSAs. Algorithmic learning theory. Theoret. Comput. Sci. 313 (2004), no. 2, 267–294. MR 2051789 Zbl 1059.68058 q.v. 393, 395 [27] F. Drewes, MAT learners for recognizable tree languages and tree series. Acta Cybernet. 19 (2009), no. 2, 249–274. MR 2584150 Zbl 1224.68039 q.v. 387, 396 [28] F. Drewes and J. Högberg, Learning a regular tree language from a teacher. In Developments in language theory (Z. Ésik and Z. Fölöp, eds.). Papers from the 7th International Conference (DLT 2003) held at the University of Szeged, Szeged, July 7–11, 2003. Lecture

406

[29] [30]

[31]

[32]

[33]

[34] [35] [36]

[37] [38] [39]

[40]

[41]

Henrik Björklund, Johanna Björklund, and Wim Martens Notes in Computer Science, 2710. Springer, Berlin, 2003, 279–291. MR 2054371 Zbl 1037.68082 q.v. 396, 397, 398 F. Drewes and H. Vogler, Learning deterministically recognizable tree series. J. Autom. Lang. Comb. 12 (2007), no. 3, 333–354. MR 2436367 Zbl 1149.68384 q.v. 387 S. Edelman, Z. Solan, D. Horn, and E. Ruppin, Learning syntactic constructions from raw corpora. In Boston university conference on language development (BUCLD) (A. Brugos, M. R. Clark-Cotton, and S. Ha, eds.). Cascadilla Press, Somerville, MA, 2005, 180–191. q.v. 401 P. García, M. Vázques de Parga, G. Álvarez, and J. Ruiz, Universal automata and NFA learning. Theoret. Comput. Sci. 407 (2008), no. 1–3, 192–202. MR 2463006 Zbl 1153.68030 q.v. 395 P. García, M. Vázquez de Parga, G. Álvarez, and J. Ruiz, Learning regular languages using nondeterministic finite automata. In Learning regular languages using nondeterministic finite automata (O. H. Ibarra and B. Ravikumar, eds.). Proceedings of the 13th International Conference (CIAA 2008) held at San Francisco State University, San Francisco, CA, July 21–24, 2008. Lecture Notes in Computer Science, 5148. Springer, Berlin, 2008, 92–101. MR 2504701 Zbl 1172.68508 q.v. 392, 395 J. Geertzen and M. van Zaanen, Grammatical inference using suffix trees. In Grammatical inference: algorithms and applications (G. Paliouras and Y. Sakakibara, eds.). Proceedings of the 7th international colloquium, ICGI 2004, Athens, Greece, October 11–13, 2004. Lecture Notes in Computer Science 3264. Lecture Notes in Artificial Intelligence. Springer, Berlin, Heidelberg, 2004, 163–174. MR 1111.68472 q.v. 402 E. Gold, Language identification in the limit. Inform. and Control 10 (1967), no. 5, 447–474. MR 3155391 Zbl 0259.68032 q.v. 380 E. Gold, Complexity of automaton identification from given data. Inform. and Control 37 (1978), no. 3, 302–320. MR 0495194 Zbl 0376.68041 q.v. 381, 382, 383 J. Goubault-Larrecq and S. Schmitz, Deciding piecewise testable separability for regular tree languages. In 43rd International Colloquium on Automata, Languages, and Programming (I. Chatzigiannakis, M. Mitzenmacher, Y. Rabani, and D. Sangiorgi, eds.). LIPIcs. Leibniz International Proceedings in Informatics, 55. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2016, Art. No. 97, 15 pp. MR 3577158 Zbl 1388.68172 q.v. 403 A. Groce, D. Peled, and M. Yannakakis, Adaptive model checking. Log. J. IGPL 14 (2006), no. 5, 729–744. MR 2299997 Zbl 1108.68073 q.v. 384, 403 Z. S. Harris, Methods in structural linguistics. University of Chicago Press, Chicago, IL, USA, 1951. q.v. 402 K. Ito and A. Yamamoto, Polynomial-time MAT learning of multilinear logic programs. In Algorithmic learning theory (S. Doshita, K. Furukawa, K. P. Jantke, and T. Nishida, eds.). Proceedings of the Third Workshop (ALT ’92) held in Tokyo, October 20–22, 1992. Lecture Notes in Computer Science, 743. Lecture Notes in Artificial Intelligence. Springer, Berlin, 1993, 63–74. MR 1291400 Zbl 0925.68371 q.v. 387 K. Lang, B. Pearlmutter, and R. Price, Results of the Abbadingo one DFA learning competition and a new evidence-driven state merging algorihtm. In Grammatical inference (V. Honavar and G. Slutzki eds.). Proceedings of the 4th International Colloquium, ICGI-98 Ames, Iowa, USA, July 12–14, 1998. Lecture Notes in Computer Science, 1433. Springer, Berlin, 1998, 1–12. q.v. 390 M. Leucker, Learning meets verification. In Formal methods for components and objects. (F. S. de Boer, M. M. Bonsangue, S. Graf, and W.-P. de Roever, eds.). Proceedings of the

11. Learning algorithms

[42] [43]

[44]

[45]

[46] [47]

[48] [49] [50]

[51]

[52]

[53]

407

5th international symposium, FMCO 2006, Amsterdam, November 7–10, 2006. Lecture Notes in Computer Science, 4709. Springer, Berlin, 2006, 127–151. Zbl 1147.68541 q.v. 403 L. Levy and A. Joshi, Skeletal structural descriptions. Inform. and Control 39 (1978), no. 2, 192–211. MR 0516825 Zbl 0387.68067 q.v. 396 L. Libkin, W. Martens, and D. Vrgoč, Querying graph databases with XPath. In Database theory—ICDT 2013. (W.-C. Tan, G. Guerrini, B. Catania, and A. Gounaris, eds.). Proceedings of the 16th International Conference held in Genoa, March 18–22, 2013. Association for Computing Machinery, New York, 2013, 129–140. MR 3480138 q.v. 402 W. Martens, F. Neven, and T. Schwentick, Complexity of decision problems for XML schemas and chain regular expressions. SIAM J. Comput. 39 (2009/10), no. 4, 1486–1530. MR 2580537 Zbl 1211.68162 q.v. 402 W. Martens and J. Niehren, On the minimization of XML Schemas and tree automata for unranked trees. J. Comput. System Sci. 73 (2007), no. 4, 550–583. MR 2320185 Zbl 1115.68099 q.v. 402 G. Miklau and D. Suciu, Containment and equivalence for a fragment of XPath. J. ACM 51 (2004), no. 1, 2–45. MR 2146078 Zbl 1316.68047 q.v. 402 J. Oncina and P. García, Identifying regular languages in polynomial time. In Advances in structural and syntactic pattern recognition (H. Bunke, ed.). Proceedings of the International Workshop held in Bern, August 26—28, 1992. Series in Machine Perception and Artificial Intelligence, 5. World Scientific, Singapore, 1993, 99–108. q.v. 389 R. Parekh and V. Honavar, Learning DFA from simple examples. Mach. Learn. 44 (2001), 9–35. Zbl 0992.68116 q.v. 401 D. Peled, M. Vardi, and M. Yannakakis, Black box checking. J. Autom. Lang. Comb. 7 (2002), no. 2, 225–246. MR 1995096 Zbl 1046.68072 q.v. 384, 403 L. Pitt, Inductive inference, DFAs, and computational complexity. In Analogical and inductive inference (K. P. Jantke, ed.). Proceedings of the Second International Workshop (AII ’89) held at Reinhardsbrunn Castle, October 1–6, 1989. Lecture Notes in Computer Science, 397. Lecture Notes in Artificial Intelligence. Springer, Berlin, 1989, 18–44. MR 1035261 q.v. 400, 401 L. Pitt and M. K. Warmuth, The minimum consistent DFA problem cannot be approximated within and polynomial. In Proceedings of the 21st Symposium on the Theory of Computing. STOC 1989. Held in Seattle, May 14-17, 1989. Association for Computing Machinery, New York, NY, USA, 1989, 421–432. q.v. 381 T. Place, L. van Rooijen, and M. Zeitoun, Separating regular languages by piecewise testable and unambiguous languages. In Mathematical foundations of computer science 2013 (K. Chatterjee and J. Sgall, eds.). Proceedings of the 38th International Symposium (MFCS 2013) held in Klosterneuburg, August 26–30, 2013. Lecture Notes in Computer Science, 8087. Springer, Berlin, 2013, 729–740. MR 3126252 Zbl 1400.68113 q.v. 403 T. Place, L. van Rooijen, and M. Zeitoun, Separating regular languages by locally testable and locally threshold testable languages. In 33 rd International Conference on Foundations of Software Technology and Theoretical Computer Science (Edited by A. Seth and N. K. Vishnoi, eds.). Proceedings of the conference (FST & TCS 2013) held in Guwahati, December 12–14, 2013. LIPIcs. Leibniz International Proceedings in Informatics, 24. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2013, 363–375. MR 3166026 Zbl 1359.68178 q.v. 403

408

Henrik Björklund, Johanna Björklund, and Wim Martens

[54] T. Place and M. Zeitoun, Separating regular languages with first-order logic. In Proceedings of the Joint Meeting of the Twenty-Third EACSL Annual Conference on Computer Science Logic (CSL) and the Twenty-Ninth Annual ACM/IEEE Symposium on Logic in Computer Science (LICS). Held in Vienna, July 14–18, 2014. Association for Computing Machinery, New York, 2014, Article No. 75, 10 pp. MR 3397696 Zbl 1401.68165 q.v. 403 [55] Y. Sakakibara, Learning context-free grammars from structural data in polynomial time. Theoret. Comput. Sci. 76 (1990), no. 2–3, 223–242. MR 1079527 Zbl 0704.68067 q.v. 396 [56] Y. Sakakibara and H. Muramatsu, Learning context-free grammars from partially structured examples. In Grammatical inference: algorithms and applications. (A. L. Oliveira, ed.). Proceedings of the conference held Lisbon, September 11–13, 2000. Lecture Notes in Computer Science, 1891. Lecture Notes in Artificial Intelligence. Springer, Berlin, 229–240. Zbl 0974.68530 q.v. 401 [57] E. Schmidt, Succinctness of description of context-free, regular and unambiguous languages. Ph.D. thesis. Cornell University, Ithaca, N.Y., 1978. q.v. 395 [58] H. Shirakawa and T. Yokomori, Polynomial-time MAT learning of c-deterministic contextfree grammars. Transaction of Information Processing Society of Japan 34 (1993), 380–390. q.v. 387 [59] S. Staworko and P. Wieczorek, Learning twig and path queries. In Database theory—ICDT 2012 (A. Deutsch, ed.). Proceedings of the 15th International Conference held in Berlin, March 26–29, 2012. Association for Computing Machinery, New York, 2012, 140–154. MR 3476909 q.v. 402, 403 [60] S. Staworko and P. Wieczorek, Characterizing XML twig queries with examples. In 18 th International Conference on Database Theory (M. Arenas and M. Ugarte, eds.). Proceedings of the conference (ICDT ’15) held in Brussels, March 23–27, 2015. LIPIcs. Leibniz International Proceedings in Informatics, 31. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2015, 144–160. MR 3368705 Zbl 1365.68223 q.v. 402, 403 [61] B. A. Trakhtenbrot and Y. M. Barzdin, Finite automata. Behavior and synthesis. Translated from the Russian by D. Louvish. English translation edited by E. Shamir and L. H. Landweber. Fundamental Studies in Computer Science, 1. North-Holland Publishing Co., Amsterdam and London, and American Elsevier Publishing Co., New York, 1973. MR 0351686 Zbl 0271.94032 q.v. 388 [62] L. Valiant, A theory of the learnable. Comm. ACM 27 (1984), 1134–1142. q.v. 375, 399 [63] M. van Zaanen, Bootstrapping structure into language: alignment-based learning. Ph.D. thesis. School of Computing, University of Leeds. Leeds, 2001. q.v. 402 [64] M. Vázquez de Parga, P. García, and J. Ruiz, A family of algorithms for non deterministic regular languages inference. In Implementation and application of automata (O. H. Ibarra and H.-C. Yen, eds.). Papers from the 11th International Conference (CIAA 2006) held at National Taiwan University, Taipei, August 21–23, 2006. Lecture Notes in Computer Science, 4094. Springer, Berlin, 2006, 265–274. MR 2296464 q.v. 395 [65] J. M. Vilar, Query learning of subsequential transducers. In Grammatical inference: learning syntax from sentences (L. Miclet and C. de la Higuera, eds.). Lecture Notes in Computer Science, 1147. Lecture Notes in Artificial Intelligence. Springer, Berlin, 1996, 2–83. MR 1483651 q.v. 387 [66] T. Yokomori, Learning non-deterministic finite automata from queries and counterexamples. In Machine intelligence and inductive learning (K. Furukawa, D. Michie, and S. Muggleton, eds.). Machine Intelligence, 13. The Clarend Press, Oxford University Press, New York, N.Y., 1994, 169–189. q.v. 387

11. Learning algorithms

409

[67] T. Yokomori, Learning two-tape automata from queries and counterexamples. Math. Systems Theory 29 (1996), no. 3, 259–270. MR 1374497 q.v. 387 [68] S. Yu, Regular languages. In Handbook of formal languages (G. Rozenberg and A. Salomaa, eds.). Vol. 1. Word, language, grammar. Springer, Berlin, 1997, Chapter 2, 41–110. MR 1469994 q.v. 392

Chapter 12

Descriptional complexity of regular languages Hermann Gruber, Markus Holzer, and Martin Kutrib

Contents 1. 2. 3. 4. 5.

Introduction . . . . . . . . . . . . . . . . . . . . . . Descriptional complexity and lower bound techniques Transformation between models for regular languages Operations on regular languages . . . . . . . . . . . . Some recent developments . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

411 411 418 429 441

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

445

1. Introduction Regular languages can be represented in many ways. Here we focus on the two major description mechanisms, namely finite automata, to be more precise, deterministic and nondeterministic finite automata, and regular expressions. The equivalence of these devices by the classical constructions of Rabin and Scott [158] and Kleene [124] can nowadays be found in almost all monographs on automata theory and formal languages such as, e.g., [96]. Here we view these devices from a descriptional complexity perspective. That is, we compare their capabilities to represent languages succinctly by measuring their “sizes” relative to each other. We give a brief overview of the descriptional complexity of determinisation, conversion between finite automata and regular expressions and vice versa, and of performing basic language operations on finite automata, as well as on regular expressions. To this end we introduce the basics on descriptional complexity of finite automata and regular expressions and moreover briefly describe simple applicable lower bound techniques.

2. Descriptional complexity and lower bound techniques In this section we summarise the basics of the descriptional complexity of finite automata and regular languages. In particular we give a brief overview on most descriptional complexity measures for these devices and concentrate on lower bound techniques that are simple to apply. These techniques are specific for finite automata and regular expressions. Proving lower bounds sometimes also relies on ad hoc or counting methods.

412

Hermann Gruber, Markus Holzer, and Martin Kutrib

2.1. Finite automata and lower bound techniques. Our nomenclature on finite automata is as follows: throughout this chapter, if not otherwise stated, we assume that every nondeterministic finite automaton (NFA) A over the alphabet A has one initial state only. For convenience we sometimes interpret the edge relation Q  A  Q of a NFA with states Q and alphabet A as a function from Q  A into the subsets of Q. As in the case for DFAs we call this function the transition function of the automaton. Recall that by definition every deterministic finite automaton has only one initial state. Moreover it is assumed that for each state p and each letter a there is exactly one edge starting in p and carrying the label a. So, any DFA is complete, i.e., the transition function is total, whereas it may be a partial function for NFAs in the sense that the transition function of nondeterministic machines may map to the empty set. Thus, a sink state is counted for DFAs, whereas it is not counted for NFAs. Finally, a finite automaton is said to be minimal if its number of states is minimal with respect to the accepted language. There are many ways to measure the size of a finite automaton: the number of states, the number of transitions, or both the number of states and the number of transitions. The most popular descriptional measure finite automata in the literature is the number of states. 2.1.1. State complexity of finite automata. Obviously, a regular language is accepted by infinitely many finite automata. Thus, the deterministic (nondeterministic) state complexity of a regular language L, denoted by sc.L/ (nsc.L/), is the minimal number of states needed by a deterministic (nondeterministic) finite automaton to accept L. How about the relationship between these measures? Since the beginning of automata theory deterministic and nondeterministic state complexity were the subject of intensive research. The most basic relations between these measures are

nsc.L/ 6 sc.L/ 6 2nsc.L/ : The former inequality is due to the fact that every DFA is an NFA, and the latter inequality comes from the classical result which concerns the cost of determinising finite automata by the so-called powerset construction [158]. The latter bound is known to be sharp, i.e., there is an infinite family of languages reaching the exponential bound. We discuss it in the forthcoming subsection on determinisation of NFAs in more detail. To this end, we have to introduce the necessary tools that allow us to prove upper and lower bounds for DFAs and NFAs. In most cases, upper bounds are provided by explicit construction of a finite automaton. Now we turn to some lower bound techniques. Determining the minimal number of states for a DFA A needed to accept a regular language L  A is the problem of computing the index of the Myhill–Nerode equivalence relation  on L, i.e., the cardinality of the set of equivalence classes. The minimal DFA accepting the language under consideration is isomorphic to the quotient automatoninduced by the equivalence classes of the Myhill–Nerode relation. Thus DFAs admit a canonical minimal description. For further details we refer to the chapter on finite automata (Chapter 10). Yet another method to determine the minimal number of states of a DFA needed to accept a language was proposed in [25], which was based

12. Descriptional complexity of regular languages

413

on the old method of derivatives of regular expressions. This led to the definition of the quotient complexity of a regular language L as the number of distinct (left) quotients wnL D ¹ u j wu 2 L º

of L. It should be clear that the state complexity of a regular language is equal to its quotient complexity, due to the relation of quotients with the Myhill–Nerode relation and thus the quotient automaton. This change in terminology has some advantages and turned out to be very fruitful (cf. [23], [24], and [32]), even for the operation problem on regular languages, which is discussed later in detail. It is worth mentioning that the congruence classes of the Myhill–Nerode relation of the reversal of L, that is, the language LR , can be described by non-empty intersections of complemented and uncomplemented quotients of the language L. These non-empty intersections are called atoms and were first introduced and investigated in [39]; also see [105]. Further studies on the state or quotient complexity of atoms can be found in [27] and [38]. An important contribution of the study of atoms of regular languages is the átomaton of a regular language, which is a unique NFA whose states are built from the atoms of the language. Atomic NFAs give a better understanding, why Brzozowski’s doublereversal method for minimising DFAs produces a minimal DFA, since it was shown that the result of applying the subset construction to an NFA is a minimal DFA if and only if the reverse of the NFA is atomic. Next, we discuss what is known on the state complexity of NFAs. For NFAs the situation is more involved and even simple problems on state complexity issues are still open. One reason why determining the nondeterministic state complexity or even deducing a lower bound is hard, is due to the fact that a minimal NFA for a regular language is not necessarily unique (see Figure 1 which shows two minimal non-isomorphic NFAs [9]). A remarkably simple lower bound technique for the nondeterministic state complexity of regular languages is a method put forward first in [16] which is commonly called the fooling set technique; see also [63] for a similar account that appeared later.

1

a; b

2

a; c b; c

a

1 c

3

c

3

b 4

a

2

b

5

4

a; b

a; c

b; c

5

Figure 1. Two non-isomorphic minimal 5-state NFAs accepting the finite language L D ¹ab; ac; ba; bc; ca; cbº

414

Hermann Gruber, Markus Holzer, and Martin Kutrib

Theorem 2.1 (fooling set and extended fooling set technique). Let L  A be a regular language and suppose there exists a set of pairs S D ¹ .xi ; yi / j 1 6 i 6 n º with the following properties: If xi yi 2 L for 1 6 i 6 n, and xi yj 62 L, for 1 6 i; j 6 n, and i ¤ j , then any nondeterministic finite automaton accepting L has at least n states, i.e., nsc.L/ > n. Here S is called a fooling set for L. The statement remains valid if the latter condition is replaced by i ¤ j implies xi yj 62 L or xj yi 62 L, for 1 6 i; j 6 n. In this case, S is called an extended fooling set for L. With this technique one can easily verify that the language L from Figure 1 satisfies nsc.L/ D 5. For instance, S D ¹.; ab/; .ab; /º [ ¹.a; b/; .b; c/º is a fooling set and S 0 D ¹.; ab/; .ba; /º [ ¹.a; b/; .b; c/; .c; a/º is an extended fooling set for L. The size of S 0 exactly matches the nondeterministic state complexity of L, while S is one element off the optimum, but is best possible with respect to the fooling set condition. Thus, sometimes the fooling set techniques are not powerful enough to establish tight lower bounds. Nevertheless both techniques are widely used in the literature. When comparing them one can prove the following interesting result: Theorem 2.2. There exists languages Ln , for n > 1, such that the nondeterministic state complexity of Ln is at least n, i.e., nsc.Ln / > n, but any fooling set for Ln has size at most c , for some constant c . An analogous statement holds for nondeterministic state complexity and extended fooling sets as well as for extended fooling sets and fooling sets. Recently it was shown in [170] that the lower bound provided by the fooling set technique is tight for and only for biseparable automata and, moreover, that the lower bound provided by the extended fooling set technique is tight for any language accepted by a biseparable residual finite automaton [135]. A slightly more general lower bound technique, which is still simple, is the socalled biclique edge cover technique introduced in [71]. It was shown that the bound provided by it is at least logarithmic in the nondeterministic state complexity. This technique is a less bulky way of formulating the nondeterministic message complexity method [99]. In principle, the idea can be traced back at least to 1970, when a work of Kameda and Weiner [118] on state minimal NFAs appeared. Yet another different viewpoint of the biclique edge cover technique, which, however, ultimately leads to the same concept, is given in [46] and [99]. Generalisations were investigated recently in [101], where it is proved that all these generalisations in principle still share the shortcoming of the biclique edge cover technique, as indicated by the aforementioned relation to nondeterministic state complexity. 2.1.2. Transition complexity of finite automata. Here the number of transitions is equal to the number of edges of a finite automaton A, where multiple alphabet symbols give rise to multiple edges. Next we define the deterministic (nondeterministic) transition complexity of L to be the minimal number of transitions needed by a deterministic (nondeterministic) finite automaton to accept L. We denote the deterministic (nondeterministic) transition complexity of L by tc.L/ (ntc.L/). Clearly, an n-state finite

12. Descriptional complexity of regular languages

415

automaton in which all states are useful has at least n 1 transitions. On the other hand, an NFA has a priori at most jAj  n2 transitions, and if the automaton is deterministic, the number of transitions is exactly jAj  n. These basic observations are summarised as follows: Let L be a regular language, then

tc.L/ D jAj  sc.L/ and

nsc.L/

1 6 ntc.L/ 6 jAj  .nsc.L//2 :

Observe, that nondeterministic transition complexity already appeared, under the name size, in the landmark paper of Meyer and Fischer [150]. But, to our knowledge, a more systematic study of nondeterministic transition complexity on its own right started only a few years ago. An explanation for the recent rise in interest is an unexpected result from [100] regarding the size of regular expressions and the number of transitions in equivalent NFAs. When comparing nondeterministic state and transition complexity it has been recently shown in [70] and independently in an unpublished work by Kari (cf. [49]) that the nondeterministic transition complexity can be almost quadratic in terms of nondeterministic state complexity, namely there are finite languages Ln , .nsc.Ln //2  for n > 1, such that any NFA accepting Ln needs  log nsc.Ln / transitions, even for binary alphabets. This lower bound was obtained by probabilistic combinatorial methods, i.e., counting arguments, and thus, is highly non-constructive. On the other hand, in [102] languages Ln are constructed over alphabets of growing size such that nsc.Ln / D O.n  2n / and any NFA accepting Ln needs .22n / transitions. In the same paper a slightly weaker lower bound for the transition complexity of languages over a fixed alphabet is given. Nevertheless, the current understanding of nondeterministic transition complexity is still rather limited, in particular what concerns lower bound techniques for nondeterministic transition complexity. Lower bound techniques similar to that for nondeterministic state complexity are not known yet. This makes it hard to prove results. Thus, most results are obtained by applying ad hoc methods. Recently, in [49] a nontrivial combinatorial property on finite projective planes was used to present a result on nondeterministic transition complexity. There p explicit languages were constructed over a constant size alphabet that need .n  n/ transitions while having nondeterministic state complexity n. This improved a result from [102] for growing alphabets to the constant case. The difference between nondeterministic transition complexity and nondeterministic state complexity is impressively documented by the following phenomenon: There are languages for which all n-state NFAs require .n2 / transitions, but for which allowing a single additional state results in a drop in transition complexity to O.n/ – the result even holds for finite languages. Thus state and transition minimisation cannot be carried out simultaneously in general. This was first discovered in [68] and is a special case of a more general result shown in [49]. The increasing interest in nondeterministic transition complexity during the last few years is documented by a growing body of research, see, for example, [65], [69], [100], [164], and [170]. 2.2. Regular expressions and lower bound techniques. There is no general agreement in the literature about the proper measure for regular expressions. We summarise some important ones: The measure size is defined to be the total number of symbols

416

Hermann Gruber, Markus Holzer, and Martin Kutrib

(including ;, , alphabetic symbols from alphabet A, all operation symbols, and parentheses) of a completely bracketed regular expression (for example, used in [2], where it is called length). Another measure related to the reverse polish notation of a regular expression is rpn, which gives the number of nodes in the syntax tree of the expressions (parentheses are not counted). This measure is equal to the length of a (parenthesisfree) expression in reverse Polish notation [2]. The alphabetic width awidth is the total number of alphabetic symbols from A (counted with multiplicity) [51] and [147]. Since ; is only needed to denote the empty set, and the need for  can be substituted by the operator L‹ D L [ ¹º, an alternative is to introduce also the ‹ -operator and instead forbid the use of ; and  inside non-atomic expressions. This is sometimes more convenient, since we avoid unnecessary redundancy already at the syntactic level [67]. As usual for a regular language L define size.L/ to be minimum size among all regular expressions denoting L. The notions rpn.L/ and awidth.L/ are analogously defined. One can easily show that the measures rpn and awidth are linearly related to size. Relations between these measures have been studied, e.g., in [51], [53], [104], and [67]. Theorem 2.3 (relation on basic regular expression measures). Let L be a regular language. Then 1. size.L/ 6 3  rpn.L/ and size.L/ 6 8  awidth.L/ 3; 2. awidth.L/ 6 21  .size.L/ C 1/ and awidth.L/ 6 21  .rpn.L/ C 1/; 3. rpn.L/ 6 21  .size.L/ C 1/ and rpn.L/ 6 4  awidth.L/ 1.

Further not so well known measures for the complexity of regular expressions are the ordinary length [53], the width [51], the length (dual to width) [51], the total number of strings [15], and total number of sums (dual to total number of strings) [79]. To our knowledge, these latter measures have received far less attention to date. How about lower bound techniques for regular expressions? Properties related to regular expression size were first subject to systematic study in [51]. There a highly specialised (pumping) method for proving regular expression lower bounds, which seemingly requires a largely growing alphabet was developed. The existence of languages Ln , for n > 1, that admit n-state finite automata over an alphabet of size O.n2 /, but require regular expression size at least 2.n/ was shown. Based on this work encodings of the witness languages were proposed in [61] to reduce the alphabet size, retaining all necessary features required to mimic the original proof. A technique based on communication complexity that applies only for finite languages is proposed in [79]. The most general technique up to now, introduced in [72], comes from an entirely different approach relating descriptional complexity of regular languages to a structural complexity measure. This technique reads as follows – for a regular expression over the alphabet A, the star height is inductively defined by height.;/ D height./ D height.a/ D 0, height.s C t/ D height.s  t/ D max .height.s/; height.t//, and height.s  / D 1 C height.s/. The star height of a regular language L, denoted by height.L/, is then defined as the minimum star height among all regular expressions describing language L:

12. Descriptional complexity of regular languages

417

Theorem 2.4 (star height lemma). Let L  A be a regular language. Then 1

awidth.L/ > 2 3 .height.L/

1/

1:

An apparent problem is that the star height of a language is defined as a minimum over all regular expressions describing the language, without bounding the expression size explicitly. In fact, the question whether the star height of a given regular language is computable became a famous problem known as the (restricted) star height problem. It was open for 25 years before a positive answer was given in [86]. In light of these considerations, proving lower bounds on alphabetic width via lower bounds on star height appears to be trading a hard problem for an even harder one. But early research done in [145] and [146] on the star height problem established a subclass of regular languages for which the star height is determined more easily, namely the family of bideterministic regular languages. Here a regular language L is bideterministic if there exists a DFA A with a single final state such that a DFA accepting the reversed language LR is obtained from A by reverting the direction of all edges (transitions) and exchanging the roles of the initial and final state. By McNaughton’s theorem [145] which relies on earlier work which was published later in [146], the star height of a bideterministic language L is computable and is equal to the cycle rank of the minimal trim, i.e., without dead state, DFA accepting L. Cycle rank is yet another structural measure for regular languages proposed in [50], which measures the degree of connectivity of directed graphs. For a definition of this measure we refer to [50]. In fact, Eggan’s theorem shows that the star height of a regular language L equals the minimum cycle rank among all nondeterministic finite automata accepting the language L. The advantage of the cycle rank measure is that it can be characterised in terms of some “game against the graph,” as already suggested in [146]. In [72] a modern formulation in terms of a cops and robber game was given – many other digraph connectivity measures such as, e.g., entanglement [14], admit characterisations in terms of some cops and robber game. The cops and visible robber game, is given as follows: Let G D .V; E/ be a digraph. Initially, the cops occupy some set of X  V vertices, with jX j 6 k , and the robber is placed on some vertex v 2 V n X . At any time, some of the cops can reside outside the graph, say, in a helicopter. In each round, the cop player chooses the next location X 0  V for the cops. The stationary cops in X \ X 0 remain in their positions, while the others go to the helicopter and fly to their new position. During this, the robber player, knowing the cops’ next position X 0 from wire-tapping the police radio, can run at great speed to any new position v 0 , provided there is both a (possibly empty) directed path from v to v 0 , i.e., he has to avoid to run into a stationary cop, and to run along a path in the remaining graph induced by the non-blocked vertices. Afterwards, the helicopter lands the cops at their new positions, and the next round starts, with X 0 and v 0 taking over the roles of X and v , respectively. The cop player wins the game if the robber cannot move any more, and the robber player wins if the robber can escape indefinitely.

418

Hermann Gruber, Markus Holzer, and Martin Kutrib

The immutable cops variant of the above game restricts the movements of the cops in the following way: once a cop has been placed on some vertex of the graph, he has to stay there forever. The hot-plate variant of the game restricts the movements of the robber in that he has to move along a nontrivial path in each move – even if the path consists only of a self-loop. The strong variant of the above game restricts the robbers moves in requiring that he stays in the same strongly connected of the graph V n.X \X 0 / induced by the non-blocked vertices. Theorem 2.5. Let G be a directed graph and k > 0. Then k cops have a winning strategy for the immutable cops and hot-plate strong visible robber game if and only if the cycle rank of G is at most k . To complete the regular expression lower bound toolkit based on Theorem 2.4, which relates the alphabetic width to star height, it appears to be tempting to code languages down to smaller alphabets while maintaining their star height. In fact, this can be done as demonstrated in [74] by star height preserving morphisms. Here a morphism  preserves star height, if the star height of each regular language L equals the star height of the homomorphic image .L/. In [87] a full characterisation of star height preserving morphisms was established. A very simple star height preserving morphism which is due to [146] is W ¹a1 ; a2 ; : : : ; ad º ! ¹a; bº , for d > 1, with .ai / D ai b d i C1 , for 1 6 i 6 d . It is useful for coding down to binary alphabets.

3. Transformation between models for regular languages This section is devoted to the descriptional complexity of simulation problems for some devices characterising the regular languages. We start with probably the most famous simulation problem in the world of finite automata, that is, the simulation of NFAs by DFAs known as the determinisation problem. Then we consider the relation between finite automata and regular expressions and vice versa. 3.1. Determinisation of nondeterministic finite automata. It is well known that for any NFA one can construct an equivalent DFA [158]. This so-called powerset construction, where each state of the DFA is associated with a subset of NFA states, turned out to be optimal in general for binary and larger alphabets. That is, the bound on the number of states necessary for the construction is tight in the sense that for an arbitrary n there is always some n-state NFA which cannot be simulated by any DFA with strictly less than 2n states [142] – see Figure 2. To this end one has to show that every state in the powerset automaton is reachable and that every two states are pairwise inequivalent. This result was (re)discovered several times in the literature, see, e.g., [150] and [151]. So, NFAs can offer exponential savings in the number of states compared with DFAs.

12. Descriptional complexity of regular languages

419

b a 1

b

2

a

3

a; b

:::

a; b

n- 1

a; b

n

a a

Figure 2. Lupanov’s NFA with n states, for n > 3, accepting a language over a two letter alphabet for which any DFA needs at least 2n states

Theorem 3.1 (NFA to DFA). Let n > 1 and A be an n-state NFA. Then 2n states are sufficient and necessary in the worst-case for a DFA to accept the language L.A/. For the particular cases of finite or unary regular languages the situation is significantly different. The conversion for finite languages over a binary alphabet was solved in [143] with a tight bound in the exact number of states. The general case of finite languages over a k -letter alphabet was shown in [163] with an asymptotically tight bound. Theorem 3.2 (NFA to DFA, finite languages). Let n > 1 and A be an n-state NFA n accepting a finite language over a binary alphabet. Then 2  2 2 1 if n is even, and n 1 1 if n is odd, states are sufficient and necessary in the worst-case for a DFA 32 2 to accept the language L.A/. If A accepts a finite language over a k -letter alphabet, n for k > 3, then ‚.k 1Clog2 .k/ / states are sufficient and necessary in the worst-case. n

Thus, for finite languages over a two-letter alphabet the cost is only ‚.2 2 /. The situation is similar when we turn to the second important special case, the unary languages. The general problem of evaluating the costs of unary automata simulation was raised in [167], and has led to emphasise some relevant differences with the general case. For state complexity issues of unary finite automata Landau’s function F .n/ D max¹ lcm.x1 ; : : : ; xk / j x1 ; : : : ; xk > 1 and x1 C    C xk D n º;

which gives the maximal order of the cyclic subgroups of the symmetric group on n elements, plays a crucial role. Here, lcm denotes the least common multiple. Since F depends on the irregular distribution of the prime numbers, we cannot expect to express F .n/ as a simple function of n. In [133] and [134] the asymptotic growth rate p lim .ln F .n/= n  ln n/ D 1 n!1

waspdetermined, which implies the (for our purposes sufficient) rough estimate F .n/ 2 2‚. nlog n/ . The following asymptotic tight bound on the unary NFA by DFA simulation was presented in [44] and [45]; see also [172].

420

Hermann Gruber, Markus Holzer, and Martin Kutrib

Theorem 3.3 (NFA to DFA, unary languages). Let n > 1 and A be an n-state NFA p ‚. nlog n/ accepting a unary language. Then 2 states are sufficient and necessary in the worst-case for a DFA to accept L.A/. Its proof is based on a normal form for unary NFAs introduced in [44]. Theorem 3.4 (Chrobak normal form for NFAs). Each n-state NFA over a unary alphabet can be replaced by an equivalent O.n2 /-state NFA consisting of an initial deterministic tail and some disjoint deterministic loops, where the automaton makes only a single nondeterministic decision after passing through the initial tail, which chooses one of the loops. For languages that are unary and finite it has been shown in [143] that nondeterminism does not help at all. Finite unary DFAs are up to one additional state as large as equivalent minimal NFAs. Moreover, it is well known that nondeterminism cannot help for all languages. For example, any NFA accepting Ln D ¹ w 2 ¹a; bº j jwj D i  n; for i > 0 º, for n > 1, has at least n states, and Ln is accepted by an n-state DFA as well. Recently the determinisation problem was studied also for some other subregular language families than finite and/or unary languages such as definite languages and variants thereof, star-free languages, locally testable languages, etc. in [21]. In most cases tight exponential bounds were shown. We have seen that for certain languages unlimited nondeterminism cannot help. On the other hand, for unary languages accepted by NFAs in Chrobak normal form, i.e., by NFAs that make at most one nondeterministic step in every computation, one can achieve a trade-off which is strictly less than 2n but still exponential. This immediately brings us to the question determining in which cases nondeterminism can help to represent a regular language succinctly. A model with very weak nondeterminism is a deterministic finite automaton with multiple initial states (MDFA) [62] and [173]. Here the sole guess appears at the beginning of the computation, i.e., by choosing one out of k initial states. So, the nondeterminism is not only limited in its amount but also in the situation at which it may appear. Converting an MDFA with k initial states into a DFA by the powerset construction shows immediately that any reachable state contains at most k states of the MDFA. This gives an upper bound for the conversion. In [95] it has been shown that this upper bound is tight. Theorem 3.5 (MDFA to DFA). Let n > 1 and A be an n-state MDFA with k initial P states, for 1 6 k 6 n. Then kiD1 ni states are sufficient and necessary in the worstcase for a DFA to accept L.A/. So, for k D 1 we obtain DFAs while for k D n we are concerned with the special case that needs 2n 1 states. Interestingly, NFAs can be exponentially concise over MDFAs. The following lower bound has been derived in [121]. Theorem 3.6 (NFA to MDFA). Let n > 1 and A be an n-state NFA. Then for an MDFA .2n / states are necessary in the worst-case to accept the language L.A/.

12. Descriptional complexity of regular languages

421

So far, we have only discussed DFAs with multiple initial states. For NFAs with multiple initial states (MNFA) it is known that they can be simulated by equivalent NFAs having one more state. The additional state is used as new sole initial state which is appropriately connected to the successors of the old initial states. On the other hand, in general this state is needed. For example, consider the language a [ b  , which is accepted by a 2-state MNFA but takes at least three states to be accepted by an NFA. The concept of limited nondeterminism in finite automata is more generally studied in [123]. There, a bound on the number of nondeterministic steps allowed during a computation as well as on the maximal number of choices for every nondeterministic step is imposed. While the maximal number of choices is three, the bound on the number of steps is given by a function that depends on the number of states. This implies that in any computation the NFAs can make a finite number of nondeterministic steps only. But the situations at which nondeterminism appear are not restricted a priori. The order of magnitude of the functions considered is strictly less than the logarithm, i.e., for a bounding function f we have f 2 o.log/. The upper bound for the costs of the conversion into a DFA follows by the powerset construction. Due to the restrictions any reachable state contains at most 3f .n/ states of the NFA. The next theorem summarises this observation and the lower bound shown in [123]. Theorem 3.7 (limited NFA to DFA). Let n > 1, A be an n-state NFA, andpf W N ! N P f .n/  P f . n/ O.pn/ be a function of order o.log/. Then 3i D0 ni states are sufficient and i2D0 i is a lower bound for the worst-case state complexity for a DFA to accept L.A/. P f .n/  Note that the upper bound 3i D0 ni is of order o.2n /, if f .n/ 2 o.log n/. The precise bound for the conversion is an open problem. Finally we come back to the roots of the subset construction and a very interesting recent development. In [106] the question was raised whether there always exists a minimal n-state NFA whose equivalent minimal DFA has ˛ states, for all n and ˛ satisfying n 6 ˛ 6 2n . A number ˛ not satisfying this condition is called a magic number. For NFAs over a two-letter alphabet some non-magic numbers were identified in [106] and [107]. Recently, in [113] it was shown that there are no magic numbers for languages over a three-letter alphabet. This improved a result from [57] for small growing alphabets and in turn for a four-letter alphabet [110]. Previously it was known that for exponential growing alphabets there are no magic numbers at all [111]. Magic numbers for unary NFAs were recently studied by revising the Chrobak normal form [58]. In the same paper also a brief historical summary of the magic number problem can be found. 3.2. From regular expressions to finite automata. The problem of converting regular expressions into small finite automata has been intensively studied for more than 50 years. The classical way is to construct the position automaton, or Glushkov automaton [64]. Intuitively, the states of this automaton correspond to the alphabetic symbols or, more precisely, to positions between subsequent alphabetic symbols in the regular expression. An accessible account on this construction is given, e.g., in [3] and [13].

422

Hermann Gruber, Markus Holzer, and Martin Kutrib

Given a regular expression of alphabetic width n > 0, the position automaton always has precisely nC1 states. Simple examples, such as the singleton set ¹an º, show that this bound is tight. Nevertheless, several optimisations give NFAs having often a smaller number of states, while the underlying constructions are mathematically sound refinements of the basic construction. A structural comparison of the position automaton with its refined versions, namely the so-called follow automaton [104] and the equation automaton, or Antimirov automaton [8], is given in [4] and [42]. Despite the mentioned optimisations, all of these constructions share the same problem with respect to the number of transitions. An easy upper bound on the number of transitions in the position automaton is O.n2 /, independent of alphabet size. It is not hard to prove that the position automaton for the regular expression rn D .a1 C /  .a2 C /    .an C /, has .n2 / transitions. It appears to be difficult to avoid such a quadratic blow-up in actual size if we stick to the NFA model. We will return to this point in a moment. But if we allow -transitions, the classical Thompson construction already yields a -NFA with a linear number of transitions [171]. Also Thompson’s classical construction went through several stages of optimisation [104] and [166]. After a preliminary bound of 32 n on the sum of the number of states and number of transitions [104] in terms of reverse polish length, an optimal construction 22 .nC1/C1 was presented in [80]. In [104] it is also asked to determine with a bound of 15 the optimal bound if the regular expression size is measured in terms of alphabetic width. Notice that the alphabetic width of a regular expression may be much smaller than its reverse polish length in general. A tight bound in terms of alphabetic width was found in [67] with the aid of a certain normal form for regular expressions. Definition 3.1. The operators ı and  are defined on regular expressions over alphabet A. The first operator is given by: aı D a, for a 2 A; .r C s/ı D r ı C s ı ; r ‹ı D r ı ; r ı D r ı ; finally, .rs/ı D rs , if  … L.rs/ and r ı C s ı otherwise. The second operator is given by: a D a, for a 2 A; .r C s/ D r  C s  ; .rs/ D r  s  ; r  D r ı ; finally, r ‹ D r  , if  2 L.r/ and r ‹ D r ‹ otherwise. The strong star normal form of an expression r is then defined as r  . An easy induction shows that the transformation into strong star normal form preserves the described language, and that it is weakly monotone with respect to all usual size measures. This definition was presented in [67] as a refinement of star normal form from [22]. Unfortunately, not every regular language has a unique strong star normal form. We will meet this useful notion again in the chapter on combinatorial enumeration of regular expressions (Chapter 13). Theorem 3.8 (RE to -NFA). Let n > 1, and r be a regular expression of alphabetic width n. Then size 22 5 n is sufficient for an equivalent -NFA accepting L.r/. In terms of .rpn.r/C1/C1. Furthermore, there are infinitely reverse polish length, the bound is 22 15 many languages for which both bounds are tight.

12. Descriptional complexity of regular languages

423

Returning to ordinary NFAs, recall that the position automaton for the regular expression rn has .n2 / transitions. Also if we transform the expression first into a -NFA and perform the standard algorithm for removing -transitions, see, e.g., [96], we obtain no better result. This naturally raises the question of comparing the descriptional complexity of NFAs over regular expressions. For about 40 years, it appears to have been considered as an unproven factoid that a quadratic number of transitions will be inherently necessary in the worst-case (cf. [100]). In the late 1990s, a barely superlinear lower bound of .n log n/ on the size of any NFA accepting the language of the expression rn was proved [100]. More interestingly, the main result of that paper is an algorithm transforming a regular expression of size n into an equivalent NFA with at most O.n  .log n/2 / transitions. In fact, this upper bound made their lower bound look reasonable at once! Later work [164] established that any NFA accepting L.rn / indeed must have at least .n.log n/2 / transitions. So the upper bound of O.n.log n/2 / from [100] is asymptotically tight. Theorem 3.9 (RE to NFA, large alphabet). Let n > 1 and r be a regular expression of alphabetic width n. Then size n  .log n/2 is sufficient for an equivalent NFA to accept L.r/. Furthermore, there are infinitely many languages for which this bound is tight. Notice that the example witnessing the lower bound is over an alphabet of growing  size. For alphabets of size two, the upper bound was improved to n  2O.log n/ , where log denotes the iterated binary logarithm [164]. Thus the question whether a conversion from regular expressions over a binary alphabet into NFAs of linear size is possible, is almost settled by now. Theorem 3.10 (RE to NFA, binary alphabet). Let n > 1 and r be a regular  expression of alphabetic width n over the binary alphabet. Then size n  2O.log n/ is sufficient for an equivalent NFA to accept L.r/. To conclude this subsection, we briefly discuss the problem of converting regular expressions into equivalent deterministic finite automata. Again, this problem has been studied by many authors. A taxonomy comparing many different conversion algorithms is given in [174]. Regarding the descriptional complexity, a tight bound of 2n C 1 in terms of alphabetic width is already given in [139]. The mentioned work also establishes a matching lower bound, but for a rather nonstandard definition of size. In terms of alphabetic width, the best lower bound known to date is from [53]. Together, we have the following result: Theorem 3.11 (RE to DFA). Let n > 1 and r be a regular expression of alphabetic width n over a binary alphabet. Then size 2n C 1 is sufficient for a DFA to accept L.r/. In contrast, for infinitely many n there are regular expressions rn of alphabetic width n n over a binary alphabet, such that the minimal DFA accepting L.rn / has at least 45 2 2 states.

424

Hermann Gruber, Markus Holzer, and Martin Kutrib

3.3. From finite automata to regular expressions. By analyzing the classical state elimination algorithm [147], one can easily derive that every finite automaton with n states can be converted into an equivalent regular expression of alphabetic width at most jAj  4n . In [51] a family of languages was exhibited, showing that finite automata, even deterministic ones, can be in fact exponentially more succinct than regular expressions. But it can be argued that these examples are rather artificial, since they make use of a largely growing alphabet (cf. [53] and [72]). The question whether a comparable size blow-up can also occur for constant alphabet size [53 ] was  settled q 

n

log n recently by two different groups of researchers. A lower bound of 2 for the succinctness gap between DFAs and regular expressions over binary alphabets was reported in [61], while a parallel effort [72] resulted in a tight lower bound of 2.n/ . We thus have the following result.

Theorem 3.12 (FA to RE). Let n > 1 and A be an n-state DFA or NFA over an alphabet of size polynomial in n. Then size 2‚.n/ is sufficient and necessary in the worst-case for a regular expression describing L.A/. This already holds for constant alphabets with at least two letters. We remark that the hidden constant in the lower bound obtained for binary alphabets is much smaller compared to the lower bound of .2n / previously obtained in [51] for large alphabets. This is no coincidence. Perhaps surprisingly, one can prove an upper bound of o.2n / when given a DFA over a binary alphabet [75]. Theorem 3.13 (DFA to RE, binary alphabet). Let n > 1 and A be an n-state DFA over a binary alphabet. Then size O .1:742n/ is sufficient for a regular expression describing L.A/. Such an expression can be constructed by state elimination in output polynomial time. Similar bounds, but with somewhat larger constants in place of 1:742, can be derived for larger alphabets. Moreover, the same holds for nondeterministic finite automata having a comparably low density of transitions. This result is derived using the classical state elimination algorithm. One simply has to choose a good elimination ordering, which is the sequence in which the states are eliminated. Finite automata having a simple structure in the underlying undirected graph tend to allow better elimination orderings. Having a relatively low number of transitions, DFAs over binary alphabets are nothing but one specific utterance of this more general phenomenon. The general theorem reads as follows. Theorem 3.14 (FA to RE, parameterised). Let n > 1 and A be an n-state NFA over alphabet A, whose underlying directed graph has cycle rank at most r . Then size jAj  4r  n is sufficient for a regular expression describing L.A/. Such an expression can be constructed by state elimination. This theorem is difficult to apply directly, since the decision problem associated with computing the cycle rank is NP-complete (cf. [20]), and might therefore be considered computationally infeasible. At the very least, often constructive upper bounds

12. Descriptional complexity of regular languages

425

on the cycle rank can be used to find good elimination orderings. Various heuristics for computing elimination orderings have been compared empirically on a large set of random DFAs as input in [78]. There it turned out that a simple greedy heuristic for choosing an elimination ordering [47] is most effective. At this point, we also mention that the claims made in another empirical study [1] deserve careful consideration. Two of the present authors scrutinised these claims again, but the reproduced data were not in accordance with the results claimed in [1]. For the particular cases of finite or unary regular languages, the situation is significantly different. Indeed, the case of finite languages was already addressed in the very first paper on the descriptional complexity of regular expressions [51]. They give a specialised conversion algorithm for finite languages, which is different from the state elimination algorithm. Their results imply that every n-state DFA accepting a finite language can be converted into an equivalent regular expression of size nO.log n/ . They also provide a lower bound of n.log log n/ when using an alphabet of size O.n2 /. The challenge of tightening this gap was settled in [79], where a lower bound technique from communication complexity is adapted, which originated in the study of monotone circuit complexity. Theorem 3.15 (FA to RE, finite languages). Let n > 1 and A be an n-state DFA or NFA over an alphabet A of size nO.1/ . Then size n‚.log n/ is sufficient and necessary in the worst-case for a regular expression describing L.A/. This still holds for constant alphabets with at least two letters. The case of unary languages was discussed in [53]. Here the main idea is that we can exploit the simple cycle structure of unary DFAs and of unary NFAs in Chrobak normal form. The main results are summarised in the following theorem. Theorem 3.16 (FA to RE, unary languages). Let n > 1 and A be an n-state DFA accepting a unary language. Then size ‚.n/ is sufficient and necessary in the worstcase for a regular expression describing L.A/. When considering NFAs, the upper bound changes to O.n2 /. 3.4. Other simulation results for models accepting regular languages. There are many other models for regular languages. For instance, two-way finite automata. One of the first simulation results of two-way devices (2DFA and 2NFA) by DFAs and NFAs dates back to [165] and [96]. Recent results and improvements can be found in [18] and [120]. We summarise these mostly tight bounds in Table 1. The question of how many states are sufficient or necessary to simulate (two-way) NFAs by 2DFAs is unanswered for decades. The problem was raised by Sakoda and Sipser in [161]. They conjectured that the upper bound is exponential. The best lower bound currently known is .n2 = log n/. It was proved in [12], where also an interesting connection with the open problem whether L equals NL is given. Furthermore, the Sakoda–Sipser problem for NFAs has been solved for the unary case. The tight bound in the order of magnitude is ‚.n2 /.

Hermann Gruber, Markus Holzer, and Martin Kutrib

426

The picture was complemented by the sophisticated studies in [149] which revealed tight bounds in the order of magnitude also for the 2NFA simulations by DFAs and NFAs. Table 1 also summarises the bounds known for the simulations between unary finite automata. Table 1. State complexities for simulations on general and unary regular languages. Depicted are the bounds for simulating the device of the first column by a device of the second line. A question mark indicates that only the trivial upper and lower bounds are known. General regular languages NFA 2DFA 2NFA

n.nn n X1 nX1  i D0 j D0

DFA

NFA

2DFA

2n

–  2n  nC1  2n  nC1

?

1/n /

.n n  n  i

NFA

2‚.

2DFA

2‚.

2NFA

2‚.

j

.2i

1/j

–  n2   6 log n

Unary regular languages DFA p

nlog n/

p p

nlog n/ nlog n/

2‚. 2‚.

NFA

2DFA

– p

‚.n2 /

p

nlog n/



nlog n/

?

Now we turn to a model that, in general, can accept all regular as well as nonregular languages. However, its unary variant captures the unary regular languages. So, the descriptional power of its resources are a natural field for investigations. Let k > 1 be a natural number. A deterministic one-way k -head finite automaton 1DFA.k/ is a deterministic finite automaton having a single read-only input tape whose inscription is the input word in between two endmarkers (we provide two endmarkers in order to have a definition consistent with two-way finite automata). The k heads of the automaton can move to the right or stay on the current tape square but not beyond the endmarkers. In [103] and [169] it is shown that every unary language accepted by a one-way multi-head finite automaton is semilinear and, thus, regular. The main results obtained in [128] are infinite proper unary hierarchies with respect to the number of states as well as to the number of heads (when the number of states is fixed). The simulation costs have been investigated in [129], where the complexity is measured again by the number of states. Below some of the results are summarised. For further results on the special case of finite and, in particular, singleton unary languages see [129], where it turned out that the simulation costs in this special case draw a different picture than in the general case. The costs for the simulation of a 1DFA.k/ by a DFA are bounded from above by O.n  F .t  n/k 1 / and from below by n  F .n/k 1 , where t is a constant depending only

12. Descriptional complexity of regular languages

427

p

on k . Since both bounds are of order e ‚. nln n/ , the trade-off for the simulation is tight in the order of magnitude. So, for the order of magnitude, the costs for the simulation of 1DFA.k/ by DFA are the same as for the simulation of NFA by DFA. From this point of view the two resources heads and nondeterminism have the same descriptional capacity. This raises immediately the question for the costs of the mutual simulations of 1DFA.k/ and NFA. Trading k heads for nondeterminism is known to yield polynomially larger state sets, where the degree of the polynomial depends on k . For constants k; n > 2 any unary n-state 1DFA.k/ can be simulated by some NFA with O.n2k / states. The lower bound is .nk / states. For the converse question, that is, the costs for the NFA by 1DFA.k/ simulation, the upperpbound naturally depends on the number k of heads  ˘ available. If k is at least t D 3C 28nC1 , then the upper bound is quadratic, otherwise p  ˘ superpolynomial. Let k > 1, n > 2 be constants, t D 3C 28nC1 , and M be a unary n-state NFA. Then 8 2 n 2 C F .n/ if k D 1; ˆ ˆ ˆ < t  t 2 C t d k e n0 6 n2 2 C n if 1 < k < t=2; ˆ 2 ˆ ˆ : 2 2n if k > t=2: states are sufficient for any equivalent 1DFA.k/. The lower bound reads as follows. Let k > 1 be a constant. For any integer m > 1 there is an integer n > m and a unary n-state NFA, such that r p k p 2np c2  e c1 ln. 2n/

states are necessary for any equivalent 1DFA.k/, where c1 ; c2 > 0 are two constants. Next, we discuss some results on a fairly new automaton model, the restarting automata, introduced in [108], in order to model the so-called “analysis by reduction,” which is a technique used in linguistics to analyze sentences of natural languages that have a high degree of free word order. The technique consists of stepwise simplification of an extended sentence such that the (in)correctness of the sentence is not affected. A restarting automaton is a finite state device that works on a flexible tape. Attached to the automaton is a read-write (or lookahead) window of fixed size. The automaton works in several cycles. In one cycle it moves the window from left to right along the tape. Depending on the current state and the current content of the window the automaton can continue to scan the input, can rewrite the window content by some shorter string, can accept the input, can halt without accepting, or can restart, i.e., start a new cycle by placing the window back on the left end of the tape and resetting to the initial state. The variants of classical restarting automata that accept exactly the regular languages are (non)deterministic R(1)- and deterministic RR(1)-automata, that is, devices with window size one that have to restart immediately after a delete step (R(1)) and deterministic devices that may continue to read after a delete step (det-RR(1)). Deterministic R(1)-automata are nothing else than DFAs and, so, are not studied separately. These variants are proved and their simulations are studied in [130] and [131]. Table 2 summarises the results. All bounds are tight for constant alphabets.

Hermann Gruber, Markus Holzer, and Martin Kutrib

428

Table 2. Depicted are the trade-offs when changing from an automaton of the leftmost column to an automaton of the first row. The number of states reachable after a delete step is denoted by k . General regular languages DFA

NFA

R(1)

NFA

2n



n

2n

1

R(1)

2n C 1

2n



2n

1

det-RR(1) O..n

1/Š/ .2n

k C 2/  2k

1

.2n

k C 2/  2k

det-RR(1)

1



Finally, we discuss the so-called limited automata introduced in [89]. A k -limited automaton is a linear bounded automaton that may rewrite each tape cell only in the first k visits, where k > 0 is a fixed constant. It is shown in [89] that the nondeterministic variant characterises the context-free languages provided k > 2, while there is a tight and strict hierarchy of language classes depending on k for the deterministic variant. Recently, the study of limited automata from the descriptional complexity point of view has been initiated in [154] and [155], where it was also shown that the deterministic 2-limited automata characterise the deterministic context-free languages, which complements the result on nondeterministic machines. Already in [89] is was proved that 1-limited automata, deterministic and nondeterministic, characterise the regular languages. From these results it follows that any unary k -limited automaton accepts regular languages only. The results concerning the simulations of nondeterministic 1-limited automata obtained in [154] rely on the following witness languages. For n > 1, let Ln be the language of all words over alphabet ¹0; 1º consisting of the concatenation of blocks of length n, such that at least n blocks coincide: Ln D ¹ x1 x2    xm j m > 0; x1 ; x2 ; : : : ; xm 2 ¹0; 1ºn;

there exist i1 ; i2 ; : : : ; in 2 ¹1; 2; : : : ; mº such that i1 < i2 <    < in ; xi1 D xi2 D    D xin º: 2

n2

First, the general upper bounds n  2n (2n2 ) for the nondeterministic 1-LA by NFA (by DFA) simulation has been derived. Furthermore, if the 1-LA is deterministic then an equivalent DFA with no more than n  .n C 1/n states can be obtained. It is worth mentioning that these upper bounds do not depend on the size of the tape alphabet of the given 1-LA, but only on its number of states. For the tightness of these bounds it has been proved that  Ln is accepted by a nondeterministic 1-LA with O.n/ states and a fixed tape alphabet, n  Ln is accepted by a DFA with .2n 1/  n2 C n states, and each DFA   n 2 accepting Ln needs at least .2n 2/  n n 1 C 1  n2 C 1 states,

12. Descriptional complexity of regular languages

429

 Ln is accepted by an NFA, a two-way DFA, and a two-way NFA with at most .2n 1/  n  2n C 1 states, and each NFA, two-way DFA, two-way NFA, and deterministic 1-LA accepting Ln needs a number of states exponential in n. In particular, each NFA accepting Ln needs n2  2n many states.

These results also imply that the simulation of nondeterministic 1-LA by deterministic 1-LA requires exponentially many states. The investigations in [132] are devoted to simulations of unary k -limited automata. Since all the unary languages accepted by these automata are necessarily regular, also the simulation costs are studied when a k -limited automaton is simulated by finite automata. In particular, if n > 2 is a prime number, then there is a unary 4n-state and n C 1 tape symbol deterministic 1-LA, such that n  F .n/ states are necessary for any equivalent 2NFA, where F denotes Landau’s function. Moreover, the witness language for this result is a singleton language. Therefore, the lower bound holds for 2DFA, DFA, and NFA as well. So, even the ability deterministically to rewrite any cell only once gives an enormous descriptional power. For the simulation cost for removing the ability to rewrite each cell k > 1 times, that is, the cost for the simulation of deterministic k -limited automata by deterministic finite automata, a lower bound of n  F .n/k has been obtained. Interestingly, this lower bound is greater than the lower bound n  F .n/k 1 known for the simulation of unary one-way k -head finite automata. Moreover, it also holds for sweeping limited automata, where sweeping means that the direction of the head movement changes only on the endmarkers. In the sweeping case, an upper bound for the simulation is only polynomial. More precisely, let M be a k 2 C3kC2 / states are sufficient for deterministic unary n-state sweeping k -LA. Then O.n 2 a 2DFA to accept the language L.M /. From the resulting unary 2DFA an upper bound for one-way devices can be derived by the known bounds for removing the two-way head movement. A lower bound is nkC1 for the simulation of a deterministic unary sweeping .n C 2/-state, .2k C 1/-tape-symbol k -LA by some 2NFA, 2DFA, NFA, or DFA. A rotating k -LA is a sweeping k -LA whose head is reset to the left endmarker every time the right endmarker is reached. So, the computation of a rotating machine can be seen as on a circular input with a marker between the last and first symbol. For deterministic unary rotating k -LA the upper bound reduces to O.nkC1 /. This upper bound is tight in the order of magnitude for simulations by 2NFA, 2DFA, NFA, and DFA.

4. Operations on regular languages The operation problem of DFAs for a regularity preserving binary operation ı is defined as follows:  given two DFAs A and B of sizes n and m;  which size is sufficient and necessary in the worst-case (in terms of n and m) for a DFA to accept the language L.A/ ı L.B/?

430

Hermann Gruber, Markus Holzer, and Martin Kutrib

Obviously, this problem generalises as well to unary language operations like, for example, complementation. Moreover, it also generalises to other devices such as, e.g., NFAs, 2DFAs, 2NFAs , or regular expressions, etc. As implied by the definition here we deal with the language operation problem in terms of worst-case complexity. Estimations of the average complexity are considered in [70] and [152]. 4.1. The language operation problem for finite automata. We start to treat the language operation problems for NFAs and DFAs, where the notion state complexity is used to express that the size of the finite automata is measured by their number of states. These problems are closely related to complexity issues discussed so far. For example, converting a given NFA to an equivalent DFA gives an upper bound for the NFA state complexity of complementation. First observations concerning basic operation problems of DFAs can be found in [144], where tight bounds for some operations are stated without proof. In [140] the tight bound of 2n states for the DFA reversal was obtained in connection with boolean automata. After the dawn the research direction was revitalised in [177]. Recent surveys of results with regard to DFAs are [175] and [176], where also operations on unary regular languages are discussed. In [25] an automaton-independent approach, called quotient complexity, that is based on derivatives of languages is presented, which turned out to be a very useful technique for proving upper bounds for DFA operations (cf. [23], [24], and [32]). A systematic study of language operations in connection with NFAs is [93]. The operation problem for 2DFAs has been investigated recently in [115]. An overview on the state complexity of individual operations on regular languages is given in [55]. 4.1.1. NFAs and DFAs accepting infinite regular languages. The bounds for some basic operations on DFAs and NFAs accepting infinite general and unary regular languages are summarised and compared in Table 3. While the upper bound for the union and intersection of DFAs is shown by the well-known cross-product construction [177], the upper bound for the NFA union is based on the idea to construct an NFA that starts with a new initial state and guesses which of the given automata is to simulate. For the intersection both given automata have to be simulated in parallel. In the unary cases the lower bound for the DFA union and intersection [177] as well as for the NFA intersection additionally requires that m and n are relatively prime. The unary bound for NFA union is tight if neither m is a multiple of n nor n is a multiple of m [93]. In [156] unary languages are studied whose deterministic state complexities are not relatively prime. In order to give an impression of the details of the results we exemplarily state the tight bound for NFA union [93]. Theorem 4.1. For any integers m; n > 1 let A be an m-state and B be an n-state NFA. Then m C n C 1 states are sufficient and necessary in the worst-case for an NFA to accept the language L.A/ [ L.B/.

12. Descriptional complexity of regular languages

431

In connection with nondeterminism the complementation often plays a crucial role. In fact, compared with DFAs the complementation of NFAs is expensive. Since the complementation operation on DFAs neither increases nor decreases the number of states (simply exchange accepting and rejecting states), we obtain the upper bounds for the NFA complementation by determinisation, i.e, 2n states [142], [150], and [151]. For the lower bound, in [161] an example of languages over a growing alphabet size is given which reaches the upper bound 2n . In [17] the result for a three-letter alphabet was claimed. Later in [19] this was corrected to a four-letter alphabet. Moreover, O.n/-state binary witness languages were found in [52]. In [93] the lower bound 2n 2 is achieved for a two-letter alphabet and finally by a fooling set technique the bound 2n on the complementation of NFAs was proven to be tight for a two-letter alphabet [112]. The [45] it has been shown that for any unary case is, again, different for NFAs. In [44] and p unary n-state NFA there exists an equivalent .2‚. nlog n/ /-state deterministic finite automaton, and in [93] it is shown that this is a tight bound in the order of magnitude for the unary NFA complementation. More detailed results on the relation between the sizes of unary NFAs and their complements are obtained in [148]. In particular, if a unary language L has a succinct NFA, then nondeterminism cannot help to recognise its complement, namely, the smallest NFA accepting the complement of L has as many states as the minimal DFA accepting it. The same property does not hold in the case of automata and languages defined over larger alphabets. Table 3. NFA and DFA state complexities for operations on infinite languages, where t is the number of accepting states of the “left” automaton, n denotes the left and = the right quotient by an arbitrary language. The tight lower bounds for union, intersection, and concatenation of unary DFAs require m and n to be relatively prime. Infinite languages NFA

DFA

general

unary

general

unary

mCnC1

mn



mCnC1 p 2‚. nlog n/

mn

2n

n

n

\

mn

mn

mn

mn

R

nC1

n

2n

n



mCn



nC1

nC1

C

n

n

[

mCn

1 6  6 m C n m2n

n

n

=

n

t2n

3  2n

3  2n

2

2

2n

mn .n

1 1

n

1

1/2 C 1 1/2

.n n n

Hermann Gruber, Markus Holzer, and Martin Kutrib

432

Now we turn to the concatenation, iteration, and -free iteration in more detail. In the early paper [160] the lower bound 2n 1 was obtained for the concatenation of a twostate DFA and an n-state DFA as well as for the iteration of an n-state DFA. The tight bounds for the DFA operations are shown in [177] and [109], where the concatenation in the unary case requires m and n to be relatively prime, and the lower bound in the general case is reached for all 0 < t < m and a two-letter alphabet [109] and [112]. For the NFA concatenation, a tight bound of m C n is achieved for a two-letter alphabet in the general case [93]. In the unary case, for any integers m; n > 1, the lower bound mCn 1 misses the upper bound mCn by one state. It is currently an open question how to close the gap by more sophisticated constructions or witness languages. The bound 2n 1 for the DFA iteration found in [160] is improved to the tight bound 2n 1 C 2n 2 in [177], where the unary case is also covered. The trivial difference between iteration and -free iteration concerns the empty word only. Moreover, the difference does not appear for languages containing the empty word. Nevertheless, in the worst-case the difference costs one state for NFAs. In particular, the tight bounds n C 1 and n for the iteration and -free iteration for general and unary languages were shown in [93]. So, roughly speaking, concatenation operations are efficient for NFAs. Again, this is essentially different from DFAs. Now consider the reversal operation. The bounds for unary languages are trivial. For general DFAs a tight bound of 2n states for the reversal has been shown in [140]. The operation is studied in more detail in [162]. The efficient bound n C 1 for NFAs shows once more that nondeterminism is a powerful concept. The bound is tight [93] even for a two-letter alphabet [112]. It is worth mentioning that the tight bounds for DFAs on the operation problem for union, intersection, concatenation, Kleene star, and reversal is met by the languages Un over a three-letter alphabet, which was shown in [26]. Therefore the languages Un are called universal witness languages. The automaton accepting Un is depicted in Figure 3. c

b; c

c

b; c

b

a; b 1

b

2

a

3

a

:::

a

n- 1

a

n

a; c

Figure 3. Brzozowski’s DFA with n states, for n > 3, accepting the “universal witness” language Un over a three-letter alphabet which accepts the most complex regular language

Although the results on the universal witness languages are sometimes not optimal with respect to the number of alphabet symbols, they are without doubt quite remarkable. In fact, the universal witness languages Un have even more remarkable and nice

12. Descriptional complexity of regular languages

433

properties from a descriptional complexity point of view. Maximal bounds are met for the number of atoms, the quotient complexity of atoms, the size of the syntactic semigroup, and about two dozen combined operations, where only a few require slightly modified versions of the universal witness languages. Thus, the languages Un can also be seen as the the most complex regular languages. For more applications of the universal witness languages we refer to [33] and [34]. Here the syntactic complexity of a regular language is defined as the cardinality of the syntactic semigroup of the language, which is induced by the equivalence classes of the syntactic congruence. This semigroup is isomorphic to the semigroup of transformations of the set of states of the minimal deterministic automaton recognising the language, where these transformations are performed by non-empty words. It is well known that the size of the syntactic semigroup of a regular language accepted by an n-state DFA can be at most nn and that this bound is reachable by DFA over a three-letter alphabet. To our knowledge the first study on the syntactic complexity of regular languages was undertaken p in [92] and [127], where the two-letter case was settled. It was shown that nn .1 2= n/ is a lower bound for the size of the syntactic semigroup of a language accepted by an nstate DFA with binary input alphabet. Thus, this bound is asymptotically close to nn as n tends to infinity. Moreover, a precise characterisation of the maximal size semigroup among all semigroups generated by two letters is given. Since then, there is a vast amount of literature dealing with the syntactic complexity of growing regular subfamilies such as, e.g., the reversible regular languages [11], trivial languages [29], star-free regular languages [30] and [36], variants of ideal languages [37], and prefix-, suffix-, bifix-, and factor-free regular languages [31] and [35], to mention only a few. For the definitions of some of these subregular language families we refer to subsections to come. After this slight detour on the syntactic complexity of regular languages, we come back to the operations problem on finite automata for some further regularity preserving operations. We continue with the quotient operation. The left quotient of a language R by a set L is LnR D ¹ w j vw 2 R, v 2 L º, and the right quotient R=L is defined analogously. The state complexities of both operations are studied in [177]. It turned out that 2n 1 is a tight bound for the left quotient of an n-state DFA R by an arbitrary language L. For the right quotient of an n-state DFA R by an arbitrary language L the tight bound n is obtained, where the upper bound is reached for a unary alphabet. Since for unary languages there is no difference between left and right quotient, the bound n is also tight for the unary left quotient. Recently, several other regularity preserving operations were investigated. Let us mention some of them in more detail. The results are summarised in Table 4. We start with the cyclic shift of a language L which is defined as ˚.L/ D ¹ vu j uv 2 L º. Its state complexity is studied in [116], where for DFAs the upper bound .n2n 2n 1 /n and the lower bound .n 1/Š  2.n 1/.n 2/ are derived for alphabets with 2 at least four letters. This implies an asymptotically tight bound of 2n Cn log.n/ O.n/ . For 2 two and three letters the number of states evaluates to 2‚.n / . For NFAs 2n2 C 1 states is tight in the exact number of states. For unary languages the cyclic shift boils down to the language itself.

Hermann Gruber, Markus Holzer, and Martin Kutrib

434

Table 4. NFA and DFA state complexities for further operations on infinite languages. The lower bounds for dow n and u p are for constant three-letter alphabets. The upper bounds for dow n and u p are tight for a growing alphabet of size n 2. Infinite languages NFA ˚

2n2

C 1 .n

Lk

nk

ш

mn

up

n

dow n

n

p r ef

n

inf

nC1

su ff

nC1

DFA 1/Š  2.n 1/.n 2/ k/2.k

.n

1/.n k/

2.m 1/.n pn 1 2n 54 p

2

6  6 .n2n

1/ 3 4

2nC30 6

2n

6  6 n2.k

6  6 2mn

66 66

2n 2

2n 2

1 /n

1/n

1 C1 C1

n 2n 2n

1

1

The second operation concerns k -powers of languages [48]. Let A be an n-state NFA and let k > 2 be a constant. Then n  k states are sufficient to accept the language L.A/k by an NFA. The bound is tight for a two-letter alphabet. The situation for DFAs is more sophisticated. It is shown that n2.k 1/n states are sufficient and .n k/2.k 1/.n k/ states is a lower bound on the worst-case complexity for an alphabet with at least six letters. This implies an asymptotically tight bound of ‚.n2.k 1/n /. In particular, for k D 3 and a four-letter alphabet the bound 6n8 3 4n .n 1/2n n is derived. For the special case k D 2 a tight bound of n2n 2n 1 was found in [159], which coincides with the bound for concatenation. Next we turn to the shuffle. The shuffle x ш y of two words x and y over an alphabet A is the set ¹ x1 y1 x2 y2    xn yn j x D x1    xn ; y D y1    yn ; xi ; yi 2 A ; 1 6 i 6 n; n > 1 º: The shuffle of L1 ; L2  A is ¹ w j w 2 x ш y for x 2 L1 and y 2 L2 º. The shuffle of an m-state and an n-state DFA language is clearly accepted by an .mn/-state NFA. The tightness of the NFA bound follows from a result in [41]. In the same paper the lower bound 2.m 1/.n 1/ and the upper bound 2mn 1 are shown for DFAs over a five-letter alphabet. It is currently an open question how to close the gap by more sophisticated constructions or witness languages. In the unary case the shuffle boils down to concatenation.

12. Descriptional complexity of regular languages

435

Two further operations are based on results of Higman [90] and Haines [81], where it is shown that the set of all scattered subwords, i.e., the Higman–Haines set

d ow n.L/ D ¹ v 2 A j there exists w 2 L such that v 6 w º;

and the set of all words that contain some word of a given language, i.e., the Higman– Haines set

u p.L/ D ¹ v 2 A j there exists w 2 L such that w 6 v º;

are both regular for any language L  A . Here, 6 refers to the scattered subword relation. Clearly, d ow n.L/ and u p.L/ cannot be obtained constructively in general. This is obvious, because L is empty if and only if d ow n.L/ and u p.L/ are empty, but the question whether or not a language is empty is undecidable for recursively enumerable languages. But it is not hard to see that for regular languages the construction becomes effective. The tight bound of n for applying d ow n or u p to an NFA is shown in [76]. Therefore, the trivial upper bound of 2n for DFAs follows. The lower bounds for DFAs p are studied in [24], [77], and [153]. Basically, inp[77] the lower bound 2‚. n log2 .n// for d ow n and u p is shown using an alphabet with n letters. The lower bound for d ow n n has been increased to 2 2 2 at the cost of an exponentially growing alphabet in [153]. In the same paper the upper bound for u p has been improved to 2n 2 C1, which is tight for alphabets with at least n 2 letters. For d ow n the same upper bound is shown in [24]. Again, it is tight for alphabets with at least n 2 letters. These results immediately raise the question for the bounds in the case of a constant alphabet. For three-letter alphabets p 2nC30 6 the lower boundpfor DFA d ow and the lower bound for DFA qn is  at least 2 n 3 n -th Catalan number, see [153]. u p is at least 51 4 2 n 4 , the 2 We conclude the subsection with the three operations p r ef, i n f and s u f f, where p r ef.L/ D ¹u j uv 2 L; u; v 2 A º, i.e., the language of all prefixes of words in L and, similarly, i n f.L/ D ¹v j uvw 2 L; u; v; w 2 A º the language of all infixes (factors) and s u f f.L/ D ¹w j vw 2 L; v; w 2 A º the language of all suffixes. The state complexity of these operations has been studied in [119]. The bounds are tight for a two-letter alphabet. 4.1.2. NFAs and DFAs accepting finite languages. This subsection is devoted to the important special case of finite languages. It turns out that the exact state complexity of several operations depends on structural properties of the given automata, for example, the number of accepting states or the number of input symbols. So, in order to obtain matching lower bounds, more parameters than the number of states of the given automata have to be considered. Occasionally, tight bound are known only for standard parameters. In this cases the complexities summarised in Table 5 show the situations for which the bounds are tight. More general and sophisticated results are referred to in the text. The tight bounds for union, intersection, and concatenation for finite unary languages are found as follows: in [143] it was shown that unary DFAs up to one additional state are as large as equivalent minimal NFAs and that they obey a chain structure.

Hermann Gruber, Markus Holzer, and Martin Kutrib

436

Table 5. NFA and DFA state complexities for operations on finite languages, where k is the size of the input alphabet. The lower bound for the union and intersection of DFAs require min¹m; nº to be unbounded. They are for constant two-letter alphabets and are tight for unbounded alphabets. The tight lower bounds for union, intersection, and concatenation of unary DFAs require m and n to be relatively prime. The bounds for DFA reversal and concatenation are for an alphabet fixed to two letters. In addition, for concatenation m C 1 > n > 2 is required. More detailed bounds for the reversal, concatenation and iteration are discussed in the text. Finite languages NFA general [

mCn

DFA unary

2

general .min¹m; nº2 /

max¹m; nº

6  6 mn

n



‚.k 1Clog2 k /

nC1

\

O.mn/

min¹m; nº

R

n

n



mCn



n

C

1 1

n

mCn n n

.m C n/

max¹m; nº

n

1 1

unary

n

.min¹m; nº2 / ´

6  6 mn

3  2p 1 2p 1 .m

3.m C n/ C 12

if n D 2p; if n D 2p 1

1

n C 3/2n 2n

3

min¹m; nº

C 2n n

2 4

1

n mCn n2

2

7n C 13 n

An immediate consequence is that we have only to consider the longest words in the languages in order to obtain the state complexity of operations that preserve finiteness. The results for NFAs are basically from [93]. The situation for the NFA complementation of finite languages boils down to the determinisation problem, and the tight bound n ‚.k 1Clog2 k / for languages over a k letter alphabet proven in [163]. Since the complementation applied to finite languages yields infinite languages, for the lower bounds of unary languages we cannot argue with the simple chain structure as before, but obtain a tight bound of n C 1 [93]. The bound for the reversal of finite NFA languages is in some sense strong. It is sufficient and reached for all finite languages. For the more or less immediate results concerning DFAs accepting finite unary languages we refer to [40], and turn to the remaining bounds for DFAs accepting general finite languages. The results for union and intersection are from [82]. In particular, the upper bounds are derived by structural properties of the accepting automata. These

12. Descriptional complexity of regular languages

437

upper bounds are tight for alphabets whose sizes depend on m and n. The lower bound ..min¹m; nº/2 / for both operations is for a constant alphabet with two letters. The remaining operations reversal, concatenation and iteration have been dealt with in [40], where the following detailed bounds are shown. The precise state complexity for reversal depends also on the size k of the alphabet. Let r be the smallest number P so that 2n 1 r 6 k r . Then the upper bound for the reversal is irD01 k i C 2n 1 r . For k D 2 we conclude 3  2p 1 1 if n D 2p; or 2p 1 if n D 2p 1, which is tight. For the concatenation, a careful elaboration revealed the bulky upper bound ² X  t 1 ³ ² ³ m i  t  X2 X n 2 X n 2 n 2 min k i ; ; C min k m 1 ; ; j j j i D0

j D0

j D0

j D0

where k is again the size of the alphabet and t is the number of accepting states of the “left” automaton. If t > 0 is a constant, this bound is of order O.mnt 1 C nt /, and for k D 2, m C 1 > n > 2 we simply obtain .m n C 3/2n 2 1. In fact, the latter bound is known to be tight. Concerning the iteration of a DFA having a single accepting state only, one immediately obtains a tight bound of n 1. So, we assume that the number of accepting states t is at least two. If the initial state is not accepting, then an upper bound for the DFA iteration is 2n 3 C 2n t 2 . The maximum for this formula is reached for t D 2, it is 2n 3 C 2n 4 . If the initial state is accepting, then 2n 3 C 2n t 1 is an upper bound whose maximum is reached for t D 3, it is again 2n 3 C 2n 4 . Furthermore this bound is tight for a three-letter alphabet.

4.1.3. Further types of finite automata and subregular language families. So far, we considered two important subfamilies of the regular languages as special cases, namely, finite and unary languages. In connection with coding theory the properties prefix-freeness and suffix-freeness are fundamental. Since codes are closely related to formal languages, prefix-free and suffix-free regular languages are worth studying. A language L is said to be prefix-free (suffix-free) if any u 2 L is not a proper prefix (suffix) of any other word in L. If L is a prefix-free regular language then its reversal LR is suffix-free by definition. Moreover, in this case the initial state of the DFA accepting LR is non-returning (has no in-transitions). However, this is a necessary but not a sufficient condition for suffix-freeness. So, prefix-free and suffix-free languages are not symmetric. The results on their operational state complexity are summarised in Table 6. The deterministic state complexities on prefix-free regular languages have been examined in [84], the nondeterministic case is investigated in [85]. The results on suffix-free regular languages are from [83] and [117]; all bounds are tight for a constant alphabet. Closed languages are somehow the counterpart of free languages. A language L is prefix-closed if L D p r ef.L/, infix-closed if L D i n f.L/, suffix-closed if L D s u f f.L/, and subword-closed if L D d ow n.L/. In the literature infix-closed languages are sometimes called factorial. In [24] the deterministic state complexity of operations on closed languages is investigated in an automaton-independent way. To this end, the number of different derivatives of a language is counted. Since each

Hermann Gruber, Markus Holzer, and Martin Kutrib

438

Table 6. NFA and DFA state complexities for operations on prefix-free and suffix-free regular languages Infinite languages NFA prefix-free [ 

2n 1

\

mn

suffix-free

mCn

2n 1

66

mCn

C1

2bn=2c 1

.m C n/ C 2

R

n



mCn



n

mn

1

6  6 5  2n

3

C2

.m C n/ C 2

1

DFA prefix-free [

mn

 \ R

suffix-free

2

mn

n mn

n

.2m C 2n

2n 2

.m C n/ C 2

6/

mn

2n 2

C1



mCn



n

2

.2m C 2n

.n

C1

1/2n 2 2n 2

6/

C1

C1

derivative determines a left quotient uniquely, the number of different derivatives is an upper bound for the number of left quotients. In turn, the number of different left quotients gives the number of states of the minimal DFA. Since different derivatives can determine the same quotient, this method cannot be used to derive lower bounds. However, it is often an elegant way to obtain upper bounds, and all of the bounds summarised in Table 7 are shown to be tight in [24]. The natural generalisation of infix-closed languages to subword-closed languages and close relations of the latter to the Higman–Haines set d ow n yields to an interesting perspective. Moreover, closed languages are related to ideal languages as follows [6]: Every non-empty regular language L is a right (left, two-sided, all-sided) ideal, if and only if the complement of L is a prefix (suffix, infix, subword)-closed language. Right (left, two-sided) ideal languages appeared in the literature often as reverse ultimate definite (ultimate definite, central definite) languages, where L  A is a right (left, two-sided, all-sided) ideal language if it satisfies L D LA (L D A L, L D A LA , A ш L). All the bounds summarised in Table 8 are shown to be tight in [23].

12. Descriptional complexity of regular languages

439

Table 7. NFA and DFA state complexities for operations on closed regular languages, where t is the number of accepting states of the “left” automaton. In the unary case all classes coincide. Infinite languages DFA prefix-closed

infix-closed

mn

mn

mn

mn

max¹m; nº

n

n

n

n

n

[

 \

mn mn



mn

.n

mn

2n

R 

.m C n/ C2 1/

1

2n 2

.m C n/ C2 .n

2n

.m C 1/2n 2 C1

suffix-closed subword-closed unary-closed

2

1/

mn 2n

C1

mCn

1

.m

2

1

.m C n/ 2

mn

mn

mn 2n

C1

t/n C t

.n 2

1/

C1

mCn

n

min¹m; nº n n mCn

1

2

2

2

Table 8. NFA and DFA state complexities for operations on ideal languages. In the unary case all classes coincide. Infinite languages DFA right ideal [ mn

.m C n/ C 2 mn

 \

mn

.m C n/ C 2

left ideal mn

all-sided ideal

unary ideal

.m C n/ C 2 min¹m; nº

mn

n

n

n

n

n

mn

mn

mn

mn

max¹m; nº

.n

R

2n 1



m C 2n



two-sided ideal

nC1

1/

mn

.n

2n 2 2

1/

C1

mCn

nC1

1

mn 2n 1

mn

C1

mCn

nC1

1

.n

2n 2

1/

C1

mCn

nC1

1

n n mCn n

1 1

Concerned with ideal languages the generators of a language are interesting, i.e., the regular languages G such that, for example, a right ideal language L  A can be written as L D GA . Similarly, for other types of ideal languages. The state complexity from the viewpoint of generators has been investigated in [23]. That means, given an arbitrary n-state DFA language G , how many states are sufficient and necessary in the worst-case for the, for example, right ideal language GA generated by G . The same question can be asked for the situations in which the state complexity of the generator is minimal. In almost all cases tight exponential bounds are found in [23].

440

Hermann Gruber, Markus Holzer, and Martin Kutrib

4.2. The language operation problem for regular expressions. We can of course ask similar questions for regular expressions. However, determining the descriptional complexity of various basic language operations remained largely open until recently. At least, some preliminary results were presented in [53], but there remained a stunningly large gap between the lower and upper bounds on the effect of complementation and intersection. With the advent of new lower bound techniques [61] and [72], the picture has changed. The main tool for proving lower bounds in this area turns out to be the star height lemma (Theorem 2.4). Concerning the complementation problem for regular expressions the naive algorithm, which converts first the given expression into an NFA, determinises, complements the resulting DFA, and finally converts back to a regular expression, gives a O.n/ doubly exponential upper bound of 22 . This bound turns out to be asymptotically tight [61], already for binary alphabets [59] and [74]. Theorem 4.2 (: RE to RE). Let n > 1 and r be a regular expression of size n over an ‚.n/ alphabet A with at least two letters. Then size 22 is sufficient and necessary in the worst-case for a regular expressions describing the complement A n L.r/. For unary p alphabets, the tight bound reads as 2‚. n log n/ . The complementation problem for regular expressions over unary alphabets was already settled in [53]. How to prove such lower bounds? For the intersection operation, we know a particularly simple example, which we briefly illustrate. Consider the intersection of the languages Km D ¹ w 2 ¹a; bº j jwja  0 mod m º and Ln D ¹ w 2 ¹a; bº j jwjb  0 mod n º. By the star height lemma, this boils down to proving a lower bound of m D min¹m; nº on the star height of this language. The minimal DFA accepting Km \Ln is bideterministic and thus by McNaughton’s theorem the star height of Km \ Ln is equal to the cycle rank of the underlying directed graph, which is isomorphic to the directed .m  n/-torus. Proving that the cycle rank of this digraph is at least m is a graph-theoretic problem of medium difficulty, see, e.g., [66]. Theorem 4.3 (RE \ RE to RE). Let n > m > 1 and r and s be a regular expressions of size n and m, over an alphabet with at least two letters. Then size 2.m/ is necessary in the worst-case for a regular expressions describing L.r/ \ L.s/.

In case of the intersection operation, providing tight bounds appears to be more difficult. The trivial algorithm for constructing an RE describing the intersection of two REs of sizes m and n gives an upper bound of size 2O.mn/ . Similar to before, we convert into finite automata, perform the usual product construction on automata, and convert back to regular expressions. By analyzing the behaviour of the undirected cycle rank under the so-called categorical product on graphs, via Theorem 3.14 one can arrive at a refined upper bound of 2O.m.1Clog.n=m/// , for m 6 n [74]. In particular, observe that for n D ‚.m/, we get O.1 C log.n=m// D O.1/, thus matching the lower bound. Also the effect of a few other language operations on regular expression size has been studied, such as shuffle [72], half-removal [75], circular shift and quotients [73]. Half-removal is defined as 21 .L/ D ¹ u j there is a v such that uv 2 L and juj D jvj º. Regarding

12. Descriptional complexity of regular languages

441

language operations in subregular language families, only a few scattered results for regular expressions are known to date [53], [61], and [74]. Because regular languages are closed under various basic language operations, it is often convenient to extend the syntax of regular expressions by a ı-operator, where ı is some regularity-preserving operation on languages. The succinctness gain by this step depends of course on the operator. It has long been known that it is non-elementary if we add complementation as a built-in operator [168]. When adding intersection and/or interleaving, a tight doublyexponential gap is known [60], [66], and [74]. By extending regular expressions with counting operators, we still have an exponential gain in succinctness in the extreme case [122]. Known bounds are summarised in Table 9. Table 9. Lower and upper bounds on alphabetic width of language operations on languages of alphabetic width m and n, denoted by RE ı RE to RE, and on the required size for transforming expressions extended with the operator ı into ordinary regular expressions, denoted by RE(ı) to RE. The bounds are for an alphabet fixed to two letters. All binary operations being symmetric, we assume m 6 n, and # stands for counting operators. General regular languages :

RE ı RE to RE

\

2.m/ 6  6 n  2O.m.1Clog.n=m///

ш

2.m/ 6  6 n  2O.m.1Clog.n=m///

n

˚

non-elementary ‚.n/

´

22 .n= log n/

O.n/

22 6  6 22 ‚.n/ 2 for jAj > n

;

2‚.n/

# 1 2

RE(ı) to RE

‚.n/

22

 6 2O.n/

 6 O.n2 /

.n2 /

6  6 O.n3 /

To conclude, let us stress that all of the “tight” upper and lower bounds presented in this section match only in an asymptotic sense. We feel that the current ability and understanding of performing efficient manipulations on regular expressions is still little developed, compared to doing these things on DFAs and NFAs.

5. Some recent developments The presented topics cover results published until early 2015. For newer results on the subject, in particular on the descriptional complexity of the operation problem, we refer to the survey given in [55], which is the last publication co-authored by the late Sheng Yu, who unexpectedly passed away in 2012. In the remainder of this section

442

Hermann Gruber, Markus Holzer, and Martin Kutrib

we want to stress two recent developments on the transformation between models of regular languages, that dates back decades. The first one is the model of reversible finite automata and the second one that of alternating finite automata. 5.1. Reversible deterministic finite automata. Reversibility is a fundamental principle in physics. The reversibility of a computation means in essence that every configuration has a unique successor configuration and a unique predecessor configuration. Originally, reversible DFAs were introduced and studied in the context of algorithmic learning theory in [7]; also see [125]. Later this concept was generalised in [5] and [157] and [141]. Almost all of these definitions agree on the fact that the transition function induces a partial injective mapping for every letter. Nevertheless, there are subtle differences. In principle the following situations appear in the literature: 1. one initial and one final state (also called bideterminism) [7] and [125], 2. one initial and multiple final states [5] and [91], and 3. multiple initial and multiple final states [141] and [157]. Obviously, the third model is the most general one, but it cannot accept all regular languages [157]. For instance, the language a b  is not reversible. It is worth mentioning that finite automata in the sense of [157] may have limited nondeterminism plugged in from the outside world at the outset of the computation, since one of the multiple initial states is guessed. A further generalisation of reversibility to quasi-reversibility, which even allows nondeterministic transitions was introduced in [141] – also see [56]. Reversibility and nondeterminism in general were studied in [94]. In view of these results natural questions concern the uniqueness and the size of a minimal reversible DFA in terms of the size of the equivalent minimal DFA. For the latter question, in [88], a lower bound of .1:001n / states has been obtained which, in turn, raises the question for the construction of a minimal reversible DFA from a given (minimal) DFA. The construction problem has partially been solved in [56] and [141], where so-called quasi-reversible automata are constructed. However, these quasi-reversible DFAs may themselves be exponentially more succinct than the minimal reversible finite state devices. In fact, the witness automata in [88] are already quasi-reversible. Yet another lower bound of .1:259 n/ was given in [5] for the conversion of a minimal DFA on a two letter alphabet to an equivalent reversible DFA, that is a partial DFA with a unique initial state and potentially multiple accepting states. This result was improved in [91] p 1C 5 n to a lower bound of ' , where ' is the golden ratio 2 , that is, approximately 1:618. This bound can be increased for larger alphabets; it has a limit of 2n 1 as the alphabet size tends to infinity. Finally, the constructions in [91] allow one to determine an upper bound of 2n 1 states for the conversion of DFAs to minimal reversible DFAs, even for arbitrary alphabet sizes. Theorem 5.1 (DFA to reversible DFA). Let n > 1 and A be an n-state DFA accepting a reversible language. Then 2n 1 states are sufficient for a reversible DFA to accept the language L.A/ and moreover .' n / states are necessary in the worst-case for a reversible DFA accepting L.A/ in the binary case.

12. Descriptional complexity of regular languages

443

It is worth mentioning that reversible DFAs are not unique. In [91] a structural characterisation of regular languages that can be accepted by reversible DFAs is given, which is based on the absence of a forbidden pattern in the (minimal) deterministic state graph – see Figure 4. Theorem 5.2 (reversible language characterisation). Let A be a minimal DFA over the input alphabet † and the transition function ı . The language L.A/ can be accepted by a reversible DFA if and only if f there do not exist states p; q in A, a letter a 2 †, and a word w 2 † such that p ¤ q , ı.p; a/ D ı.q; a/, and ı.q; aw/ D q . p

a a

r

q w

Figure 4. The “forbidden pattern” of Theorem 5.2: the language accepted by a minimal DFA A can be accepted by a reversible DFA if and only if A does not contain the depicted structure. Here the states p and q must be distinct, but state r could be equal to state p or state q . The situations where r D q or r D p are not shown; in both cases state r has an a-loop.

The number of occurrences of forbidden patterns in a DFA – see Figure 4 – gives rise to the descriptional complexity measure of the degree of irreversibility which was first considered in [10]. It was shown that this measure induces a strict infinite hierarchy of languages, where the base case are exactly the reversible languages. Moreover, the behaviour of this measure with respect to the operation problem on standard formal language operations was investigated. Contrary to ordinary DFAs, where minimality and reducibility coincides, for reversible DFAs this is not the case. Further studies on this subject were done in a series of papers [136], [137], and [138] and led to a better understanding of these devices. 5.2. Alternating and Boolean finite automata. Finally we report on recent development on alternating and Boolean finite automata. Alternating finite automata (AFAs) have been developed in [43] as a generalisation of nondeterministic finite automata. Q The transition function ıW Q  † ! ¹0; 1º¹0;1º maps pairs of states and input symbols to Boolean formulas. As the input is read (from left to right), the automaton “builds” a propositional formula, starting with the initial formula q0 (the initial state), and on reading an input a, replaces every state q in the current formula by ı.q; a/. The input is accepted if and only if the constructed formula on reading the whole input evaluates to 1 on substituting 1 for q , if q is an accepting state, and 0 otherwise. At the same period in [28] the so-called Boolean automata (BFAs) were introduced. Note, that several authors use the notation “alternating finite automata” but rely

444

Hermann Gruber, Markus Holzer, and Martin Kutrib

on the definition of BFAs. Though it turned out that both types are almost identical, there are differences with respect to the initial configurations. While for AFAs the computation starts with the fixed propositional formula q0 , a BFA starts with an arbitrary propositional formula. Clearly, this does not increase their computational capacities. However, every n-state BFA can be simulated by an .n C 1/-state AFA and accept regular languages only. Concerning the simulation of AFAs and BFAs the situation is as follows: the tight n bound of 22 states for the deterministic simulation of n-state AFAs has already been shown in the famous fundamental papers [43] for AFAs and [28] and [140] for BFAs. Theorem 5.3 (BFA/AFA to DFA). Let n > 1 and A be an n-state BFA or AFA. n Then 22 states are sufficient and necessary in the worst-case for a DFA to accept the language L.A/. The simulation of AFAs and BFAs by NFAs was known to require at most 2n C 1 states and at least 2n states [54]. Recently this long openstanding problem was solved in [114] proving a tight bound of 2n C 1 states for the nondeterministic simulation of AFAs and BFAs. Theorem 5.4 (BFA/AFA to NFA). Let n > 1 and A be an n-state BFA or AFA. Then 2n C 1 states are sufficient and necessary in the worst-case for an NFA to accept the language L.A/. If the simulation of the previous theorem is done by an NFA with multiple entry states then 2n states are sufficient and necessary in the worst-case. Moreover, it is a very old observation that the construction yields a backward deterministic NFA with multiple entry states. So, if L is a language accepted by a complete n-state BFA or AFA then the language LR , the reversal of L, is accepted by a DFA with 2n states [43]. Here similar as for nondeterministic finite automata, the transition function of AFAs and BFAs may be partial. Any AFA or BFA is said to be complete if its transition function is total. If the BFA or AFA is incomplete, the simulating DFA may need a rejecting sink state which results in 2n C 1 states in total. Theorem 5.5 (reversed BFA/AFA to DFA). Let n > 1 and A be an n-state BFA or AFA. Then 2n C 1 states are sufficient and necessary in the worst-case for a DFA to accept the language L.A/R . If A is complete then 2n states are sufficient and necessary in the worst-case. In case one started with an AFA, half of the states of the simulating DFA are accepting [114]. The converse relation of this simulation is also true. In particular, the reversal of every n-state DFA language is accepted by a BFA with dlog2 .n/e states [140]. In the particular case where the DFA has some 2m states from which 2m 1 are accepting, the simulation is possible also by an AFA with dlog2 .2m /e D m states [98]. In general, the result left open whether the reversal of every n-state DFA language is also accepted by some AFA with dlog2 .n/e states. However, we know that dlog2 .n/e C 1 states are sufficient for this purpose.

12. Descriptional complexity of regular languages

445

The efficient reverse simulation led to a series of papers [97], [98], and [126] on the operational state complexity for BFAs and AFAs. In most cases tight bounds for basic operations like, for example, union, intersection, complementation, concatenation, square, Kleene star, difference, symmetric difference, reversal, and quotients on languages represented by BFAs and AFAs are obtained. Although it looks that the descriptional complexity results for BFAs and AFAs are always the same this is in fact not true in general, since the tight bounds obtained for the operations union, intersection, difference, and quotients differ by exactly one state between BFAs and AFAs, while for the other operation listed above, the bounds on BFAs and AFAs are exactly the same.

References [1] J.-H. Ahn and Y.-S. Han, Implementation of state elimination using heuristics. In Implementation and application of automata (S. Maneth, ed.). Proceedings of the 14th International Conference (CIAA 2009) held at the University of New South Wales, Sydney, July 14–17, 2009. Lecture Notes in Computer Science, 5642. Springer, Berlin, 2009, 178–187. MR 2550022 Zbl 1248.68281 q.v. 425 [2] A. V. Aho, J. E. Hopcroft, and J. D. Ullman, The design and analysis of computer algorithms. Addison-Wesley Series in Computer Science and Information Processing. Addison-Wesley, Reading, MA, 1974. MR 0413592 Zbl 0326.68005 q.v. 416 [3] A. V. Aho, R. Sethi, and J. D. Ullman, Compilers. Principles, techniques, and tools. Addison-Wesley, Reading, MA, 1986. q.v. 421 [4] C. Allauzen and M. Mohri, A unified construction of the Glushkov, follow, and Antimirov automata. In Mathematical foundations of computer science 2006. (R. Královič and P. Urzyczyn, eds.). Proceedings of the 31st International Symposium (MFCS 2006) held in Stará Lesná, August 28–September 1, 2006. Lecture Notes in Computer Science, 4162. Springer, Berlin, 2006, 110–121. MR 2298170 Zbl 1132.68434 q.v. 422 [5] A. Ambainis and R. Freivalds, 1-way quantum finite automata: Strengths, weaknesses and generalizations. In Proceedings 39 th Annual Symposium on Foundations of Computer Science. The Institute of Electrical and Electronics Engineers, Long Beach, CA, 1998, 332–341. q.v. 442 [6] T. Ang and J. A. Brzozowski, Languages convex with respect to binary relations, and their closure properties. Acta Cybernet. 19 (2009), no. 2, 445–464. MR 2584158 Zbl 1199.68168 q.v. 438 [7] D. Angluin, Inference of reversible languages. J. Assoc. Comput. Mach. 29 (1982), no. 3, 741–765. MR 0666776 Zbl 0485.68066 q.v. 442 [8] V. M. Antimirov, Partial derivatives of regular expressions and finite automaton constructions. Theoret. Comput. Sci. 155 (1996), no. 2, 291–319. MR 1379579 Zbl 0872.68120 q.v. 422 [9] A. Arnold, A. Dicky, and M. Nivat, A note about minimal non-deterministic automata. Bull. European Assoc. Theor. Comput. Sci. 47 (1992), 166–169. Zbl 0751.68038 q.v. 413 [10] H. B. Axelsen, M. Holzer, and M. Kutrib, The degree of irreversibility in deterministic finite automata. Internat. J. Found. Comput. Sci. 28 (2017), no. 5, 503–522. MR 3737499 Zbl 1380.68244 q.v. 443

446

Hermann Gruber, Markus Holzer, and Martin Kutrib

[11] M. Beaudry and M. Holzer, On the size of inverse semigroups given by generators. Theoret. Comput. Sci. 412 (2011), no. 8–10, 765–772. MR 2796923 Zbl 1206.68168 q.v. 433 [12] P. Berman and A. Lingas, On the complexity of regular languages in terms of finite automata. Article id. 304. Polish Academy of Sciences, 1977. q.v. 425 [13] G. Berry and R. Sethi, From regular expressions to deterministic automata. Theoret. Comput. Sci. 48 (1986), no. 1, 117–126. MR 0889664 Zbl 0626.68043 q.v. 421 [14] D. Berwanger and E. Grädel, Entanglement – a measure for the complexity of directed graphs with applications to logic and games. In Logic for programming, artificial intelligence, and reasoning (F. Baader and A. Voronkov, eds.). Proceedings of the 11th International Conference (LPAR 2004) held in Montevideo, March 14–18, 2005. Lecture Notes in Computer Science, 3452. Lecture Notes in Artificial Intelligence. Springer, Berlin, 2005, 209–223. MR 2169865 Zbl 1109.68080 q.v. 417 [15] P. Bille and M. Thorup, Regular expression matching with multi-strings and intervals. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms (M. Charikar, ed.). Proceedings of SODA 2010 held in Austin, TX, January 17–19, 2010. Society for Industrial and Applied Mathematics, Philadelphia, PA, and Association for Computing Machinery, New York, 2010, 1297–1308. MR 2809745 Zbl 1288.68299 q.v. 416 [16] J.-C. Birget, Intersection and union of regular languages and state complexity. Inform. Process. Lett. 43 (1992), no. 4, 185–190. MR 1187185 Zbl 0763.68048 q.v. 413 [17] J.-C. Birget, Partial orders on words, minimal elements of regular languages and state complexity. Theoret. Comput. Sci. 119 (1993), no. 2, 267–291. MR 1244294 Zbl 0786.68071 q.v. 431 [18] J.-C. Birget, State-complexity of finite-state devices, state compressibility and incompressibility. Math. Systems Theory 26 (1993), no. 3, 237–269. MR 1209997 Zbl 0779.68061 q.v. 425 [19] J.-C. Birget, Erratum: Partial orders on words, minimal elements of regular languages and state complexity. Preprint, 2002. q.v. 431 [20] H. L. Bodlaender, J. S. Deogun, K. Jansen, T. Kloks, D. Kratsch, H. Müller, and Zs. Tuza, Rankings of graphs. SIAM J. Discrete Math. 11 (1998), no. 1, 168–181. MR 1612885 Zbl 0907.68137 q.v. 424 [21] H. Bordihn, M. Holzer, and M. Kutrib, Determination of finite automata accepting subregular languages. Theoret. Comput. Sci. 410 (2009), no. 35, 3209–3222. MR 2546877 Zbl 1173.68030 q.v. 420 [22] A. Brüggemann-Klein, Regular expressions into finite automata. Theoret. Comput. Sci. 120 (1993), no. 2, 197–213. MR 1247207 Zbl 0811.68096 q.v. 422 [23] J. Brzozowski, G. Jirásková, and B. Li, Quotient complexity of ideal languages. Theoret. Comput. Sci. 470 (2013), 36–52. MR 3004493 Zbl 1283.68190 q.v. 413, 430, 438, 439 [24] J. Brzozowski, G. Jirásková, and C. Zou, Quotient complexity of closed languages. Theory Comput. Syst. 54 (2014), no. 2, 277–292. MR 3159995 Zbl 1380.68249 q.v. 413, 430, 435, 437, 438 [25] J. A. Brzozowski, Quotient complexity of regular languages. J. Autom. Lang. Comb. 15 (2010), no. 1–2, 71–89. MR 3801316 Zbl 1345.68200 q.v. 412, 430 [26] J. A. Brzozowski, In search of the most complex regular language. Internat. J. Found. Comput. Sci. 24 (2013), no. 6, 691–708. MR 3158963 Zbl 1410.68199 q.v. 432 [27] J. A. Brzozowski and G. Davies, Maximally atomic languages. In Proceedings 14 th International Conference on Automata and Formal Languages (Z. Ésik and Z. Fülöp, eds.).

12. Descriptional complexity of regular languages

[28] [29]

[30] [31]

[32] [33]

[34]

[35]

[36]

[37]

[38]

[39]

447

Electronic Proceedings in Theoretical Computer Science (EPTCS), 151. EPTCS, 2014, 151–161. MR 3682549 q.v. 413 J. A. Brzozowski and E. L. Leiss, On equations for regular languages, finite automata, and sequential networks. Theoret. Comput. Sci. 10 (1980), no. 1, 19–35. MR 0549752 Zbl 0415.68023 q.v. 443, 444 J. A. Brzozowski and B. Li, Syntactic complexity of R- and J-trivial regular languages. Internat. J. Found. Comput. Sci. 25 (2014), no. 7, 807–821. MR 3299921 Zbl 1320.68108 q.v. 433 J. A. Brzozowski, B. Li, and D. Liu, Syntactic complexities of six classes of star-free languages. J. Autom. Lang. Comb. 17 (2012), no. 2–4, 83–105. MR 3307502 Zbl 1322.68118 q.v. 433 J. A. Brzozowski, B. Li, and Y. Ye, Syntactic complexity of prefix-, suffix-, bifix-, and factor-free regular languages. Theoret. Comput. Sci. 449 (2012), 37–53. MR 2944381 Zbl 1280.68108 q.v. 433 J. A. Brzozowski and B. Liu, Quotient complexity of star-free languages. Internat. J. Found. Comput. Sci. 23 (2012), no. 6, 1261–1276. MR 2999823 Zbl 1272.68206 q.v. 413, 430 J. A. Brzozowski and D. Liu, Universal witnesses for state complexity of basic operations combined with reversal. In Implementation and application of automata (S. Konstantinidis, ed.). Proceedings of the 18th International Conference (CIAA 2013) held at Saint Mary’s University, Halifax, N.S., July 16–19, 2013. Lecture Notes in Computer Science, 7982. Springer, Berlin, 2013, 72–83. MR 3111193 Zbl 1298.68124 q.v. 433 J. A. Brzozowski and D. Liu, Universal witnesses for state complexity of boolean operations and concatenation combined with star. In Descriptional complexity of formal systems (H. Jürgensen and R. Reis, eds.). Proceedings of the 15th Annual International Workshop (DCFS 2013) held at Western University, London, ON, July 22–25, 2013. Springer, Berlin, 2013, 30–41. MR 3111316 Zbl 1388.68159 q.v. 433 J. A. Brzozowski and M. Szykula, Upper bound on syntactic complexity of suffix-free languages. In Descriptional complexity of formal systems (J. Shallit and A. Okhotin, eds.). Proceedings of the 17th International Workshop (DCFS 2015) held in Waterloo, ON, June 25–27, 2015. Lecture Notes in Computer Science, 9118. Springer, Cham, 2015, 33–45. MR 3375018 Zbl 1390.68381 q.v. 433 J. A. Brzozowski and M. Szykula, Large aperiodic semigroups. In Implementation and application of automata (M. Holzer and M. Kutrib, eds.). Proceedings of the 19th International Conference (CIAA 2014) held at Universität Giessen, Giessen, July 30–August 2, 2014, 124–135. MR 3247087 Zbl 1302.68154 q.v. 433 J. A. Brzozowski and M. Szykula, Upper bounds on syntactic complexity of left and twosided ideals. In Developments in language theory (A. M. Shur and M. V. Volkov, eds.). Proceedings of the 18th International Conference (DLT 2014) held at the Ural Federal University, Ekaterinburg, August 26–29, 2014. Lecture Notes in Computer Science, 8633. Springer, Cham, 2014, 13–24. MR 3253086 Zbl 06355206 q.v. 433 J. A. Brzozowski and H. Tamm, Complexity of atoms of regular languages. Internat. J. Found. Comput. Sci. 24 (2013), no. 7, 1009–1027. MR 3189707 Zbl 1360.68542 q.v. 413 J. A. Brzozowski and H. Tamm, Theory of átomata. Theoret. Comput. Sci. 539 (2014), 13–27. MR 3214840 Zbl 1359.68160 q.v. 413

448

Hermann Gruber, Markus Holzer, and Martin Kutrib

[40] C. Câmpeanu, K. Čulik, K. Salomaa, and S. Yu, State complexity of basic operations on finite languages. In Automata implementation. (O. Boldt and H. Jürgensen, eds.). Proceedings of the 4th International Workshop, WIA ’99, Potsdam, Germany, July 17–19, 1999. Lecture Notes in Computer Science. 2214. Springer, Berlin, 2001, 60–70. Zbl 1050.68091 q.v. 436, 437 [41] C. Câmpeanu, K. Salomaa, and S. Yu, Tight lower bound for the state complexity of shuffle of regular languages. J. Autom. Lang. Comb. 7 (2002), no. 3, 303–310. MR 1957693 Zbl 1033.68057 q.v. 434 [42] J.-M. Champarnaud, F. Ouardi, and D. Ziadi, Normalized expressions and finite automata. Internat. J. Algebra Comput. 17 (2007), no. 1, 141–154. MR 2300409 Zbl 1117.68042 q.v. 422 [43] A. Chandra, D. Kozen, and L. J. Stockmeyer, Alternation. J. Assoc. Comput. Mach. 28 (1981), no. 1, 114–133. MR 0603186 Zbl 0473.68043 q.v. 443, 444 [44] M. Chrobak, Finite automata and unary languages. Theoret. Comput. Sci. 47 (1986), no. 2, 149–158. MR 0881208 Zbl 0638.68096 q.v. 419, 420, 431 [45] M. Chrobak, Errata to “finite automata and unary languages.” Theoret. Comput. Sci. 302 (2003), no. 1–3, 497–498. MR 1981965 q.v. 419, 431 [46] B. Courcelle, D. Niwiński, and A. Podelski, A geometrical view of the determinization and minimization of finite-state automata. Math. Systems Theory 24 (1991), no. 2, 117–146. MR 1096695 Zbl 0722.68080 q.v. 414 [47] M. Delgado and J. Morais, Approximation to the smallest regular expression for a given regular language. In Implementation and application of automata (M. Domaratzki, A. Okhotin, K. Salomaa, and S. Yu, eds.). Proceedings of the 9th International Conference, CIAA 2004, Kingston, Canada, July 22–24, 2004. Lecture Notes in Computer Science 3317. Springer, Berlin, 312–314. MR 2144483 Zbl 1115.68428 q.v. 425 [48] M. Domaratzki and A. Okhotin, State complexity of power. Theoret. Comput. Sci. 410 (2009), no. 24–25, 2377–2392. MR 2522442 Zbl 1168.68024 q.v. 434 [49] M. Domaratzki and K. Salomaa, Lower bounds for the transition complexity of NFAs. J. Comput. System Sci. 74 (2008), no. 7, 1116–1130. MR 2454057 Zbl 1152.68028 q.v. 415 [50] L. C. Eggan, Transition graphs and the star height of regular events. Michigan Math. J. 10 (1963), 385–397. MR 0157840 Zbl 0173.01504 q.v. 417 [51] A. Ehrenfeucht and H. P. Zeiger, Complexity measures for regular expressions. J. Comput. System Sci. 12 (1976), no. 2, 134–146. MR 0418509 Zbl 0329.94024 q.v. 416, 424, 425 [52] K. Ellul, Descriptional complexity measures of regular languages. Master’s thesis, University of Waterloo, Ontario, 2004. q.v. 431 [53] K. Ellul, B. Krawetz, J. Shallit, and M.-W. Wang, Regular expressions: new results and open problems. J. Autom. Lang. Comb. 10 (2005), no. 4, 407–437. MR 2376649 Zbl 1143.68434 q.v. 416, 423, 424, 425, 440, 441 [54] A. Fellah, H. Jürgensen, and S. Yu, Constructions for alternating finite automata. Internat. J. Comput. Math. 35 (1990), 117–132. Zbl 0699.68081 q.v. 444 [55] Y. Gao, N. Moreira, R. Reis, and S. Yu, A survey on operational state complexity. J. Autom. Lang. Comb. 21 (2016), no. 4, 251–310. MR 3699788 Zbl 1380.68253 q.v. 430, 441

12. Descriptional complexity of regular languages

449

[56] P. García, M. Vázquez de Parga, and D. López, On the efficient construction of quasireversible automata for reversible languages. Inform. Process. Lett. 107 (2008), no. 1, 13–17. MR 2420072 Zbl 1186.68255 q.v. 442 [57] V. Geffert, (Non)determinism and the size of one-way finite automata. In Proceedings of the 7 th International Workshop on Descriptional Complexity of Formal Systems. (DCFS 2005) (C. Mereghetti, B. Palano, G. Pighizzini, and D. Wotschke, eds.). Università degli Studi di Milano, Milano, 2005. Rapporto Tecnico 06-05, 23–37. q.v. 421 [58] V. Geffert, Magic numbers in the state hierarchy of finite automata. Inform. and Comput. 205 (2007), no. 11, 1652–1670. MR 2368645 Zbl 1130.68069 q.v. 421 [59] W. Gelade, Foundations of XML: regular expressions revisited. Ph.D. thesis. School voor Informatietechnologie, University of Hasselt, Hasselt, and University of Maastricht, Maastricht, 2009. q.v. 440 [60] W. Gelade, Succinctness of regular expressions with interleaving, intersection and counting. Theoret. Comput. Sci. 411 (2010), no. 31–33, 2987–2998. MR 2667956 Zbl 1192.68120 q.v. 441 [61] W. Gelade and F. Neven, Succinctness of the complement and intersection of regular expressions. ACM Trans. Comput. Log. 13 (2012), no. 1, Art. 4, 19 pp. MR 2893019 Zbl 1351.68139 q.v. 416, 424, 440, 441 [62] A. Gill and L. T. Kou, Multiple-entry finite automata. J. Comput. System Sci. 9 (1974), 1–19. MR 0351666 Zbl 0285.94030 q.v. 420 [63] I. Glaister and J. Shallit, A lower bound technique for the size of nondeterministic finite automata. Inform. Process. Lett. 59 (1996), no. 2, 75–77. MR 1409955 Zbl 0900.68313 q.v. 413 [64] V. M. Glushkov, Abstract theory of automata. Uspehi Mat. Nauk 16 (1961), no. 5(101), 3–62. In Russian. English translation, Russ. Math. Surv. 16 (1961), no. 5, 1–53. MR 0138529 Zbl 0104.35404 q.v. 421 [65] G. Gramlich and G. Schnitger, Minimizing NFA’s and regular expressions. J. Comput. System Sci. 73 (2007), no. 6, 908–923. MR 2332724 Zbl 1152.68459 q.v. 415 [66] H. Gruber, On the descriptional and algorithmic complexity of regular languages. Harland Media, Jena, 2010. q.v. 440, 441 [67] H. Gruber and S. Gulan, Simplifying regular expressions. A quantitative perspective. In Language and automata theory and applications (A.-H. Dediu, H. Fernau, and C. Martín-Vide, eds.). Proceedings of the 4th International Conference on Language and Automata Theory and Applications. Lecture Notes in Computer Science, 6031. Springer, Berlin, 2010, 285–296. MR 2753917 Zbl 1284.68351 q.v. 416, 422 [68] H. Gruber and M. Holzer, A note on the number of transitions of nondeterministic finite automata. In Theorietag – Automaten und Formale Sprachen. Wilhelm-Schickard-Institut für Informatik. Universität Tübingen, Tübingen, Germany, 2005, 24–25. q.v. 415 [69] H. Gruber and M. Holzer, Inapproximability of nondeterministic state and transition complexity assuming P¤NP. In Developments in language theory (T. Harju, J. Karhumäki, and A. Lepistö, eds.). Proceedings of the 11th International Conference (DLT 2007) held at the University of Turku, Turku, July 3–6, 2007. Lecture Notes in Computer Science, 4588. Springer, Berlin, 2007, 205–216. MR 2380432 Zbl 1202.68226 q.v. 415 [70] H. Gruber and M. Holzer, On the average state and transition complexity of finite languages. Theoret. Comput. Sci. 387 (2007), no. 2, 155–166. MR 2362187 Zbl 1148.68030 q.v. 415, 430

450

Hermann Gruber, Markus Holzer, and Martin Kutrib

[71] H. Gruber and M. Holzer, Finding lower bounds for nondeterministic state complexity is hard (extended abstract). In Developments in language theory (O. H. Ibarra and B. Ravikumar, eds.). Proceedings of the 10th International Conference (DLT 2006) held at the University of California, Santa Barbara, CA, June 26–29, 2006. Lecture Notes in Computer Science, 4036. Springer, Berlin, 2006, 363–374. MR 2334484 Zbl 1227.68056 q.v. 414 [72] H. Gruber and M. Holzer, Finite automata, digraph connectivity, and regular expression size. In Automata, languages and programming (L. Aceto, I. Damgaard, L. A. Goldberg, M. M. Halldórsson, A. Ingólfsdóttir, and I. Walkuwiewicz, eds.). Part II. Proceedings of the 35th International Colloquium (ICALP 2008) held in Reykjavik, July 7–11, 2008. Lecture Notes in Computer Science, 5126. Springer, Berlin, 2008, 39–50. MR 2503575 Zbl 1155.68418 q.v. 416, 417, 424, 440 [73] H. Gruber and M. Holzer, Language operations with regular expressions of polynomial size. Theoret. Comput. Sci. 410 (2009), no. 35, 3281–3289. MR 2546883 Zbl 1176.68105 q.v. 440 [74] H. Gruber and M. Holzer, Tight bounds on the descriptional complexity of regular expressions. In Developments in language theory (V. Diekert and D. Nowotka, eds.). Proceedings of the 13th International Conference (DLT 2009) held in Stuttgart, June 30–July 3, 2009. Lecture Notes in Computer Science, 5583. Springer, Berlin, 2009, 276–287. MR 2544708 Zbl 1247.68141 q.v. 418, 440, 441 [75] H. Gruber and M. Holzer, Provably shorter regular expressions from finite automata. Internat. J. Found. Comput. Sci. 24 (2013), no. 8, 1255–1279. MR 3189722 Zbl 1291.68230 q.v. 424, 440 [76] H. Gruber, M. Holzer, and M. Kutrib, The size of Higman–Haines sets. Theoret. Comput. Sci. 387 (2007), no. 2, 167–176. MR 2362188 Zbl 1143.68035 q.v. 435 [77] H. Gruber, M. Holzer, and M. Kutrib, More on the size of Higman–Haines sets: effective constructions. Fund. Inform. 91 (2009), no. 1, 105–121. MR 2508116 Zbl 1192.68410 q.v. 435 [78] H. Gruber, M. Holzer, and M. Tautschnig, Short regular expressions from finite automata: empirical results. In Implementation and application of automata (S. Maneth, ed.). Proceedings of the 14th International Conference (CIAA 2009) held at the University of New South Wales, Sydney, July 14–17, 2009. Lecture Notes in Computer Science, 5642. Springer, Berlin, 2009, 188–197. MR 2550023 Zbl 1248.68296 q.v. 425 [79] H. Gruber and J. Johannsen, Optimal lower bounds on regular expression size using communication complexity. In Foundations of software science and computational structures (R. M. Amadio, ed.). Proceedings of the 11th International Conference (FOSSACS 2008) held as part of the Joint European Conferences on Theory and Practice of Software (ETAPS 2008) in Budapest, March 26–April 6, 2008. Lecture Notes in Computer Science, 4962. Springer, Berlin, 2008, 273–286. MR 2477200 Zbl 1139.68033 q.v. 416, 425 [80] S. Gulan and H. Fernau, An optimal construction of finite automata from regular expressions. In FST & TCS 2008: IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (R. Hariharan, M. Mukund, and V. Vinay, eds.). LIPIcs. Leibniz International Proceedings in Informatics, 2. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2008, 211–222. MR 2874063 Zbl 1248.68297 q.v. 422 [81] L. H. Haines, On free monoids partially ordered by embedding. J. Combinatorial Theory 6 (1969), 94–98. MR 0240016 Zbl 0224.20065 q.v. 435

12. Descriptional complexity of regular languages

451

[82] Y.-S. Han and K. Salomaa, State complexity of union and intersection of finite languages. Internat. J. Found. Comput. Sci. 19 (2008), no. 3, 581–595. MR 2417957 Zbl 1155.68037 q.v. 436 [83] Y.-S. Han and K. Salomaa, State complexity of basic operations on suffix-free regular languages. Theoret. Comput. Sci. 410 (2009), no. 27–29, 2537–2548. MR 2531098 Zbl 1172.68033 q.v. 437 [84] Y.-S. Han, K. Salomaa, and D. Wood, State complexity of prefix-free regular languages. In Proceedings of the 8 th Workshop on Descriptional Complexity of Formal Systems (H. Leung and G. Pighizzini, eds.). New Mexico State University, Las Cruces, New Mexico, 2006, 165–176. q.v. 437 [85] Y.-S. Han, K. Salomaa, and D. Wood, Nondeterministic state complexity of basic operations for prefix-free regular languages. Fund. Inform. 90 (2009), no. 1–2, 93–106. MR 2494605 Zbl 1161.68534 q.v. 437 [86] K. Hashiguchi, Algorithms for determining relative star height and star height. Inform. and Comput. 78 (1988), no. 2, 124–169. MR 0955580 Zbl 0668.68081 q.v. 417 [87] K. Hashiguchi and N. Honda, Homomorphisms that preserve star height. Information and Control 30 (1976), no. 3, 247–266. MR 0406733 Zbl 0325.94039 q.v. 418 [88] P.-C. Héam, A lower bound for reversible automata. Theor. Inform. Appl. 34 (2000), no. 5, 331–341. MR 1829230 Zbl 0987.68043 q.v. 442 [89] T. N. Hibbard, A generalization of context-free determinism. Information and Control 11 (1967), 196–238. MR 0230574 Zbl 0168.25801 q.v. 428 [90] G. Higman, Ordering by divisibility in abstract algebras. Proc. London Math. Soc. (3) 2 (1952), 326–336. MR 0049867 Zbl 0047.03402 q.v. 435 [91] M. Holzer, S. Jakobi, and M. Kutrib, Minimal reversible deterministic finite automata. InDevelopments in language theory (I. Potapov, ed.). Proceedings of the 19th International Conference (DLT 2015) held in Liverpool, July 27–30, 2015. Lecture Notes in Computer Science, 9168. Springer, Cham, 2015, 276–287. MR 3440678 Zbl 1386.68091 q.v. 442, 443 [92] M. Holzer and B. König, On deterministic finite automata and syntactic monoid size. Theoret. Comput. Sci. 327 (2004), no. 3, 319–347. MR 2098311 Zbl 1161.68536 q.v. 433 [93] M. Holzer and M. Kutrib, Nondeterministic descriptional complexity of regular languages. Internat. J. Found. Comput. Sci. 14 (2003), no. 6, 1087–1102. MR 2031104 Zbl 1101.68657 q.v. 430, 431, 432, 436 [94] M. Holzer and M. Kutrib, Reversible nondeterministic finite automata. In Reversible computation (I. Phillips and H. Rahaman, eds.). Proceedings of the 9th International Conference (RC 2017) held in Kolkata, July 6–7, 2017. Lecture Notes in Computer Science, 10301. Springer, Cham, 2017, 35–51. MR 3678942 Zbl 06850911 q.v. 442 [95] M. Holzer, K. Salomaa, and S. Yu, On the state complexity of k -entry deterministic finite automata. J. Autom. Lang. Comb. 6 (2001), no. 4, 453–466. 2nd Workshop on Descriptional Complexity of Automata, Grammars and Related Structures (London, ON, 2000). MR 1897054 Zbl 1050.68093 q.v. 420 [96] J. E. Hopcroft and J. D. Ullman, Introduction to automata theory, languages, and computation. Addison-Wesley Series in Computer Science. Addison-Wesley, Reading, MA, 1979. MR 0645539 Zbl 0426.68001 q.v. 411, 423, 425 [97] M. Hospodár and G. Jirásková, Concatenation on deterministic and alternating automata. In Proc. 8 th Workshop on Non-Classical Models for Automata and Applications.

452

[98]

[99] [100]

[101] [102] [103] [104] [105] [106] [107]

[108]

[109] [110] [111]

Hermann Gruber, Markus Holzer, and Martin Kutrib (H. Bordihn, R. Freund, B. Nagy, and G. Vaszil, eds.). [email protected], 321. Austrian Computer Society, Vienna, 2016, 179–194. q.v. 445 M. Hospodár, G. Jirásková, and I. Krajnáková, Operations on boolean and alternating finite automata. In Computer science – theory and applications (F. V. Fomin and V. V. Podolskii, eds.). Proceedings of the 13th International Computer Science Symposium in Russia, CSR 2018, Moscow, Russia, June 6–10, 2018. Lecture Notes in Computer Science, 10846. Springer, Cham, 2018, 181–193. MR 3816839 Zbl 06986117 q.v. 444, 445 J. Hromkovič, Communication complexity and parallel computing. Texts in Theoretical Computer Science. An EATCS Series. Springer, Berlin, 1997. MR 1442518 Zbl 0873.68098 q.v. 414 J. Hromkovič, S. Seibert, and T. Wilke, Translating regular expressions into small  -free nondeterministic finite automata. J. Comput. System Sci. 62 (2001), no. 4, 565–588. MR 1837505 Zbl 1014.68093 q.v. 415, 423 J. Hromkovič, H. Petersen, and G. Schnitger, On the limits of the communication complexity technique for proving lower bounds on the size of minimal NFA’s. Theoret. Comput. Sci. 410 (2009), no. 30–32, 2972–2981. MR 2543350 Zbl 1173.68035 q.v. 414 J. Hromkovič and G. Schnitger, Comparing the size of NFAs with and without  -transitions. Theoret. Comput. Sci. 380 (2007), no. 1–2, 100–114. MR R2330644 Zbl 1115.68098 q.v. 415 O. H. Ibarra, A note on semilinear sets and bounded-reversal multihead pushdown automata. Information Processing Lett. 3 (1974), 25–28. MR 0347142 Zbl 0294.68019 q.v. 426 L. Ilie and S. Yu, Follow automata. Inform. and Comput. 186 (2003), no. 1, 140–162. MR 2001743 Zbl 1059.68063 q.v. 416, 422 Sz. Iván, Complexity of atoms, combinatorially. Inform. Process. Lett. 116 (2016), no. 5, 356–360. MR 3458446 Zbl 1352.68130 q.v. 413 K. Iwama, Y. Kambayashi, and K. Takaki, Tight bounds on the number of states of DFAs that are equivalent to n-state NFAs. Theoret. Comput. Sci. 237 (2000), no. 1–2, 485–494. MR 1756226 Zbl 0939.68068 q.v. 421 K. Iwama, A. Matsuura, and M. Paterson, A family of NFAs which need 2n ˛ deterministic states. Theoret. Comput. Sci. 301 (2003), no. 1–3, 451–462. MR 1975240 Zbl 1022.68067 q.v. 421 P. Jančar, F. Mráz, M. Plátek, and J. Vogel, Restarting automata. In Fundamentals of computation theory (H. Reichel, ed.). Proceedings of the 10 th International Conference (FCT ’95) held in Dresden, August 22–25, 1995. Springer, Berlin, 1995, 283–292. MR 1459184 q.v. 427 J. Jirásek, G. Jirásková, and A. Szabari, State complexity of concatenation and complementation. Internat. J. Found. Comput. Sci. 16 (2005), no. 3, 511-529. MR 2139620 q.v. 432 J. Jirásek, G. Jirásková, and A. Szabari, Deterministic blow-ups of minimal nondeterministic finite automata over a fixed alphabet. Internat. J. Found. Comput. Sci. 19 (2008), no. 3, 617–631. MR 2417959 Zbl 1155.68041 q.v. 421 G. Jirásková, Note on minimal finite automata. In Mathematical foundations of computer science 2001 (J. Sgall, A. Pultr, and P. Kolman, eds.). Proceedings of the 26th International Symposium (MFCS 2001) held in Mariánské Lázn˘e, August 27–31, 2001. Lecture Notes in Computer Science, 2136. Springer, Berlin, 2001, 421–431. MR 1907031 Zbl 0999.68104 q.v. 421

12. Descriptional complexity of regular languages

453

[112] G. Jirásková, State complexity of some operations on binary regular languages. Theoret. Comput. Sci. 330 (2005), no. 2, 287–298. MR 2114874 Zbl 1078.68088 q.v. 431, 432 [113] G. Jirásková, Magic numbers and ternary alphabet. Internat. J. Found. Comput. Sci. 22 (2011), no. 2, 331–344. MR 2772813 Zbl 1222.68109 q.v. 421 [114] G. Jirásková, Descriptional complexity of operations on alternating and Boolean automata. In Computer science – theory and applications (E. A. Hirsch, J. Karhumäki, A. Lepistö, and M. Prilutskii, eds.). Proceedings of the 7th International Computer Science Symposium in Russia (CSR 2012) held in Nizhny Novgorod, July 3–7, 2012. Lecture Notes in Computer Science, 7353. Springer, Berlin, 2012, 196–204. MR 2988601 Zbl 1360.68562 q.v. 444 [115] G. Jirásková and A. Okhotin, On the state complexity of operations on two-way finite automata. In Developments in language theory (M. Ito and M. Toyama, eds.) Proceedings of the 12th International Conference (DLT 2008) held in Kyoto, September 16–19, 2008. Lecture Notes in Computer Science, 5257. Springer, Berlin, 2008, 443–454. MR 2490976 Zbl 1161.68540 q.v. 430 [116] G. Jirásková and A. Okhotin, State complexity of cyclic shift. Theor. Inform. Appl. 42 (2008), no. 2, 335–360. MR 2401266 Zbl 1144.68033 q.v. 433 [117] G. Jirásková and P. Olejár, State complexity of intersection and union of suffix-free languages and descriptional complexity. In Proceedings of the 1st Workshop on Non-Classical Models for Automata and Applications (H. Bordihn, R. Freund, M. Holzer, M. Kutrib, and F. Otto, eds.). [email protected], 256. Austrian Computer Society, Vienna, 2009, 151–166. q.v. 437 [118] T. Kameda and P. Weiner, On the state minimization of nondeterministic finite automata. IEEE Trans. Comput. C-19 (1970), no. 7, 617–627. MR 0398705 Zbl 0195.02701 IEEEXplore 1671587 q.v. 414 [119] J.-Y. Kao, N. Rampersad, and J. Shallit, On NFAs where all states are final, initial, or both. Theoret. Comput. Sci. 410 (2009), no. 47–49, 5010–5021. MR 2583695 Zbl 1194.68140 q.v. 435 [120] C. A. Kapoutsis, Removing bidirectionality from nondeterministic finite automata. In Mathematical foundations of computer science 2005. (J. Jedrzejowicz and A. Szepietowski, eds.). Proceedings of the 30 th International Symposium (MFCS 2005) held in Gdansk, August 29–September 2, 2005. Lecture Notes in Computer Science, 3618. Springer, Berlin, 2005, 544–555. MR 2237397 Zbl 1156.68456 q.v. 425 [121] M. Kappes, Descriptional complexity of deterministic finite automata with multiple initial states. J. Autom. Lang. Comb. 5 (2000), no. 3, 269–278. Descriptional complexity of automata, grammars and related structures (Magdeburg, 1999). MR 1778476 Zbl 0965.68041 q.v. 420 [122] P. Kilpeläinen and R. Tuhkanen, Regular expressions with numerical occurrence indicators – Preliminary results. Proceedings of the 8 th Symposium on Programming Languages and Software Tools (SPLST) (P. Kilpeläinen and N. Päivinen, eds.). Department of Computer Science, University of Kuopio, Finland, 2003, 163–173. q.v. 441 [123] C. M. Kintala and D. Wotschke, Amounts of nondeterminism in finite automata. Acta Inform. 13 (1980), no. 2, 199–204. MR 0564465 Zbl 0423.68016 q.v. 421 [124] S. C. Kleene, Representation of events in nerve nets and finite automata. In Automata Studies (C. E. Shannon, and J. McCarthy, John eds.). Annals of Mathematics Studies 34. University Press, Princeton, N.Y, 1956, 3–41. MR 0077478 q.v. 411

454

Hermann Gruber, Markus Holzer, and Martin Kutrib

[125] S. Kobayashi and T. Yokomori, Learning approximately regular languages with reversible languages. Theoret. Comput. Sci. 174 (1997), no. 1–2, 251–257. MR 1439240 Zbl 0908.68146 q.v. 442 [126] I. Krajnáková and G. Jirásková, Square on deterministic, alternating, and boolean finite automata. In Descriptional complexity of formal systems (G. Pighizzini and C. Câmpeanu, eds.). Proceedings of the 19th IFIP WG 1.02 International Conference (DCFS 2017) held in Milan, July 3–5, 2017. Lecture Notes in Computer Science, 10316. Springer, Cham, 2017, 214–225. MR 3666375 Zbl 06767299 q.v. 445 [127] B. Krawetz, J. Lawrence, and J. Shallit, State complexity and the monoid of transformations of a finite set. Internat. J. Found. Comput. Sci. 16 (2005), no. 3, 547–563. MR 213962 Zbl 1097.68065 q.v. 433 [128] M. Kutrib, A. Malcher, and M. Wendlandt, Head and state hierarchies for unary multihead finite automata. Acta Inform. 51 (2014), no. 8, 553–569. MR 3273655 Zbl 1304.68110 q.v. 426 [129] M. Kutrib, A. Malcher, and M. Wendlandt, Simulations of unary one-way multi-head finite automata. Internat. J. Found. Comput. Sci. 25 (2014), no. 7, 877–896. MR 3299925 Zbl 1320.68112 q.v. 426 [130] M. Kutrib and J. Reimann, Optimal simulations of weak restarting automata. Internat. J. Found. Comput. Sci. 19 (2008), no. 4, 795–811. MR 2437815 Zbl 1156.68031 q.v. 427 [131] M. Kutrib and J. Reimann, Succinct description of regular languages by weak restarting automata. Inform. and Comput. 206 (2008), no. 9–10, 1152–1160. MR 2440658 Zbl 1154.68072 q.v. 427 [132] M. Kutrib and M. Wendlandt, On simulation costs of unary limited automata. In Descriptional complexity of formal systems (J. Shallit and A. Okhotin, eds.). Proceedings of the 17th International Workshop (DCFS 2015) held in Waterloo, ON, June 25–27, 2015. Lecture Notes in Computer Science, 9118. Springer, Cham, 2015, 53–164. MR 3375028 q.v. 429 [133] E. Landau, Über die Maximalordnung der Permutationen gegebenen Grades. Arch. der Math. u. Phys. (3) 3 (1903), 92–103. JFM 34.0233.02 q.v. 419 [134] E. Landau, Handbuch der Lehre von der Verteilung der Primzahlen. Teubner, Leipzig, 1909. JFM 40.0232.09 q.v. 419 [135] M. Latteux, Y. Roos, and A. Terlutte, Minimal NFA and biRFSA languages. Theor. Inform. Appl. 43 (2009), no. 2, 221–237. MR 2512256 Zbl 1166.68025 q.v. 414 [136] G. J. Lavado, G. Pighizzini, and L. Prigioniero, Minimal and reduced reversible automata. J. Autom. Lang. Comb. 22 (2017), no. 1–3, 145–168. MR 3733077 Zbl 1392.68218 q.v. 443 [137] G. J. Lavado, G. Pighizzini, and L. Prigioniero, Weakly and strongly irreversible regular languages. In Proceedings 15 th International Conference on Automata and Formal Languages (E. Csuhaj-Varjú, P. Dömösi, and G. Vaszil, eds.). EPTCS, 252. Electronic Proceedings in Theoretical Computer Science, 2017, 143–156. q.v. 443 [138] G. J. Lavado and L. Prigioniero, Concise representations of reversible automata. In Descriptional complexity of formal systems (G. Pighizzini and C. Câmpeanu, eds.). Proceedings of the 19th IFIP WG 1.02 International Conference (DCFS 2017) held in Milan, July 3–5, 2017. Lecture Notes in Computer Science, 10316. Springer, Cham, 2017, 238–249. MR 3666377 Zbl 06767301 q.v. 443

12. Descriptional complexity of regular languages

455

[139] E. L. Leiss, The complexity of restricted regular expressions and the synthesis problem for finite automata. J. Comput. System Sci. 23 (1981), no. 3, 348–354. MR 0644733 Zbl 0516.68051 q.v. 423 [140] E. L. Leiss, Succinct representation of regular languages by Boolean automata. Theoret. Comput. Sci. 13 (1981), no. 3, 323–330. MR 0603263 Zbl 0458.68017 q.v. 430, 432, 444 [141] S. Lombardy, On the construction of reversible automata for reversible languages. In Automata, languages and programming (P. Widmayer, F. T. Ruiz, R. M. Bueno, M. Hennessy, S. Eidenbenz, and R. Conejo, eds.). Proceedings of the 29th International Colloquium (ICALP 2002) held in Málaga, July 8–13, 2002. Lecture Notes in Computer Science, 2380. Springer, Berlin, 2002, 170–182. MR 2062456 Zbl 1056.68097 q.v. 442 [142] O. B. Lupanov, A comparison of two types of finite sources. Problemy Kybernetiki 9 (1963), 321–326. In Russian. German translation, Über den Vergleich zweier Typen endlicher Quellen. Probleme der Kybernetik 6 (1966), 328–335. q.v. 418, 431 [143] R. Mandl, Precise bounds associated with the subset construction on various classes of nondeterministic finite automata. In Princeton Conference on Information Sciences and Systems (CISS 1973), 1973, 263–267. q.v. 419, 420, 435 [144] A. N. Maslov, Estimates of the number of states of finite automata. Dokl. Akad. Nauk SSSR 194 (1970), 1266–1268. In Russian. English translation, Soviet Math. Dokl. 11 (1970), 1373–1375. MR 0274221 Zbl 0222.94064 q.v. 430 [145] R. McNaughton, The loop complexity of pure-group events. Information and Control 11 (1967), 167–176. MR 0249218 Zbl 0166.26905 q.v. 417 [146] R. McNaughton, The loop complexity of regular events. Information Sci. 1 (1968/1969) 305–328. MR 0249219 q.v. 417, 418 [147] R. McNaughton and H. Yamada, Regular expressions and state graphs for automata. IRE Trans. Electronic Computers 9 (1960), 39–47. Zbl 0156.25501 q.v. 416, 424 [148] F. Mera and G. Pighizzini, Complementing unary nondeterministic automata. Theoret. Comput. Sci. 330 (2005), no. 2, 349–360. MR 2114879 Zbl 1078.68091 q.v. 431 [149] C. Mereghetti and G. Pighizzini, Optimal simulations between unary automata. SIAM J. Comput. 30 (2001), no. 6, 1976–1992. MR 1856565 Zbl 0980.68048 q.v. 426 [150] A. R. Meyer and M. J. Fischer, Economy of description by automata, grammars, and formal systems. In Proceedings of the 12 th Annual Symposium on Switching and Automata Theory. SWAT ’71. Held in East Lansing, MI, USA, October 13–15, 1971. Institute of Electrical and Electronics Engineers, 1971, 188–191. q.v. 415, 418, 431 [151] F. R. Moore, On the bounds for state-set size in the proofs of equivalence between deterministic, nondeterministic, and two-way finite automata. IEEE Trans. Comput. 20 (1971), no. 10, 1211–1214. Zbl 0229.94033 IEEEXplore 1671701 q.v. 418, 431 [152] C. Nicaud, Average state complexity of operations on unary automata. In Mathematical foundations of computer science 1999 (M. Kutylowski, L. Pacholski, and T. Wierzbicki, eds.). Proceedings of the 24th International Symposium (MFCS ’99) held in Szklarska Poręba, September 6–10, 1999. Lecture Notes in Computer Science, 1672. Springer, Berlin, 1999, 231–240. MR 1731238 Zbl 0955.68068 q.v. 430 [153] A. Okhotin, On the state complexity of scattered substrings and superstrings. Fund. Inform. 99 (2010), no. 3, 325–338. MR 2663427 Zbl 1208.68139 q.v. 435 [154] G. Pighizzini and A. Pisoni, Limited automata and regular languages. Internat. J. Found. Comput. Sci. 25 (2014), no. 7, 897–916. MR 3299926 Zbl 1320.68114 q.v. 428

456

Hermann Gruber, Markus Holzer, and Martin Kutrib

[155] G. Pighizzini and A. Pisoni, Limited automata and context-free languages. Fund. Inform. 136 (2015), no. 1–2, 157–176. MR 3320057 Zbl 1335.68128 q.v. 428 [156] G. Pighizzini and J. Shallit, Unary language operations, state complexity and Jacobsthal’s function. Internat. J. Found. Comput. Sci. 13 (2002), no. 1, 145–159. MR 1884644 Zbl 1066.68072 q.v. 430 [157] J.-É. Pin, On reversible automata. In LATIN ’92 (I. Simon, ed.). Proceedings of the First Latin American Symposium on Theoretical Informatics held in São Paulo, April 6–10, 1992. Lecture Notes in Computer Science, 583. Springer, Berlin, 1992, 401–416. MR 1253368 q.v. 442 [158] M. O. Rabin and D. Scott, Finite automata and their decision problems. IBM J. Res. Develop. 3 (1959), 114–125. MR 0103795 Zbl 0158.25404 q.v. 411, 412, 418 [159] N. Rampersad, The state complexity of L2 and Lk . Inform. Process. Lett. 98 (2006), no. 6, 231–234. MR 2221535 Zbl 1187.68298 q.v. 434 [160] B. Ravikumar and O. H. Ibarra, Relating the type of ambiguity of finite automata to the succinctness of their representation. SIAM J. Comput. 18 (1989), no. 6, 1263–1282. MR 1025473 Zbl 0692.68049 q.v. 432 [161] W. J. Sakoda and M. Sipser, Nondeterminism and the size of two way finite automata. In Conference Record of the Tenth Annual ACM Symposium on Theory of Computing. (R. J. Lipton, W. A. Burkhard, W. J. Savitch, E. P. Friedman, and A. V. Aho, eds). STOC 1978. Held in San Diego, CA, May 1–3, 1978. Association for Computing Machinery, New York, 1978. 275–286. MR 0521062 Zbl 1282.68160 q.v. 425, 431 [162] A. Salomaa, D. Wood, and S. Yu, On the state complexity of reversals of regular languages. Theoret. Comput. Sci. 320 (2004), no. 2–3, 315–329. MR 2064305 Zbl 1068.68078 q.v. 432 [163] K. Salomaa and S. Yu, NFA to DFA transformation for finite languages over arbitrary alphabets. J. Autom. Lang. Comb. 2 (1997), no. 3, 177–186. MR 1611176 Zbl 0897.68060 q.v. 419, 436 [164] G. Schnitger, Regular expressions and NFAs without  -transitions (extended abstract). In STACS 2006 (B. Durand and W. Thomas, eds.). Proceedings of the 23rd Annual Symposium on Theoretical Aspects of Computer Science held in Marseille, February 23–25, 2006. Lecture Notes in Computer Science, 3884. Springer, Berlin, 2006, 432–443. MR 2249387 Zbl 1136.68422 q.v. 415, 423 [165] J. C. Shepherdson, The reduction of two-way automata to one-way automata. IBM J. Res. Develop. 3 (1959), 198–200. MR 0103796 Zbl 0158.25601 q.v. 425 [166] S. Sippu and E. Soisalon-Soininen, Parsing theory. Vol. I. Languages and parsing. EATCS Monographs on Theoretical Computer Science, 15. Springer, Berlin, 1988. MR 0960693 Zbl 0651.68007 q.v. 422 [167] M. Sipser, Lower bounds on the size of sweeping automata. J. Comput. System Sci. 21 (1980), no. 2, 195–202. MR 0597793 Zbl 0445.68064 q.v. 419 [168] L. J. Stockmeyer and A. R. Meyer, Word problems requiring exponential time: preliminary report. In 5 th Annual ACM Symposium on Theory of Computing. Papers presented at the Symposium, Austin, Tex., April 30–May 2, 1973. Association for Computing Machinery, New York, 1973, 1–9. MR 0418518 Zbl 0359.68050 q.v. 441 [169] I. H. Sudborough, Bounded-reversal multihead finite automata languages. Information and Control 25 (1974), 317–328. MR 0400818 Zbl 0282.68033 q.v. 426 [170] H. Tamm, Some minimality results on biresidual and biseparable automata. In Language and automata theory and applications (A.-H. Dediu, H. Fernau, and C. Martín-Vide,

12. Descriptional complexity of regular languages

[171] [172] [173] [174] [175] [176] [177]

457

eds.). Proceedings of the 4th international conference, LATA 2010, Trier, Germany, May 24–28, 2010. Lecture Notes in Computer Science, 6031. Springer, Berlin, 2010, 573–584. MR 2753941 Zbl 1284.68367 q.v. 414, 415 K. Thompson, Regular expression search algorithm. J. Assoc. Comput. Mach. 11 (1968), 419–422. Zbl 0164.46205 q.v. 422 A. W. To, Unary finite automata vs. arithmetic progressions. Inform. Process. Lett. 109 (2009), no. 17, 1010–1014. MR 2547574 Zbl 1202.68241 q.v. 419 P. A. S. Veloso and A. Gill, Some remarks on multiple-entry finite automata. J. Comput. System Sci. 18 (1979), no. 3, 304–306. MR 0536404 Zbl 0402.68046 q.v. 420 B. Watson, A taxonomy of finite automata construction algorithms. Department of Mathematics and Computing Science. Eindhoven University of Technology, Eindhoven, 1995, article id. 93/43. q.v. 423 S. Yu, State complexity of regular languages. J. Autom. Lang. Comb. 6 (2001), no. 2, 221–234. MR 1828860 Zbl 0978.68087 q.v. 430 S. Yu, State complexity of finite and infinite regular languages. Bull. Eur. Assoc. Theor. Comput. Sci. 76 (2002), 142–152. MR 1901177 Zbl 1024.68543 q.v. 430 S. Yu, Q. Zhuang, and K. Salomaa, The state complexities of some basic operations on regular languages. Theoret. Comput. Sci. 125 (1994), no. 2, 315–328. MR 1264137 Zbl 0795.68112 q.v. 430, 432, 433

Chapter 13

Enumerating regular expressions and their languages Hermann Gruber, Jonathan Lee, and Jeffrey Shallit

Contents 1. 2. 3. 4.

Introduction and overview . . . . . . . . . . . . . . . . . . . . . . . . . On measuring the size of a regular expression . . . . . . . . . . . . . . . A simple grammar for valid regular expressions . . . . . . . . . . . . . . Unambiguous context-free grammars and the Chomsky–Schützenberger theorem . . . . . . . . . . . . . . . . . . . . 5. Solving algebraic equations using Gröbner bases . . . . . . . . . . . . . 6. Asymptotic bounds via singularity analysis . . . . . . . . . . . . . . . . 7. Lower bounds on enumeration of regular languages by regular expressions 8. Upper bounds on enumeration of regular languages by regular expressions 9. Exact enumerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10. Conclusion and open problems . . . . . . . . . . . . . . . . . . . . . .

459 461 462

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

488

463 466 469 472 480 485 486

1. Introduction and overview Regular expressions have been studied for almost fifty years, yet many interesting and challenging problems about them remain unsolved. By a regular expression, we mean a string over the alphabet † [ ¹ C; ; . ; / ; "; ; º that represents a regular language. For example, .0 C 10/ .1 C "/ represents the language of all strings over ¹0,1º that do not contain two consecutive 1’s. (There are other versions of regular expressions, which go by various names, such as “extended regular expressions”, “regex”, “practical regular expressions”, and so forth, but we do not study them in this chapter.) We would like to enumerate both (i) valid regular expressions and (ii) the distinct languages they represent. Observe that these are two different enumeration tasks: on the one hand, every regular expression represents exactly one regular language. On the other hand, simple examples, such as the expressions .a C b/ and .b a / , show that the same language can have many different regular expressions that specify it. We are in a similar situation if we use descriptors other than regular expressions, such as deterministic or nondeterministic finite automata. Although enumeration of automata has a long history, until recently little attention was paid to enumerating the distinct languages accepted. Instead authors concentrated on enumerating the automata

460

Hermann Gruber, Jonathan Lee, and Jeffrey Shallit

themselves according to various criteria (e.g., acyclic, nonisomorphic, strongly connected, initially connected, . . . ). Here is a brief survey of known results on automata. Vyssotsky [54] raised the question of enumerating strongly connected finite automata in an obscure technical report (but we have not been able to obtain a copy). Harary [19] enumerated the number of “functional digraphs” (which are essentially unary deterministic automata with no distinguished initial or final states) according to their cycle structure; also see Read [49] and Livshits [40]. Harary also mentioned the problem of enumerating deterministic finite automata over a binary alphabet as an open problem in a 1960 survey of open problems in enumeration [20], p. 75 and p. 87, and later in a similar 1964 survey [21]. Ginsburg [16], p. 18, asked for the number of nonisomorphic automata with output on n states with given input and output alphabet size. Harrison [23] and [24] developed exact formulas for the number of automata with specified size of the input alphabet, output alphabet, and number of states. Similar results were found by Korshunov [29]. However, in their model, the automata do not have a distinguished initial state or set of final states. Using the same model, Radke [47] enumerated the number of strongly connected automata, but his solution was very complicated and not particularly useful. Harary and Palmer [22] found very complicated formulas in the same model, but including an initial state and any number of final states. Harrison [23] and [24] gave asymptotic estimates for the number of automata in his model, but his formulas contained some errors that were later corrected by Korshunov [30]. For example, the number of nonisomorphic unary automata with n 1 states (and no distinguished initial or final states) is asymptotically c. n/ 2  n where : : c D 0:80 and  D 0:34. Much work on enumeration of automata was done in the former Soviet Union. For example, Liskovets [38] studied the number of initially connected automata and gave both a recurrence formula and an asymptotic formula for them; also see Robinson [50]. Korshunov [31] counted the number of minimal automata, and [32] gave asymptotic estimates for the number of initially connected automata. The 78-page survey by Korshunov [33], which unfortunately seems to never have been translated into English, gives these and many other results. More recently, Bassino and Nicaud [5] found that the number of nonisomorphic initially connected deterministic automata with n states is closely related to the Stirling numbers of the second kind. Their estimate was later improved by Lebensztayn [35]. Shallit and Breitbart observed that the number of finite automata can be applied to give bounds on the “automaticity” of languages and functions [52]. Pomerance, Robson, and Shallit [46] gave an upper bound on the number of distinct unary languages accepted by unary NFA’s with n states. Domaratzki, Kisman, and Shallit considered the number of distinct languages accepted by finite automata with n states [12]. They showed, for example, that the number of distinct languages accepted by unary finite : automata with n states is 2n .n ˛ C O.n2 n=2 //, where ˛ D 1:3827. (A weaker result was previously obtained by Nicaud [43].) Domaratzki [10] and [9] gave bounds on the number of minimal DFA’s accepting finite languages, which were improved by

13. Enumerating regular expressions and their languages

461

Liskovets [39]. Also see [6]. Almeida, Moreira, and Reis [2] showed how to encode complete initially connected DFA’s, and thereby count them. Bassino, David, and Nicaud [3] enumerated incomplete DFA’s. Bassino, David, and Sportiello [4] gave an asymptotic formula for the number of minimal DFA’s. For more details about enumeration of automata and languages, see the survey of Domaratzki [11].

2. On measuring the size of a regular expression Although, as we have seen, there has been much work for over 50 years on enumerating automata and the languages they represent, the analogous problem for regular expressions does not seem to have been studied before 2004 [36]. We define Rk .n/ to be the number of distinct languages specified by regular expressions of size n over a k -letter alphabet. The “size” of a regular expression can be defined in several different ways [14].  Ordinary length: total number of symbols, including parentheses, ;, ", etc., counted with multiplicity, – .0 C 10/ .1 C "/ has ordinary length 12, – mentioned, for example, in [1], p. 396, and [27].  Reverse polish length: number of symbols in a reverse polish equivalent, including a symbol  for concatenation. Equivalently, number of nodes in a syntax tree for the expression, – .0 C 10/ .1 C "/ in reverse polish would be 010  C  " C , – this has reverse polish length 10, – mentioned in [56].  Alphabetic width: number of symbols from †, counted with multiplicity, not including ", ;, parentheses, operators, – .0 C 10/ .1 C "/ has alphabetic width 4, – Mentioned in [42], [13], and [37].

Each size measure seems to have its own advantages and disadvantages. The ordinary length appears to be the most direct way to measure the size of a regular expression. Here we can employ the usual priority rules, borrowed from arithmetic, for saving parentheses and omitting the  operator. This favors the catenation operator  over the union operator C. For instance, the expression .a  b/ C .c  d/ can be written more briefly as abCcd, which has ordinary length 5, whereas there is no corresponding way to simplify the expression .a C b/.c C d/, which is twice as long. The other two measures are more robust in this respect. In particular, reverse polish length is a faithful measure for the amount of memory required to store the parse tree of a regular expression, and alphabetic width is often used in proofs of upper and lower bounds – compare Chapter 12. A drawback of alphabetic width is that it may be far from the “real” size of a given regular expression. As an example, the expression .." C ;/ ; C "/ has alphabetic width 0.

462

Hermann Gruber, Jonathan Lee, and Jeffrey Shallit

However, these three measures are all essentially identical, up to a constant multiplicative factor. We say “essentially” because one can always artificially inflate the ordinary length of a regular expression by adding arbitrarily many multiplicative factors of ", additive factors of ;, etc. In order to avoid such trivialities, we define what it means for a regular expression to be collapsible, as follows. Definition 2.1. Let E be a regular expression over the alphabet †, and let L.E/ be the language specified by E . We say E is collapsible if at least one of the following conditions holds: 1. E contains the symbol ; and jEj > 1; 2. E contains a subexpression of the form F G or GF where L.F / D ¹"ºI

3. E contains a subexpression of the form F C G or G C F where L.F / D ¹"º

and " 2 L.G/:

Otherwise, if none of the conditions hold, E is said to be uncollapsible. Definition 2.2. If E is an uncollapsible regular expression such that 1. E contains no superfluous parentheses and 2. E contains no subexpression of the form F  , then we say E is irreducible. Note that a minimal regular expression for E is uncollapsible and irreducible, but the converse does not necessarily hold. In [14] the following theorem is proved (cf. [27]). Theorem 2.1. Let E be a regular expression over †. Let jEj denote its ordinary length, let jrpn.E/j denote its reverse polish length, and let j alph.E/j denote the number of alphabetic symbols contained in E . Then (a) j alph.E/j 6 jEj; (b) if E is irreducible and j alph.E/j > 1, then jEj 6 11  j alph.E/j 4; (c) jrpn.E/j 6 2  jEj 1; (d) jEj 6 2  jrpn.E/j 1; (e) j alph.E/j 6 12 .jrpn.E/j C 1/; (f) if E is irreducible and j alph.E/j > 1, then jrpn.E/j 6 7  j alph.E/j 2.

3. A simple grammar for valid regular expressions As we have seen, if we want to enumerate regular expressions by size, we first have to agree upon a notion of expression size. But even then there still remains some ambiguity about the definition of a valid regular expression. For example, does the empty expression, that is, a string of length zero, constitute a valid regular expression? How about () or a ? The first two, for example, generate errors in the software

13. Enumerating regular expressions and their languages

463

package Grail version 2.5 [48]. Surprisingly, very few textbooks, if any, define valid regular expressions properly or formally. For example, using the definition given in Martin [41], p. 86, the expression 00 is not valid, since it is not fully parenthesised. (To be fair, after the definition it is implied that parentheses can be omitted in some cases, but no formal definition of when this can be done is given.) Probably the best way to define valid regular expressions is to use a context-free grammar. We now present an unambiguous context-free grammar for all valid regular expressions: S ! EC j E j G;

EC ! EC C F j F C F; F

! E j G;

E ! E G j GG; G ! E j C j P; C !;j"ja 

E ! G ;

.a 2 †/;

P ! .S /:

This grammar can be proved unambiguous by induction on the size of the regular expression generated. The meaning of the non-terminals is as follows: S generates all regular expressions; EC generates all unparenthesised expressions where the last operator was C; E generates all unparenthesised expressions where the last operator was  (implicit concatenation); E generates all unparenthesised expressions where the last operator was  (Kleene closure); C generates all unparenthesised expressions where there was no last operator (i.e., the constants); P generates all parenthesised expressions.

Here by “parenthesised” we mean there is at least one pair of enclosing parentheses. Note this grammar allows a , but disallows . /. Something similar was also done by [44], who devised a grammar for describing all reasonable regular expressions in order to perform an average-case analysis. Once we have an unambiguous grammar, we can use a powerful tool – the Chomsky–Schützenberger theorem – to enumerate the number of expressions of size n. We do this in the next section.

4. Unambiguous context-free grammars and the Chomsky–Schützenberger theorem Our principal tool for enumerating the number of strings of length n generated by an unambiguous context-free grammar is the Chomsky–Schützenberger theorem [7].

464

Hermann Gruber, Jonathan Lee, and Jeffrey Shallit

To state the theorem, we first recall some basic notions about grammars; these can be found in any introductory textbook on formal language theory, such as [26]. A context-free grammaris a quadruple of the form G D .V; †; P; S /, where V is a nonempty finite set of non-terminals, † is a nonempty finite set called the alphabet, P is a finite subset of V  .V [ †/ called the productions, and S 2 V is a distinguished non-terminal called the start variable. The elements of † are often called terminals. A production .A; / is typically written A ! . A sentential form is an element of .V [ †/ . Given a sentential form ˛Aˇ , where A 2 V and ˛; ˇ 2 .V [ †/ , we can apply the production A ! to get a new sentential form ˛ ˇ . In this case we write ˛Aˇ H) ˛ ˇ . We write H) for the reflexive, transitive closure of H) ; that is, we write ˛ H) ˇ if we can get from ˛ to ˇ by 0 or more applications of H) . The language generated by a context-free grammar is the set of all strings of terminals obtained in 0 or more derivation steps from S , the start variable. Formally, L.G/ D ¹x 2 † W S H) xº. A language is said to be context-free if it is generated by some context-free grammar. Given a sentential form ˛ derivable from a non-terminal A, we can form a parse tree for ˛ as follows: the root is labelled A. Every node labelled with a non-terminal B has subtrees with roots labelled, from left to right, with the elements of , where B ! is a production. A grammar is said to be unambiguous if for each x 2 L.G/ there is exactly one parse tree for x ; otherwise it is said to be ambiguous. It is known that not every context-free language has an unambiguous grammar. Now we turn to formal power series; for more information, see, for example [55]. A formal power series over a commutative ring R in an indeterminate x is an infinite sequence of coefficients .a0 ; a1 ; a2 ; : : :/ chosen from R, and usually written a0 C a1 x C a2 x 2 C    . The set of all such formal power series is denoted RŒŒx. The set of all formal power series is itself a commutative ring, with addition defined term-byterm, and multiplication defined by the usual Cauchy product as follows: if f D 2 2 2 a0 Ca1 x Ca2 x P C   and g D b0 Cb1 x Cb2 x C   , then fg D c0 Cc1 x Cc2 x C   , where cn D i Cj Dn ai bj . Exponentiation of formal series is defined, as usual, by iterated multiplication, so that f 2 D ff , for example. A formal power series f is said to be algebraic (over R.x/) if there exist a finite number of polynomials with coefficients in R, r0 .x/; r1 .x/; : : : ; rn .x/ such that r0 .x/ C r1 .x/f C    C rn .x/f n D 0:

The simplest nontrivial examples of algebraic formal series are the rational functions, which are quotients of polynomials p.x/=q.x/. Here is a less trivial example. The generating function of the Catalan numbers  X 2n n f .x/ D x nC1 D x C x 2 C 2x 3 C 5x 4 C 14x 5 C 42x 6 C 132x 7 C    n C 1 n>0 p 1 4x/, and hence we have f 2 f Cx D 0. is well known [53] to satisfy f .x/ D 12 .1 Thus f .x/ is an algebraic (even quadratic!) formal series.

13. Enumerating regular expressions and their languages

465

Now that we have the preliminaries, we can state the Chomsky–Schützenberger theorem: Theorem 4.1. If L is aPcontext-free language having an unambiguous grammar, and an WD jL \ †n j, then n>0 an x n is a formal power series in ZŒŒx that is algebraic over Q.x/. Furthermore, the equation of which the formal power series is a root can be deduced as follows: first, we carry out the following replacements:    

every terminal is replaced by an indeterminate x ; every occurrence of " is replaced by the integer 1; every occurrence of ! is replaced by D; every occurrence of j is replaced by C.

By doing so, we get a system of algebraic equations, called the “commutative image” of the grammar, which can then be solved to find a defining equation for the power series. Oddly enough, Chomsky and Schützenberger did not actually provide a proof of their theorem. A proof was given by Kuich and Salomaa [34] and, more recently, by Panholzer [45]. Let’s look at a simple example. Consider the unambiguous grammar S ! M j U;

M ! 0M1M j "; U

! 0S j 0M1U;

which represents strings of “if-then-else” clauses. Then this grammar has the following commutative image: (1)

S D M C U; 2

2

(2)

M D x M C 1; 2

(3)

U D S x C x M U:

This system of equations has the following power series solutions: M D 1 C x 2 C 2x 4 C 5x 6 C 14x 8 C 42x 10 C    ;

U D x C x 2 C 3x 3 C 4x 4 C 10x 5 C 15x 6 C 35x 7 C 56x 8 C    ; S D 1 C x C 2x 2 C 3x 3 C 6x 4 C 10x 5 C 20x 6 C 35x 7 C    :

By the Chomsky–Schützenberger theorem, each of M; U; S satisfies an algebraic equation over Q.x/. We can solve the system above to find the equation for S , as follows: first, we solve (3) to get U D 1 Sx , and substitute back in (1) to get x2 M 2 . Multiplying through by 1 x M gives S x 2 M S D M x 2 M 2 CS x , S D M C 1 Sx x2 M which, by (2), is equivalent to S x 2 M S D 1 C S x . Solving for S , we get S D 1 . Now (whatever M and x are) we have 1 x2 M x .1

x2 M

x/2 D x 2 .1

M C x2 M 2/

x.2x

1/

.2x

1/.1

x2 M

x/;

Hermann Gruber, Jonathan Lee, and Jeffrey Shallit

466

so we get S

2

D

x.2x

1/ x.2x

.2x

1/S

1

1/S 2 C .2x

and hence 1/S C 1 D 0:

This is an equation for S .

5. Solving algebraic equations using Gröbner bases Before introducing the notion of Gröbner bases, we describe some of the relevant mathematical notions from the field of commutative algebra. The exposition here is impressionistic; readers familiar with algebraic geometry will have no difficulty reformulating it in more formalised terms. For readers seeking for a more thorough introduction into the topic, there are accessible textbooks at the undergraduate level, such as [8]; a standard graduate level textbook is [25]. We recall that a field k is a commutative ring with the additional property that multiplicative inverses exist. That is, for any non-zero a 2 k , there exists an element b such that ab D ba D 1; more informally, one can “divide by a.” Familiar examples of fields are the rational numbers Q, the real numbers R, and the complex numbers C. On the other hand, the commutative ring Z of integers is not a field, and the smallest field containing it is Q. For our application to the asymptotic enumeration of regular languages, we are interested in the commutative ring of formal power series ZŒŒx. This is not a field, but rather only a ring – note, for example, that the element 2x does not have a multiplicative inverse. For the purposes of our algebraic framework it is convenient to work with the field k D Q..x// of formal Laurent series over Q. A formal Laurent series is defined similarly to a formal power series, with the difference that finitely many negative exponents are allowed; an example is 1 1 1 x x2 ex D C C C C C : x2 x2 x 2 6 24

The following discussion holds for any field k , but for intuition, the reader may prefer to think of k D R. Given any field k and indeterminates X1 ; X2 ; : : : ; Xn , there are two important objects:  the n-dimensional vector space W D k n over k , with coordinates Xi (1 6 i 6 n); and  the ring kŒX1 ; X2 ; : : : ; Xn  of (multivariate) polynomials over k in n indeterminates.

For instance, taking k D Q..x//, the polynomial S xCx 2 M U U , which we used in the previous section in equation (3), is member of the ring kŒS; M; U . The corresponding vector space W has coordinates S , M , and U . Notice that x is not a coordinate of W , but an artifact originating from the way the members of the underlying field k are defined.

13. Enumerating regular expressions and their languages

467

Given any collection of polynomials F in R, we can define their vanishing set V .F/ to be the set of common solutions in W ; that is, all points .x1 ; x2 ; : : : ; xn / 2 W such that f .x1 ; x2 ; : : : ; xN / D 0 for all f 2 F :

As an example, let W D R3 , with coordinates X; Y; Z . Then the vanishing set of the set of polynomials F D ¹X; Y C 3; Z C Y 2º is the single point given by .X; Y; Z/ D .0; 3; 5/; the vanishing set of the single polynomial Z X 2 Y 2 is an upward-opening paraboloid. The ideal hFi generated by a collection F of polynomials is the set of all R-linear combinations of F; that is, all polynomials of the form p1  f1 C p2  f2 C    C p`  f`

where pi 2 R; fi 2 F for all i :

Observe that the vanishing sets of a collection of polynomials and their generated ideal are equal: V .F/ D V .hFi/. A term ordering on R is a total order  on the set of monomials (disregarding coefficients) of R satisfying  multiplicativity: if u; v; w are any monomials in R, then u  v implies wu  wv ;

 well-ordering: if F is a collection of monomials, then F has a smallest element under .

Once a term ordering has been defined, one can then define the notion of the leading term of a polynomial, similar to the univariate case. For example, one defines the pure lexicographic order on kŒX; Y; Z given by Z  Y  X to be the ordering where X a Y b Z c  X d Y e Z f if and only if .a; b; c/ < .d; e; f / lexicographically. With this ordering, an example of a polynomial with its monomials in decreasing order is X 3 C X 2Y C X 2Z7 C Y 9 C 1 I

its leading term is X 3 D X 3 Y 0 Z 0 , and its trailing terms are X 2 Y , X 2 Z 7 , Y 9 , and 1. Given an ideal I , a Gröbner basis B for I is a set of polynomials g1 ; g2 ; : : : ; gk such that the ideal generated by the leading terms of the gi is precisely the initial ideal of I , defined to be the set of leading terms of polynomials in I . It can be shown that B generates I . Furthermore, we say that B is a reduced Gröbner basis if  the coefficient of each leading term in B is 1;

 the leading terms of B are a minimal set of generators for the initial ideal of B ; and  no trailing terms of B appear in the initial ideal of I .

Once a term order has been chosen, reduced Gröbner bases are unique. Note that in general, there are many term orderings for a polynomial ring R; the computational difficulty of a computation involving Gröbner bases is often highly sensitive to the choice of term ordering used.

Hermann Gruber, Jonathan Lee, and Jeffrey Shallit

468

Having established these preliminaries, we turn our attention to solving a system of equations given by the commutative image of a context-free grammar. Suppose we have a context-free grammar in the non-terminals S; N1 ; N2 ; : : : ; Nn . For each non-terminal N , let fN also denote the generating function enumerating the language generated by N . Taking k to be the field of formal Laurent series Q..x//, the Chomsky– Schützenberger theorem implies that fN 2 k for every non-terminal N . Furthermore, by taking the commutative image of the context-free grammar, we obtain a sequence of polynomials pS ; pN1 ; : : : ; pNn , where for every non-terminal N , the polynomial relation pN is the commutative image of the derivation rule for N . Note that every such polynomial is in the polynomial ring .ZŒx/ ŒS; N1 ; N2 ; : : : ; Nn . It follows from the definitions that for every non-terminal N , pN .fS ; fN1 ; fN2 ; : : : ; fNn / D 0 I

that is, the .n C 1/-tuple .fS ; fN1 ; fN2 ; : : : ; fNn / is a zero of the polynomial pN . Since this holds for every non-terminal N , we can equivalently say that the .n C 1/-tuple .fS ; fN1 ; fN2 ; : : : ; fNn / is in the vanishing set V .I /, where I is generated by the polynomials pS ; pN1 ; pN2 ; : : : ; pNn . Our aim is to determine an algebraic equation satisfied by the power series fS . To do this, we find a Gröbner basis B for I , using an elimination ordering on the indeterminate S . The defining property of any such term ordering is that the monomials involving only the indeterminate S are strictly smaller than the other monomials; namely, those involving at least one of N1 ; N2 ; : : : ; Nn . By the Chomsky–Schützenberger theorem and the properties of Gröbner bases, the smallest polynomial p in B will be a univariate polynomial in the indeterminate S . Since p 2 I , and .fS ; fN1 ; fN2 ; : : : ; fNn / is in the vanishing set V .I /, we see that p.fS / D 0; that is, p D 0 is an algebraic equation satisfied by fS . (Note that in previous sections, we simply use S to denote fS .) As an example, we use the software package Maple 2017 to compute such an algebraic equation for the example grammar in the previous section. We give the commands, followed by the produced output. The commutative image of the grammar is entered as a list of polynomials, given by > eqs := [ -S + M + U, -M + x^2*M^2 + 1, -U + S*x + x^2*M*U ];

eqs WD ΠS C M C U; M 2x 2

M C 1; M Ux 2 C S x

U :

Maple provides an elimination ordering called lexdeg; to compute a reduced Gröbner basis using this ordering, we enter the command > Groebner[Basis](eqs, lexdeg([M, U], [S])); Œ.2x 2

x/S 2 C .2x

1/S C 1; .x

1/S C Ux C 1; M x C . 2x C 1/S

The algebraic equation satisfied by S is the first polynomial in this set: > algeq := %[1];

algeq WD .2x 2

x/S 2 C .2x

1/S C 1:

1:

13. Enumerating regular expressions and their languages

469

To compute the Laurent series zeros of S using this polynomial, we solve for S and expand the solutions as Laurent series in the indeterminate x : > map(series, [solve(algeq, S)], x); Πx

1

1

x

2x 2

3x 3

6x 4 C O.x 5 /; 1 C x C 2x 2 C 3x 3 C 6x 4 C O.x 5 /:

Our desired power series solution is the second entry in the above returned list. It is easy to obtain more terms.

6. Asymptotic bounds via singularity analysis P If L is a context-free language having an unambiguous grammar and f .x/ D an x n is the formal power series enumerating it, then f .x/ is algebraic over Q.x/ by Theorem 4.1. The previous section gave a procedure for computing an algebraic equation satisfied by f ; that is, we are able to determine a non-trivial polynomial P .x; S / 2 ZŒx; S  such that P .x; f .x// D 0. This section describes how singularity analysis can be used to determine the asymptotic growth rate of the coefficients an . We sketch some of the requisite notions from complex analysis and provide a glimpse of the underlying theory; more details can be found in Flajolet and Sedgewick [15]. The use of considering complex analysis is that the formal power series f .x/, defined purely combinatorially, can be viewed as a function defined on an appropriate open subset of the complex plane C. Such a function is called holomorphic or (complex) analytic; this reinterpretation of f .x/ allows us to apply theorems from complex analysis in order to derive bounds on the asymptotic growth rate of the an far tighter than what we could do with purely combinatorial reasoning. Indeed, assume that L is an infinite context-free language. Then there exists a real number 0 < R 6 1 called the radius of convergence for f .x/. The defining properties of R are as follows:  if z is a complex number with jzj < R, then the infinite sum a0 C a1 z C a2 z 2 C a3 z 3 C    converges;

 if z is a complex number with jzj > R, then the infinite sum a0 C a1 z C a2 z 2 C a3 z 3 C    diverges. P We note that the definition says nothing about the convergence of ai z i when jzj D R. Thus, defining U to be the open ball of complex numbers z satisfying jzj < R, we can reinterpret f as an analytic function on U . The connection between the asymptotic growth of the coefficients an and the number R is given by two theorems.

Theorem 6.1 (Hadamard). Given a power series, R is given by the explicit formula RD

1 : lim sup jan j1=n n!1

470

Hermann Gruber, Jonathan Lee, and Jeffrey Shallit

The defining properties of lim sup state that  for all " > 0, the relation jan j1=n
0, the relation jan j1=n >

1 R

" holds for infinitely many n.

For our situation in particular, this implies that up to a sub-exponential factor, the n sequence .an / growsasymptotically like 1=R n  . (This implies that for all " > 0, we n 1 1 and an … O R " . have an 2 O R C " We note that Hadamard’s formula applies to all power series, not just to generating functions of context-free languages. An elementary argument shows that our assumption that L is infinite implies R 6 1; similarly, our assumption that L is context-free (and thus algebraic) implies R > 0. (The argument for showing R > 0 is harder, and is sketched here for those familiar with complex analysis. The algebraic curve given P by P .z; y/ D 0 determines d branches around z D 0 and the power series f .x/ D n an x n must be associated with one such branch. Since the exponents of f .x/ are non-negative integers, this must be an analytic branch at 0; hence, f .x/ determines an analytic function at 0 and must have positive radius of convergence.) The second theorem describes the convergence of the power series f .x/ on the circle given byPjzj D R. A dominant singularity for f .x/ is a point z0 on this circle such that the sum an z0n diverges; the following result says that a positive (real-valued) dominant singularity always exists. P Theorem 6.2 (Pringsheim). Let f .x/ D n an x n be a power series with radius of convergence R > 0. If the coefficients an are all non-negative, then R is a dominant singularity for f .x/. The benefit of Pringsheim’s theorem is that, for the sake of determining R, it suffices to examine the positive real line for the singularities of f .x/ considered as a function, not just as a power series. We make this more precise now, by introducing the concept of a multi-valued function. Suppose that the power series f .x/ is algebraic of degree d over Q.x/. Under the assumption that P is irreducible, this means that the degree of the polynomial P .x; S / 2 ZŒx; S  in the indeterminate S is d , and we may write P D qn S n C qn

1S

n 1

C qn

2S

n 2

C    C q0 ;

where each qi is a polynomial in ZŒx and qn is non-zero. (If P is reducible, factor it and replace it by an appropriate irreducible factor.) S If we work in the algebraically closed Puiseux series field n>1 C..x 1=n //, we obtain d roots of P .x; S / D 0, say, g1 .x/; g2 .x/; : : : ; gd .x/, one of which coincides with f .x/. In general, these roots will not be power series with non-negative integer coefficients, but instead will be more generalised power series with complex coefficients and (possibly negative) fractional exponents.

13. Enumerating regular expressions and their languages

471

Let D.x/ 2 ZŒx be the discriminant of P with respect to the indeterminate S ; this is readily computed via the formula   . 1/n.n 1/=2 @ DD  Res P; P; S : qn @S Here, Res denotes the resultant of two polynomials, defined to be the determinant of a matrix whose entries are given by the coefficients of the polynomials in a certain way. The theoretical importance of D is that it satisfies the identity Y D.x/ D qn2.n 1/ .gi .x/ gj .x//: i ¤j

Define the exceptional set „ of P to be the complex zeros of D ; note that this is a finite set. For every point z in the complement C n „, where D does not vanish, there exist d distinct solutions y to the equation P .z; y/ D 0. Furthermore, the d distinct solutions vary continuously with z , and a locally continuous choice of solutions locally determines a branch (which is locally an analytic function) of the algebraic curve cut out by P .z; y/ D 0; this is how a multi-valued function arises. On the open set U , which we have defined to be the set of points z satisfying jzj < R, one such branch is given by our initial power series f .x/. By Pringsheim’s theorem, f .x/ diverges at R; this shows that f .x/, considered as an analytic function on U , has no analytic continuation to a function on an open set containing U [ ¹Rº. According to the discussion above, this shows that R must be in the exceptional set „. We have given a method to calculate an upper bound for the growth rate of the an ; in particular, we have shown parts .1/ and .2/ of the following: P Theorem 6.3. Let f .x/ D n an x n be a formal power series where an > 0 for each n. Suppose P .x; S / D 0 is a non-trivial algebraic equation satisfied by f .x/, and let D be the discriminant of P with respect to S . Then exactly one of the positive real roots R of D satisfies the following properties: n  1. for all " > 0, an 2 O R1 C " ; n  2. for all " > 0, an … O R1 " ; and 3. if D has no zero n z0 ¤ R such that jz0 j D R, then, for all " > 0, one gets an 2  R1 " .

We remark that part .3/ is much more difficult to show; it is implied by the stronger result that if D has no zeroz0 ¤ R such that jz0 j D R, then there exists a polynomial 1 n p such that an  p.n/  R . Given the list 1 < 2 <    < k of positive real-valued elements of „, there remains the task of selecting which j to use to provide an upper or lower bound. The bigger j is, the better our upper bound will be; however, for this bound to be valid, we must ensure that j 6 R. For our purposes, we simply employ a bootstrapping method – if it is known beforehand that an 2 O.ns / for some s , then we simply choose the minimal j such that 1=j 6 s ; equivalently, j > 1=s . If this is not possible, we simply pick j D 1. Similarly, for a lower bound, we choose the maximal j such that

472

Hermann Gruber, Jonathan Lee, and Jeffrey Shallit

j 6 1=t if it is known that an 2 .nt /. (With much more work, one can precisely identify R; Flajolet and Sedgewick [15] describe an algorithm “Algebraic Coefficient Asymptotics” that does this.) As an illustration, we continue the Maple example in the previous section to derive an asymptotic upper bound for the example grammar. We first recall the algebraic equation satisfied by S : > algeq; .2x 2

x/S 2 C .2x

We compute the discriminant D :

1/S C 1:

> d := discrim(algeq,S); d WD

The real roots of D are given by

.2 x C 1/.2 x

1/:

> realroots := [fsolve(%)];

realroots WD Œ 0:5000000000; 0:5000000000:

Finally, an upper bound is given by taking the inverse of the smallest positive real root: > 1/min(op(select(type, realroots, positive))); 2:000000000:

Hence, an 2 O..2 C "/ / for all " > 0. n

7. Lower bounds on enumeration of regular languages by regular expressions We now turn to lower bounds on Rk .n/. In the unary case (k D 1), we can argue as follows: consider any subset of ¹"; a; a2 ; : : : ; at 1 º. Such a subset can be denoted by a regular expression of (ordinary) length at mostpt.t C 1/=2. Since there are 2t distinct subsets, this gives a lower bound of R1 .n/ > 2 2n 1 . Similarly, when k > 2, there are k n distinct strings of length n, so Rk .n/ > k n . These naive bounds can be improved somewhat using a grammar-based approach. Consider a regular expression of the form w1 ." C w2 ." C w3 ." C    ///

where the wi denote nonempty words. Every distinct choice of the wi specifies a distinct language. Such expressions can be generated by the grammar S ! Y j Y ." C S /;

Y

! aY j a ;

a 2 †;

13. Enumerating regular expressions and their languages

473

which has the commutative image S D Y C Y S x4;

Y D kxY C kx:

The solution to this system is

kx : 1 kx kx 5 Once again, the asymptotic behaviour of the coefficients of the power series for S depend on the zeros of 1 kx kx 5 . The smallest (indeed, the only) real root is, asymptotically as k ! 1, given by  X . 1/i 5i 1 1 5 35 i k .4i C1/ D C 9 C : 5 4i C 1 k k k k 13 SD

i >0

The reciprocal of this series is C5 X 4 5ii C1 1 k 1 4i D k C 3 5.5i C 4/ k i >0

4 26 C 11 7 k k

204 1771 C 19 15 k k

 :

For k D 1 the only real root of 1 kx kx 5 is approximately 0:754877666 and for k D 2 it is about 0:4756527435. Thus we have the following result. Theorem 7.1. R1 .n/ D .1:3247n/ and R2 .n/ D .2:102374n/.

7.1. Trie representations for finite languages. We will now improve these lower bounds. To this end, we begin with the simpler problem of counting the number of finite languages that may be specified by regular expressions without Kleene star of size n. Non-empty finite languages not containing " admit a standard representation via a trie structure; an example is given Figure 1(a).

2

0

6

0

6

1

7

1*

7*

3

5

"

8

4

(a) Representing the finite language 01(2+34+5)+67("+8) as a trie

2

3

5

"

8

4*

(b) Representing the infinite language 01*(2+34*+5)+67*("+8) as a starred trie

Figure 1. Example of a trie representation for a finite language (see § 7.1) and of a starred trie representation for an infinite language (se § 7.2)

Hermann Gruber, Jonathan Lee, and Jeffrey Shallit

474

The words in such a language L correspond to the leaf nodes of the trie for L; moreover, the concatenation of labels from the root to a leaf node gives an expression for the word associated with that leaf node. For regular languages L and M , we write M 1 L to denote the left quotient of L by M ; formally M

1

L D ¹vW there exists u 2 M such that uv 2 Lº:

If M consists of a single word w , we also write w 1 L instead of ¹wº 1 L, and w n L instead of .w n / 1 L. For notational convenience, we take our alphabet to be † D ¹a0 ; a1 ; : : : ; ak 1 º, where k > 1 denotes our alphabet size. A trie encodes the simple fact that each nonempty finite language L not containing " can be uniquely decomposed as L D S a L , where Li D ai 1 L, and the index i runs over all symbols ai 2 † such that Li i i i is nonempty. This factoring out of common prefixes resembles Horner’s rule (see, e.g., [28], p. 486) for evaluating polynomials. We develop lower bounds by specifying a context-free grammar that generates regular expressions with common prefixes factored out. In fact, the grammar is designed so that if r is a regular expression generated by the grammar, then the structure of r mimics that of the trie for L.r/; nodes with a single child correspond to concatenations, while nodes with multiple children correspond to concatenations with a union, see Table 1. Table 1. A grammar for mimicking tries with regular expressions S !Y jZ E ! Y j .Z/ j ." C S/ Y

! Pi

for 0 6 i < k

Z ! P n0 C P n1 C    C P n t where 0 6 n0 < n1 <    < n t < k for t > 0 Pi ! ai j ai E

for 0 6 i < k

The set of regular languages represented corresponds to all non-empty finite languages over † not containing the empty string ". We briefly describe the non-terminals, as follows: S generates all non-empty finite languages not containing "; E generates all non-empty finite languages containing at least one word other than "; Y generates all non-empty finite languages (not containing ") whose words all begin with the same letter; The for loop is executed only once; Z generates all non-empty finite languages (not containing ") whose words do not all begin with the same letter; Pi generates all non-empty finite languages (not containing ") whose words all begin with ai .

13. Enumerating regular expressions and their languages

475

We remark that this grammar is unambiguous and that no regular language is represented more than once; this should be clear from the relationship between regular expressions generated by the grammar and their respective tries. (Note that it is possible to slightly optimise this grammar in the case of ordinary length to generate expressions such as 0 C 00 in lieu of 0." C 0/, but as it results in marginal improvements to the lower bound at the cost of greatly complicating the grammar, we do not do so here.) Table 2 lists the lower bounds obtained through this grammar. In this table (and only this table), each .k n / in the column corresponding to reverse polish notation should be interpreted as “not O.k n /” – observe, for instance, that all strings produced by our grammar for a unary alphabet have odd reverse polish length. Table 2. Lower bounds for Rk .n/ with respect to size measure and alphabet size

1

ordinary

reverse polish alphabetic

.1:3247n /

.1:2720n /

.2n /

2 .2:5676n / .2:1532n /

.6:8284n /

3 .3:6130n / .2:7176n /

.11:1961n /

4 .4:6260n / .3:1806n /

.15:5307n /

5 .5:6264n / .3:5834n /

.19:8548n /

6 .6:6215n / .3:9451n /

.24:1740n /

Remark 7.2. Using the singularity analysis method explained in § 6, these lower bounds were obtained by bootstrapping off the trivial bounds of .k n /, .k n=2 / and .k n / for the ordinary, reverse polish length and alphabetic width cases, respectively. Before we generalise our approach to cover also infinite languages, we derive a formula showing how our lower bound on alphabetic width will increase along with the alphabet size k . To this end, we first state a version of the Lagrange implicit function theorem as a simplification of [17], Theorem 1.2.4. If f .x/ is a power series in x , we write Œx n f .x/ to denote the coefficient of x n in f .x/; recall that the characteristic of a ring R with additive identity 0 and multiplicative identity 1 is defined to be the smallest integer k P such that kiD1 1 D 0, or zero if there is no such k .

Lemma 7.3. Let R be a commutative ring of characteristic zero and take ./ 2 RŒŒ such that Œ0  is invertible. Then there exists a unique formal power series w.x/ 2 RŒŒx such that Œx 0 w D 0 and w D x.w/. For n > 1, Œx n w.x/ D

1 n Πn

1

 n ./ :

476

Hermann Gruber, Jonathan Lee, and Jeffrey Shallit

Due to the simplicity of alphabetic width, the problem of enumerating regular languages in this case may be interpreted as doing so for rooted k -ary trees, where each internal node is marked with one of two possible colours. We thus investigate how our lower bound varies with k . More specifically, consider a regular expression r generated by the grammar from the previous section and its associated trie. Colour each node with a child labelled " black and all other nodes white. After deleting all nodes marked ", call the resultant tree T .r/. This operation is reversible, and shows that we may put the expressions of alphabetic width n in correspondence with the k -ary rooted trees with n C 1 vertices where every non-root internal node may assume one of two colours. In order to estimate the latter, we first prove a basic result. The first half of the following lemma is also found in [15], p. 68.  Lemma 7.4. There are n1 nkn1 k -ary trees of n nodes. Moreover, the expected number of leaf nodes among k -ary trees of n nodes is asymptotic to .1 1=k/k n as n ! 1. Proof. Fix k > 1. For n > 1, let an denote the number of k -ary rooted trees with n vertices and consider the generating series X f .x/ D an x n : n>1

By the recursive structure of k -ary trees, we have the recurrence f .x/ D t.1 C f .x//k :

Thus, by the Lagrange implicit function theorem, we have an D Œx n f .x/ D

1 n Πn

1

.1 C /kn D

  1 kn : n n 1

We now calculate the number of leaf nodes among all k -ary rooted trees with n vertices. Let bn;m denote the number of k -ary rooted trees with n vertices and m leaf nodes and cn the number of leaf nodes among all k -ary rooted trees with n vertices. Consider the bivariate generating series X g.x; y/ D bn;m x m y n : n;m>1

By the recursive structure of k -ary trees, we have the recurrence g.x; y/ D y.x

1 C .1 C g.x; y//k / :

The Lagrange implicit function theorem once again yields ˇ @ n ˇ cn D Œy g.x; y/ˇ xD1 @x ˇ @ 1 n 1 ˇ Œ .x 1 C .1 C g.x; y//k /n ˇ D xD1 @x n ˇ 1 n 1 @ ˇ D Œ  .x 1 C .1 C /k /n ˇ xD1 n @x

13. Enumerating regular expressions and their languages

D Œn 1 .1 C /k.n   k.n 1/ D : n 1

477

1/

Thus, the expected number of leaf nodes among n-node trees, for k fixed and as n ! 1, is 1/  k 1 k n k.n cn n 1  n D :  kn an k n 1

We wish to find a bound on the expected number of subsets of non-root internal nodes among all k -ary rooted trees with n nodes, where a subset corresponds to those nodes marked black. Fix k > 2. Since the map x 7! 2x is convex, for every " > 0 and sufficiently large n, Jensen’s inequality (e.g., [51], Theorem 3.3) applied to the lemma above implies the following lower bound on the number of subsets: 2.1

Since .1

.1 1=k/k "/n

:

1=k/ > 1=e for k > 1, we may choose " > 0 such that k

.1

1=k/k

">

1=e:

This yields a lower bound of 2.1 1=e/n .  Assuming k > 2 fixed, we now estimate nkn1 . By Stirling’s formula, we have, as n ! 1,    n  kk kn D‚ : n 1 .k 1/k 1

Putting our two bounds together, we have the following lower bound on the number of star-free regular expressions of alphabetic width n, for fixed k and as n ! 1:  2.1 1=e/ k k n  :  .k 1/k 1

7.2. Trie representations for some infinite regular languages. We now turn our attention to enumerating regular languages in general; that is, we allow for regular expressions with Kleene stars. Our grammars for this section are based on the those for the star-free cases. Due to the difficulty of avoiding specifying duplicate regular languages, we settle for a “small” subset of regular languages. For simplicity, we only consider taking the Kleene star closure of singleton alphabet symbols, and we impose some further restrictions. Recall the trie representation of a star-free regular expression written in our common prefix notation. With this representation, we may mark nodes with stars while satisfying the following conditions:  each starred symbol must have a non-starred parent other than the root;  a starred symbol may not have a sibling or an identically-labelled parent (disregarding the lack of star) with its own sibling; and

Hermann Gruber, Jonathan Lee, and Jeffrey Shallit

478

 a starred symbol may not have an identically-labelled child (disregarding the lack of star).

The first condition eliminates duplicates such as 0 11 0 1 0

the second eliminates those such as 01

! 0." C 11 /

and the third eliminates those such as

0 0

! 0 1 0 11 0 I

and 0.1 C 2 1/

! 02 1

! 00 :

In this manner, we end up with starred tries such as in Figure 1(b). Algorithm 1 illustrates how to recreate such a starred trie from the language it specifies. Algorithm 1 Require: " 62 L, L ¤ ; 1. 2. 3. 4.

Sta r-Tr i e (L)

create a tree T with unlabelled root for all b 2 † such that b 1 L ¤ ; do append Sta r-Tr i e-H elp (a 1 L, a) below the root of T return T

Let T be a starred trie satisfying the conditions above. Then T represents a regular expression, which in turn specifies a certain language. We now show that when the algorithm is run with that language as input, it returns the trie T by arguing that at each step of the algorithm when a particular node (matched with language L if the root and aL otherwise) is being processed, the children are correctly reconstructed. We first consider children of the root. By the original trie construction (for finite languages without "), no such children may be labelled ". Thus, by the first star condition, the only children may be unstarred alphabet symbols. Thus, line 2 of Algorithm 1 suffices to find all children of the root correctly. Now consider a non-root internal node, say labelled a. By the third star condition, a starred node may not have a child labelled with the same alphabet symbol, so if a has a child labelled b  , then .b n /

1

L \ ." C .† n ¹bº/† / is non-empty for all n > 0:

(4)

Conversely, by the second condition, a starred node may not have an identically-labelled parent that has " as a sibling, so if (4) holds, then a must have a child labelled b  . By the second star condition, a starred node may not have siblings, so the algorithm need not check for other children once a starred child is found. This shows that line 3 of Algorithm 2 correctly identifies all starred children of a. Assuming a has a starred child b  , then by the third condition, line 6 of Algorithm 2 correctly recovers all children of b  . All remaining children of a have no stars, and line 7 of Algorithm 2 suffices to find all children labelled with a 2 †; the special case of an "-child below a is covered by line 12.

13. Enumerating regular expressions and their languages

479

Algorithm 2 Sta r-Tr i e-H elp (L; a) 1. create a tree T with root labelled a 1 L ¤ ; do 2. for all b 2 † such that b 3. if .b n L/ \ ." C .† n ¹bº/† / ¤ ; then ¹need a child labelled b  º 4. append a new b  -node below the root of T 5. if L ¤ b  then ¹b  will be an internal nodeº 6. for all c 2 † n ¹bº such that c 1 L ¤ ; do ¹determine children of b  º 7. append Sta r-Tr i e-H elp (c 1 L, c ) below the b  -node 8. if b 2 L then 9. append a new "-node below the b  -node 10. else ¹need a child labelled bº 11. append Sta r-Tr i e-H elp(b 1 L, b ) below the root of T 12. if " 2 L and the root of T has at least one unstarred child then 13. append a new "-node below the root of T 14. return T

We give a grammar that generates expressions meeting these conditions in Table 3. As before, we take our alphabet to be † D ¹a0 ; a1 ; : : : ; ak 1 º. We describe the roles of the non-terminals of the grammar in Table 3. Table 3. A grammar generating all regular expressions meeting all three star conditions

S !Y jZ E ! Y j .Z/ j ." C Y 0 / j ." C Z/

Ei ! Yi j .Zi / j ." C Yi0 / j ." C Zi / Y Y0

! Pi !

Pi0

Yi ! Pj

Yi0

!

Pj0

for 0 6 i < k

for 0 6 i < k for 0 6 i < k for 0 6 i; j < k and i ¤ j

for 0 6 i; j < k and i ¤ j

Z ! Pn0 0 C Pn0 1 C    C Pn0 t where 0 6 n0 < n1 <    < n t < k for t > 0

Zi ! Pn0 0 C Pn0 1 C    C Pn0 t as above, but with nj ¤ i for all 0 6 j 6 t Pi ! ai j ai E j ai aj j ai aj Ej

Pi0

! ai j ai E j

ai aj

j

ai aj Ej

for 0 6 i; j < k for 0 6 i; j < k and i ¤ j

480

Hermann Gruber, Jonathan Lee, and Jeffrey Shallit

 S generates all expressions; this corresponds to Algorithm 1.  E; Ei generate expressions that may be concatenated to non-starred and starred alphabet symbols, respectively. The non-terminal E corresponds to lines 2 and 13 while Ei corresponds to line 5 of Algorithm 2. These act the same as S except for the introduction of parentheses to take precedence into account and restriction that no prefixes of the form "Caa are generated, used to implement the second condition. Additionally, Ei has the restriction that its first alphabet symbol produced may not be ai ; this is used to implement the third condition.  Y; Y 0 ; Yi ; Yi0 generate expressions whose prefix is an alphabet symbol. As a whole, these non-terminals correspond to Algorithm 2, and may be considered degenerate cases of Z and Zi ; that is, trivial unions. The tick-mark signifies that expressions of the form aa for a 2 † are disallowed, used to implement the second condition. The subscripted i signifies that the initial alphabet symbol may not be ai , used to implement the third condition.  Z; Zi generate non-trivial unions of expressions beginning with distinct alphabet symbols; Z corresponds to line 2 of Algorithm 1 and line 10 of Algorithm 2, while Zi corresponds to line 5 of Algorithm 2. The subscripted i signifies that none of initial alphabet symbols may be ai , used to implement the third condition.  Pi ; Pi0 generate expressions beginning with the specified alphabet symbol ai . They correspond to line 1 of Algorithm 2. The tick-mark signifies that expressions may not have the prefix ai ai , used to implement the second condition.

Since the algorithm correctly returns a trie when run on the language represented by the trie, the correspondence between the algorithm and the grammar gives us the following result. Theorem 7.5. The grammar above is unambiguous and the generated regular expressions represent distinct regular languages. Table 4 lists the improved lower bounds for Rk .n/. These lower bounds were obtained via singularity analysis, as explained in § 6, bootstrapping off the bounds in Table 2.

8. Upper bounds on enumeration of regular languages by regular expressions Turning our attention back to upper bounds for Rk .n/, we develop grammars for regular expressions such that every regular language is represented by at least one shortest regular expression generated by the grammar, where a regular expression r of size n is said to be shortest if there is no expression r 0 of size less than n with L.r/ D L.r 0 /.

13. Enumerating regular expressions and their languages

481

Table 4. Improved lower bounds for Rk .n/ with respect to size measure and alphabet size ordinary

reverse polish alphabetic

1 .1:3247n / .1:2720n /

.2n /

2 .2:7799n / .2:2140n /

.7:4140n /

3 .3:9582n / .2:8065n /

.12:5367n /

4 .5:0629n / .3:2860n /

.17:6695n /

5 .6:1319n / .3:6998n /

.22:8082n /

6 .7:1804n / .4:0693n /

.27:9500n /

To this end, we consider certain “normal forms” for regular expressions, with the property that transforming a regular expression into normal form never increases its size. Again, size may refer to one of the various measures introduced before. With such a normal form, it suffices to enumerate all regular expressions in normal form to obtain improved upper bounds on Rk .n/ for various measures. 8.1. A grammar based on normalised regular expressions. We begin with a simple approach, which will be further refined later on. As concatenation and sum are associative, we consider them to be variadic operators taking at least 2 arguments and impose the condition that in a parse tree, neither of them are permitted to have themselves as children. Also, by the commutativity of the sum operator, we impose the condition that the summands of each sum appear in the following order: First come all summands which are terminal symbols, then all summands which are concatenations, and finally all starred summands. Also, we can safely omit all subexpressions of the form s  , s  C ", .s C "/ , s C " C ": occurrences of these can be replaced with occurrences of s  , s  , s  , and s C ", respectively. Here the latter subexpressions have size no larger than the former ones, and this holds for all size measures considered. These observations immediately lend themselves for a simple unambiguous grammar, such as the one listed in Table 5. The meaning of the non-terminals is as follows: S generates all regular expressions obeying the above-mentioned format. Among them, Q generates those expressions of the form r C ", A generates those of the form r C s , i.e., “additions,” T generates those which are terminal symbols, C generates those of the form rs , i.e., concatenations, C0 generates the “factors” appearing inside concatenations (which are themselves not concatenations), and K generates those of the form r  , i.e., Kleene stars;

482

Hermann Gruber, Jonathan Lee, and Jeffrey Shallit Table 5. A simple unambiguous grammar for generating at least one shortest regular expression for each regular language S !QjAjT jC jK Q !AC"jT C"j C C" A ! T C AT j C C AC j K C AK AT

! T j T C AT j AC

AC

! C j C C AC j AK

AK ! K j K C AK T

! a1 j a2 j    j ak

C ! C0 C0 j C0 C C0 ! .Q/ j .A/ j T j K K ! .A/ j T  j .C /

finally, the “summands” in expressions of type A are subdivided into subtypes AT , AC , and AK , used for handling summands which are terminal symbols, concatenations, or Kleene stars, respectively. In the special case of unary alphabets, not only union, but also concatenation (again viewed as a variadic operator) is commutative. In this case, we may impose a similar ordering of factors as done for summands, and thus we can replace the rule with C as left-hand side with the rules given in Table 6. Table 6. Rules for concatenation over unary alphabets, which in that case is commutative C ! .Q/CQ j .A/CA j T CT j KCK CQ ! .Q/ j .Q/CQ j CA CA ! A j .A/CA j CT CT

! T j T CT j CK

CK ! K j KCK

8.2. A grammar based on strong star normal form. We now refine the above approach by considering only regular expressions in strong star normal form [18]; compare Definition 3.1 in the chapter on descriptional complexity of regular languages (Chapter 12). Since ; is only needed to denote the empty set, and the need for " can be substituted by the operator L‹ D L [ ¹"º, an alternative syntax also introduces the ‹ -operator and instead forbids the use of ; and " inside non-atomic expressions. The

13. Enumerating regular expressions and their languages

483

definition of strong star normal form is most conveniently given for this alternative syntax. Definition 8.1. The operators ı and  are defined on regular expressions. The first operator is given by aı D a, for a 2 †; .r C s/ı D r ı C s ı ; r ‹ı D r ı ; r ı D r ı ; finally, .rs/ı D rs , if " … L.rs/ and r ı C s ı otherwise. The second operator is given by a D a, for a 2 †; .r C s/ D r  C s  ; .rs/ D r  s  ; r  D r ı ; finally, r ‹ D r  , if " 2 L.r/ and r ‹ D r ‹ otherwise. The strong star normal form of an expression r is then defined as r  . An easy induction shows that the transformation into strong star normal form preserves the described language, and that it is weakly monotone with respect to all usual size measures. We sketch a proof for the case of ordinary length. Lemma 8.1. Let r be a regular expression without occurrences of the symbol ;, and let r  be its strong star normal form. Then ord.r  / 6 ord.r/. Proof sketch. First of all, we may safely assume that r does not contain a subexpression ruled out by the grammar of the previous section, such as " C "; the transformation into strong star normal form subsumes these reductions anyway. Recall the definition of the auxiliary operator ı in the definition of strong star normal form (Chapter 12, Definition 3.1). The proof relies on the following claim: If " 2 L.r/ and L.r/ ¤ ¹"º, then ord.r ı / 6 ord.r/ 1; otherwise, ord.r ı / 6 ord.r/. This claim can be proved by induction while excluding the cases L.r/ D ;; ¹"º. The base cases are easy; the induction step is most interesting in the case r D st . If " … L.st/, then r ı D st and the claim holds; otherwise r ı D s ı C t ı with " 2 L.s/ and " 2 L.t/. We can apply the induction hypothesis twice to deduce ord.s ı / C ord.t ı / 6 ord.s/ C ord.t/ 2, and thus ord.s ı C t ı / 6 ord.st/ 1, as desired. Notice that, as union has lower precedence than concatenation, this step never introduces new parentheses. The induction step in the other cases is even easier. Since every regular language is represented by at least one shortest regular expression in strong normal form (with respect to all three considered size measures), it suffices to enumerate those expressions in normal form. Our improved grammar will be based on the following simple observation on expressions in strong star normal form: Lemma 8.2. If s  or s C " appears as a subexpression of an expression in star normal form, then " … L.s/.

To exploit this fact, for each subexpression we need to keep track of whether it denotes the empty word. This can of course be done with dynamic programming, by using rules such as " 2 L.rs/ if and only if " 2 L.r/ and " 2 L.s/. Since in addition every subexpression either denotes the empty word or not, it is easy to extend the above grammar to incorporate these rules while retaining the property of being unambiguous. Notice that most non-terminals now come in an "-flavor (for example, the nonterminal AC ) and in an "-free flavor (for example, the non-terminal A ). Moreover, the summands inside sums appear in the following order, which is a refinement of the

Hermann Gruber, Jonathan Lee, and Jeffrey Shallit

484

summand ordering devised previously: First come all summands which are terminal symbols, then all summands which are "-free concatenations, then all concatenations with " in the denoted language, and finally all starred summands. To illustrate this ordering, we give the most important steps of the unique derivation for the expression a1 C a2 a3 C .a4 C "/.a5 C "/ C a6 : S H) A C AC C H)

T

C AT C AC C

H)

a1 C AT C AC C

H)

a1 C C C AC C

H)

a1 C a2 a3 C C C C AC C

H)

C a1 C a2 a3 C .a4 C "/.a5 C "/ C AK

H)

a1 C AC C AC C

H) a1 C a2 a3 C AC C

H) a1 C a2 a3 C .a4 C "/.a5 C "/ C AC C

H)

a1 C a2 a3 C .a4 C "/.a5 C "/ C K C

H) a1 C a2 a3 C .a4 C "/.a5 C "/ C a6 :

The following proposition, giving the correctness of the improved grammar, can be proved by induction on the minimum required regular expression size. Table 7 lists the upper bounds obtained through this grammar. For the case of finite languages, we can simply omit all rules generating starred subexpressions from the improved grammar. Table 9 lists the corresponding upper bounds for finite languages. Table 7. Summary of upper bounds on Rk .n/ for k D 1; 2; : : : ; 6 and various size measures. For ordinary length, we used the simple grammar in Table 5, because the computation for the improved grammar ran out of computational resources. For reverse polish length, we used the simple grammar for bootstrapping the bounds. 1 2 3 4 5 6

ordinary O.2:5946n / O.4:2877n / O.5:4659n / O.6:5918n / O.7:6870n / O.8:7624n /

reverse polish O.2:7422n / O.3:9870n / O.4:7229n / O.5:3384n / O.5:8780n / O.6:3643n /

alphabetic 9 > > > > > > = O .k n  21:5908n / > > > > > > ;

Proposition 8.3. The grammar in Table 8 is unambiguous and, for each regular language, generates at least one regular expression of minimal ordinary length (respectively: reverse polish length, alphabetic width) representing it.

13. Enumerating regular expressions and their languages

485

Table 8. A better unambiguous grammar generating at least one shortest regular expression (in strong star normal form) for each regular language S ! SC j S S C ! QC j AC j C C j K C QC ! A C " j T AC !

AC C C AK

! !

T C AC C j C A C AC j C K C C AK CC KC

j j

C"jC C C AC C C C C AC C

C AC C C K C C AK

CC

j

C AK

C C ! C0C C0C j C0C C C

! .A

/

jT



j .C

!A jT

jC

A

!T

C AT j C

AT

!T

jT

C AT j AC

AC

!C

jC

C AC

T

! a1 j a2 j    j ak

C

! C0 C0 j C0 C0C j C0C C0 j

C" j j

C0 C

C0C ! .QC / j .AC / j K C KC

S

C0

C AC

j C0 C C j C0C C

! .A / j T

/

Table 9. Summary of upper bounds for k D 1; 2; : : : ; 6 and various size measures in the case of finite languages. For reverse polish length, we bootstrapped from the values in Table 7; for ordinary length, we bootstrapped the case k D 2 from the upper bound obtained for k D 3. 1 2 3 4 5 6

ordinary O.2:1793n / O.3:8145n / O.4:9019n / O.5:8234n / O.6:8933n / O.7:9492n /

reverse polish O.2:0795n / O.3:3494n / O.4:0315n / O.4:6121n / O.5:1268n / O.5:5939n /

alphabetic n/ O.10:9822 9 > > > > = O .k n  12:2253n / > > > > ;

9. Exact enumerations Tables 10–15 give exact numbers for the number of regular languages representable by a regular expression of size n, but not by any of size less than n.

486

Hermann Gruber, Jonathan Lee, and Jeffrey Shallit

We explain how these numbers were obtained. Using the upper bound grammars described previously, a dynamic programming approach was taken to produce (in order of increasing regular expression size) the regular expressions generated by each nonterminal. To account for duplicates, each regular expression was transformed into a DFA, minimised and relabelled via a breadth-first search to produce a canonical representation. Using these representations as hashes, any regular expression matching a previous one generated by the same non-terminal was simply ignored.

10. Conclusion and open problems In this chapter, we discussed various approaches to enumerating regular expressions and the languages they represent, and we used algebraic and analytic tools to compute upper and lower bounds for these enumerations. Our upper and lower bounds are not always very close, so an obvious open problem (or class of open problems) is to improve these bounds. Other problems we did not examine here involve enumerating interesting subclasses of regular expressions. For example, in linear expressions, every alphabet symbol occurs exactly once. In addition to the intrinsic interest, enumerating subclasses may provide a strategy for improving the lower bounds for the general case. Table 10. Ordinary length, finite languages k 1 2 3 4 1 3 4 5 6 2 1 4 9 16 3 2 11 33 74 4 3 28 117 336 3 63 391 1474 5 6 5 156 1350 6560 7 5 358 4546 28861 8 888 15753 128720 8 9 9 2194 55053 578033 10 14 5665 196185 2624460

Table 11. Ordinary length, general case k 1 2 3 4 1 3 4 5 6 2 2 6 12 20 3 17 48 102 3 4 4 48 192 520 5 5 134 760 2628 6 9 397 3090 13482 1151 12442 68747 7 12 8 17 3442 51044 354500 9 25 10527 211812 1840433 10 33 32731 891228

13. Enumerating regular expressions and their languages Table 12. Reverse polish length, finite languages k 1 2 3 4 1 3 4 5 6 2 7 15 26 3 5 3 25 85 202 7 5 109 589 1917 9 9 514 4512 20251 14 2641 37477 231152 11 13 24 14354 328718 2780936 15 41 81325 2998039 17 71 475936 19 118 2854145 Table 13. k 1 2 3 4 5 6 7 8 9 10

Reverse polish length, general case 1 2 3 4 3 4 5 6 1 2 3 4 2 7 15 26 2 13 33 62 3 32 106 244 4 90 361 920 6 189 1012 3133 7 580 3859 13529 11 1347 11655 48388 15 3978 43431 208634

Table 14. Alphabetic width, finite languages 1 2 3 4 k 0 2 2 2 2 2 4 6 8 1 2 4 24 60 112 3 8 182 806 2164 1652 13182 51008 4 16 5 32 16854 242070 1346924 6 64 186114 4785115 Table 15. Alphabetic width, general case

k 0 1 2 3 4 5 6

1 2 3 6 14 30 72 155

2 2 6 56 612 7923 114554 1768133

3 2 9 150 3232 82614 2332374

4 2 12 288 9312 357911

487

488

Hermann Gruber, Jonathan Lee, and Jeffrey Shallit

References [1] A. V. Aho, J. E. Hopcroft, and J. D. Ullman, The design and analysis of computer algorithms. Addison-Wesley Series in Computer Science and Information Processing. AddisonWesley, Reading, MA, 1974. MR 0413592 Zbl 0326.68005 q.v. 461 [2] M. Almeida, N. Moreira, and R. Reis, Enumeration and generation with a string automata representation. Theoret. Comput. Sci. 387 (2007), no. 2, 93–102. MR 2362181 Zbl 1143.68031 q.v. 461 [3] F. Bassino, J. David, and C. Nicaud, Enumeration and random generation of possibly incomplete deterministic automata. Pure Math. Appl. (PU.M.A.) 19 (2008), no. 2–3, 1–15. MR 2566168 Zbl 1224.68043 q.v. 461 [4] F. Bassino, J. David, and A. Sportiello, Asymptotic enumeration of minimal automata. In 29th International Symposium on Theoretical Aspects of Computer Science (C. Dürr and T. Wilke, eds.). Proceedings of the symposium (STACS ’12) held in Paris, February 29–March 3, 2012. LIPIcs. Leibniz International Proceedings in Informatics, 14. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2012, 88–99. MR 3005521 Zbl 1245.68118 q.v. 461 [5] F. Bassino and C. Nicaud, Enumeration and random generation of accessible automata. Theoret. Comput. Sci. 381 (2007), no. 1–3, 86–104. MR 2347395 Zbl 1188.68168 q.v. 460 [6] D. Callan, A determinant of Stirling cycle numbers counts unlabeled acyclic single-source automata. Discrete Math. Theor. Comput. Sci. 10 (2008), no. 2, 77–86. MR 2419277 Zbl 1196.05008 q.v. 461 [7] N. Chomsky and M.-P. Schützenberger. The algebraic theory of context-free languages. In Computer programming and formal systems (P. Brattfort and D. Hirschberg, eds.). NorthHolland Publishing Co., Amsterdam, 1963, 118–161. MR 0152391 Zbl 0148.00804 q.v. 463 [8] D. A. Cox, J. Little, and D. O’Shea, Ideals, varieties, and algorithms. An introduction to computational algebraic geometry and commutative algebra. Third edition. Undergraduate Texts in Mathematics. Springer, New York, 2007. MR 2290010 Zbl 1118.13001 q.v. 466 [9] M. Domaratzki, Combinatorial interpretations of a generalization of the Genocchi numbers. J. Integer Seq. 7 (2004), no. 3, Article 04.3.6, 11 pp. https://www.cs.uwaterloo.ca/journals/JIS/VOL7/Domaratzki/doma23.html MR 2110777 Zbl 1092.11010 q.v. 460 [10] M. Domaratzki, Improved bounds on the number of automata accepting finite languages. Internat. J. Found. Comput. Sci. 15 (2004), no. 1, 143–161. Computing and Combinatorics Conference – COCOON ’02. MR 2056728 Zbl 1101.68650 q.v. 460 [11] M. Domaratzki, Enumeration of formal languages. Bull. Eur. Assoc. Theor. Comput. Sci. 89 (2006), 117–133. MR 2267837 Zbl 1169.68466 q.v. 461 [12] M. Domaratzki, D. Kisman, and J. Shallit, On the number of distinct languages accepted by finite automata with n states. J. Autom. Lang. Comb. 7 (2002), no. 4, 469–486. Descriptional complexity of automata, grammars and related structures (Vienna, 2001). MR 1990452 Zbl 1137.68421 q.v. 460 [13] A. Ehrenfeucht and P. Zeiger, Complexity measures for regular expressions. J. Comput. System Sci. 12 (1976), no. 2, 134–146. MR 0418509 Zbl 0329.94024 q.v. 461 [14] K. Ellul, B. Krawetz, J. Shallit, and M.-W. Wang, Regular expressions: new results and open problems. J. Autom. Lang. Comb. 10 (2005), no. 4, 407–437. MR 2376649 Zbl 1143.68434 q.v. 461, 462

13. Enumerating regular expressions and their languages

489

[15] P. Flajolet and R. Sedgewick, Analytic combinatorics. Cambridge University Press, Cambridge, 2009. MR 2483235 Zbl 1165.05001 q.v. 469, 472, 476 [16] S. Ginsburg, An introduction to mathematical machine theory. Addison-Wesley, Reading, MA, 1962. MR 0145693 Zbl 0102.33804 q.v. 460 [17] I. P. Goulden and D. M. Jackson, Combinatorial enumeration. With a foreword by G.-C. Rota. Wiley–Interscience Series in Discrete Mathematics. John Wiley & Sons, New York, 1983. MR 0702512 Zbl 0519.05001 q.v. 475 [18] H. Gruber and S. Gulan, Simplifying regular expressions. A quantitative perspective. In Language and automata theory and applications (A.-H. Dediu, H. Fernau, and C. Martín-Vide, eds.). Proceedings of the 4th International Conference on Language and Automata Theory and Applications. Lecture Notes in Computer Science, 6031. Springer, Berlin, 2010, 285–296. MR 2753917 Zbl 1284.68351 q.v. 482 [19] F. Harary, The number of functional digraphs. Math. Ann. 138 (1959), 203–210. MR 0109130 Zbl 0087.38703 q.v. 460 [20] F. Harary, Unsolved problems in the enumeration of graphs. Magyar Tud. Akad. Mat. Kutató Int. Közl. 5 (1960), 63–95. MR 0146820 Zbl 0095.16902 q.v. 460 [21] F. Harary, Combinatorial problems in graphical enumeration. In Applied combinatorial mathematics (E. Beckenbach, ed.). University of California Engineering and Physical Sciences Extension Series. John Wiley & Sons, New York etc., 1964, 185–217. Zbl 0158.20801 q.v. 460 [22] F. Harary and E. Palmer, Enumeration of finite automata. Information and Control 10 (1967), 499–508. MR 0215652 Zbl 0168.25903 q.v. 460 [23] M. A. Harrison, A census of finite automata. In Proceedings of the 5 th Annual Symposium on Switching Circuit Theory and Logical Design. Institute of Electrical and Electronics Engineers, 1964, 44–46. q.v. 460 [24] M. A. Harrison, A census of finite automata. Canadian J. Math. 17 (1965), 100–113. MR 0170772 Zbl 0156.01703 q.v. 460 [25] R. Hartshorne, Algebraic geometry. Graduate Texts in Mathematics, 52. Springer, Berlin, 1977. MR 0463157 Zbl 0367.14001 q.v. 466 [26] J. E. Hopcroft and J. D. Ullman, Introduction to automata theory, languages, and computation. Addison-Wesley Series in Computer Science. Addison-Wesley, Reading, MA, 1979. MR 0645539 Zbl 0426.68001 q.v. 464 [27] L. Ilie and S. Yu, Follow automata. Inform. and Comput. 186 (2003), no. 1, 140–162. MR 2001743 Zbl 1059.68063 q.v. 461, 462 [28] D. E. Knuth, The art of computer programming. Vol. 2. Seminumerical algorithms. Third edition. Addison-Wesley, Reading, MA, 1998. MR 3077153 Zbl 0895.65001 q.v. 474 [29] A. D. Korshunov, Об асимптотических оценках числа приведенных автоматов (On asymptotic estimates of the number of finite automata). Diskret. Analiz 6 (1966), 35–50. In Russian. MR 0197225 Zbl 0171.27502 q.v. 460 [30] A. D. Korshunov, Asymptotic estimates of the number of finite automata. Kibernetika (Kiev) 1967, no. 2, 12–19. In Russian. English translation, Cybernetics 3 (1967), no. 2, 9–14. MR 0267983 Zbl 0226.94045 q.v. 460 [31] A. D. Korshunov, Обзор некоторых направлений теории автоматов (A survey of certain trends in automata theory). Diskret. Analiz 25 Sintez Shem i Avtomaty (1974), 19–55, 62. In Russian. MR 0368971 Zbl 0306.94034 q.v. 460 [32] A. D. Korshunov, The number of automata and boundedly determined functions. Hereditary properties of automata. Dokl. Akad. Nauk SSSR 221 (1975), no. 6, 1264–1267. In Russian.

490

[33]

[34]

[35] [36]

[37] [38]

[39] [40]

[41] [42] [43]

[44]

[45]

[46]

Hermann Gruber, Jonathan Lee, and Jeffrey Shallit English translation, Soviet Math. Dokl. 16 (1975), 515–518. MR 0386893 Zbl 0325.68024 q.v. 460 A. D. Korshunov, О перечислении конечных автоматов (Enumeration of finite automata). Problemy Kibernet. 34 (1978), 5–82, 272. In Russian. MR 0517814 Zbl 0415.03027 q.v. 460 W. Kuich and A. Salomaa, Semirings, automata, languages. EATCS Monographs on Theoretical Computer Science, 5. Springer, Berlin, 1986. MR 0817983 Zbl 0582.68002 q.v. 465 E. Lebensztayn, On the asymptotic enumeration of accessible automata. Discrete Math. Theor. Comput. Sci. 12 (2010), no. 3, 75–79. MR 2786470 Zbl 1280.68117 q.v. 460 J. Lee and J. Shallit, Enumerating regular expressions and their languages. In Implementation and application of automata (M. Domaratzki, A. Okhotin, K. Salomaa, and S. Yu, eds.). Papers from the 9th International Conference (CIAA 2004) held at Queen’s University, Kingston, ON, July 22–24, 2004. Lecture Notes in Computer Science, 3317. Springer, Berlin, 2005, 2–22. MR 2143391 Zbl 2144483 q.v. 461 E. L. Leiss, Constructing a finite automaton for a given regular expression. SIGACT News 12 (1980), no. 3, 81–87. Zbl 0453.68025 q.v. 461 V. A. Liskovets, The number of connected initial automata. Kibernetika (Kiev) 1969, no. 3, 16–19. In Russian. English translation, Cybernetics 5 (1969), 259–262. MR 0302356 Zbl 0204.32102 q.v. 460 V. A. Liskovets, Exact enumeration of acyclic deterministic automata. Discrete Appl. Math. 154 (2006), no. 3, 537–551. MR 2203203 Zbl 1090.68060 q.v. 461 E. M. Livshits, Асимптотическая формула для числа классов изоморфных автономных автоматов с n состояниями (Asymptotic formula for the number of classes of isomorphic autonomous automata with n states). Ukrain. Mat. Ž. 16 (1964), 245–246. In Russian. MR 0164848 Zbl 0124.25005 q.v. 460 J. C. Martin, Introduction to languages and the theory of computation. Third edition. McGraw-Hill, New York, N.Y., 2003. q.v. 463 R. McNaughton and H. Yamada, Regular expressions and state graphs for automata. IRE Trans. Electronic Computers 9 (1960), 39–47. Zbl 0156.25501 q.v. 461 C. Nicaud, Average state complexity of operations on unary automata. In Mathematical foundations of computer science 1999. (M. Kutylowski, L. Pacholski, and T. Wierzbicki, eds.). Proceedings of the 24th International Symposium (MFCS ’99) held in Szklarska Poręba, September 6–10, 1999. Lecture Notes in Computer Science, 1672. Springer, Berlin, 1999, 231–240. MR 1731238 Zbl 0955.68068 q.v. 460 C. Nicaud, On the average size of Glushkov’s automata. In Language and automata theory and applications (A. H. Dediu, A. M. Ionescu, and C. Martín-Vide, eds.). Third International Conference on Language and Automata Theory and Applications. LATA 2009. Held in Tarragona, Spain, April 2–8, 2009. Lecture Notes in Computer Science, 5457. Springer, Berlin, 2009, 626–637. MR 2544451 Zbl 1234.68232 q.v. 463 A. Panholzer, Gröbner bases and the defining polynomial of a context-free grammar generating function. J. Autom. Lang. Comb. 10 (2005), no. 1, 79–97. MR 2192586 Zbl 1087.68046 q.v. 465 C. Pomerance, J. M. Robson, and J. Shallit, Automaticity. II. Descriptional complexity in the unary case. Theoret. Comput. Sci. 180 (1997), no. 1–2, 181–201. MR 1453865 Zbl 0959.11015 q.v. 460

13. Enumerating regular expressions and their languages

491

[47] C. E. Radke, Enumeration of strongly connected sequential machines. Information and Control 8 (1965), 377–389. MR 0180461 Zbl 0127.01102 q.v. 460 [48] D. Raymond and D. Wood, Grail: a CCC library for automata and expressions. J. Symbolic Comput. 17 (1994), 341–350. Zbl 0942.68803 q.v. 463 [49] R. C. Read, A note on the number of functional digraphs. Math. Ann. 143 (1961), 109–110. MR 0120162 Zbl 0096.38201 q.v. 460 [50] R. W. Robinson, Counting strongly connected finite automata. In Graph theory with applications to algorithms and computer science (Y. Alavi, G. Chartrand, L. Lesniak, D. R. Lick, and C. E. Wall, eds.). Proceedings of the 5th international conference. Held at Western Michigan University, Kalamazoo, MI, June 4–8, 1984. John Wiley & Sons, New York, 1985, 671–685. MR 0812700 Zbl 0572.68042 q.v. 460 [51] W. Rudin, Real and complex analysis. McGraw-Hill Series in Higher Mathematics. McGraw-Hill, New York etc., 1966. MR 0210528 Zbl 0142.01701 q.v. 477 [52] J. Shallit and Y. Breitbart, Automaticity. I. Properties of a measure of descriptional complexity. J. Comput. System Sci. 53 (1996), no. 1, 10–25. MR 1409007 Zbl 0859.68059 q.v. 460 [53] R. P. Stanley, Enumerative combinatorics. Vol. 2. With a foreword by G.-C. Rota and Appendix 1 by S. Fomin. Cambridge Studies in Advanced Mathematics, 62. Cambridge University Press, Cambridge, 1999. MR 1676282 Zbl 0928.05001 q.v. 464 [54] V. A. Vyssotsky, A counting problem for finite automata. Technical report. Bell Telephone Laboratories, 1959. q.v. 460 [55] H. Wilf, Generatingfunctionology. Third edition. A. K. Peters, Wellesley, MA, 2006. MR 2172781 Zbl 1092.05001 q.v. 464 [56] D. Ziadi, Regular expression for a language without empty word. Theoret. Comput. Sci. 163 (1996), no. 1–2, 309–315. MR 1407031 Zbl 0878.68080 q.v. 461

Chapter 14

Circuit complexity of regular languages Michal Koucký

Contents 1. 2. 3. 4. 5. 6. 7.

Introduction . . . . . . . . . . . . . . Circuits . . . . . . . . . . . . . . . . Syntactic monoid . . . . . . . . . . . Regular expressions . . . . . . . . . . Circuit complexity of regular languages Circuit size of regular languages . . . . Final remarks . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

493 495 498 502 506 514 521

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

521

1. Introduction Finite automata and Boolean circuits are two of the most intuitive models that have been formalised to analyze computations. Traditionally an automaton A over the alphabet A consists of a finite set of states Q that are interconnected by transitions (edges) labelled by symbols from the alphabet (see Figure 1). The automaton works by reading its input symbol-by-symbol and following the transitions that are labelled by the symbols it reads. The computation starts in a designated initial state and ends when the automaton processes the last symbol. If the computation finishes in one of the designated final states F then we say that the automaton accepts this input otherwise it rejects the input. The language recognised by the automaton is the set of accepted inputs. Each finite automaton can be associated with its syntactic monoid, which is the set of those functions Q ! Q that one obtains by taking the closure (under composition) of the set of transition functions fa for each symbol a in the alphabet A. One can classify regular languages by the algebraic properties of corresponding syntactic monoids, which leads to various regular language classes that are relevant for this chapter. However we will do our best to minimise the amount of algebra that is needed by the reader. Boolean circuits are a model of computation that captures the essence of operation of actual hardware circuits. A Boolean circuit can also be represented by a directed graph, the graph must be acyclic and the nodes are referred to as gates. Gates with no incoming edges, that is gates with in-degree zero, are called input gates and each of them is associated with one input position. Each gate with non-zero in-degree is labelled

Michal Koucký

494

by some Boolean function such as A n d, O r, No t, and their incoming edges are given a specific order. The computation of the circuit on a given input starts by assigning the value of the appropriate input symbol to each input gate. The value of each input gate is then propagated along each of its out-going edges. Once a gate receives some value along each of its incoming edges it determines its value by applying the associated function on the incoming values. Then it propagates its value along all its out-going edges. Since the graph is acyclic this process eventually stops and the output is given by the value of a designated output gate.

b; c

c a

1

2 b a a; b; c

3

x1

x2

Figure 1. A finite automaton for A ac  aA , and a Boolean circuit testing x1 6 x2

There are several major differences between the two models. A rather technical difference is that finite automata operate over arbitrary alphabets whereas Boolean circuits assume the input alphabet to be ¹0; 1º. This is a rather insignificant difference as any alphabet can be encoded in binary or alternatively one can allow the circuit to use generalised input gates that test whether a particular input position contains a prescribed symbol. The really substantial difference between the two models is that a finite automaton can process inputs of any length whereas a Boolean circuit is designed for a particular input length. Thus in the case of circuits one needs to consider a family of circuits ¹Cn ºn>1 where each Cn processes inputs of length n. The language recognised by the family consists of all the words on which the appropriate circuit outputs one. A priori there is no restriction on the complexity of the family so there are uncountably many circuit families and indeed every language over the binary alphabet can be recognised by at least one circuit family. To limit the computational power of circuit families one has to a) limit the size and the structure of each circuit Cn or b) require that the circuit family itself has an algorithmic description: in other words, that there is an algorithm that on input n encoded in unary outputs a description of the circuit Cn . The structural and size restrictions of circuit families lead to various circuit classes and in this chapter we will

14. Circuit complexity of regular languages

495

relate some of the well known circuit families to various classical classes of regular languages. The restriction on the efficient computability of the circuit family leads to so called uniform circuit families. (A computational model is considered to be uniform if each of its devices (or device families) that operates on inputs of every size has a finite description.) Another aspect of automata and circuits is worth mentioning. Finite automata seems to be an inherently sequential computational device. The input is processed symbol by symbol and the time needed to complete the computation is proportional to the length of the input. On the other hand Boolean circuits represent inherently parallel computational devices. Multiple gates can be evaluated independently at the same time and the computation finishes in time that is proportional to the length of the longest path in the circuit. This length might be only weakly related to the input length. Indeed, Boolean circuits serve as a standard model for parallel computation. Hence relating regular languages to languages recognised by Boolean circuits sheds light on the possibility of parallelisation of regular language recognition.

2. Circuits In this section we review basic notation and facts about Boolean circuits. One can view any language as a function f W A ! ¹0; 1º that for each word x 2 A determines whether it is in the language (f .x/ D 1) or it is not in the language (f .x/ D 0). We will study circuits, which are machines that work with a fixed number of inputs. It will thus be convenient and necessary to view each function f W A ! ¹0; 1º n as a family of functions ¹fn º1 nD0 , where each function fn W A ! ¹0; 1º is a restriction of f to the inputs of length n. Similar views can be applied to any function f from A to arbitrary range. Circuits. Let B be a collection of functions from ¹0; 1º to ¹0; 1º. When dealing with circuits one usually considers inputs over binary alphabet A D ¹0; 1º so we consider this case first. A circuit over base B with input length n is a directed acyclic multi-graph in which each node of in-degree zero is labelled by one of the input bits x1 ; : : : ; xn , and each node v of non-zero in-degree is labelled by a function gv 2 B . One may also allow some nodes of in-degree zero that are labelled by constants 0 or 1. (The indegree of a node is the number of its incoming edges and the out-degree is the number of its out-going edges.) There might be multiple edges between the same nodes, all the edges incoming to a node v must be given some fixed ordering, and the number of these edges must correspond to the arity of gv . One node of the circuit is designated as the output node. Nodes of a circuit are often called gates. On input x 2 ¹0; 1ºn the circuit is evaluated by assigning values to gates and edges as follows: each edge of the circuit receives the value of its starting node, each input gate labelled by an input bit xi is given the value of xi , nodes labelled by constants take the value of their constants, and each node v labelled by a basis function gv is given the value of gv applied on the

496

Michal Koucký

word formed by the values of edges incoming to v concatenated in the given order. The value of the circuit is the value of the output gate. Beside considering circuits that compute a mapping ¹0; 1ºn ! ¹0; 1º one can also consider circuits computing a mapping ¹0; 1ºn ! ¹0; 1ºm by designating multiple gates of the circuits as the output gates, and fixing their order in some way. There are several ways how to extend the definition of circuits to a non-binary input alphabet A. Our preference is to allow input gates of the form Œxi 2 D, for i 2 ¹1; : : : ; nº and D  A, where the gate will evaluate to 1 if the input symbol xi belongs to D , and it will evaluate to 0 otherwise. We call such gates generalised input gates. Another possibility is to fix an injective mapping from A to ¹0; 1ºk , for some k > 1, and encode each symbol of the input from An by this mapping for the purpose of processing it by a circuit. Both these extensions are essentially equivalent, in different settings one might be advantageous over the other one. Uniformity. As seen from the description of the circuit, each circuit computes a function of a fixed input length. To compute a function f W A ! A one has to consider a family of circuits ¹Cn º1 nD0 where circuit Cn computes fn . A priori we impose no limit on how hard it is to construct a particular circuit Cn . This model is thus not limited to so-called computable functions. To recover the classical notion of decidability, one can put a restriction on the constructibility of each circuit Cn . Such circuit families are called uniform. For example when there is a polynomial time algorithm that on input 0n outputs the natural description of Cn we say that the circuit family ¹Cn º1 nD0 is polynomial-time uniform. As we will see later polynomialtime uniform circuit families over the standard basis B2 D ¹No t; A n d2 ; O r2 º compute exactly the functions computable in polynomial time. There are other more stringent notions of uniformity such a DLOGTIME-uniformity which requires that on input n and the indices of two gates from Cn given in binary, one outputs in time O.log n/ the labels of the two gates and whether they are connected. More information on uniformity of circuits can be found in [18], [4], and [30]. Measure. There are several measures that capture various aspects of circuit computation. Circuit size means either the total number of nodes or the total number of edges in the circuit. Depending on context one may be a more sensible measure than the other one but typically they are closely related. In § 6.1 we will see the difference between the two measures. The size of the circuit is related to the time it takes to evaluate a circuit by usual sequential algorithms as one has to evaluate each gate one by one. Another important parameter is the circuit depth, the maximum length of any path in the circuit. The depth of the circuit corresponds to the time it takes to evaluate a circuit in parallel models of computation. Here one assumes that the value of each gate can be computed in constant number of steps once the values of all its input nodes are known. This assumption may or may not be reasonable depending on the choice of basis B and the model of parallel computation. In the particular case of bases that contain only fixed arity functions such as B2 this is a perfectly legitimate assumption.

14. Circuit complexity of regular languages

497

Another measure people sometimes consider is the circuit width. In order to determine the width of the circuit one needs to layer the circuit so that each layer consists of nodes that are connected only to nodes in adjacent layers. The width of the circuit is then the size of the largest layer. This parameter upper bounds the space needed to evaluate a circuit. For a circuit family C D ¹Cn º1 nD0 , we say that C has polynomial size if for some polynomial p.n/ and all n D 0; 1; : : : , the size of Cn is bounded by p.n/. Similar expressions are used in connection with other circuit measures and growth functions, and they are interpreted similarly. We will have in this chapter special interest in circuits of polynomial size and constant depth. Basis functions. The basis functions that are usually considered when talking about circuits are the functions No t, A n d and O r. No t.x/ D 1 x is a unary function, whereas A n d; O rW ¹0; 1º ! ¹0; 1º have arbitrary arity and are defined by A n d.x/ D 1 if and only if x 2 ¹1º, and O r.x/ D 0 if and only if x 2 ¹0º . The restriction of A n d and O r to binary will be written A n d2 and O r2 . The functions M o dm and M a jW ¹0; 1º ! ¹0; 1º also often appear in the context of constant depth circuits. For integer m > 2, we define the M o dm function by M o dm .x/ D 1 if and only if the number of ones in x is divisible by m. The function M a j is defined by M a j.x/ D 1 if and only if the number of ones in x is larger than the number of zeros. We say that a collection B of functions from ¹0; 1º to ¹0; 1º is a complete basis if there are circuits over B that compute A n d2 and No t. The standard basis B2 D ¹No t; A n d2 ; O r2 º is a complete basis. Circuit classes. Given the above definitions we define the following circuit classes that are of interest to us. First, we define several classes of functions that are computable by circuits of constant depth. For these classes to be interesting one has to allow basis functions of arbitrary arity.  AC0 is the class of functions computable by circuit families of polynomial size and constant depth over the standard basis ¹No t; A n d; O rº.

 CC0 Œm is the class of functions computable by circuit families of polynomial size and constant S depth over the basis ¹M o dm º. 0  CC0 is the union 1 m>2 CC Œm.

 AC0 Œm is the class of functions computable by circuit families of polynomial size and constant depth over the basis ¹No t; A n d; O r, M o dm º. S 0  ACC0 is the union 1 m>2 AC Œm.  TC0 is the class of functions computable by circuit families of polynomial size and constant depth over the basis ¹No t; M a jº.

Once the depth of circuits in a circuit family grows at least logarithmically it is meaningful to consider circuit families over bases with bounded arity functions. The following are the most interesting classes in that context.

498

Michal Koucký

 NCk , k > 1, is the class of functions computable by circuit families of depth O..log n/k / over the standard binary basis ¹No t; A n d2 ; O r2 º. S k  NC is the union 1 k>1 NC .

The known relationships among the various circuit classes that are not immediate from their definitions are CC0 Œm 6 AC0 , see [10], [1], and [11] and ACC0  TC0  NC1 , see for example [26] and [30]. The circuit classes that we have defined correspond to functions that can be efficiently computed in parallel. The class NC captures parallel computation running in poly-logarithmic time whereas classes like AC0 capture parallel computation running in constant time on appropriately defined parallel machines, see for example [30]. The class NC contains a large collection of natural problems including finding a perfect matching in a graph and computing the determinant of a matrix. The class AC0 is much more restricted but contains functions like adding or comparing two n-bit integers given in binary. The intermediate class TC0 captures various types of neural networks and contains for example division and multiplication of integers given in binary. Since we did not put any uniformity restriction on the circuit families in the above definitions the circuit classes contain non-computable functions. One can add for example a polynomial-time uniformity requirement on the circuit families in the definition of AC0 to obtain the class polynomial-time uniform AC0 . Similarly for the other classes. The polynomial-time uniform versions of all the above circuit classes are included in P, the class of functions computable in polynomial time on Turing machines. The class P can also be characterised as functions computable by polynomial-time uniform circuit families of polynomial size over the standard basis ¹No t; A n d2 ; O r2 º. It is a major open problem whether P actually coincides with some of the above classes. This represents a formalisation of the question whether efficient computation can be parallelised. As we will see later recognition of all regular languages can be efficiently parallelised.

3. Syntactic monoid A concept that turns out to be critical for understanding the circuit complexity of regular languages is their syntactic monoid. The syntactic monoid can be defined purely combinatorially but its algebraic properties predetermine the complexity of the language. An in-depth discussion of syntactic monoids is presented in Chapter 1, here we will recall only key facts that are relevant to this chapter. A monoid M is a set together with a binary operation  that is associative and for which we have an identity element 1M 2 M satisfying a  1M D 1M  a D a, for any a 2 M . An example of a monoid is the set A , for an alphabet A, together with the operation concatenation of words. Such a monoid is called a free monoid. Except for free monoids we will consider only finite monoids in this chapter, and we will usually subscript their identity element with the monoid to distinguish them from the binary 0

14. Circuit complexity of regular languages

499

and 1. For monoids M; N , a function W N ! M is a morphism if for all u; v 2 N , .uv/ D .u/.v/. The syntactic monoid of a regular language has several equivalent definitions. We define the syntactic monoid by looking at automata recognising the language. For a deterministic automaton A D .Q; A; I; E; F / over an alphabet A we let fa W Q ! Q be the mapping of its states induced by reading a symbol a 2 A that is fa .p/ D q if .p; a; q/ 2 E . Inductively one can extend the definition to all words in A : f" .p/ D p and fwa D fa .fw .p//, for all p 2 Q, w 2 A and a 2 A. Thus, fwa D fa ı fw . The transformation monoid of automaton A is the set M.A/ D ¹fw j w 2 A º together with the operation function composition. Thus the transformation monoid contains all mappings of states of A induced by some finite word over its alphabet. For a regular language L its syntactic monoid Synt.L/ is the transformation monoid of the minimal automaton that recognises L. The syntactic morphism of L is the morphism L W A ! Synt.L/. One can define the syntactic monoid also by looking at which languages are recognised by a monoid. We say that L  A can be recognised by M if there exists a morphism W A ! M and a subset F  M so that L D  1 .F /. For every language L there is a minimal monoid Synt.L/ that recognises L, which is the syntactic monoid of L, and the associated morphism L W A ! Synt.L/ is the syntactic morphism of L. The syntactic monoid of L is given uniquely up-to an isomorphism. It can be shown that the syntactic monoid of L divides the transformation monoid of any finite automaton A recognising L. That is, Synt.L/ is the image of a morphism of some submonoid of M.A/. (See Chapter 1 for more details.) Basic classification of monoids. Recall that a group G is a monoid where for each element a 2 G we have the inverse element a 1 2 G satisfying a  a 1 D 1G D a 1  a. The commutator subgroup of G is the subgroup generated by elements a  b  a 1  b 1 , where a; b 2 G . A group is solvable if repeated taking of the commutator subgroup gives the trivial group ¹1G º. A group is non-solvable if it is not solvable. A non-solvable group necessarily contains a group whose commutator subgroup is the subgroup itself. We say that a monoid is aperiodic if none of its subsets forms a subgroup, it is solvable if each of its subsets that forms a group forms a solvable group, and it is non-solvable if it contains a non-solvable group. For a prime p , a finite group is a p -group if its size is a power of p . Word problems over monoids. Given a finite monoid M one can view it as an alphabet, and consider languages over M . For a 2 M we define the a-word problem to be the set of words a1 a2    an over M that multiply out to a: that is, a D a1  a2    an . The empty word multiplies out to the identity element 1M . Clearly, any a-word problem is a regular language as one can easily calculate the product during one pass over the word using a fixed amount of memory. The product over M is the function …W M  ! M defined by ….a1 a2    an / D a1  a2    an . The prefix-product over M is the function …p W M  ! M  defined by …p .a1 a2    an / D p1 p2    pn , where pi D a1  a2    ai , for i D 1; : : : ; n. Thus, the prefix-product gives all the partial

500

Michal Koucký

products of initial segments of a word. Similarly, one can define the suffix-product …s W M  ! M  by …s .a1 a2    an / D s1 s2    sn , where si D ai  ai C1    an , for i D 1; : : : ; n. Computing the product over the syntactic monoid of a regular language L plays an important role in understanding the circuit complexity of recognising L. We will show now that efficient circuits for deciding membership in L can be turned into efficient circuits for computing the syntactic morphism L ; that is, a circuit mapping w 2 A to L .w/. Under certain circumstances one is even able to construct circuits for computing the product over the syntactic monoid Synt.L/. In the other direction one can trivially convert a circuit computing the product over Synt.L/ into a circuit computing membership in L. The following proposition captures all these conversions. Proposition 3.1. Let B be a complete basis. Let M be a monoid. 1. If a regular language L is computable by a circuit family over B of size s.n/ and depth d.n/ and for some k > 0, M  L .Ak / then the product over M is computable by a circuit family of size s.O.n// C O.n/ and depth d.O.n// C O.1/. 2. If a regular language L is computable by a circuit family over B of size s.n/ and depth d.n/ then the syntactic morphism L , that is the mapping of w 2 A to L .w/, is computable by a circuit family of size s.O.n// C O.n/ and depth d.O.n// C O.1/. 3. If the product over M is computable by a circuit family over B of size s.n/ and depth d.n/ then any regular language with the syntactic monoid M is computable by a circuit family of size s.n/ C O.n/ and depth d.n/ C O.1/. Note, the first two parts are closely related and have almost identical proofs but there is a subtle difference between them. The latter part essentially says that we can evaluate the product L .w1 /  L .w2 /    L .wn / for any w1    wn 2 An however, this does not imply that we could compute the product of an arbitrary sequence of elements from Synt.L/. Indeed, we will see later that there are regular languages in AC0 such that the product over their syntactic monoids is not in AC0 . The reader may want to skip the proof of Proposition 3.1 during his first reading. Proof. Consider the first claim. For every g 2 M , fix a word ug 2 Ak such that L .ug / D g . We know that M  Synt.L/. By the properties of syntactic monoids we know that for any g; g0 2 Synt.L/, g 6D g0 , there are v; w 2 A such that either vug w 2 L and vug 0 w 62 L or vug w 62 L and vug 0 w 2 L. Thus for ` D jM j2 , pick v1 ; v2 ; : : : ; v` ; w1 ; : : : ; w` 2 A such that for any u; u0 2 A , L .u/ D L .u0 / if and only if for each i D 1; : : : ; `, either both vi uwi and vi u0 wi are in L or neither of them is. Hence, for any g1 ; g2 ; : : : ; gn ; g 2 M , g1  g2    gn D g if and only if L .ug1 ug2    ugn / D L .ug / if and only if for each i D 1; : : : ; `, either both vi ug1 ug2    ugn wi and vi ug wi are in L or neither of them is. Hence, we can determine whether g1  g2    gn D g by comparing the membership of vi ug1 ug2    ugn wi in L

14. Circuit complexity of regular languages

501

with the membership of vi ug wi in L for each i D 1; : : : ; `. There are at most jM j possibilities for the product of g1  g2    gn so we can try them all in parallel. The actual circuit computing the product of n elements from M works as follows. For simplicity we assume that it takes its input encoded in binary. It first transforms the binary encoding of the i -th input element gi into the binary encoding of ugi by a circuit of constant size depending only on Synt.L/, for i D 1; : : : ; n. (This can be done using the fact that B can compute A n d2 and No t, and any function can be expressed using such gates.) Then for i D 1; : : : ; `, it forms words vi ug1 ug2    ugn wi . (The binary encoding of words vi and wi is hard-wired into the circuit.) Each of the words is fed in parallel into a copy of a circuit deciding L on inputs of length jvi j C nk C jwi j. From the answer of these circuits we determine the unique answer by a constant size circuit. We need n constant size circuits to process the input, then ` circuits of size s.O.n// to decide membership in L and a constant size circuit to process their outputs. Since ` depends only on the size of M which is fixed, the circuit is of appropriate size and depth. The second claim is proven in almost identical way. We just omit the circuits transforming gi into ugi as we can directly view the input word w D w1 w2    wn as a sequence of n words w1 ; w2 ; : : : ; wn representing corresponding elements from M . The proof of the third claim is similar. We build a circuit deciding L on words of length n as follows. Convert the binary encoding of each i -th input symbol ai into the binary encoding of L .ai / by a constant size circuit depending only on Synt.L/. Then feed these encodings into a circuit computing the product of n elements from Synt.L/. The circuit outputs the binary encoding of the product of the elements. Using a constant size circuit decide whether the product belongs to the accepting set for L. Non-uniform automaton. We have seen that a circuit decides a language on inputs of a certain fixed length and that there might be little or no relationship among circuits deciding the language on various input lengths. On the other hand, finite automata are very uniform, they decide a language on all input lengths. Motivated by circuits, Barrington and Thérien [2] and [5] define non-uniform finite automata (NUFA). Let A be an alphabet. A non-uniform finite automaton on monoid M for inputs of length n is a sequence of instructions .i1 ; f1 /; .i2 ; f2 /; : : : ; .im ; fm / where each instruction is from ¹1; : : : ; nº  .A ! M / and specifies an input position and a mapping from the alphabet to the monoid. The sequence of instructions is called a program. A subset F  M is designed as the accepting set of the automaton. On input a1 a2    an 2 A the program is interpreted by taking the instructions one by one, where the instruction .ij ; fj / causes the automaton to output fj .aij /. The automaton accepts if the product of the elements f1 .ai1 /  f2 .ai2 /    fm .aim / is in F . Similarly to the case of circuits one considers infinite families of non-uniform finite automata ¹Pn ºn>1 , one for each input length. We say that the non-uniform finite automata are of polynomial size if the number of instructions in Pn is bounded by some fixed polynomial p.n/. The notion of non-uniform finite automata can easily be seen to be equivalent to the notion of branching programs and binary decision diagrams [2] and [31]. However, as we shall see non-uniform automata allow for an elegant characterisation of circuit classes.

502

Michal Koucký

4. Regular expressions It is a well known fact that all regular languages can be described by regular expressions using certain elementary language operations. This fact is known as Kleene’s theorem (see Chapter 1). It is quite remarkable that regular languages corresponding to various circuit classes can also be characterised by certain types of regular expressions. We will define now the regular operations that we will need for our purposes. For a word w 2 A , a1 ; a2 ; : : : ; ak 2 A and L0 ; L1 ; : : : ; Lk  A , we let .w=L0 a1 L1    ak Lk / be the number of ways w can be written as u0 a1 u1    ak uk , where ui 2 L1 for i D 0; : : : ; k . Definition 4.1. 1. Boolean operations. We consider the following operations L1 ! A n L1 ;

L1 ; L2 ! L1 [ L2 ; L1 ; L2 ! L1 \ L2 ;

where the first one gives the complement of the language and the latter two give union and intersection of the languages, respectively. 2. Star operation. The star operation L1 ! L1

gives all the finite concatenations of words from L1 . 3. Concatenation operation. We consider the following variant of concatenation L1 ; L2 ! L1 aL2

where a 2 A. This operation gives all the possible concatenations of words from L1 and L2 joined by a. Notice, it is the language ¹w 2 A j .w=L1 aL2 / > 0º. One could use the more familiar L1 L2 concatenation operation but L1 aL2 allows crisper formulation of various results. 4. modp -Concatenation operation. This is a less known operation that was defined by Straubing [22] and which is denoted by L1 ; L2 ! .L1 ; a; L2 ; r; p/

where a 2 A and 0 6 r < p are integers. It gives the set of words ¹w 2 A j .w=L1 aL2 /  r .mod p/º. It is an easy exercise to verify that all the operations above preserve regularity. When building a regular expression we will use the empty language ; as an atomic expression as well as the following languages defined for integers q > 2 and t > 0:

LENGTH .q/ D ¹w 2 A j jwj  0 .mod q/º; THRESH .t/ D ¹w 2 A j jwj > tº:

14. Circuit complexity of regular languages

503

A companion language to LENGTH .q/ is the following language that will be of concern to us as well: jwj ˇ X ° ±  ˇ SUM .q/ D w 2 ¹0; 1º ˇ wi  0 .mod q/ : i D1

A regular expression is a well formed expression consisting of atomic languages and the above regular operations. Since A is the complement of the empty set we may write A S instead of .An ;/ in regular expressions. An example of a regular expression    which describes the language consisting of the empty word ". is A n a2A A aA An example of a class of regular languages is the class of star-free languages which is the class of languages that can be obtained using Boolean operations and concatenation using the empty set as the atomic expression. For a class of languages L over the same alphabet A we denote by Bool.L/ the class of languages where each language is either from L or can be obtained from languages in L by applying the Boolean operations A n, [, and \ finite number of times. We state a proposition that relates the description of languages by regular expressions to their circuit complexity. Proposition 4.1. Let B be a collection of functions from ¹0; 1º to ¹0; 1º. Let s.n/; d.n/W N ! N be two non-decreasing functions. Let ¹C1;n ºn>0 be a family of circuits over B of size s.n/ and depth d.n/ that decides a language L1  A . Similarly, let ¹C2;n ºn>0 be a family of circuits over B of size s.n/ and depth d.n/ that decides a language L2  A .

1. Assume B contains A n d2 ; O r2 and N o t. If a language L D L1 [ L2 , L D L1 \ L2 or L D A n L1 , then L is computable by a family of circuits over B of size 2s.n/ C 1 and depth d.n/ C 1. 2. Assume B contains A n d; O r and N o t. If a language L D L1 aL2 , for some a 2 A, then L is computable by a family of circuits over B of size O.n  s.n// and depth d.n/ C O.1/. 3. Let p > 1, and assume that B contains M o dp . If a language L D .L1 ; a; L2 ; r; p/, for some a 2 A and 0 6 r < p , then L is computable by a family of circuits over B of size O.n  s.n// and depth d.n/ C O.1/.

Proof. The first claim is essentially trivial as either we apply A n d2 or O r2 on the outputs of C1;n and C2;n or we apply No t on the output of C1;n to obtain a circuit Cn deciding L on inputs of length n. We sketch the proof of the second claim. For clarity let us assume that circuits use generalised input gates to access their input. We build a circuit deciding L on inputs of length n as follows (see Figure 2). Let w1 ; w2 ; : : : ; wn be the n input symbols to the circuit we want to build. For j D 0; : : : ; n 1, feed in parallel w1    wj into a copy of C1;j . Similarly, for j D 2; : : : ; n C 1, feed wj wj C1    wn into C2;n j C1 . For j D 1; : : : ; n, test in parallel by generalised input gates whether wj represents a. For j D 1; : : : ; n, feed the output of C1;j 1 ; C2;n j and the gate testing wj D a into an independent A n d gate. Feed the output of all the A n d gates into a

Michal Koucký

504

single O r gate. This gives a circuit for L. Clearly, it is of size at most 2n  s.n/ C O.n/ and depth d.n/ C O.1/. The circuit for .L1 ; a; L2 ; r; p/ looks very much the same as the circuit for L1 aL2 but the top O r gate is replaced by a M o dp gate which additionally receives p r inputs from constant 1 gates. If the basis B does not contain A n d3 we can simulate it using a constant size circuit built from M o dp gates.

C1;0

w1 D a

C2;n w2

1

wn

C1;j w1

wj D a

1

wj

1

C2;j wj C1

1

wn

Figure 2. A circuit for L1 aL2

Observe that the empty language and languages LENGTH.q/, q > 1, are all recognised by constant size families of circuits. Thus the above proposition immediately implies the following statement that provides circuits for certain classes of regular languages. It can be proved by induction on the structure of regular expressions. Corollary 4.2. Let L be a regular language and p > 1 be an integer. 1. If L can be described by a regular expression built using Boolean operations and concatenation from atomic languages ; and LENGTH .q/, q > 1, then L 2 AC0 . 2. If L can be described by a regular expression built using Boolean operations, concatenation and modp -concatenation from atomic languages ; and LENGTH .q/, q > 1, then L 2 AC0 Œp.

4.1. Neutral letter. It will be useful to consider languages with a neutral letter, as computing the product over the syntactic monoid of such a language has precisely the same circuit complexity as deciding the language itself. In particular, one can apply the first part of Proposition 3.1 to it. We say that a language L has a neutral letter e 2 A if for every word w 2 A , w 2 L if and only if the word obtained from w by removing all occurrences of e is in L. For example, the language SUM .q/, q > 1, has a neutral

14. Circuit complexity of regular languages

505

letter 0 whereas the language LENGTH .q/, q > 1, does not have a neutral letter. We can state the following proposition. Proposition 4.3. If a regular language L  A has a neutral letter then there exists k > 0 such that the image of Ak under the syntactic morphism L .Ak / equals the syntactic monoid Synt.L/. The proposition is immediate as every element of Synt.L/ is the image of some word w from A under L . For each element of Synt.L/ fix one such a word. Inserting the neutral letter into any word does not change the image of the word under L so we can pad all of the chosen words to the same length by the neutral letter. The proposition follows. We can extend any regular language by a neutral letter. For a language L  A and e 62 A we define the extension Le of L by the neutral letter e to be the set of all the words w 2 .A [ ¹eº/ such that if we remove all occurrences of e from w we get a word from L. Here are two simple examples. For any alphabet A D ¹a1 ; a2 ; : : : ; ak º and e 62 A, ;e D ;

and for any integer q > 1

LENGTH.q/e D

[

06r1 ;r2 ;:::;rk

r1 CCrk 0 .mod p/

k \

..A [ ¹eº/ ; ai ; .A [ ¹eº/ ; ri ; q/:

i D1

Note, any regular expression over an alphabet A  A0 can be interpreted over the larger alphabet A0 , the only care has to be taken to replace the operation A n L1 by A0 n L1 . The next proposition shows that such an interpretation extends the language by neutral letters. Proposition 4.4. Let A be an alphabet, e 62 A, and L be a regular language described by a regular expression F . If the only atomic language used in F is the empty language ; then F describes Le when interpreted over the alphabet A [ ¹eº.

Proof. The proof is by induction on the structure of the regular expression. If the expression F is the atomic language ; then the claim is clearly true. Otherwise F is obtained from smaller regular expressions by one of the regular operations. If F D A n F1 then F interpreted over A [ ¹eº equals .A [ ¹eº/ n F1 . Let L1 be the language described by F1 when interpreted over A. By the induction assumption F1 interpreted over A [ ¹eº describes .L1 /e . That is, for any word w 2 .A [ ¹eº/ , w 2 .L1 /e if and only if the word obtained from w by removing all occurrences of e belongs to L1 . F describes the complement of L1 with respect to A . .A [ ¹eº/ n F1 describes the complement of .L1 /e , when F1 is interpreted over A[¹eº, which is clearly the extension of the complement of L1 over A by the neutral letter e . The case when F is the union or intersection of two other regular expressions follows trivially from the induction assumption on the sub-expressions. Let us consider

506

Michal Koucký

the case when F is .F1 ; a; F2 / for some a 2 A and two regular expressions F1 and F2 . Let L1 and L2 be the languages described by F1 and F2 , respectively, when interpreted over A . By the induction hypothesis F1 interpreted over .A[¹eº/ describes .L1 /e and similarly for F2 . Let L be the language described by F interpreted over A . Consider any word w 2 .A [ ¹eº/ . w 2 Le if and only if the word w 0 obtained from w by removing all occurrences of e belongs to L which happens if and only if w 0 D u0 av 0 for some words u0 2 L1 and u0 2 L2 . Clearly, w D uav for some words u; v 2 .A [ ¹eº/ , where u gives u0 after removing all occurrences of e and similarly v gives v 0 . Thus, w 2 Le if and only if w D uav for some u 2 .L1 /e and v 2 .L2 /e . The claim follows. The case of modp -concatenation is similar.

5. Circuit complexity of regular languages In this section we dive into the actual circuit complexity of regular languages. The computation of finite automata appears inherently sequential so it may come as a surprise that one can efficiently parallelise the process. The following theorem in essence asserts that any regular language has a parallel recognition algorithm running in logarithmic time. Theorem 5.1. If L is a regular language then L 2 NC1 . The claim is less mysterious once one realises that the evaluation of a product of n elements over the syntactic monoid of L can be parallelised by taking advantage of associativity of the monoid operation. Hence, the above theorem is a consequence of the next lemma and Proposition 3.1 Lemma 5.2. Let M be a finite monoid and a 2 M . Then the a-word problem over M is in NC1 . Proof. We provide a brief sketch of the construction for readers not accustomed to working with circuits. The main idea is to build a fixed size circuit that takes two elements from M and multiplies them. Forming a binary tree of depth dlog2 ne from n 1 copies of the circuit gives a circuit that computes the product of n elements from M . Using a fixed size circuit we can then test whether the product is a or not. We will consider circuits that take inputs over the alphabet ¹0; 1º (see § 2) so we have to fix some unique encoding of elements from M by strings from ¹0; 1ºk , for say k D jM j. One can easily construct a fixed size circuit MULT.x1 x2    xk y1 y2    yk / that takes a representation of two elements from M and outputs the representation of their product. Such a circuit can be constructed using at most 8k 4 A n d2 ; O r2 and No t gates from k DNF formulas computing each bit of the product representation. Using n 1 copies of the MULT.x1 x2    xk y1 y2    yk / circuit organised into a binary tree of depth dlog2 ne, one constructs a circuit that computes the binary representation of the product of n elements from M represented in binary.

14. Circuit complexity of regular languages

507

On top of that circuit one adds a circuit that evaluates to 1 if it gets as its input the representation of a. Such a circuit can be constructed using 2k additional A n d2 and No t gates. Overall we obtain a circuit of depth at most 8k 4 .1 C log2 n/ C 2k deciding the a-word problem on inputs of length n. The depth of this circuit grows logarithmically with the input length. Thus, these circuits form an NC1 circuit family computing the a-word problem over M . We have defined many more circuit classes contained in NC1 . It is a natural question whether Theorem 5.1 can be strengthen to show that all regular languages are in say AC0 or in ACC0 . As we will see further we know answers to some of these questions but not to all. The only obstacle in our full understanding of these questions is the lack of our understanding of the precise relationship among the different circuit classes. The celebrated result of Barrington extended by Barrington et al. [2] and [3] implies that regular languages whose syntactic monoids contain a non-solvable group are complete for the class NC1 under a restricted form of reductions called projections. A polynomial size projection of L to L0 is a simple reduction that takes a word w from a language L and maps it to a word w 0 from a language L0 so that each symbol of w 0 depends on at most one symbol of w and the length of w 0 depends only on the length of w and is polynomial in the length of w . Words outside of L are mapped to words outside of L0 . For example, programs of non-uniform finite automata define projections. Theorem 5.3 (Barrington et al. [3]). If L0 is a regular language with non-solvable syntactic monoid then any language L 2 NC1 is reducible by polynomial size projection to L0 . An example of a non-solvable group is the permutation group S5 on five elements. Hence, any word problem over S5 is hard for NC1 under polynomial size reductions. An immediate consequence of the above theorem is that if L0 were say in ACC0 then NC1  ACC0 . Hence, understanding the circuit complexity of regular languages sheds light on the circuit complexity of arbitrary languages. The above theorem is a consequence of Barrington’s Theorem. Theorem 5.4 (Barrington’s Theorem [2]). Let G be a non-solvable group, and let L  ¹0; 1º be an arbitrary language in NC1 . Then L can be recognised by nonuniform finite automata on G of polynomial size. Proof. This proof is based on the original proof of Barrington [2]. Let k D jGj. Let G 0  G be a subgroup of G such that each element of G 0 can be represented as a product of commutators of G 0 : that is, as a product of elements a  b  a 1  b 1 , for a; b 2 G 0 . If G is non-solvable such a subgroup G 0 must exist. By the Pigeonhole Principle, each element of G 0 can be represented as a product of at most jG 0 j 6 k commutators of G 0 . We will construct non-uniform finite automata over G 0 . Consider any circuit of depth d > 1 consisting of A n d2 ; O r2 , and No t gates. Using deMorgan’s rules we can replace O r2 gates by A n d2 and No t gates to obtain

508

Michal Koucký

an equivalent circuit of depth at most 2d C 1 that uses only A n d2 and No t gates. Hence, we need to prove the theorem only for such circuits. For a 2 G 0 we say that a non-uniform finite automaton on G 0 a-represents a circuit C if on inputs on which C evaluates to 1 the output of the automaton multiplies out to a and on inputs on which C evaluates to 0 the output multiplies to 1G 0 . The following claim implies the theorem. Claim. Let d > 0 be an integer, and C be a depth-d circuit on n inputs that uses only A n d2 and N o t gates. Let a 2 G 0 . Then there exists a non-uniform finite automaton on G 0 of length .4k/d that a-represents C . We prove the claim by induction on d . If d D 0 then the circuit output gate is either a constant 0 or 1 gate or it is an input gate xi for some i 2 ¹1; : : : ; nº. In the former case either .1; ¹0 ! 1G 0 ; 1 ! 1G 0 º/ or .1; ¹0 ! a; 1 ! aº/ a-represents C . In the latter case .i; ¹0 ! 1G 0 ; 1 ! aº/ a-represents C . If d > 1 then the output gate of C is either an A n d2 gate or a No t gate. First assume that it is a No t gate. Then the input to this No t gate is a circuit C1 of depth at most d 1. By the induction hypothesis the circuit C1 can be a 1 -represented by a non-uniform finite automaton of length at most .4k/d 1 . Appending the additional instruction .1; ¹0 ! a; 1 ! aº/ to its program turns it into a non-uniform finite automaton that a-represents C . Consider now the case where the output gate of C is the A n d2 gate taking as its input the value of two circuits C1 and C2 of depth at most d 1. Let a D a1  b1  a1 1  b1 1  a2  b2  a2 1  b2 1    ak  bk  ak 1  bk 1 , for some a1 ; : : : ; b1 ; : : : 2 G 0 . By the induction hypothesis, for each i D 1; : : : ; k , C1 can be ai -represented as well as ai 1 -represented by a non-uniform finite automaton of length at most .4k/d 1 . Similarly, C2 can be bi -represented and bi 1 -represented. From these representations of C1 and C2 we build an a-representation of C by concatenating all the programs together in the order in which a1 ; b1 ; : : : represent a, that is the a1 -representation of C1 is followed by the b1 -representation of C2 which is followed by the a1 1 -representation of C1 , etc. The total length of this non-uniform finite automaton will be at most .4k/d . We claim that this program indeed a-represents C . Consider some input on which C1 evaluates to 0. On this input all the representations of C1 evaluate to 1G 0 so the output of the total program evaluates to 1G 0 since 1G 0  bi  1G 0  bi 1 D 1G 0 , for each i D 1; : : : ; k . Similarly on inputs on which C2 evaluates to 0. On inputs where both C1 and C2 evaluate to 1, each representation of C1 and C2 evaluates to the particular element in the representation of a as the product of commutators. So the total program evaluates to a. This proves the theorem. Theorem 5.3 follows from Barrington’s Theorem using Proposition 3.1 and the nice fact observed by Barrington et al. [3] that if a regular language L  A has a nonsolvable syntactic monoid then for some k > 0, the image of Ak under the syntactic morphism L .Ak / contains a non-solvable group. The following theorem of Barrington

14. Circuit complexity of regular languages

509

and Thérien [5] gives a characterisation of NC1 in terms of non-uniform finite automata on monoids. It can be proven using techniques that we have already seen. Theorem 5.5 ([5]). Let L  ¹0; 1º be arbitrary and let M be a non-solvable monoid. L is in NC1 if and only if L can be recognised by non-uniform finite automata on M of polynomial size. 5.1. Regular languages in AC0 . In this and in the subsequent section we take a closer look at regular languages that are computable by circuit subclasses of NC1 . In this section we consider regular languages in AC0 , one of the smallest subclass of NC1 . Recall that AC0 is the class of languages recognised by families of polynomial-size constant-depth circuits consisting of unbounded fan-in A n d and O r gates and unary No t gates. Possibly the first result concerning regular languages in AC0 was given by Chandra, Fortune, and Lipton [8]. They show that star-free languages are in AC0 . There are several equivalent characterisations of star-free languages. A language is starfree if it can be described by a regular expression consisting of Boolean operations, concatenation and the atomic language ;. Star-free languages are also the languages with aperiodic syntactic monoids, which are the monoids that contain no group. They are also the non-counting languages, the languages L that satisfy: there is an integer n > 0 so that for all words x; y; z and any integer m > n, xy m z 2 L if and only if xy mC1 z 2 L. The proof of Chandra et al. that star-free languages are computable by AC0 circuits uses the characterisation of counter-free regular languages in terms of the flip-flop automata of McNaughton and Papert [16]. Using the result of Furst, Saxe and Sipser [10] that for any q > 1, SUM.q/ is not in AC0 , Chandra et al. also show that any word problem over a group cannot be computed in AC0 , as such a word problem could be used to decide SUM.q/ for some q > 1. This almost settles the case of regular languages in AC0 except for the fact that languages like LENGTH.q/, q > 1, are in AC0 . The language LENGTH.q/, q > 1, is not star-free as its syntactic monoid contains a group. Thus, the precise characterisation of regular languages in AC0 was given only by Barrington et al. [3]. Theorem 5.6 ([3]). Let L  A be a regular language. The following are equivalent: 1. L is in AC0 ; 2. for every k > 0, the image of Ak under the syntactic morphism L .Ak / does not contain a non-trivial group; 3. L can be described by a regular expression built using Boolean operations and concatenation from atomic languages ; and LENGTH.q/, q > 1.

If a language L has a neutral letter then for some k > 1, the image of ADk under the syntactic morphism L .ADk / is equal to Synt.L/. Hence, the presence of a neutral letter simplifies the above characterisation as shown in the next theorem. Languages SUM.q/ and LENGTH.q/ provide an interesting example as they have isomorphic syntactic monoids. However, SUM.q/ has a neutral letter so it is not in AC0 whereas LENGTH.q/ does not have any neutral letter and it is in AC0 .

510

Michal Koucký

Theorem 5.7. Let L  A be a regular language with a neutral letter. The following are equivalent: 1. L is in AC0 ; 2. the syntactic monoid Synt.L/ does not contain a non-trivial group; 3. L can be described by a regular expression built using Boolean operations and concatenation from the atomic language ;; that is L, is star-free.

5.2. AC0 and the dot-depth hierarchy. In the previous section we have seen the class of regular languages that are in AC0 . In this section we will refine the classification of languages in AC0 . We will show a connection between the dot-depth hierarchy of starfree languages and the depth of AC0 circuits. The dot-depth hierarchy was originally defined by Brzozowski [6], later Straubing [23] and [24] and Thérien [29] gave a slightly different definition. The two variants of the hierarchy coincide except for the first two levels. In order to make the connection as precise as possible we need a particular definition of depth-k AC0 given by Barrington and Thérien [5]. We will consider circuits that use generalised input gates of the form Œx 2 D which test whether the i -th input symbol belongs to some D  A. For k > 1, a †k circuit is a circuit consisting of k alternating layers of unbounded fan-in A n d and O r gates with the top output gate being an O r gate, the bottom layer is formed by the generalised input gates. …k is defined analogously but the top gate is an A n d gate. It follows from deMorgan’s rules that …k circuits recognise complements of languages recognised by †k circuits. Finally a language L  A is in depth-k AC0 if it is computed by a family of circuits of polynomial size and constant depth where each of the circuits in the family consists of a constant size circuit built from A n d2 and O r2 gates with inputs to this circuit being provided by †k and …k circuits of polynomial size. Hence, except for the few top layers the main portion of each circuit consists of k alternating layers of unbounded fan-in A n d and O r gates. It is well known that there is a strict depth hierarchy of languages in AC0 . Sipser [20] and [11] proved that for any k > 1, there is a language in AC0 computable by depth k circuits of polynomial size consisting of unbounded fan-in A n d, O r, and No t gates that cannot be computed by depth k 1 circuits of polynomial size consisting of unbounded fan-in A n d, O r and No t gates. In a similar fashion regular languages in AC0 form a hierarchy and there are regular languages in depth-k AC0 that are not in depth-.k 1/ AC0 . These are languages from the k -th level of the dot-depth hierarchy. Barrington and Thérien [5] use the Straubing–Thérien definition of dot-depth: D 0 D Bool.A /; D 1 D Bool.¹A aA j a 2 Aº/; D k D Bool.¹L0 a1 L1    ar Lr j r > 1; a1 ; : : : ; ar 2 A; L0 ; : : : ; Lr 2 D k

1

S

º/;

where k > 2. One can easily verify that D  D . Clearly, the union k>0 D k equals the star-free languages. The classes D k parameterise star-free languages by the k 1

k

14. Circuit complexity of regular languages

511

number of alternations between concatenation and Boolean operations. We say that a language has dot-depth k if it is in D k . It was shown by Brzozowski and Knast [7] that the dot-depth hierarchy is proper. An example of a language that lies in the difference between D k n D k 1 is given by the automaton in Figure 3. a 1

a 2

b

b

a 3

k

b

1

k b

kC1

0 a; b

a

a; b

Figure 3. An automaton for a language in D k n D k

1

Barrington and Thérien prove the following theorem. Theorem 5.8 ([5]). For any k > 1 and any regular language L 2 D k , L is in depth-k AC0 . Proof. We provide a sketch of the proof that goes by induction on k . A language of the form A aA can be recognised on inputs of length n by a circuit consisting of a single O r gate and n generalised inputs gates that test whether the i -th input symbol is equal to a. The complement of that language can be recognised by an analogous circuit consisting of a single A n d gate and generalised input gates. All languages in D 1 can be expressed as a fixed Boolean combination of such languages so they are in depth-1 AC0 . A language of the form L0 a1 L1    ar Lr , where L0 ; : : : ; Lr are in depth-.k 1/ AC0 , can be recognised by a circuit consisting of a single O r gate at the top that tests for all possible partitions of the input into 2r C 1 subwords whether they are from the particular languages or are the required letters. For each partition the tests are carried out by circuits corresponding to depth-.k 1/ AC0 computation. The resulting circuit is not in the prescribed form but can be converted into such: each constant size circuit between the top O r and the †k 1 and …k 1 circuits can be replaced by an equivalent DNF formula. Each constant size conjunction of †k 1 and …k 1 circuits can be transformed into a †k circuit using deMorgan’s rules. The †k circuit will be of size at most quadratic in the total size of the original †k 1 and …k 1 circuits. All these transformations result in an overall circuit that has the top three layers formed by O r gates. These O r gates can be replaced by a single O r gate to give a †k circuit for L0 a1 L1    ar Lr . The size of the circuit will be polynomial in the input size. It follows that the complement of L0 a1 L1    ar Lr can be recognised by polynomial size …k circuit. All other languages in D k are Boolean combinations of languages of the form L0 a1 L1    ar Lr , hence they are in depth-k AC0 .

Michal Koucký

512

It turns out that the preceding theorem almost characterises depth-k AC0 languages. We say that a monoid has dot-depth k if all regular languages recognised by it have dotdepth k . (This definition differs slightly from the standard semigroup one [7] and [9].) Barrington and Thérien prove that word problems over monoids of dot-depth k are complete for depth-k AC0 under projections. Namely, they establish the following theorem. Theorem 5.9 ([5]). Let k > 1. If a language L is in depth-k AC0 then L is recognised by polynomial size non-uniform finite automata on some dot-depth k monoid. This theorem is indeed an equivalence. By Sipser’s hierarchy result, it implies that there are regular languages that are of dot-depth k , and hence in depth-k AC0 , but are not in depth-.k 1/ AC0 . 5.3. Regular languages in AC0 Œq for prime power q . Furst et al. [10] show that SUM.q/ is not in AC0 for any q > 1. It is a trivial fact that SUM.q/ 2 AC0 Œq, the class of languages recognised by families of polynomial-size constant-depth circuits consisting of unbounded fan-in A n d; O r; M o dq gates and unary No t gates. Hence, AC0 ¨ AC0 Œq. One may wonder what other languages are in AC0 Œq beside the SUMŒq and languages from AC0 . Could it be that all regular languages are in there? We know the answer to this question in the case when q is a prime power. Key to the answer are the results of Razborov and Smolensky [17] and [21] showing that if q is a power of a prime p and m > 1 has a prime factor different from p then SUM.m/ 62 AC0 Œq. So in particular AC0 Œq ¨ NC1 . We do not have any similar result for any q that is divisible by 2 distinct primes, so as far as we know it could be that NC1  AC0 Œ6. It is well known that for q; k > 1, one can simulate a M o dq k gate by a circuit consisting of M o dq and A n d2 gates, and vice versa (see for example [26]). Hence, AC0 Œq D AC0 Œq k  for q; k > 1. We can rephrase Theorem 5 of Barrington et al. [3] using our binary modp -concatenation. Theorem 5.10 ([3]). Let L  A be a regular language and let p be a prime. The following are equivalent: 1. L is in AC0 Œp; 2. L is in AC0 Œp k , for some k > 1; 3. for every k > 0, every group contained in the image of Ak under the syntactic morphism L .Ak / is a p -group; 4. L can be described by a regular expression built using Boolean operations, concatenation and modp -concatenation from atomic languages ; and LENGTH.q/, q > 1. Again, the characterisation simplifies in the case of languages with a neutral letter. Theorem 5.11. Let L  A be a regular language with a neutral letter, and p be a prime. The following are equivalent:

14. Circuit complexity of regular languages

513

1. L is in AC0 Œp; 2. L is in AC0 Œp k , for some k > 1; 3. every group contained in the syntactic monoid Synt.L/ is a p -group; 4. L can be described by a regular expression built using Boolean operations, concatenation and modp -concatenation from the atomic language ;. 5.4. Regular languages in ACC0 . The class ACC0 is the union of classes AC0 Œq, q > 1. We have seen a precise characterisation of regular languages in AC0 Œq for the case when q is prime. For AC0 Œq where q is a composite we do not know a precise characterisation as such a class could possibly contain all regular languages. A clue to understanding regular languages in ACC0 is the following theorem which can be extracted from results of Thérien [29] that are stated in terms of congruences. An almost identical result was independently proven by Straubing [22] who instead of the binary modp -concatenation L1 ; L2 ! .L1 ; a; L2 ; r; p/ used the unary L1 ! .L1 ; a; A ; r; p/. Theorem 5.12 ([3]). Let L  A be a regular language. The following are equivalent: 1. every group contained in the syntactic monoid Synt.L/ is solvable;

2. L can be described by a regular expression built using Boolean operations, concatenation and modp -concatenation, q > 1, from the atomic language ;. In § 4 we have seen how to convert regular expressions into circuits so an immediate consequence of the above theorem is that regular languages with solvable syntactic monoids are in ACC0 . By Theorem 5.3 we know that regular languages with nonsolvable syntactic monoids are complete for NC1 . Thus either ACC0 contains all regular languages, and hence ACC0 D NC1 or ACC0 contains only languages with solvable syntactic monoids. This fact can be stated as the next theorem. Theorem 5.13. Let L  A be a regular language. If ACC0 6D NC1 then the following are equivalent: 1. L is in ACC0 ; 2. every group contained in the syntactic monoid Synt.L/ is solvable; 3. L can be described by a regular expression built using Boolean operations, concatenation and modp -concatenation, q > 1, from the atomic language ;. Note that in the case of languages in ACC0 the presence of a neutral letter does not change our characterisation. This is because if Synt.L/ contains a non-solvable group then for some k > 1, the image of Ak under the syntactic morphism L .Ak / contains a non-solvable group. This is a key observation of Barrington and Thérien [5] that allowed them to extend Barrington’s Theorem to any regular language with a nonsolvable syntactic monoid. Barrington and Thérien [5] provide the following characterisation of ACC0 in terms of non-uniform finite automata on monoids.

514

Michal Koucký

Theorem 5.14 ([5]). Let L  ¹0; 1º be arbitrary. L is in ACC0 if and only if L can be recognised by non-uniform finite automata of polynomial size on some solvable monoid M .

6. Circuit size of regular languages We have seen that there are large classes of regular languages computable by various circuit types. Can we say something more about the circuits computing regular languages? For example, can we say that those circuits are small or that they have small depth? In § 5.2 we have already seen that the dot-depth hierarchy of star-free languages corresponds to the depth hierarchy of AC0 circuits. Thus we cannot say that all regular languages in AC0 could be computed by circuits of depth, say, 10. Contrary to that, in this section we will show that regular languages from various circuit classes are computable by almost linear size circuits of the appropriate type. Similarly to the space and time hierarchies of Turing machines, one can establish size hierarchy for various circuit classes. A simple counting argument shows that if a class of functions B from ¹0; 1º to ¹0; 1º contains arbitrary fan-in A n d and No t then for any k > 1, there is a language computable by circuits over B using O.n2k / wires but not computable by circuits over B consisting of O.nk / wires (see [14].) Hence, there is a strict (wire) size hierarchy of circuits over B . This applies to the circuit classes AC0 , ACC0 and TC0 . A similar counting argument implies a size hierarchy of languages in NC1 . Hence, one may want to pin-point the precise location of regular languages in their associated circuit classes. It is immediate from the proof of Theorem 5.1 that regular languages are computable by linear size NC1 circuits: that is, by circuits of linear size and logarithmic depth consisting of A n d2 , O r2 and No t gates. Hence, regular languages lie very low in the size hierarchy of NC1 languages. A similar situation occurs with regular languages in the other circuit classes that we consider. The following statement was proven in [13]. Theorem 6.1 ([13]). Let B be a complete basis. Let s.n/; d.n/W N ! N be two non-decreasing functions bounded by a polynomial from above. Let L be a regular language. If the product over Synt.L/ is computable by a circuit family over B of size s.n/ and depth d.n/ then for every  > 0, there is a family of circuits over B of size O.n1C / and depth O.d.n// computing L. Proof. It suffices to show that for any fixed k > 1, if the product over Synt.L/ is computable by circuits of size O.nk / and depth O.d.n// then it is computable also by circuits of size O.n.kC1/=2 / and depth O.d.n//. The theorem follows by repeated use of this claim a constant number of times. We prove the claim. The product of n elements first breaking the sequence of n elements p from Synt.L/ can be computed by p into d ne blocks, each consisting of at most d ne consecutive elements, computing the product of each block of the elements, and then computing the total productp of the block products. The products of all the blocks can be computed in parallel by d n e copies

14. Circuit complexity of regular languages

515

p of a circuit the product of d n e elements, where each of the copies is of pfor computing size O..d ne/k /  O.nk=2 /. The total product can be computed by another copy of the circuit of size O.nk=2 /. Hence, in total we have a circuit of size O.n.kC1/=2 /. Notice, the depth of the new circuit is at most twice the original depth. The claim follows.

The previous proposition provides a generic upper bound for large class of languages for example, all regular languages in AC0 or ACC0 containing a neutral letter are computable by their respective circuits of size O.n1C / and constant depth. This is a significant observation as it indicates that regular languages lie low in their respective circuit classes. It turns our that for regular languages in AC0 and ACC0 we can provide a significantly better upper bound on their circuit size. To present these upper bounds we need to define a sequence of slowly growing functions 0 ; 1 ; : : : For a function gW N ! N, we define g .i / .n/ D g.g.: : : .g.n/// : : : / to be the i -th iteration of g . If g.n/ < n for all n > 0 then we define the function g  .n/ D min¹i j g .i / .n/ 6 1º. Now, we define and inductively for i D 1; 2; : : :

0 .n/ D bn1=4 c i .n/ D i 1 .n/:

Hence, 0 grows like n1=4 , 1 grows like log log.n/, 2 grows like log .n/, 3 grows like log .n/, etc. The following theorem was proven by Chandra, Fortune and Lipton (hereafter: CFL) [8] for star-free languages and extended to all regular languages in AC0 by Koucký [13]. Theorem 6.2 ([8] and [13]). If a regular language is in AC0 then for any fixed d > 0, it is computable by circuits of size O.n  d .n// and constant-depth consisting of unbounded fan-in A n d and O r gates and unary N o t gates. We will present a proof of this theorem. We will largely follow the proof of Koucký [13]. It turns out that for the construction one needs not only circuits computing the language L but also circuits for the product, prefix-product and suffix-product over Synt.L/. (For non-star-free regular languages in AC0 , the product over their syntactic monoids is not in AC0 so we will have to deal with them separately.) We will use a procedure introduced by Chandra, Fortune and Lipton in [8] that allows one to construct efficient circuits for the prefix- and suffix-product over a monoid from efficient circuits for the product over the monoid and less efficient circuits for the prefix- and suffixproduct over the monoid. Let gW N ! N be a non-decreasing function and M be a (finite) monoid. CFL procedure for computing the prefix-product of n elements from M Step 0. We split the input x 2 M n iteratively into sub-words. We start with x as the only sub-word of length n and we divide it into at most dn=g.n/e sub-words 0 0 x10 ; : : : ; xdn=g.n/e of size at most g.n/ so that x D x10 x20    xdn=g.n/e . We iterate and divide each sub-word of length ` > 1 obtained so far into at most d`=g.`/e sub-words

Michal Koucký

516

of length at most g.`/. Hence, ignoring the rounding issues, for i D 0; : : : ; g .n/ we obtain n=g.i / .n/ sub-words of length g.i / .n/. Step 1. For every sub-word obtained in Step 0 we compute its product over M . Step 2. For each sub-word w of length ` > 1 from Step 0 that was divided into 0 parts w10 ; w20 ; : : : ; wd`=g.`/e , and each j D 1; : : : ; d`=g.`/e, we compute the product 0 0 0 computed in Step 1 and w1    wj , using the individual products of w10 ; : : : ; wd`=g.`/e existing circuits for prefix-product of d`=g.`/e elements.

Step 3. For each i D 1; : : : ; n, the product of x1    xi can be obtained as the product of at most g .n/ values of the appropriate prefixes obtained in Step 2. Namely, x1 x2    xi D x10 x20    xj0 r , for some j 6 dn=g.n/e and r of length less than g.n/. The product of x10 x20    xj0 was obtained in Step 2 and similarly the product of r can be written as a product of at most g .n/ 1 elements obtained in Step 2. Hence, for each i D 1; : : : ; n, we compute the product of x1    xi by multiplying together at most g  .n/ elements obtained in Step 2. Let us analyze the circuit obtained from the above procedure. Assume that for some constant c > 0 we have existing circuits of size c  n  s.n/ and depth ds for computing the product of n elements from M and that we have circuits of size c  n  p.n/ and depth dp for computing the prefix-product of n elements from M . Assume that s.n/ and p.n/ are and that there is a constant c 0 > 0 such that for all ` > 1,  ˙ `  functions ˙ `non-decreasing 0  p g.`/ 6 c `. Then the CFL procedure gives a circuit for computing the g.`/ prefix-product of n elements from M of depth 2ds C dp and size at most  .n/  gX  c n  s.g .i / .n// C g  .n/  c 0 n C n  g  .n/  s.g  .n// :

(1)

i D0

P P This is because kiD1 ai s.ai / 6 s.max¹a1 ; : : : ; ak º/ kiD1 ai , for any sequence of positive integers a1 ; a2 ; : : : ; ak , so the first sum upper-bounds the size of a circuit implementing Step 1 of the procedure, c  c 0  g .n/  n upper-bounds Step 2, and the last term upper-bounds Step 3. We demonstrate the use of the above procedure. Let M be a monoid such that the product over M is computable by a polynomial-size constant-depth circuit family. Choose  > 0. From Theorem 6.1 we have constant-depth circuits of size n  s.n/ D nO.n / for computing the product over M . By choosing g.n/ D dn=2e and computing the prefix-product of two elements by a trivial circuit we obtain a circuit for the prefixproduct over M of constant depth and size O.n1C log n/. Clearly, one can use a similar kind of strategy to compute the suffix-product. We can state the following lemma.

Lemma 6.3. Let B be a complete basis. Let s.n/W N ! N be a non-decreasing function bounded by a polynomial from above. Let L be a regular language. If the product over Synt.L/ is computable by a constant-depth circuit family over B of size s.n/ then for every  > 0, there is a family of constant-depth circuits over B of size O.n1C / computing the prefix-product and the suffix-product over Synt.L/.

14. Circuit complexity of regular languages

517

The previous proposition states clearly something non-trivial as a naïve construction of prefix- or suffix-product circuits would produce at least quadratic size circuits. We can derive also the following lemma from the CFL procedure. Lemma 6.4. Let B be a complete basis. Let M be a monoid, and i > 0 be an integer. If there is a size O.n  i C1 .n// depth ds circuit family over B for computing product over M and a size O.n  2i .n// depth dp circuit family over B for computing prefixproduct over M then there is a size O.n  2iC1 .n// depth 2ds C dp circuit family over B for computing the prefix-product and the suffix-product over M . Proof. Use the CFL procedure and choose g.n/ D 2i .n/, s.n/ D i C1 .n/ and ˙ `  ˙ `  p.n/ D 2i .n/. Since 2i .n/ is non-decreasing, for all ` > 1, g.`/  p g.`/ 6 2`. 1=4 2 2  2   Since i .n/ 6 n , we get i .i .n// 6 i .n/. Thus g .n/ D .i / .n/ 6 2i .n/ D 2i C1 .n/. Using equation 1, the size of the resulting circuit is of order at most  .n/ gX

i D0

n  i C1 .g .i / .n// C g  .n/  2n C n  g  .n/  i C1 .g  .n//

6 g  .n/  n  i C1 .n/ C 2g  .n/  n C n  g  .n/  i C1 .n/ 6 6n  2iC1 .n/:

It is trivial that if we can compute the prefix-product over some monoid M by O.n  2i .n// circuits then we can also compute the product by the same size circuits. The above lemma provides essentially the other direction that is it allows one to build efficient circuits for the prefix-product from circuits for the product. We state the following key corollary that follows from the lemma above by induction on i . Lemma 6.5. Let B be a complete basis. Let M be a monoid. If for each i > 0, there is a constant-depth circuit family over B of size O.n  i C1 .n// computing the product over M then for each i > 0 there is a constant-depth circuit family over B of size O.n  i C1 .n// computing the prefix-product and the suffix-product over M .

Proof. Proof of Theorem 6.2 The star-free case. We will prove the theorem first for star-free languages. So assume that we have a star-free language L. In this case the proof is similar to the proof of Proposition 4.1 but we have to be more careful about the size of the resulting circuits. A star-free language L can be described by a regular expression F built using Boolean and concatenation operations from the atomic language ;. Let e 62 A. We will first construct a circuit for Le , the extension of L by the neutral letter e . From Proposition 4.4 we know that F describes Le when interpreted over A [ ¹eº. From now on we will interpret F as such. By induction on the structure of the expression F we will show that for any d > 0, the language described by F is computable by a circuit family of size O.n  d .n// and constant depth consisting of unbounded fan-in A n d; O r and unary No t gates. If F is the atomic language ;, then trivially we have a constant size circuit family to decide the language described by F . Assume that F equals .A [ ¹eº/ n F1 for some regular expression F1 . By the induction hypothesis for each d 0 > 0 we have a circuit

518

Michal Koucký

family of size O.n  d 0 .n// and constant depth recognising the language described by F1 . To obtain a circuit family for the language of F , we just negate the output of each circuit by an additional No t gate. This increases the size and the depth of each circuit only by one so we obtain the required circuit family of asymptotically the same size and depth. The case of F being a union or intersection of two regular expressions is similar. Thus assume that F is .F1 ; a; F2 / for some a 2 A and two regular expressions F1 and F2 . We will build a circuit for deciding L, that is described by F , on inputs of length n as follows. Given the input word w 2 .A [ ¹eº/n we need to decide whether for some i 2 ¹0; : : : ; n C 1º, w1 w2    wi 2 L1 , wi C1 D a and wi C2    wn 2 L2 . Thus, our circuit will first compute for each prefix of w whether it belongs to L1 and for each suffix whether it belongs to L2 . Then as in Proposition 4.1, using linear number of additional constant fan-in A n d gates and a single O r gate of fan-in n C 2 we decide the language L. So we only need to efficiently decide the membership of all the prefixes and suffixes into L1 and L2 , respectively. By the induction hypothesis for each d > 0, we have circuit families of size O.n  d .n// and constant depth recognising the languages L1 and L2 described by F1 and F2 , respectively. Since we interpret all regular expressions over A [ ¹eº, both L1 and L2 have a neutral letter e . Thus by Propositions 4.3 and 3.1, we obtain circuits of asymptotically the same size and depth for the products over the syntactic monoids Synt.L1 / and Synt.L2 /. By Corollary 6.5 we also have circuits of asymptotically the same size and depth for the prefix-product over Synt.L1 / and the suffix-product over Synt.L2 /. So for given d > 0, the circuit computes as follows: first it maps each letter of the input word w 2 .A [ ¹eº/n to the corresponding monoid element in Synt.L1 / and Synt.L2 /. This gives words x 2 Synt.L1 /n and y 2 Synt.L2 /n . Then it computes the prefix-product of x over Synt.L1 / and the suffix-product of y over Synt.L2 / by circuits of size O.n  d .n// and constant depth. By a linear amount of extra circuitry the circuit converts the prefix-product and suffix-product into sequences on n C 1 bits each, which determine whether a given prefix or suffix of the input word w belongs to L1 and L2 , respectively. A linear amount of extra circuitry produces the desired result. Clearly, the circuit is of constant depth and total size O.n  d .n//. Non-star-free regular languages in AC0 . The proof for non-star-free regular languages from AC0 is given in [13]. We present here only a brief sketch due to space constraints. We will reduce the problem to star-free languages. One can show that if for every k > 0, the image of Ak under the syntactic morphism L .Ak / does not contain a non-trivial group then there is k > 1 such that L ..Ak / / does not contain a non-trivial group. Hence, for any regular language L 2 AC0 there is k > 0 such that the image L ..Ak / / is an aperiodic monoid M . It is easy to show that any word problem over M is a star-free language over the alphabet M . (The syntactic monoid of each of the word problems has to divide M so it cannot contain a non-trivial group.) Hence by the star-free case of this theorem, for each a 2 M and an integer d > 0, the a-word problem over M is computable by a family of constant depth circuits of size O.n  d .n//.

14. Circuit complexity of regular languages

519

Hence for any n > 1, a circuit for L is constructed as follows. Let n D mk C r , for 0 6 r < k . From left to right break the input word w of length n into blocks of k letters 0 0 to get words w10 ; w20 ; : : : ; wmC1 where all the words wi0 are of size k except for wmC1 . So 0 0 0 0 w D w1 w2    wmC1 . For i D 1; : : : ; m, compute xi D L .wi / and using the circuits for 0 /. word products over M on m elements, compute x1 x2    xm which is L .w10 w20    wm 0 0 0 0 0 To obtain L .w1 w2    wm wmC1 / multiply the result by L .wmC1 /. This requires only an additional constant size circuit. If we use circuits for word products over M of size O.m  d .m// and constant depth we obtain a circuit for L on inputs of length n of size O.n  d .n// and constant depth. Observe that we could extend the star-free part of the above proof to cover also the modp -concatenation if we were to allow M o dq gates in the resulting circuit. Together with Theorem 5.12 this leads to the following theorem given by Koucký in [13]. Theorem 6.6 ([13]). If a regular language L has a solvable syntactic monoid then for some q > 1, and any constant d > 0, it is computable by circuits of size O.nd .n// and constant-depth consisting of unbounded fan-in A n d, O r, and M o dq gates and unary N o t gates. Assuming that ACC0 6D NC1 this implies that all regular languages in ACC0 are computable by circuits of quasi-linear size. It is a major open problem in circuit complexity whether this is the case. 6.1. Regular languages computable by linear size circuits. In the previous section we have seen that regular languages in AC0 and ACC0 are computable by appropriate circuits of almost linear size. It is natural to ask whether all regular languages are in fact computable by linear size circuits of their particular type. This would reflect the situation with NC1 . However, Koucký, Pudlák and Thérien [15] show that when we measure the size of circuits in terms of the number of wires this is not the case. [15] provides characterizations of regular languages in AC0 and ACC0 that can be computed by appropriate circuits of linear size. Let L0 ; : : : ; Lk  A and a1 ; : : : ; ak 2 A. We recall that L0 a1 L1 a2    Lk is unambiguous if for any word w 2 A , 1 D .w=L0 a1 L1 a2    Lk /: that is, if there is at most one decomposition of w of the form w0 a1 w1    ak wk , where wi 2 Li for each i . We say that L  A is an unambiguous language if it is the disjoint union of languages that are unambiguous products of the form A0 a1 A1 a2    Ak , where Ai  A for each i . There are two prominent languages over the alphabet A D ¹a; b; cº that are not unambiguous: K D .c  ac  b/ c  and U D A ac  aA . They are in certain specific sense minimal languages that are unambiguous. However, both of them are in AC0 . The following theorem was proven by Koucký et al. Theorem 6.7 ([15]). A regular language L with a neutral letter is computable by constant-depth circuits with a linear number of wires consisting of unbounded fan-in A n d, O r, and unary N o t gates if and only if L is unambiguous.

520

Michal Koucký

This theorem provides a precise characterisation of regular languages in AC0 with neutral letter that are computable by circuits with linear number of wires. This result can be extended to languages without any neutral letter using the technique of [3] and [27]. There are several other characterisations of unambiguous languages. Schwentick, Thérien and Vollmer [19] characterise unambiguous languages in terms of partiallyordered two-way finite automata and in terms of turtle programs. A partially-ordered two-way automaton is a two-way finite automaton whose states can be linearly ordered so that all transitions are either back to the same state or to a state of higher order. Hence, the graph of transitions is acyclic except for the self-loops. Schwentick et al. prove that unambiguous languages are precisely those languages that are recognised by partially-ordered two-way finite automata. They also define turtle programs. A turtle program T is a sequence of instructions T D .d1 ; a1 /; .d2 ; a2 /; : : : ; .dk ; ak /, where dj 2 ¹L; Rº and aj 2 A. The program executes on an input w 2 A placed between two end-markers not in A, by moving a turtle along the input. If d1 D R, the turtle is initially on the left end-marker (at position 0), if d1 D L, it is on the right end-marker (at position n C 1). To execute an instruction .d; a/, the turtle moves in direction d from its current position to the next position that contains a; if there is no occurrence of a in the given direction, the program stops and rejects. The language L.T / recognised by T is the set of strings for which the program executes all instructions without rejecting. In this case, we say that L is a turtle language. Schwentick et al. prove that unambiguous languages are precisely all the Boolean combinations of turtle languages. Languages in ACC0 computable by circuits with linear number of wires allow a similar kind of characterisation. We say that a language is mod-counting-unambiguous if it is a disjoint union of unambiguous concatenations of the form L0 a1 L1 a2    Lk , where each Li is recognised by a commutative monoid. The following theorem was proven by Koucký et al. Theorem 6.8 ([15]). A regular language L with a neutral letter is computable by constant-depth circuits with linear number of wires consisting of unbounded fan-in A n d; O r and M o dq gates and unary N o t gates if and only if L is mod-countingunambiguous. Koucký et al. provide a characterisation of these languages in terms of super-turtle programs. A super-turtle program S over a commutative group G is a sequence of instructions S D I1 ; I2 ; : : : ; Ik , where each Ij is either a turtle instruction .dj ; aj / or a super-turtle instruction .dj ; Aj ; gj ; fj /, where dj 2 ¹L; Rº, Aj  A, gj 2 G and fj W Aj ! G . The program executes on an input w 2 A placed between two distinguished end-markers by moving a turtle along the input similarly to a turtle program. Each instruction is executed in turn; if the instruction is a turtle instruction then it is executed in the usual way, if it is a super-turtle instruction .d; A0 ; g; f / then it is executed as follows: the turtle moves from its current position (at least one step) in the direction d until it reaches the first position with a symbol not in A0 . The turtle stops at that position which may be a position of an end-marker. Then it verifies that the

14. Circuit complexity of regular languages

521

product of f .wi / of all the symbols wi that lie between the new position of the turtle and the position before the beginning of the execution of this instruction is equal to g. (The turtle computes the product of f .wi / while traveling.) If the product is not equal to g the computation stops and rejects. The language L.S / recognised by S is the set of strings for which the program executes all instructions without rejecting. We say that L is a super-turtle language if there is a super-turtle program S over some commutative group G such that L D L.S /. Again, Boolean combinations of super-turtle languages correspond to mod-countingunambiguous languages [15] and [28]. Neither the language K nor U is a mod-counting-unambiguous language. They are both in ACC0 but not computable by circuits with linear number of wires. However, K is computable by bounded-depth circuits with linear number of gates consisting of A n d; O r; M o dq and No t gates. Hence, there is a clear distinction between the measures of circuit size in terms of wires and gates. [15] gives a similar example also for AC0 but that language is not regular.

7. Final remarks There are several other topics we did not cover in this chapter due to space constraints. One of them is the tight connection of descriptive complexity and first order logic to regular languages and their circuit complexity. Straubing’s and Immerman’s books [26] and [12] provide excellent expositions of this connection. Another topic we did not cover are the regular languages that correspond to the circuit classes CC0 and CC0 Œm. This topic was thoroughly investigated by Straubing [25]. There are many open problems in the area of circuit complexity of regular languages. Indeed, many of these problems are directly related to major open problems in circuit complexity that are open for decades so it is unlikely that they will be resolved any time soon. Acknowledgements. I am indebted to Denis Thérien for his contribution to this chapter. He graciously shared with me his extensive expertise of the area and provided many valuable comments on various versions of this chapter. I would like to thank also to Eric Allender for his careful reading of this chapter and providing many comments and corrections. This work was partially supported by the Sino-Danish Center CTIC (funded under the grant 61061130540) and by the Center of Excellence CE-ITI under the grant P202/12/G061 of GA ČR.

References [1] M. Ajtai, †11 -formulae on finite structures. Ann. Pure Appl. Logic 24 (1983), no. 1, 1–48. MR 0706289 Zbl 0519.03021 q.v. 498

522

Michal Koucký

[2] D. A. Barrington, Bounded-width polynomial-size branching programs recognize exactly those languages in NC1 . J. Comput. System Sci. 38 (1989), no. 1, 150–164. 18th Annual ACM Symposium on Theory of Computing (Berkeley, CA, 1986). MR 0990054 Zbl 0667.68059 q.v. 501, 507 [3] D. A. M. Barrington, K. J. Compton, H. Straubing, and D. Thérien, Regular languages in NC1 . J. Comput. System Sci. 44 (1992), no. 3, 478–499. MR 1163944 Zbl 0757.68057 q.v. 507, 508, 509, 512, 513, 520 [4] D. A. M. Barrington, N. Immerman, and H. Straubing, On uniformity within NC1 . J. Comput. System Sci. 41 (1990), no. 3, 274–306. MR 1079468 Zbl 0719.68023 q.v. 496 [5] D. A. M. Barrington and D. Thérien, Finite monoids and the fine structure of NC1 . J. Assoc. Comput. Mach. 35 (1988), no. 4, 941–952. MR 1072406 Zbl 0667.68068 q.v. 501, 509, 510, 511, 512, 513, 514 [6] J. A. Brzozowski, Hierarchies of aperiodic languages. Rev. Française Automat. Informat. Recherche Opérationnelle Sér. Rouge Informat. Théor. 10 (1976), no. R–2, 33–49. MR 0428813 q.v. 510 [7] J. A. Brzozowski and R. Knast, The dot-depth hierarchy of star-free languages is infinite. J. Comput. System Sci. 16 (1978), no. 1, 37–55. MR 0471451 Zbl 0368.68074 q.v. 511, 512 [8] A. K. Chandra, S. Fortune, and R. J. Lipton, Unbounded fan-in circuits and associative functions. J. Comput. System Sci. 30 (1985), no. 2, 222–234. MR 0801824 Zbl 0604.68051 q.v. 509, 515 [9] S. Eilenberg, Automata, languages, and machines. Vol. A. Pure and Applied Mathematics, 58. Academic Press, New York etc., 1974. Vol. B. With two chapters by B. Tilson. Pure and Applied Mathematics, 59. Academic Press, New York etc., 1976. MR 0530382 (Vol. A) MR 0530383 (Vol. B) Zbl 0317.94045 (Vol. A) Zbl 0359.94067 (Vol. B) q.v. 512 [10] M. Furst, J. B. Saxe, and M. Sipser, Parity, circuits, and the polynomial-time hierarchy. Math. Systems Theory 17 (1984), no. 1, 13–27. MR 0738749 Zbl 0534.94008 q.v. 498, 509, 512 [11] J. Håstad, Computational limitations of small-depth circuits. The MIT Press, Cambridge, MA, 1987. q.v. 498, 510 [12] N. Immerman, Descriptive complexity. Graduate Texts in Computer Science. Springer, New York, 1999. MR 1732784 Zbl 0918.68031 q.v. 521 [13] M. Koucký, Circuit complexity of regular languages. Theory Comput. Syst. 45 (2009), no. 4, 865–879. MR 2529750 Zbl 1185.68390 q.v. 514, 515, 518, 519 [14] M. Koucký, C. Lautemann, S. Poloczek, and D. Thérien, Circuit lower bounds via Ehrenfeucht–Fraissé games. In 21 st Annual IEEE Conference on Computational Complexity (CCC ’06). Held in Prague, July 16-20, 2006. IEEE Press, Los Alamitos, CA, 2006, 190–201. IEEEXplore 1663737 q.v. 514 [15] M. Koucký, P. Pudlák, and D. Thérien, Bounded-depth circuits: separating wires from gates. In STOC ’05: Proceedings of the 37 th Annual ACM Symposium on Theory of Computing (H. N. Gabow and R. Fagin, eds.). Held in Baltimore, MD, May 22–24, 2005. Association for Computing Machinery, 2005, 257–265. MR 2181625 Zbl 1192.68298 q.v. 519, 520, 521 [16] R. McNaughton and S. A. Papert, Counter-free automata. With an appendix by W. Henneman. MIT Research Monograph, 65. The MIT Press, Cambridge, MA, and London, 1971. MR 0371538 Zbl 0232.94024 q.v. 509

14. Circuit complexity of regular languages

523

[17] A. A. Razborov, Lower bounds on the size of bounded depth circuits over a complete basis with logical addition. Mat. Zametki 41 (1987), no. 4, 598–607, 623. MR 0897705 Zbl 0632.94030 q.v. 512 [18] W. L. Ruzzo, On uniform circuit complexity. J. Comput. System Sci. 22 (1981), no. 3, 365–383. Special issue dedicated to M. Machtey. MR 0633540 Zbl 0462.68013 q.v. 496 [19] T. Schwentick, D. Thérien, and H. Vollmer, Partially ordered two-way automata: a new characterization of DA. In Developments in language theory (W. Kuich, G. Rozenberg, and A. Salomaa, eds.). Revised papers from the 5th International Conference (DLT 2001) held at Technische Universität Wien, Vienna, July 16–21, 2001. Lecture Notes in Computer Science, 2295. Springer, Berlin, 2002, 239–250. MR 1964176 Zbl 1073.68051 q.v. 520 [20] M. Sipser, Borel sets and circuit complexity. In 15 th Annual ACM Symposium on Theory of Computing (D. S. Johnson, R. Fagin, M. L. Fredman, D. Harel, R. M. Karp, N. A. Lynch, C. H. Papadimitriou, R. L. Rivest, W. L. Ruzzo, and J. I. Seiferas, eds.). Held in Boston, MA, April 25–27. 1983. Association for Computing Machinery, 1983, 61–69. q.v. 510 [21] R. Smolensky, Algebraic methods in the theory of lower bounds for Boolean circuit complexity. In Proceedings of the 19 th Annual ACM Symposium on Theory of Computing (A. V. Aho, ed.). Held in New York, N.Y., May 25–27. 1987. Association for Computing Machinery, 1987, 77–82. q.v. 512 [22] H. Straubing, Families of recognizable sets corresponding to certain varieties of finite monoids. J. Pure Appl. Algebra 15 (1979), no. 3, 305–318. MR 0537503 Zbl 0414.20056 q.v. 502, 513 [23] H. Straubing, A generalization of the Schützenberger product of finite monoids. Theoret. Comput. Sci. 13 (1981), no. 2, 137–150. MR 0594057 Zbl 0456.20048 q.v. 510 [24] H. Straubing, Finite semigroup varieties of the form V D . J. Pure Appl. Algebra 36 (1985), no. 1, 53–94. MR 0782639 Zbl 0561.20042 q.v. 510 [25] H. Straubing, Constant-depth periodic circuits. Internat. J. Algebra Comput. 1 (1991), no. 1, 49–87. MR 1112299 Zbl 0718.68045 q.v. 521 [26] H. Straubing, Finite automata, formal logic, and circuit complexity. Progress in Theoretical Computer Science. Birkhäuser Boston, Boston, MA, 1994. MR 1269544 Zbl 0816.68086 q.v. 498, 512, 521 [27] H. Straubing, D. Thérien, and W. Thomas, Regular languages defined with generalized quantifiers. Inform. and Comput. 118 (1995), no. 2, 289–301. MR 1331729 Zbl 0826.68072 q.v. 520 [28] P. Tesson and D. Thérien, Bridges between algebraic automata theory and complexity theory. Bull. Eur. Assoc. Theor. Comput. Sci. 88 (2006), 37–64. MR 2222335Zbl 1169.68434 q.v. 521 [29] D. Thérien, Classification of finite monoids: the language approach. Theoret. Comput. Sci. 14 (1981), no. 2, 195–208. MR 0614416 Zbl 0471.20055 q.v. 510, 513 [30] H. Vollmer, Introduction to circuit complexity. A uniform approach. Texts in Theoretical Computer Science. An EATCS Series. Springer, Berlin, 1999. MR 1704235 Zbl 0931.68055 q.v. 496, 498 [31] I. Wegener, Branching programs and binary decision diagrams. Theory and applications. SIAM Monographs on Discrete Mathematics and Applications. Society for Industrial and Applied Mathematics, Philadelphia, PA, 2000. MR 1775233 Zbl 0956.68068 q.v. 501

Chapter 15

Černý’s conjecture and the road colouring problem Jarkko Kari and Mikhail Volkov

Contents 1. 2. 3. 4.

Synchronising automata, their origins and importance Algorithmic and complexity issues . . . . . . . . . Around the Černý’s conjecture . . . . . . . . . . . The road colouring problem . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

525 530 535 547

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

557

1. Synchronising automata, their origins and importance Let A be a complete deterministic finite automaton (DFA) with input alphabet A and state set Q. We say that such an automaton is synchronising if there exists a word w 2 A such that reading the word always leaves A in one fixed state, regardless of the state one starts from. More precisely, there exists w such that q  w D q 0  w for all q; q 0 2 Q. If a word w has this property, it is said to be a reset word for the automaton. Figure 1 shows a synchronising automaton with 4 states 1 denoted by C4 . The reader can easily verify that the word ab 3 ab 3 a resets the automaton, leaving it in the state 1. With somewhat more effort, one can check that ab 3 ab 3 a is also the shortest reset word for C4 . a 0

a; b

b 3 a

1 b

b

2 a

Figure 1. The automaton C4

1 Here and below we adopt the convention that edges bearing multiple labels represent two or more

parallel edges. In particular, the edge 0 b

0 ! 1.

a;b

a

! 1 in Figure 1 represents the two parallel edges 0 ! 1 and

526

Jarkko Kari and Mikhail Volkov

The concept of a synchronising automaton was formalised at the beginning of the 1960s. In the vast literature on synchronising automata, the 1964 paper [23] by Černý, a Slovak computer scientist, serves as a standard reference. (In particular, the example in Figure 1 is taken from [23].) However, it would be fair to mention other researchers who independently and about the same time came to the very same idea. Chung Laung Liu’s Ph.D. thesis [71], submitted in 1962, contains a whole chapter devoted to a systematic study of synchronising automata. Moreover, the term “synchronising automata” seems to originate from Liu’s thesis: Liu used the term “synchronisable” while Černý called such automata “directable.” Synchronising automata also appeared in Laemmel’s technical reports [68] and [69] under the name “resettable machines.” Laemmel’s reports are less elaborated in comparison with [23] and [71], but still contain some valuable observations. In [23] the notion of a synchronising automaton arose within the classic framework of Moore’s “Gedanken-experiments” [72]. For Moore and his followers, finite automata served as a mathematical model of devices working in a discrete mode, such as computers or relay control systems. This leads to the following natural problem: how can we restore control over such a device if we do not know its current state, but can observe outputs produced by the device under various actions? Moore [72] has shown that under certain conditions one can uniquely determine the state at which the automaton arrives after a suitable sequence of actions (called an experiment). Moore’s experiments were adaptive, that is, each action was selected on the basis of the outputs caused by the previous actions. Ginsburg [46] considered more restricted experiments that he called uniform. A uniform experiment 2 is just a fixed sequence of actions, that is, a word over the input alphabet; thus, in Ginsburg’s experiments outputs were only used for calculating the resulting state at the end of an experiment. From this, just one further step was needed to come to the setting in which outputs were not used at all. It should be noted that this setting is by no means artificial – there exist many practical situations when it is technically impossible to observe output signals. (Think of a satellite which loops around the Moon and cannot be controlled from the Earth while “behind” the Moon.) In Liu’s thesis [71] three motivating applications were mentioned. The first one is resetting an automaton whose current state is unknown to a preselected state, so it is exactly the same problem as discussed in the preceding paragraph. The second application is a variation of the first when one deals with several copies of identical automata that are in different initial states and can accept identical input sequences in parallel, and one wants to make these copies work synchronously. The third application relates synchronising automata to variable-length codes – Liu explains how synchronising automata can provide codes that are able to restore synchronisation between sender and recipient after a channel error. This connection, which is indeed of utmost importance, is discussed in more detail below. Laemmel’s motivations were similar, too: he also referred to Ginsburg’s experiments as did Černý, and also mentioned connections between his “resettable machines” 2 After [45], the name homing sequence has become standard for the notion.

15. Černý’s conjecture and the road colouring problem

527

and variable-length codes. Specifically, Laemmel related synchronising automata to socalled ergodic codes considered by Schützenberger [103]. The original “Gedanken-experiments” motivation for studying synchronising automata is still of importance, and reset words are frequently applied in model-based testing of reactive systems. See [27] and [17] for typical samples of technical contributions to the area and [102] for a survey. As for applications to coding theory, we refer to Chapters 3 and 10 in [15] for a detailed account of profound connections between codes and automata. Here we restrict ourselves to the special (but still very important) case of maximal prefix codes. Recall that a prefix code over a finite alphabet A is a set X of words in A such that no word of X is a prefix of another word of X . A prefix code is maximal if it is not contained in another prefix code over the same alphabet. A maximal prefix code X over A is synchronised if there is a word x 2 X  such that for any word w 2 A , one has wx 2 X  . Such a word x is called a synchronising word for X . The advantage of synchronised codes is that they are able to recover after a loss of synchronisation between the decoder and the coder caused by channel errors: in the case of such a loss, it suffices to transmit a synchronising word and the following symbols will be decoded correctly. Moreover, since the probability that a word v 2 A contains a fixed factor x tends to 1 as the length of v increases, synchronised codes eventually resynchronise by themselves, after sufficiently many symbols being sent. (As shown in [19], the latter property in fact characterises synchronised codes.) The following simple example illustrates these ideas: let A D ¹0; 1º and X D ¹000; 0010; 0011; 010; 0110; 0111; 10; 110; 111º. Then X is a maximal prefix code and one can easily check that each of the words 010, 011110, 011111110, . . . is a synchronising word for X . For instance, if the code word 000 has been sent but, due to a channel error, the word 100 has been received, the decoder interprets 10 as a code word, and thus, loses synchronisation. However, with a high probability this synchronisation loss only propagates for a short while; in particular, the decoder definitely resynchronises as soon as it encounters one of the segments 010, 011110, 011111110, . . . in the received stream of symbols. A few samples of such streams are shown in Table 1, where vertical lines show the partition of each stream into code words and the boldfaced code words indicate the position at which the decoder resynchronises.

Table 1. Restoring synchronisation Sent Received Sent Received Sent Received

000 j 0010 j 0111 j ::: 10 j 000 j 10 j 0111 j ::: 000 j 0111 j 110 j 0011 j 000 j 10 j 110 j ::: 10 j 0011 j 111 j 000 j 110 j 0010 j 110 j ::: 000 j 000 j 111 j 10 j ::: 10 j 000 j 0111 j 10 j :::

If X is a finite prefix code over an alphabet A, then its decoding can be implemented by a deterministic automaton that is defined as follows. Let Q be the set of all proper

Jarkko Kari and Mikhail Volkov

528

prefixes of the words in X (including the empty word "). For q 2 Q and a 2 A, define ´ qa if qa is a proper prefix of a word of X ; qa D " if qa 2 X :

The resulting automaton AX is complete whenever the code X is maximal, and it is easy to see that AX is a synchronising automaton if and only if X is a synchronised code. Moreover, a word x is synchronising for X if and only if x is a reset word for AX and sends all states in Q to the state ". Figure 2 illustrates this construction for the code X D ¹000; 0010; 0011; 010; 0110; 0111; 10; 110; 111º considered above. The solid/dashed lines correspond to (the action of) 0/1. "

0

1

00

01

000

001

0010

010

0011

10 011

0110

11 110

111

0111

"

0

00

1

11

01

001

011

Figure 2. A synchronised code (at the top) and its automaton (at the bottom)

An additional source of problems related to synchronising automata has come from robotics or, more precisely, from part handling problems in industrial automation such as part feeding, fixturing, loading, assembly and packing. Within this framework, the concept of a synchronising automaton was again rediscovered in the mid-1980s by Natarajan [73] and [74] who showed how synchronising automata can be used to design

15. Černý’s conjecture and the road colouring problem

529

sensor-free orienters for polygonal parts. See § 1 in [116] for a transparent example illustrating Natarajan’s approach in a nutshell. Since the 1990s synchronising automata usage in the area of robotic manipulation has grown into a prolific research direction, but it is fair to say that publications in this area deal mostly with implementation technicalities. However, amongst them there are papers of significant theoretical importance, such as [36], [47], and [25]. Recently, it has been realised that a notion that arose in studying of substitution systems is also closely related to synchronising automata. Let X be a finite alphabet. Any map X ! X C is called a morphism or a substitution. Following the monograph [85], we use the term substitution in this chapter. A substitution W X ! X C is said to be of constant length if all words .x/, x 2 X , have the same length. One extends  to a map X C ! X C in a usual way and says that  satisfies the coincidence condition if there exist positive integers m and k such that all words  k .x/ have the same letter in the m-th position. For an example, consider the substitution  on X D ¹0; 1; 2º defined by 0 7! 11; 1 7! 12; 2 7! 20. Calculating the iterations of  up to  4 (see Table 2), we observe that  satisfies the coincidence condition (with k D 4, m D 7). Table 2. A substitution satisfying the coincidence condition 0 7 ! 11 7 ! 1212 7 ! 12201220 7 ! 1220201112202011

1 7 ! 12 7 ! 1220 7 ! 12202011 7 ! 1220201120111212

2 7 ! 20 7 ! 2011 7 ! 20111212 7 ! 2011121212201220

The importance of the coincidence condition comes from the crucial fact (established by Dekking [31]) that it is this condition that completely characterises the constant length substitutions which give rise to dynamical systems measure-theoretically isomorphic to a translation on a compact Abelian group; see Chapter 7 in [85] for a survey. For us, however, the coincidence condition is primarily interesting as yet another incarnation of synchronisability. Indeed, there is a straightforward bijection between DFAs and constant length substitutions. Each DFA A D .Q; A/ with A D ¹a1 ; : : : ; a` º defines a length ` substitution on Q that maps every q 2 Q to the word .q  a1 /    .q  a` / 2 QC . (For instance, the automaton C4 in Figure 1 induces the substitution 0 7! 11; 1 7! 12; 2 7! 23; 3 7! 30.) Conversely, each substitution W X ! X C such that all words .x/, x 2 X , have the same length ` gives rise to a DFA for which X serves as the state set and which has ` input letters a1 ; : : : ; a` , say, acting on X as follows: x  ai is the symbol in the i -th position of the word .x/. (For instance, the substitution  considered above defines the automaton shown in Figure 3.) It is clear that under the described bijection substitutions satisfying the coincidence condition correspond precisely to synchronising automata, and moreover, given a substitution, the number of iterations at which the coincidence first occurs is equal to the minimum length of reset words for the corresponding automaton. We mention in passing a purely algebraic framework within which synchronising automata also appear in a natural way. One may treat DFAs as unary algebras, since each letter of the input alphabet defines a unary operation on the state set. A term in the language of such unary algebras is an expression t of the form xw , where x is a variable

530

Jarkko Kari and Mikhail Volkov

0 a2 2 a1

a1 ; a2 a2

1 a1

Figure 3. The automaton induced by the substitution 0 7! 11, 1 7! 12, 2 7! 20

and w is a word over an alphabet A. An identity is a formal equality between two terms. A DFA A D .Q; A/ satisfies an identity t1 D t2 , where the words involved in the terms t1 and t2 are over A, if t1 and t2 take the same value under each interpretation of their variables in the set Q. Identities of unary algebras can be of the form either x  u D x  v (homotypical identities) or x  u D y  v with x ¤ y (heterotypical identities. It is easy to see that a DFA is synchronising if and only if it satisfies a heterotypical identity, and thus, studying synchronising automata may be considered as a part of the equational logic of unary algebras. In particular, synchronising automata over a fixed alphabet form a pseudovariety of unary algebras. See [16] for a survey of numerous publications in this direction; it is fair to say, however, that so far this algebraic approach has not proved to be really useful for understanding the combinatorial nature of synchronising automata.

2. Algorithmic and complexity issues It should be clear that not every DFA is synchronising. Therefore, the very first question that we should address is the following one: given an automaton A, how to determine whether or not A is synchronising? The most straightforward solution to this question uses the classic subset construction by Rabin and Scott [88] and can be found already in Laemmel’s reports [68] and [69]. Given a DFA A D .Q; A/, we define its subset automaton P.A/ on the set of the non-empty subsets of Q by setting P  a D ¹p  a j p 2 P º for each non-empty subset P of Q and each a 2 A. (Since we start with a deterministic automaton, we do not need to add the empty set to the state set of P.A/.) Figure 4 presents the subset automaton for the DFA C4 shown in Figure 1. Now it is obvious that a word w 2 A is a reset word for the DFA A if and only if w labels a path in P.A/ starting at Q and ending at a singleton. (For instance, the bold path in Figure 4 represents the shortest reset word ab 3 ab 3 a of the automaton C4 .) Thus, the question of whether or not a given DFA A is synchronising reduces to the following reachability question in the underlying graph 3 of the subset automaton P.A/: is there a 3 By a graph we mean a quadruple of sets and maps: the set of vertices V , the set of edges E , a map t W E ! V that maps every edge to its tail vertex, and a map hW E ! V that maps every edge to its head

vertex. Notice that in a graph, there may be several edges with the same tail and head. (Thus, our graphs are in fact directed multigraphs, but since no other graph species show up in this chapter, we use a short name.) We assume the reader’s acquaintance with basic notions of graph theory such as path, cycle, etc. The underlying graph of an automaton A is the graph obtained from A by forgetting edge labels.

15. Černý’s conjecture and the road colouring problem

531

path from Q to a singleton? The latter question can be easily answered by breadth-first search; see, e.g., § 22.2 in [28]. b

a a a

0123

132

023

b

b

b b

012 a

a 0

a; b

b 3 a

b b

01

1 a

b 03

2 a

b

b

013 a

a 12

b

02

b

a

23

a

13

a

b

a

Figure 4. The subset automaton P.C4 )

The described procedure is conceptually very simple but rather inefficient because the subset automaton P.A/ is exponentially larger than A. However, the following criterion of synchronisability (established independently in the pioneering works by Liu in Theorem 15 in [71] and Černý in Theorem 2 in [23]) gives rise to a polynomial algorithm. Proposition 2.1. A DFA A D .Q; A/ is synchronising if and only if for every q; q 0 2 Q there exists a word w 2 A such that q  w D q 0  w .

Proof. Of course, only sufficiency needs a proof. For this, take two states q; q 0 2 Q and consider a word w1 such that q  w1 D q 0  w1 . Then jQ  w1 j < jQj. If jQ  w1 j D 1, then w1 is a reset word and A is synchronising. If jQ  w1 j > 1, take two states p; p 0 2 Q  w1 and consider a word w2 such that p  w2 D p 0  w2 . Then jQ  w1 w2 j < jQ  w1 j. If jQ  w1 w2 j D 1, then w1 w2 is a reset word; otherwise we repeat the process. Clearly, a reset word for A will be constructed in at most jQj 1 steps.

One can treat Proposition 2.1 as a reduction of the synchronisability problem to a reachability problem in the subautomaton PŒ2 .A/ of P.A/ whose states are 2-element subsets and singletons of Q. Since the subautomaton has jQj.jQjC1/ states, breadth-first 2 2 search solves this problem in O.jQj  jAj/ time. This complexity bound assumes that no reset word is explicitly calculated. If one requires that, whenever A turns out to be synchronising, a reset word is produced, then the best of the known algorithms (which is basically due to Eppstein, see Theorem 6 in [36]; also see Theorem 1.15 in [102]) has an implementation that consumes O.jQj3 C jQj2  jAj/ time and O.jQj2 C jQj  jAj/ working space, not counting the space for the output, which is O.jQj3 /.

Jarkko Kari and Mikhail Volkov

532

Based on a deep study of synchronisation of random automata, Berlinkov has suggested in [13] an algorithm that, given a DFA A D .Q; A/, checks whether A is synchronising and spends time O.jQj  jAj/ on average. The worst-case complexity of Berlinkov’s algorithm is still O.jQj2  jAj/. Ageev [3] has reported a successful implementation of Berlinkov’s algorithm that outperforms the algorithm based on Proposition 2.1 for random DFAs with > 30 states. For a synchronising automaton, the subset automaton can be used to construct shortest reset words, as they correspond to shortest paths from the whole state set Q to a singleton. Of course, this requires exponential (in jQj) time in the worst case. Nevertheless, there were attempts to implement this approach; see, e.g., [90], [113], and [60]. One might hope that, as above, a suitable calculation in the “polynomial” subautomaton PŒ2 .A/ may yield a polynomial-time algorithm. However, this is not the case, and moreover, as we will see, it is very unlikely that any reasonable algorithm exists for finding shortest reset words in general synchronising automata. In the following discussion we assume the reader’s acquaintance with some basics of computational complexity (such as the definitions of the complexity classes NP and coNP) that can be found, e.g., in [41] and [78]. Consider the following decision problem.

S h o rt- R es et-Wo r d. Given a synchronising automaton A and a positive integer `, is it true that A has a reset word of length `? Clearly, S h o rt-R es et-Wo r d belongs to the complexity class NP: one can nondeterministically guess a word w 2 A of length ` and then check if w is a reset word for A in time `jQj. Several authors ([96], [36], [50], [100], and [101]) have proved that S h o rt-R es et-Wo r d is NP-hard by a polynomial reduction from SAT (the satisfiability problem for a system of clauses, that is, disjunctions of literals). Here we reproduce Eppstein’s reduction from [36]. Given an arbitrary instance of SAT with n variables x1 ; : : : ; xn and m clauses c1 ; : : : ; cm , we construct a DFA A. / with 2 input letters a and b as follows. The state set Q of A. / consists of .n C 1/m states qi;j , 1 6 i 6 m, 1 6 j 6 n C 1, and a special state z . The transitions are defined by ´ z if the literal xj occurs in ci ; qi;j  a D for 1 6 i 6 m, 1 6 j 6 n; qi;j C1 otherwise, ´ z if the literal :xj occurs in ci ; qi;j  b D for 1 6 i 6 m, 1 6 j 6 n; qi;j C1 otherwise, qi;nC1  a D qi;nC1  b D z  a D z  b D z

for 1 6 i 6 m.

Figure 5 shows two automata of the form A. / build for the SAT instances 1 2

D ¹x1 _ x2 _ x3 ; :x1 _ x2 ; :x2 _ x3 ; :x2 _ :x3 º; D ¹x1 _ x2 ; :x1 _ x2 ; :x2 _ x3 ; :x2 _ :x3 º:

15. Černý’s conjecture and the road colouring problem

533

If at some state q 2 Q in Figure 5 there is no outgoing edge labelled c 2 ¹a; bº, the c edge q ! z is assumed (those edges are omitted to improve readability). The two instances differ only in the first clause: in 1 it contains the literal x3 while in 2 it does not. Correspondingly, the automata A. 1 / and A. 2 / differ only by the outgoing edge labelled a at the state q1;3 : in A. 1 / it leads to z (and therefore, it is not shown) while in A. 2 / it leads to the state q1;4 and is shown by the dashed line. Observe that 1 is satisfiable for the truth assignment x1 D x2 D 0, x3 D 1 while is not satisfiable. It is not hard to check that the word bba resets A. 1 / while A. 2 / 2 is reset by no word of length 3 but by every word of length 4. x1 c4

q4;1

c3

q3;1

a; b a; b

x2 q4;2

q3;2

a

a

x3 q4;3

q4;4

a

b

q3;3

q3;4 z

c2

q2;1

a

q1;1

b

q2;2

b

q1;2

b

a; b

q2;3

a in A. c1

q1;3

q2;4 2/

q1;4 b

Figure 5. The automata A.

1/

and A.

2/

In general, it is easy to see that A. / is reset by every word of length n C 1 and is reset by a word of length n if and only if is satisfiable. Therefore assigning the instance .A. /; n/ of S h o rt-R es et-Wo r d to an arbitrary n-variable instance of SAT, one obtains a polynomial reduction of the latter problem to the former. Since SAT is NP-complete and S h o rt-R es et-Wo r d lies in NP, we obtain the following. Proposition 2.2. The problem S h o rt-R es et-Wo r d is NP-complete. In fact, as observed by Samotij [101], the above construction yields slightly more.4 Consider the following decision problem.

S h o rt est-R es et-Wo r d. Given a synchronising automaton A and a positive integer `, is it true that the minimum length of a reset word for A is equal to `? Assigning the instance .A. /; nC1/ of S h o rt est-R es et-Wo r d to an arbitrary system of clauses on n variables, one sees that the answer to the instance is “yes” if 4 Actually, the reduction in [101] is not correct, but the result claimed can be easily recovered as shown.

534

Jarkko Kari and Mikhail Volkov

and only if is not satisfiable. Thus, we have a polynomial reduction from the negation of SAT to S h o rt est-R es et-Wo r d whence the latter problem is coNP-hard. As a corollary, S h o rt est-R es et-Wo r d cannot belong to NP unless NP D coNP which is commonly considered to be very unlikely. In other words, even non-deterministic algorithms cannot decide the reset threshold of a given synchronising automaton (that is, the minimum length of its reset words) in polynomial time. The exact complexity of the problem S h o rt est-R es et-Wo r d has been determined by Gawrychowski [42] and, independently, by Olschewski and Ummels [77]. It turns out that the appropriate complexity class is DP (Difference Polynomial-Time) introduced by Papadimitriou and Yannakakis [79]; DP consists of languages of the form L1 \ L2 where L1 is a language from NP and L2 is a language in coNP. A “standard” DP-complete problem is SAT-U N SAT whose instance is a pair of clause systems ; , say, and whose question is whether is satisfiable and  is unsatisfiable. Proposition 2.3. The problem S h o rt est-R es et-Wo r d is DP-complete. Proposition 2.3 follows from mutual reductions between S h o rt est-R es etWo r d and SAT-U N SAT obtained in [42] and [77]. The complexity class PNPŒlog consists of all problems solvable by a deterministic polynomial-time Turing machine that has an access to an oracle for an NP-complete problem, with the number of queries being logarithmic in the size of the input. The class DP is contained in PNPŒlog (in fact, for every problem in DP two oracle queries suffice) and the inclusion is believed to be strict. Olschewski and Ummels [77] have shown that the problem of computing the reset threshold (as opposed to deciding whether it is equal to a given integer) is complete for the functional analogue FPNPŒlog of the class PNPŒlog (see [104] for a discussion of functional complexity classes). Hence, this problem appears to be even harder than deciding the reset threshold. The problem of finding a reset word of minimum length (as opposed to computing only the length without writing down the word itself) may be even more difficult. From the cited result of [77] it follows that the problem is FPNPŒlog -hard but its exact complexity is not yet known. Since the exact value of the reset threshold is hard to compute, it is natural to ask for polynomial-time approximation algorithms. It turns out that even approximating the reset threshold is hard. Various evidence for this claim has been found in [10], [11], and [44]. The ultimate non-approximation result has been established by Gawrychowski and Straszak [43]: no polynomial-time algorithm can approximate the reset threshold for every synchronising automaton with n states within a factor of n1  for any  > 0, unless P D NP. In contrast, for every fixed integer k > 2, Gerbush and Heeringa [44] have constructed an algorithm that, given a synchronising automaton with n > k states, m input letters, and reset threshold `, finds its reset word  ˙ 4 with length 6 kn 11 ` in time O kmnk C nk . Ananichev and Gusev [5] have shown ˙  that the approximation factor nk 11 cannot be improved within a very wide family of polynomial-time algorithms that approximate the reset threshold; this family includes the algorithms from [44].

15. Černý’s conjecture and the road colouring problem

535

There have been many attempts to develop practical approaches for finding short reset words in synchronising automata. These approaches have been based on certain heuristics [55], [4], and [56] and/or popular techniques, including (but not limiting to) binary decision diagrams [84], genetic and evolutionary algorithms [92], and [67], satisfiability solvers [105], answer set programming [51], hierarchical classifiers [86], and machine learning [87]. Some polynomial-time algorithms yielding short (though not necessarily shortest) reset words have been reported in [113], [91], [26], and [95]. Also, some algorithms for finding reset words are discussed in the next section.

3. Around the Černý’s conjecture The Černý conjecture. A very natural question to ask is the following: given a positive integer n, how long can be reset words for synchronising automata with n states? Černý [23] found a lower bound by constructing, for each n > 1, a synchronising automaton Cn with n states and 2 input letters whose shortest reset word has length .n 1/2 . We assume that the state set of Cn is Q D ¹0; 1; 2; : : : ; n 1º and the input letters are a and b , subject to the following action on Q: ´ i if i > 0; i a D i  b D i C 1 .mod n/: 1 if i D 0;

Our first example of a synchronising automaton (see Figure 1) is, in fact, C4 . A generic automaton Cn is shown in Figure 6 on the left. 1

1 a; b

b a

0

b; c c

0

b n 1

a

a 2

b

b

2

b; c

b; c n 1

3

3

a

Figure 6. The DFA Cn and the DFA Wn induced by the actions of b and c D ab

The series ¹Cn ºnD2;3;::: was rediscovered many times (see, e.g., [70], [37], [36], and [39]). It is easy to see that the word .ab n 1 /n 2 a of length n.n 2/ C 1 D .n 1/2 resets Cn . Proposition 3.1 ([23], Lemma 1). Any reset word for Cn has length at least .n

1/2 .

There are several nice proofs of this result. Here we present a proof from [6]; it is based on a transparent idea and reveals an interesting connection between Černý’s automata Cn and an extremal series of graphs discovered in Wielandt’s classic paper [119].

536

Jarkko Kari and Mikhail Volkov

Proof of Proposition 3.1. Let w be a reset word of minimum length for Cn . Since the letter b acts on Q as a cyclic permutation, the word w cannot end with b . (Otherwise removing the last letter gives a shorter reset word.) Thus, w D w 0 a for some w 0 2 ¹a; bº such that the image of Q under the action of w 0 is precisely the set ¹0; 1º. Since the letter a fixes each state in its image ¹1; 2; : : : ; n 1º, every occurrence of a in w except the last one is followed by an occurrence of b . (Otherwise a2 occurs in w as a factor and reducing this factor to just a results in a shorter reset word.) Therefore, if we let c D ab , then the word w 0 can be rewritten as a word v over the alphabet ¹b; cº. The actions of b and c induce a new DFA on the state set Q; we denote this induced DFA (shown in Figure 6 on the right) by Wn . Since w 0 and v act on Q in the same way, the word vc is a reset word for Wn and brings the automaton to the state 2. If u 2 ¹b; cº , the word uvc also is a reset word for Wn and it also brings the automaton to 2. Hence, for every ` > jvcj, there is a path of length ` in Wn from any given state i to 2. In particular, setting i D 2, we conclude that for every ` > jvcj there is a cycle of length ` in Wn . The underlying graph of Wn has simple cycles only of two lengths: n and n 1. Each cycle of Wn must consist of simple cycles of these two lengths, whence each number ` > jvcj must be expressible as a non-negative integer combination of n and n 1. Here we invoke the following well-known and elementary result from number theory. Lemma 3.2 ([89], Theorem 2.1.1). If k1 ; k2 are relatively prime positive integers, then k1 k2 k1 k2 is the largest integer that is not expressible as a non-negative integer combination of k1 and k2 . Lemma 3.2 implies that jvcj > n.n 1/ n .n 1/ D n2 3n C 1. Suppose that jvcj D n2 3n C 2. Then there should be a path of this length from the state 1 to the state 2. Every outgoing edge of 1 leads to 2, and thus, in the path it must be followed by a cycle of length n2 3n C 1. No cycle of such length may exist by Lemma 3.2. Hence jvcj > n2 3n C 3. Since the action of b on any set S of states cannot change the cardinality of S and the action of c can decrease the cardinality by 1 at most, the word vc must contain at least n 1 occurrences of c . Hence the length of v over ¹b; cº is at least n2 3n C 2 and v contains at least n 2 occurrences of c . Since each occurrence of c in v corresponds to an occurrence of the factor ab in w 0 , we conclude that the length of w 0 over ¹a; bº is at least n2 3nC2 Cn 2 D n2 2n. Thus, jwj D jw 0 aj > n2 2nC1 D .n 1/2 . If we define the Černý function C.n/ as the maximum length of shortest reset words for synchronising automata with n states, the above property of the series ¹Cn º, n D 2; 3; : : : , yields the inequality C.n/ > .n 1/2 . The Černý conjecture is the claim that the equality C.n/ D .n 1/2 holds. In the literature, one often refers to Černý’s paper [23] as the source of the Černý conjecture. In fact, the conjecture was not yet formulated in that paper. There Černý only observed that .n 1/2 6 C.n/ 6 2n n 1 and concluded the paper with the following remark.

15. Černý’s conjecture and the road colouring problem

537

“The difference between the bounds increases rapidly and it is necessary to sharpen them. One can expect an improvement mainly for the upper bound.” The conjecture in its present-day form was formulated a bit later, after the expectation in the above quotation was confirmed by Starke [107]. (Namely, Starke improved the 2/ upper bound from [23] to 1 C n.n 1/.n , which was the first polynomial upper bound 2 for C.n/.) Černý explicitly stated the conjecture C.n/ D .n 1/2 in his talks in the second half of the 1960s; in print the conjecture first appeared in [24]. Upper bounds. Until to recently, the best upper bound known for the Černý function 3 3 was n 6 n . For each synchronising automaton with n states, a reset word of length n 6 n arises as the output of the following greedy algorithm. Algorithm 1 Compression algorithm calculating a reset word for A D .Q; A/ G r eedy Co mp r essi o n.A/ 1. w " F initialising the current word 2. P Q F initialising the current set 3. while jP j > 1 do 4. if jP  uj D jP j for all u 2 A then 5. return Failure 6. else 7. take a word v 2 A of minimum length with jP  vj < jP j 8. w ! wv F updating the current word 9. P !P v F updating the current set 10. return w

If jQj D n, then clearly the main loop of Algorithm 1 is executed at most n 1 times. Finding the word v in line 7 amounts to reading the labels along a shortest path between a 2-element subset of P and a singleton in the automaton PŒ2 .A/ (see the discussion after Proposition 2.1). Breadth-first search does this in O.n2  jAj/ time. Thus, Algorithm 1 is polynomial in the size of A. In order to evaluate the length of the output word w , we estimate the length of each word v produced by the main loop. Consider a generic step at which jP j D k > 1 and let v D a1    a` with ai 2 A, i D 1; : : : ; `. Then each of the sets P1 D P;

P2 D P1  a1 ;

:::;

P` D P`

1

 a`

1

contains exactly k states. Furthermore, since jP`  a` j < jP` j, there exist two distinct states p` ; p`0 2 P` such that p`  a` D p`0  a` . Now define the 2-element subset Ri D ¹pi ; pi0 º  Pi , i D 1; : : : ; `, such that pi  ai D pi C1 , pi0  ai D pi0 C1 for i D 1; : : : ; ` 1. Then the condition that v is a word of minimum length with jP  vj < jP j implies that Ri ª Pj for 1 6 j < i 6 `. Indeed, if Ri  Pj for some j < i , then already the word a1    aj ai    a` of length j C ` i < ` would satisfy jP  a1    aj ai    a` j < jP j contradicting the choice of v . Thus, we arrive at a problem from combinatorics of finite sets that can be stated as follows. Let 1 < k 6 n.

Jarkko Kari and Mikhail Volkov

538

A sequence of k -element subsets P1 ; P2 ; : : : of an n-element set is called 2-renewing if each Pi contains a 2-element subset Ri such that Ri ª Pj for each j < i . What is the maximum length of a 2-renewing sequence as a function of n and k ?

p10 p1 P1

a1

a1

p20 p2

a2

:::

a`

1

p`0

a`

a2

:::

a`

1

p`

a`

P2

P`

Figure 7. Combinatorial configuration at a generic step of Algorithm 1

The problem was solved by Frankl [38], who proved the following result.5 Proposition 3.3. The maximum length  of a 2-renewing sequence of k -element subsets in an n-element set is equal to n kC2 . 2

Thus, if `k is the length of the word v that Algorithm 1 appends to the current word w after the iteration step that the algorithm enters while the current set P contains . Summing up all these k states, then Proposition 3.3 guarantees that `k 6 n kC2 2 inequalities from k D n to k D 2, one arrives at the aforementioned bound C.n/ 6

n3

n 6

:

(1)

In the literature the bound (1) is usually attributed to Pin, who explained the above connection between Algorithm 1 and the combinatorial problem on the maximum length of 2-renewing sequences and conjectured the estimate n kC2 for this length 2 in his talk at the Colloquium on Graph Theory and Combinatorics held in Marseille in 1981. (Frankl learned this conjecture from Pin – and proved it – during another colloquium on combinatorics held in Bielefeld in November 1981.) Accordingly, the usual reference for (1) is the paper [83] based on the talk. The full story is, however, more complicated. Actually, the bound (1) first appeared in [37] where it was deduced from a combinatorial conjecture equivalent to Pin’s one. The conjecture, however, remained unproved. The bound (1) then reoccurred in [65] and [66], but the argument justifying it in these papers was insufficient. In 1987 both (1) and Proposition 3.3 were independently rediscovered by Klyachko, Rystsov and Spivak [64] who were aware of [37], [65], and [66], but neither [83] nor [38]. Here we include a proof of Frankl’s result following [64]. 5 Actually Frankl [38] considered and solved a more general problem concerning the maximum length of (analogously defined) m-renewing sequences of k -element subsets in an n-element set for any fixed m 6 k.

15. Černý’s conjecture and the road colouring problem

539

 Proof of Proposition 3.3. Let Q D ¹1; 2; : : : ; nº and denote n kC2 by m. First, we 2 exhibit a 2-renewing sequence of k -element subsets in Q of length m. For this put W D ¹1; : : : ; k 2º, list all m 2-element subsets of Q n W in some order and let Ti be the union of W with the i -th subset in the list. Clearly, the sequence T1 ; : : : ; Tm is 2-renewing. Now we assign to each k -element subset S D ¹s1 ; : : : ; sk º of Q the following polynomial D.S / in variables xs1 ; : : : ; xsk over the field R of reals: ˇ ˇ1 ˇ ˇ1 ˇ D.S / D ˇ : ˇ :: ˇ ˇ1

s1 s2 :: : sk

s12 s22 :: : sk2

  :: : 

s1k s2k :: : skk

3 3

3

xs1 xs2 :: : xsk

ˇ xs21 ˇ ˇ xs22 ˇˇ : :: ˇˇ : ˇ xs2k ˇkk

Observe that for any 2-renewing sequence S1 ; : : : ; S` of k -element subsets in Q, the polynomials D.S1 /; : : : ; D.S` / are linearly independent. Indeed, if they were linearly dependent, then by a basic lemma of linear algebra, some polynomial D.Sj / would be expressible as a linear combination of the preceding polynomials D.S1 /; : : : ; D.Sj 1 /. By the definition of a 2-renewing sequence, Sj contains a subset ¹s; s 0 º such that ¹s; s 0 º ª Si for all i < j . If we substitute xs D s , xs 0 D s 0 , and x t D 0 for t ¤ s; s 0 in each polynomial D.S1 /; : : : ; D.Sj /, then the polynomials D.S1 /; : : : ; D.Sj 1 / vanish (since the two last columns in each of the resulting determinants become proportional) and so does any linear combination of the polynomials. The value of D.Sj /, however, is a determinant that is the product of a Vandermonde ..k 2/  .k 2//-determinant with the .2  2/-determinant ˇ ˇs ˇ 0 ˇs

ˇ s 2 ˇˇ ; .s 0 /2 ˇ

whence this value is not 0. Hence D.Sj / cannot be equal to a linear combination of D.S1 /; : : : ; D.Sj 1 /. We see that the length of any 2-renewing sequence cannot exceed the dimension of the linear space over R spanned by all polynomials of the form D.S /. In order to n kC2 prove that the dimension is at most m D , it suffices to show that the space 2 is spanned by the polynomials D.T1 /; : : : ; D.Tm /, where T1 ; : : : ; Tm is the 2-renewing sequence constructed in the first paragraph of the proof. For this, take an arbitrary k -element subset S D ¹s1 ; : : : ; sk º of Q. We claim that the polynomial D.S / is a linear combination of D.T1 /; : : : ; D.Tm /. We induct on the cardinality of the set S n W . If jS n W j D 2, then S is the union of W with some 2-element subset from Q n W , whence S D Ti for some i D 1; : : : ; m. Thus, D.S / D D.Ti / and our claim holds. If jS n W j > 2, there is s0 2 W n S . Let S 0 D S [ ¹s0 º. There exists a polynomial p.x/ D ˛0 C ˛1 x C ˛2 x 2 C    C ˛k 3 x k 3 over R such that p.s0 / D 1 and p.s/ D 0

540

Jarkko Kari and Mikhail Volkov

for all s 2 W n ¹s0 º. Consider the determinant ˇ ˇp.s0 / 1 s0 s 2    s k ˇ 0 0 ˇp.s / 1 s s 2    s k 1 ˇ 1 1 1 ˇ 2 k  D ˇˇp.s2 / 1 s2 s2    s2 :: :: :: : : :: ˇ :: : ˇ : : : : : ˇ ˇp.sk / 1 sk s 2    s k k k

3 3 3

3

xs0 xs1 xs2 :: : xsk

ˇ xs20 ˇˇ xs21 ˇˇ ˇ xs22 ˇ : :: ˇˇ ˇ : ˇ xs2k ˇ.kC1/.kC1/

Clearly,  D 0 as the first column is the sum of the next k 2 columns with the coefficients ˛0 ; ˛1 ; ˛2 ; : : : ; ˛k 3 . Thus, expanding  by the first column gives the identity k X . 1/j p.sj /D.S 0 n ¹sj º/ D 0: j D0

Since p.s0 / D 1 and S 0 n ¹s0 º D S , the identity can be rewritten as D.S / D

k X

. 1/j C1 p.sj /D.S 0 n ¹sj º/;

(2)

j D1

and since p.s/ D 0 for all s 2 W n ¹s0 º all the non-zero summands in the right-hand side are such that sj … W . For each such sj , we have .S 0 n ¹sj º/ n W D S 0 n .W [ ¹sj º/ D .S [ ¹s0 º/ n .W [ ¹sj º/ D .S n W / n ¹sj º;

whence j.S 0 n¹sj º/nW j D jS nW j 1 and by the inductive assumption, the polynomials D.S 0 n ¹sj º/ are linear combinations of D.T1 /; : : : ; D.Tm /. From (2) we conclude that this holds for the polynomial D.S / as well. If one executes Algorithm 1 on the Černý automaton C4 (Figure 4 is quite helpful here), one sees that the algorithm returns the word ab 2 abab 3 a of length 10, which is not the shortest reset word for C4 . This reveals one of the main intrinsic difficulties of the synchronisation problem: the standard optimality principle does not apply here, since it is not true that the optimal solution behaves optimally also in all intermediate steps. In our example, the optimal solution is the word ab 3 ab 3 a, but it cannot be found by Algorithm 1 because the algorithm chooses v D b 2 a rather than v D b 3 a on the second execution of the main loop. Actually, the gap between the reset threshold of a synchronising automaton and the length of the reset word that Algorithm 1 returns on the automaton may be arbitrarily large: 6 one can calculate that for the Černý automaton Cn whose reset threshold is .n 1/2 , Algorithm 1 produces a reset word of length .n2 log n/, see [52] for details and generalisations. The behaviour of Algorithm 1 on average is not yet understood; practically it behaves rather well. 6 We observe that this does not immediately follow from the non-approximation results discussed in § 2 because Algorithm 1 is not really deterministic. Indeed, in general there may be several words satisfying the conditions in line 7 of the algorithm, and it has not been specified which one of the words should be taken.

15. Černý’s conjecture and the road colouring problem

541

As mentioned, the bound (1) remained the best for more than three decades. In 2 16/ . Unfortu2011, Trahtman [115] published a better upper bound, namely n.7n C6n 48 nately, the proof in [115] contained an error; see [49] for a detailed discussion. Recently, Szykuła [111] has managed to partly rescue Trahtman’s idea, and this has finally led to the first improvement on (1). The new bound is C.n/ 6

85059n3 C 90024n2 C 196504n 511104

it improves the leading coefficient by

125 511104

10648

I

 0:000245.

The extension algorithm. While studying Algorithm 1 has provided the best currently known upper bounds for the Černý function in the general case, the most impressive partial results proving the Černý conjecture for some special classes of automata have been obtained via analysis of a different algorithm. This algorithm also operates in a greedy manner, but builds a reset word in the opposite direction. For a DFA A D .Q; A/, a subset P  Q and a word w 2 A , we denote by P w 1 the full pre-image of P under the action of w , that is, P w 1 D ¹q 2 Q j q  w 2 P º. In what follows, we use the same notation for a singleton set and its single element. In contrast to Algorithm 1, it is not clear whether Algorithm 2 admits a polynomialtime implementation. Moreover, in general we know no non-trivial bound on the length of the words v that the main loop of Algorithm 2 appends to the current word. However, one can isolate some cases in which rather strong bounds on jvj do exist. The following definition is convenient for subsequent discussion. Given a number ˛ > 0, a DFA A D .Q; A/ is said to be ˛ -extensible if for each proper non-singleton subset S  Q, there exists a word u 2 A of length at most ˛jQj such that jS u 1 j > jS j. The following observation explains the importance of this property. Algorithm 2 Extension algorithm calculating a reset word for A D .Q; A/ G r eedy Ex t en si o n.A/ 1 j D 1 for all q 2 Q and a 2 A then 1. if jqa 2. return Failure 3. else 4. w a such that jqa 1 j > 1 F initialising the current word 5. P qa 1 such that jqa 1 j > 1 F initialising the current set 6. while jP j < jQj do 7. if jP u 1 j 6 jP j for all u 2 A then 8. return Failure 9. else 10. take a word v 2 A of minimum length with jP v 1 j > jP j 11. w vw F updating the current word 12. P Pv 1 F updating the current set 13. return w

542

Jarkko Kari and Mikhail Volkov

Proposition 3.4. If A is an ˛ -extensible automaton with n states, then A is synchronising and the reset threshold of A is at most 1 C ˛n.n 2/. In particular, the Černý conjecture holds for 1-extensible automata. Proof. If we run Algorithm 2 on A, the main loop is executed at most n 2 times and each word that it appends to the current word has length at most ˛n. Hence the length of the reset word returned by the algorithm does not exceed 1 C ˛n.n 2/. If ˛ D 1, then we get the bound 1 C n.n 2/ D .n 1/2 , which complies with the Černý conjecture. The approach to the Černý conjecture via extensibility traces back to Pin’s paper [82] of 1978. Pin observed that every DFA A D .Q; A/ such that jQj is prime and some letter acts as a cyclic permutation of Q is 1-extensible provided some other letter acts on Q as a non-permutation. Thus, such an A is synchronising and its reset threshold does not exceed .jQj 1/2 . Twenty years later Dubuc [33] generalised Pin’s result by showing that every synchronising automaton in which some letter acts as a cyclic permutation of the state set is 1-extensible. Kari [59] proved 1-extensibility of Eulerian 7 synchronising automata. In all these papers 1-extensibility is obtained via linear-algebraic arguments; here we include a proof from [59] as quite a representative example of these linearisation techniques. Theorem 3.5 ([59], Theorem 2). If a synchronising automaton A D .Q; A/ is Eulerian, then it has a reset word of length at most .n 2/.n 1/ C 1, where n D jQj.

Proof. For every vertex in an Eulerian graph, its in-degree and its out-degree are equal. In the underlying graph of a DFA the out-degree of every vertex is equal to the cardinality of the input alphabet. Hence, if jAj D k , then each vertex in the underlying graph of A has in-degree k and for every subset P  Q, the equality X jP a 1 j D kjP j (3) a2A

holds, since the left-hand side of (3) is the number of edges in the underlying graph of A with ends in P . The equality (3) readily implies that for each P  Q, one of the following alternatives takes place: either jP a 1 j D jP j for all letters a 2 A or jP b 1 j > jP j for some b 2 A. Now assume that a subset S  Q and a word u 2 AC are such that jS u 1 j ¤ jS j and u is a word of minimum length with this property. We write u D aw for some a 2 A and w 2 A and let P D S w 1 . Then jP j D jS j by the choice of u and P a 1 D S u 1 whence jP a 1 j ¤ jP j. Thus, P must fall into the second of the above alternatives and so jP b 1 j > jP j for some b 2 A. The word v D bw has the same length as u and has the property that jS v 1 j > jS j. Having this in mind, we now aim to prove that for every proper subset S  Q, there exists a word u 2 A of length at most n 1 such that jS u 1 j ¤ jS j. 7 A graph is strongly connected if for every pair of its vertices, there exists a path from one to the other. A graph is Eulerian if it is strongly connected and each of its vertices serves as the tail and as the head for the same number of edges. A DFA is said to be Eulerian if so is its underlying graph. More generally, we freely transfer graph notions (such as strong connectivity) from graphs to automata they underlie.

15. Černý’s conjecture and the road colouring problem

543

It is here where linear algebra comes into play. We assume Q D ¹1; 2; : : : ; nº. Assign to each subset P  Q its characteristic vector ŒP  in the linear space Rn of n-dimensional row vectors over R as follows: i -th entry of ŒP  is 1 if i 2 P , otherwise it is equal to 0. For instance, ŒQ is the all-ones row vector and the vectors Œ1; : : : ; Œn form the standard basis of Rn . Observe that for any vector x 2 Rn , the inner product hx; ŒQi is equal to the sum of all entries of x . In particular, for each subset P  Q, we have hŒP ; ŒQi D jP j. Further, assign to each word w 2 A the linear operator 'w on Rn defined by 'w .Œi / D Œiw 1  for each i 2 Q. It is then clear that 'w .ŒP / D ŒP w 1  for each P  Q. The inequality jS u 1 j ¤ jS j that we look for can be rewritten as h'u .ŒS /; ŒQi ¤ ŒQ. Then x ¤ 0 as S ¤ Q hŒS ; ŒQi or h'u .ŒS / ŒS ; ŒQi ¤ 0. Let x D ŒS  jSj n 1 and hx; ŒQi D 0. Since Qu D Q for every word u, we have 'u .ŒQ/ D ŒQ. Hence h'u .ŒS /

D     E jS j jS j ŒS ; ŒQi D 'u x C ŒQ xC ŒQ ; ŒQ n n E D jS j jS j ŒQ x ŒQ; ŒQ D 'u .x/ C n n D h'u .x/ x; ŒQi D h'u .x/; ŒQi:

Thus, a word u satisfies jS u 1 j ¤ jS j if and only if the vector 'u .x/ lies outside the subspace U of all vectors orthogonal to ŒQ. We aim to bound the minimum length of such a word u but first we explain why words sending x outside U exist. Since the automaton A is synchronising and strongly connected (as it is Eulerian), there exists a word w 2 A such that Q  w  S ; one can first synchronise A to a state q and then move q into S by applying a word that labels a path from q to a state in S . Then  'w .x/ D 'w ŒS 

 jS j ŒQ D 'w .ŒS / n

 jS j 'w .ŒQ/ D 1 n

jS j  ŒQ ¤ 0: n

Now consider the chain of subspaces U0  U1     , where Uj is spanned by all vectors of the form 'w .x/ with jwj 6 j . Clearly, if Uj C1 D Uj for some j , then 'a .Uj /  Uj for all a 2 A, whence Ui D Uj for every i > j . Let ` be the least number such that 'u .x/ … U for some word u of length `, that is, the smallest ` such that U` ª U . Then in the chain U0  U1      U` all inclusions are strict, whence 1 D dim U0 < dim U1 <    < dim U`

1

< dim U`

and, in particular, dim U` 1 > `. But by our choice of ` we have U` 1  U , whence dim U` 1 6 dim U . Since U is the orthogonal complement of a 1-dimensional subspace, dim U D n 1, and we conclude that ` 6 n 1. As shown in the first paragraph of the proof, the above implies that for every proper subset S  Q, there exists a word u 2 A of length at most n 1 such that jS u 1 j > jS j. Then Algorithm 2 run on A returns a reset word of length at most .n 2/.n 1/C1.

Jarkko Kari and Mikhail Volkov

544

We mention in passing that the upper bound provided by Theorem 3.5 is far from being tight. The best lower bound for the restriction of the Černý function to the class  2 ˘ of Eulerian synchronising automata published so far is n 2 3 ; see [112]. Let us return to our discussion of extensibility. Even though the approach to the Černý conjecture via 1-extensibility has proved to be productive in several special cases, it cannot resolve the general case because there exist strongly connected synchronising automata that are not 1-extensible. The first example here was the 6-state automaton K6 discovered by Kari [57]; see Figure 8. This automaton is synchronising with reset threshold 25, the shortest reset word being ba.ab/3 a2 b.ba/3 ab.ba2 /ab . Kari found K6 as a counterexample to a generalised form of the Černý conjecture proposed in Pin’s thesis [81], but the automaton is remarkable in several other respects. a b

0

a

1

b b b

3

a

4

2

a

a

b

b

5

a

Figure 8. Kari’s automaton K6

In particular, one can verify that no word v of length 6 or 7 is such that the full pre-image of the set ¹2; 3; 4; 5º under the action of v has more than 4 elements. Moreover, Kisielewicz and Szykuła [63] have constructed a series of synchronising automata that for each ˛ contains an automaton that is not ˛ -extensible. On the other hand, 2-extensibility (and thus – by Proposition 3.4 – a quadratic upper bound on the reset threshold) has been established for several classes of synchronising automata in [97], [98], [99], and [48]. A slightly relaxed version of 2-extensibility has been verified by Béal, Berlinkov, and Perrin [8] and [7] for the important class of the socalled one-cluster automata. A DFA A D .Q; A/ is called one-cluster if there exists a letter a 2 A that labels only one simple cycle. (For instance, the automata Cn and Wn shown in Figure 6 are one-cluster while Kari’s automaton K6 shown in Figure 8, is not. A large class of examples of one-cluster automata is provided by the decoders of finite maximal prefix codes discussed in § 1.) If C is this cycle, then it is easy to see that Q  ajQj jC j D C , and one can modify Algorithm 2 into Algorithm 3. In [8] and [7] it has been shown that the length of each word v appended by the main loop of Algorithm 3 does not exceed 2jQj, and this clearly implies a quadratic (in jQj) upper bound on the reset threshold for one-cluster synchronising automata. A similar result was obtained by Carpi and D’Alessandro [21]. Steinberg in [108] and [109] generalised the above approach and slightly improved the upper bound.

15. Černý’s conjecture and the road colouring problem Algorithm 3

545

Modified extension algorithm for a one-cluster automaton A D .Q; A/ with C being a unique simple cycle labelled a

R elat i v eEx t en si o n.A; C; a/ 1. w " 2. P ¹qº where q 2 C 3. while jP j < jC j do 4. if jP u 1 \ C j 6 jP j for all u 2 A then 5. return Failure 6. else 7. take a word v 2 A of minimum length with jP v 8. w vw 9. P Pv 1 \C jQj jC j w 10. return a

F initialising the current word F initialising the current set

1

\ C j > jP j F updating the current word F updating the current set

Namely, he proved that a one-cluster synchronising automaton with n states has a reset word of length at most 2n2 9n C 14. Steinberg also verified the Černý conjecture for synchronising automata in which a letter labels only one simple cycle and this cycle is of prime cardinality. Experimental studies and synchronisation of random automata. The Černý conjecture keeps resisting researchers’ efforts for more than half a century, and we still do not know whether or not it holds in general. Exhaustive computer experiments (see [113], [6], [62], [110], [61], [30], and [32]) have confirmed the conjecture for all DFAs up to 7 states and any size of the input alphabet, as well as for all 3-letter DFAs with 8 states and all 2-letter DFAs with 6 12 states. The experiments also have revealed interesting peculiarities in the distribution of possible values of the reset thresholds for synchronising automata with a given number of states. For instance, among synchronising automata with 6 states, some 8 have reset threshold 25 but none have reset threshold 24. This gap between the maximal and the second largest value of the reset thresholds for synchronising automata with n > 6 states was first observed by Trahtman [113] for n 6 10 and was reconfirmed by further experiments that also covered n D 11 and n D 12. It has been conjectured in Conjecture 1 in [6] that, for n > 6, the second largest value of the reset thresholds for synchronising automata with n states is equal to n2 3n C 4 and that, for n > 7, there exists a unique (up to isomorphism and omitting inessential letters) n-state DFA on which this value is attained, namely, the automaton Dn with the state set Q D ¹0; 1; 2; : : : ; n 1º and the input letters a and b that act on Q as follows: 8 ˆ 8, there is a second gap in possible values of the reset thresholds for synchronising automata with n states and 2 input letters. Namely, there exist no such automata with reset threshold smaller than n2 3n C 2 and greater than n2 4n C 7 if n is odd or n2 4n C 6 if n is even. A third gap of a similar sort was registered for n > 10 in [62] and [110]. For a detailed study of the gap phenomenon see [35]. Along with exhaustive experiments with “small” automata, extensive experiments with randomly generated “big” DFAs have been performed; see, e.g., [26], [105], [60], and [61]. All these experiments have operated with the simplest model of a random DFA with n states and k letters in which such a DFA is represented by a k -tuple of maps chosen uniformly at random from all nn transformations of the state set. Experiments with random DFAs also have led to interesting observations, some of which already have found theoretical justifications. First of all, it turns out that an overwhelming majority of random DFAs are synchronising and the rate of synchronising automata with n states among all n-state DFAs quickly tends to 1 as n grows. For instance, Kisielewicz, Kowalski, and Szykuła [60] reported that in their experiments, only 2250 of one million 100-state 2-letter DFAs and only 5 of 10,000 DFAs with 300 states and 2 letters failed to be synchronising. Cameron [18] conjectured that the probability that a random DFA with n states and 2 letters is synchronising is 1 o.1/. Cameron’s conjecture has been confirmed by Berlinkov [12] who has even found exact asymptotics: the probability of being synchronising for a random DFA with k n states and k > 2 letters is equal to 1 .n 2 /. In the context of the Černý conjecture, experiments with random DFAs have demonstrated that whenever a randomly generated DFA is synchronising, its reset threshold is much smaller than the Černý bound. For instance, the maximum reset threshold observed in [60] for synchronising automata amongst one million 100-state 2-letter random DFSa analyzed in that paper is equal to 41 – recall that the Černý bound for the reset threshold of synchronising automata with 100 states is 992 D 9801. In particular, the experiments suggest that even if the Černý conjecture does not hold in general, it does hold for “almost all” synchronising automata. First partial results towards a proof of this claim were obtained by Skvortsov and Zaks in [106] and [120], and later the claim was proved by Nicaud [75] in the following rather strong form.

15. Černý’s conjecture and the road colouring problem

547

Theorem 3.6 ([75], Theorem 3). An n-state random DFA with at least two input letters admits a reset word of length O.n log3 n/ with probability that tends to 1 as 1 n ! 1. More precisely, the probability that no such word exists is O.n 8 log4 n/.

Based on the proof of Theorem 3.6, Berlinkov and Szykuła in Corollary 14 in [14] have established that the probability that the Černý conjecture does not hold for a random synchronising automaton with at least 2 letters is exponentially small in terms of the number of states. More precisely, for any  > 0 and n large enough, with probability  at least 1 O.exp.n 4 //, a random n-state automaton with at least 2 letters has a reset 7 word of length at most n 4 C6 .1 C o.1//, and so satisfies the Černý conjecture. Observe that even though the upper bound O.n log3 n/ in Theorem 3.6 is much 2 lower p than the Černý bound .n 1/ , it is still far from the hypothetical upper bound ‚. n/ that is suggested by experimental results.

4. The road colouring problem A graph € in which each vertex has the same out-degree (say, k ) is called a graph of constant out-degree and the number k is referred to as the out-degree of € . If we take an alphabet A whose size is equal to the out-degree of € , then we can label the edges of € by letters of A such that the resulting automaton will be complete and deterministic. Any DFA obtained this way is referred to as a colouring of € . Given a graph, it is reasonable to ask under which conditions it admits a colouring satisfying some “good” properties. In this section we analyze the so-called road colouring problem, which is certainly the most famous question within this framework. The road colouring problem asks under which conditions graphs of constant out-degree admit a synchronising colouring. The problem was explicitly stated by Adler, Goodwyn, and Weiss [1] in 1977; in an implicit form it was present already in an earlier memoir by Adler and Weiss [2]. Adler, Goodwyn, and Weiss considered only strongly connected graphs; as we shall see below this is quite a natural assumption since the general case easily reduces to the case of strongly connected graphs. The name of the problem suggested in [1] comes from the following interpretation. In every strongly connected synchronising automaton A D .Q; A/, one can assign to state q 2 Q an instruction (a reset word) wq such that following wq one will surely arrive at q from any initial state. (Indeed, for this one should first follow an arbitrary reset word leading to some state p , say, and then follow a word that labels a path connecting p and q ; such a path exists because of strong connectivity.) Thus, in order to help a traveler lost on a given strongly connected graph € of constant out-degree to find his/her way from wherever he/she could be, we should if possible colour (that is, label) the edges of € such that € becomes a synchronising automaton and then tell the traveler the magic sequence of colours representing a reset word leading to the traveler’s destination. The original motivation in [2] and [1] came from symbolic dynamics. However, the road colouring problem is quite natural also from the viewpoint of the “reverse

548

Jarkko Kari and Mikhail Volkov

engineering” of synchronising automata: we aim to relate geometric properties of graphs to combinatorial properties of automata built on those graphs. The following necessary condition was first published in [1] but it can already be found in Laemmel’s report [69]. Proposition 4.1. If a strongly connected graph € admits a synchronising colouring, then the gcd of lengths of all cycles in € is equal to 1. Proof. Arguing by contradiction, let k > 1 be a common divisor of lengths of the cycles in € . Let V denote the vertex set of € . Take a vertex v0 2 V and, for i D 0; : : : ; k 1, let Vi D ¹v 2 V j there exists a path from v0 to v of length i mod kº: S Clearly, V D ikD01 Vi . We claim that Vi \ Vj D ¿ if i ¤ j . Let v 2 Vi \Vj where i ¤ j . This means that in € there are two paths from v0 to v : of length `  i .mod k/ and of length m  j .mod k/. Since € is strongly connected, there also exists a path from v to v0 of length n, say. Combining it with each of the two paths above we get a cycle of length ` C n and a cycle of length m C n. Since k divides the length of any cycle in € , we deduce that ` C n  i C n  0 .mod k/ and m C n  j C n  0 .mod k/, whence i  j .mod k/, a contradiction. Thus, V is a disjoint union of V0 ; V1 ; : : : ; Vk 1 , and by the definition each edge in € leads from Vi to Vi C1 mod k . Then € definitely cannot be converted into a synchronising automaton by any colouring of its edges: no paths of the same length ` originated in V0 and V1 can terminate in the same vertex because they end in V` mod k and in V`C1 mod k , respectively. Graphs satisfying the conclusion of Proposition 4.1 are called primitive. 9 Adler, Goodwyn, and Weiss [1] conjectured that primitivity is not only necessary for a graph to have a synchronising colouring but also sufficient. In other word, they suggested the following road colouring conjecture: every strongly connected primitive graph with constant out-degree admits a synchronising colouring. The road colouring conjecture has attracted much attention. There were several interesting partial results (see, e.g., [76], [40], [80], [54], [20], [58], and [59]), and finally the problem was solved (in the affirmative) by Trahtman [114]. Trahtman’s proof crucially depends on the idea of stability, which is due to Culik, Karhumäki, and Kari [29]. Let A D .Q; A/ be a DFA. We define the stability relation  on Q as follows: q  q 0 () 8u 2 A 9v 2 A q  uv D q 0  uv:

Any pair .q; q 0 / such that q ¤ q 0 and q  q 0 is called stable. The key observation by Culik, Karhumäki, and Kari [29] is the following. 9 In the literature such graphs are sometimes called aperiodic. The term “primitive” comes from the notion of a primitive matrix in the Perron–Frobenius theory of non-negative matrices: it is known (and easy to see) that a graph is primitive if and only if so is its incidence matrix.

15. Černý’s conjecture and the road colouring problem

549

Proposition 4.2. If every strongly connected primitive graph with constant out-degree and more than one vertex has a colouring with a stable pair of vertices, then the road colouring conjecture is true. Proof. Let € be a strongly connected primitive graph with constant out-degree. We show that € has a synchronising colouring by induction on the number of vertices in € . If € has only one vertex, there is nothing to prove. If € has more than one vertex, then it admits a colouring with a stable pair of states by the letters of some alphabet A. Let A be the automaton resulting from this colouring. It is easy to check that the stability relation is a congruence of A. Since the relation is non-trivial, the quotient automaton A= has fewer vertices. It is clear that A= is strongly connected; moreover, since each cycle in A induces a cycle of the same length in A=, the underlying graph of the latter automaton is primitive as well. Therefore, the graph admits a synchronising colouring by the induction assumption. We lift this colouring to a colouring of € in the following a a natural way. Every transition p ! q in the automaton A induces the transition Œp ! Œq in A= (here Œp and Œq stand for the -classes of p and respectively q ). Now, if the a0

a

transition Œp ! Œq is being recoloured to Œp ! Œq for some a0 2 A, then the transition a

a0

p ! q becomes p ! q . A crucial feature of this recolouring procedure is that it is a consistent with the stability relation  in the following sense. Suppose p ! q and a p 0 ! q 0 are two transitions with the same label in A such that p  p 0 and q  q 0 . a Then Œp D Œp 0 , Œq D Œq 0  and the two transitions induce the same transition Œp ! Œq a0

in A=. If it is being recoloured to Œp ! Œq for some a0 2 A, then the two transitions a0

a0

are being changed such that the resulting transitions p ! q and p 0 ! q 0 still have a common label. Let B be the automaton resulting from the described recolouring; we want to show that B is synchronising. Take a reset word w for the synchronising colouring of €= that we started with. If we apply w to the states of the automaton B, it will lead them all into a set S that is contained in a single class of the relation . We induct on jS j. If jS j D 1, then w is a reset word for B. If jS j > 1, take two states q; q 0 2 S . Since they form a stable pair in A, there exists a word v such that q A v D q 0 A v . (Here and below, subscripts indicate the automaton in which paths are being considered.) As discussed above, since q  q 0 , the paths started at q and q 0 and labelled v in A have a common label v 0 , say, in B as well. Thus, q B v 0 D q 0 B v 0 . Consider the set S B v 0 of the end points of all paths in B that originate in S and are labelled v 0 . Observe that jS B v 0 j < jS j and, since S B v 0 D S A v , the set is still contained in a single class of the relation . Therefore the induction assumption applies.

Proposition 4.2 “localises” the initial task: while synchronisation is a “global” property in which all vertices are involved, the proposition shows that we may look at some pair of vertices. We need a further localisation that allows us to concentrate on the action of a single letter. For this, we need some auxiliary notions and results.

Jarkko Kari and Mikhail Volkov

550

Let A D .Q; A/ be a DFA. A pair .p; q/ of distinct vertices is compressible if p  w D q  w for some w 2 A ; otherwise it is incompressible. A subset P  Q is said to be compressible if P contains a compressible pair and to be incompressible if every pair of distinct vertices in P is incompressible. Clearly, if P is incompressible, then for every word u 2 A , the set P  u D ¹p  u j p 2 P º also is incompressible and jP j D jP  uj. Lemma 4.3. Let A D .Q; A/ be a DFA and let P  Q be an incompressible set of maximum size in A. Suppose that there exists a word w 2 A that fixes all but one states in P . Then A has a stable pair. Proof. Let q 2 P be such that q 0 D q  w ¤ q but p  w D p for all p 2 P 0 D P n ¹qº. Take an arbitrary word u 2 A ; we aim to show that q  uv D q 0  uv for a suitable word v 2 A . Clearly, we may assume that q  u ¤ q 0  u. Since the set P  wu is incompressible, the state q 0  u D q  wu forms an incompressible pair with every state in P 0  u D P 0  wu. Similarly, since the set P  u is incompressible, the state q  u also forms an incompressible pair with every state in P 0  u, and of course every pair of distinct states in P 0  u is incompressible too. Now P 0  u [ ¹q  u; q 0  uº has more than jP j elements so it must be compressible, and the above analysis shows that the only pair in P 0  u [ ¹q  u; q 0  uº which may be compressible is the pair .q  u; q 0  u/. Thus, there is a word v 2 A such that q  uv D q 0  uv , and the pair .q; q 0 / is stable. Suppose that A D .Q; A/ is a DFA. Fix a letter a 2 A and remove all edges of A except those labelled a. The remaining graph is called the underlying graph of a or simply the a-graph. Thus, in the a-graph every vertex is the tail of exactly one edge. From every state q 2 Q, one can start a path in the a-graph: a

a

a

a

a

q ! q  a ! q  a2 !    ! q  ak !    :

Since the set Q is finite, states in this path eventually begin repeating, that is, for some non-negative integer ` and some integer m > ` we have q  a` D q  am . In other words, each path in the a-graph eventually arrives at a cycle, see Figure 10. The least non-negative integer ` such that q  a` D q  am for some m > ` is called the a-level of the state q and the state q  a` is called the root of q . The cycles of the a-graph are referred to as a-cycles. q a`C1

q a`C2 a

a q

a q a

a q a2

q a`

1

q a`

a a q am

a a 1

q a`C4

Figure 10. The orbit of a state in the underlying graph of a letter

q a`C3

15. Černý’s conjecture and the road colouring problem

551

Lemma 4.4. Let A D .Q; A/ be a strongly connected DFA. Suppose that there is a letter a 2 A such that all states of maximal a-level L > 0 have the same root. Then A has a stable pair. Proof. Let M be the set of all states of a-level L. Then q aL D q 0 aL for all q; q 0 2 M , whence no pair of vertices from M is incompressible. Thus, every incompressible set in A has at most one common state with M . Take an incompressible set S of maximum size in A and choose any state p 2 S . Since the automaton is A strongly connected, there is a path from p to a state in M . If u 2 A is the word that labels this path, then S 0 D S  u is an incompressible set of maximum size and it has exactly one common state with M (namely, p  u). Then S 00 D S 0 :aL 1 is an incompressible set of maximum size that has all its states except one (namely, puaL 1 ) in some a-cycles – the latter conclusion is ensured by our choice of L. If m is the lcm of the lengths of all simple a-cycles, then am fixes all states in every a-cycle but .p  uaL 1 /  am D p  uaL 1Cm ¤ p  uaL 1 . We see that Lemma 4.3 applies (with S 00 in the role P and am in the role of w ). Now we are ready to prove the following result. Theorem 4.5 ([114]). Every strongly connected primitive graph € with constant outdegree admits a synchronising colouring. Proof. If € has just one vertex, there is nothing to prove. Thus, we assume that € has more than one vertex and prove that it admits a colouring with a stable pair of states – the result will then follow from Proposition 4.2. Fix an arbitrary colouring of € by letters from an alphabet A and take an arbitrary letter a 2 A. We induct on the number N of states that do not lie on any a-cycle in the chosen colouring. We say that a vertex p of € is ramified if it serves as the tail for some edges with different heads. Suppose that N D 0. This means that all states lie on the a-cycles. If we suppose that no vertex in € is ramified, then there is just one simple a-cycle (since € is strongly connected) and all simple cycles in € have the same length. This contradicts the assumption that € is primitive.10 Thus, let p be a vertex that is ramified. Then there exists a letter b 2 A such that the states q D p  a and r D p  b are not equal. We exchange the labels of the edges a

b

p ! q and p ! r , see Figure 11. It is clear that in the new colouring there is only one state of maximal a-level, namely, the state q . Thus, Lemma 4.4 applies and the induction basis is verified. Now suppose that N > 0. We denote by L the maximum a-level of the states in the chosen colouring. Observe that N > 0 implies L > 0. Let p be a state of level L. Since € is strongly connected, there is an edge p 0 ! p with p 0 ¤ p , and by the choice of p , the label of this edge is some letter b ¤ a. Let 10 This is the only place in the whole proof where primitivity is used!

Jarkko Kari and Mikhail Volkov

552 a

p

a

a

p

a q

b

a

r

b q

a a

a

r

ak

ak

Figure 11. Recolouring in the induction basis

t D p 0  a. One has t ¤ p . Let r D p  aL be the root of p and let C be the a-cycle on which r lies. The following considerations split in several cases. In each case except one we can recolour € by swapping the labels of two edges so that the new colouring either satisfies the premise of Lemma 4.4 (all states of maximal a-level have the same root) or has more states on the a-cycles (and the induction assumption applies). In the remaining case finding a stable pair will be easy. b

a

Case 1: p0 is not on C . We swap the labels of p 0 ! p and p 0 ! t , see Figure 12. If p 0 was on the a-path from p to r , then the swapping creates a new a-cycle increasing the number of states on the a-cycles. If p 0 was not on the a-path from p to r , then the a-level of p 0 becomes LC1, whence all states of maximal a-level in the new automaton are a-ascendants of p 0 and thus have r as the common root.

b p0

p

aL

r

ak

a p0

a t

aL

p

ak

r

b t

Figure 12. Recolouring in Case 1

Case 2: p0 is on C . Let k1 be the least integer such that r ak1 D p 0 . The state t D p 0 a is also on C . Let k2 be the least integer such that t  ak2 D r . Then the length of C is k1 C k2 C 1. b

a

Subcase 2.1: k2 ¤ L. Again, we swap the labels of p 0 ! p and p 0 ! t , see Figure 13. If k2 < L, then the swapping creates an a-cycle of length k1 C L C 1 > k1 C k2 C 1 increasing the number of states on the a-cycles. If k2 > L, then the alevel of t becomes k2 , whence all states of maximal a-level in the new automaton are a-ascendants of t and thus have the same root.

15. Černý’s conjecture and the road colouring problem

ak1 p0

b

ak1 aL

p

553

a

p0

r

a

p

aL

r

b a

t

k2

ak2

t

Figure 13. Recolouring in Subcase 2.1

Let s be the state of C such that s  a D r . Subcase 2.2: k2 D L and s is ramified. Since s is ramified, there is a letter c such that s 0 D s  c ¤ r . c

a

We swap the labels of s ! s 0 and s ! r ; see Figure 14. If r still lies on an a-cycle, then the length of the a-cycle is at least k1 C k2 C 2 and the number of states on the a-cycles increases. Otherwise, the a-level of r becomes at least 1, whence the a-level of p becomes at least L C 1. Hence all states of maximal a-level in the new automaton are a-ascendants of p and have a common root. ak1

ak1 p0

b

p

aL

a

p0

r

a

k2 1

s

p

aL

a

a

t

b

c

s

0

r c

t a

k2 1

s

a

s0

Figure 14. Recolouring in Subcase 2.2

Let q be the state on the a-path from p to r such that q  a D r . Subcase 2.3: k2 D L and q is ramified. Since q is ramified, there is a letter c such that q 0 D q  c ¤ r . b

a

If we swap the labels of p 0 ! p and p 0 ! t , then we find ourselves in the conditions of Subcase 2.2 (with q and q 0 playing the roles of s and s 0 respectively); see Figure 15.

Subcase 2.4: k2 D L and neither s nor q is ramified. In this subcase it is clear that q and s form a stable pair whichever colouring of € is chosen; see Figure 16. This completes the proof.

Jarkko Kari and Mikhail Volkov

554 ak1 p0 a

b

p

aL q0

ak1 1

q

a

p0

r

c

a

p

b

t

aL q0

1

q

a

r

c

t ak2

ak2

Figure 15. Recolouring reducing Subcase 2.3 to Subcase 2.2

p0

b

L p a

1

a

k1

a t

ak2

q

a r

1

s

a

Figure 16. Subcase 2.4

The above proof of Theorem 4.5 is constructive and can be “unfolded” to an algorithm that, given a strongly connected primitive graph € with constant out-degree, finds a synchronising colouring of € ; moreover, as shown by Béal and Perrin [9], this can be done in time quadratic in the number of vertices in € . If one drops the primitivity condition, one can prove (basically by the same method) the following generalisation of the Road Colouring theorem; see [9]. Theorem 4.6. Suppose that d is the gcd of the lengths of cycles in a strongly connected graph € D .V; E/ with constant out-degree. Then € admits a colouring for which there is a word w such that jV  wj D d . Finally, we discuss a general version of the road colouring problem in which graphs are not assumed to be strongly connected. Given an arbitrary graph € , a vertex q is said to be reachable from a vertex p if there is a path from p to q . Clearly, the reachability relation is transitive, and the mutual reachability relation is an equivalence on the vertex set of € . The subgraphs induced on the classes of the mutual reachability relation are strongly connected and are called the strongly connected components of the graph € . The reachability relation induces a partial order on the set of the strongly connected components: a component €1 precedes a component €2 in this order if some vertex of €1 is reachable from some vertex of €2 . The following result shows that the general case of the road colouring problem easily reduces to its strongly connected case (solved by Theorem 4.5). Corollary 4.7. A graph € with constant out-degree admits a synchronising colouring if and only if € has the least strongly connected component and this component is primitive.

15. Černý’s conjecture and the road colouring problem

555

An interesting issue related to the road colouring problem is the choice of the optimal synchronising colouring for a given graph. Clearly, graphs admitting a synchronising colouring may have several such colourings and reset thresholds of the resulting synchronising automata may drastically differ. For instance, it is easy to see that the Černý automaton Cn whose reset threshold .n 1/2 is believed to be maximum possible for an n-state automaton admits a recolouring with reset threshold is as low as n 1 (and moreover, every strongly connected graph € with constant out-degree that has a loop admits a synchronising colouring whose reset threshold is less than the number of vertices of € ). Nevertheless, there exist graphs all of whose synchronising colouring are “slowly” synchronising automata. As an example, consider the Wielandt graph Wn shown in Figure 17. 1

0

2

n 1

3

Figure 17. The graph Wn

It has n vertices 0; 1; : : : ; n 1, say, and 2n edges: two edges from i to i C 1 mod n for each i D 1; : : : ; n 1, and the edges from 0 to 1 and 2. The graph (more precisely, its incidence matrix) first appeared in Wielandt’s seminal paper [119] where Wielandt 2 stated that for every primitive non-negative n  n-matrix M , the matrix M .n 1/ C1 is positive. The incidence matrix of Wn was used to show that this bound is tight (that is, its .n 1/2 -th power still has some 0 entries); later it was observed to be the only (up to a simultaneous permutation of rows and columns) matrix with this property, see [34]. It is easy to realise that every colouring of the graph Wn is isomorphic to the automaton Wn shown in Figure 6 on the right. Since Wn is strongly connected and primitive, the Road Colouring Theorem implies that Wn is synchronising (of course, this can also be verified directly). In [6] it is shown that the reset threshold of Wn is n2 3n C 3; see the proof of Proposition 3.1 above. The aforementioned extremal property of the Wielandt graphs gives some evidence for conjecturing that this series of graphs may also yield the extremal value for the reset threshold of synchronising colourings of n-vertex graph. The following conjecture is in a sense parallel to Černý’s. Conjecture 4.8 ([6], Conjecture 2). Every strongly connected primitive graph with constant out-degree and n vertices admits a synchronising colouring that can be reset by a word of length n2 3n C 3.

We observe that while there is a clear analogy between Conjecture 4.8 and the Černý conjecture, the validity of none of them immediately implies the validity of the other.

556

Jarkko Kari and Mikhail Volkov

Some preliminary partial results related to Conjecture 4.8 can be found in [22] and [109]. Vorel, Roman, and Drewienkowski (see [93], [94], [117], and [118]) have shown that the problem of finding the optimal synchronising colouring for a given graph is computationally hard. Namely, the following decision problem is NP-complete.

Bo u n d ed -S y nc h ro n i z i ng -Co lo r i ng. Given a strongly connected primitive graph € with constant out-degree > 2, is it true that € has a synchronising colouring with a reset word of length 4? Finally, we briefly mention a quantitative aspect of the road colouring conjecture. Due to the Road Colouring Theorem, every strongly connected primitive graph with constant out-degree admits at least one synchronising colouring, but how many of its colourings may be synchronising? Gusev and Szykuła [53] have studied the synchronising ratio of a primitive graph defined as the ratio of the number of its synchronising colourings to the number of all possible colourings. They have performed extensive experiments revealing various phenomena concerning the synchronising ratio. In particular, Gusev and Szykuła have observed that for small n and k , all primitive strongly connected graphs with constant out-degree k > 2 have synchronising ratio at least 1 k1 , except for a single particular graph with n D 6 and k D 2 which has synchronising ra30 . This has led them to the following conjecture. tio 64 Conjecture 4.9 ([53], Conjecture 1). The minimum value of the synchronising ratio among all strongly connected primitive graphs with n vertices and constant out-degree 30 . k > 2 is equal to 1 k1 , except for the case k D 2 and n D 6 when it is equal to 64 If Conjecture 4.9 holds true, one can use random sampling of colourings as a simple and efficient randomised algorithm for finding a synchronising colouring of a given strongly connected primitive graph. Yet another interesting conjecture in [53] is related to the notion of a totally synchronising graph, that is, a strongly connected primitive graph with constant out-degree all of whose colourings are synchronising. (In terms of synchronising ratio, totally synchronising graphs are precisely graphs of synchronising ratio 1.) For instance, the Wielandt graph Wn shown in Figure 17 is totally synchronising, and so are the underlying graphs of the Černý automata Cn (see Proposition 2 in [53]) as well as those of the automata Dn shown in Figure 9. No polynomial-time algorithm for recognising the property of being totally synchronising is known, but experiments have shown that the property is sufficiently frequent to formulate the following conjecture. Conjecture 4.10 ([53], Conjecture 3). For every k > 2, the fraction of totally synchronising graphs among all strongly connected primitive graphs with n vertices and constant out-degree k tends to 1 as n ! 1.

15. Černý’s conjecture and the road colouring problem

557

References [1] R. L. Adler, L. W. Goodwyn, and B. Weiss, Equivalence of topological Markov shifts. Israel J. Math. 27 (1977), no. 1, 48–63. MR 0437715 Zbl 0362.54034 q.v. 547, 548 [2] R. L. Adler and B. Weiss, Similarity of automorphisms of the torus. Memoirs of the American Mathematical Society, 98. American Mathematical Society, Providence, R.I., 1970. MR 0257315 Zbl 0195.06104 q.v. 547 [3] P. Ageev, Implementation of the algorithm for testing an automaton for synchronization in linear expected time. J. Autom. Lang. Comb. 24 (2019), no. 2–4, 139–152. MR 4023058 Zbl 1429.68096 q.v. 532 [4] Ö. F. Altun, K. T. Atam, S. Karahoda, and K. Kaya, Synchronizing heuristics: Speeding up the slowest. In Testing Software and Systems (N. Yevtushenko, A. R. Cavalli, and H. Yenigün, eds.) Proceedings of the 29th IFIP WG 6.1 International Conference. ICTSS 2017. St. Petersburg, Russia, October 9–11, 2017. Lecture Notes in Computer Science, 10533 Springer, Cham, 2017, 243–256. q.v. 535 [5] D. S. Ananichev and V. V. Gusev, Approximation of reset thresholds with greedy algorithms. Fund. Inform. 145 (2016), no. 3, 221–227. MR 3545831 Zbl 1368.68228 q.v. 534 [6] D. S. Ananichev, M. V. Volkov, and V. V. Gusev, Primitive digraphs with large exponents and slowly synchronizing automata. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 402 (2012), Kombinatorika i Teoriya Grafov. IV, 9–39, 218. In Russian. English translation, J. Math. Sci. (N.Y.) 192 (2013), no. 3, 263–278. MR 2981976 Zbl 1276.68089 q.v. 535, 545, 546, 555 [7] M.-P. Béal, M. V. Berlinkov, and D. Perrin, A quadratic upper bound on the size of a synchronizing word in one-cluster automata. Internat. J. Found. Comput. Sci. 22 (2011), no. 2, 277–288. MR 2772809 Zbl 1217.68121 q.v. 544 [8] M.-P. Béal and D. Perrin, A quadratic upper bound on the size of a synchronizing word in one-cluster automata. In Developments in language theory (V. Diekert and D. Nowotka, eds.). Proceedings of the 13th International Conference (DLT 2009) held in Stuttgart, June 30–July 3, 2009. Lecture Notes in Computer Science, 5583. Springer, Berlin, 2009, 81–90. MR 2544692 Zbl 1217.68122 q.v. 544 [9] M.-P. Béal and D. Perrin, A quadratic algorithm for road coloring. Discrete Appl. Math. 169 (2014), 15–29. MR 3175054 Zbl 1288.05080 q.v. 554 [10] M. V. Berlinkov, Approximating the minimum length of synchronizing words is hard. Theory Comput. Syst. 54 (2014), no. 2, 211–223. MR 3159991 Zbl 1380.68247 q.v. 534 [11] M. V. Berlinkov, On two algorithmic problems about synchronizing automata. In Developments in language theory (A. M. Shur and M. V. Volkov, eds.) Proceedings of the 18th International Conference (DLT 2014) held at the Ural Federal University, Ekaterinburg, August 26–29, 2014. Springer, Cham, 2014, 61–67. MR 3253090 Zbl 1405.68163 q.v. 534 [12] M. V. Berlinkov, On the probability of being synchronizable. In Algorithms and discrete applied mathematics (S. Govindarajan and A. Maheshwari, eds.). Proceedings of the 2nd International Conference (CALDAM 2016) held in Thiruvananthapuram, February 18–20, 2016. Lecture Notes in Computer Science, 9602. Springer, Cham, 2016, 73–84. MR 3509748 Zbl 1398.68298 q.v. 546 [13] M. V. Berlinkov, On the probability of being synchronizable. Preprint, 2020. arXiv:1304.5774v22 [cs.FL] q.v. 532

558

Jarkko Kari and Mikhail Volkov

[14] M. V. Berlinkov and M. Szykuła, Algebraic synchronization criterion and computing reset words. Inf. Sci. 369 (2016), 718–730. q.v. 547 [15] J. Berstel, D. Perrin, and C. Reutenauer, Codes and automata. Encyclopedia of Mathematics and its Applications, 129. Cambridge University Press, Cambridge, 2010. MR 2567477 Zbl 1187.94001 q.v. 527 [16] S. Bogdanović, B. Imreh, M. Ćirić, and T. Petković, Directable automata and their generalizations: a survey. Novi Sad J. Math. 29 (1999), no. 2, 29–69. Proceedings of the VIII International Conference “Algebra and Logic” (Novi Sad, 1998). MR 1818327 Zbl 1009.68076 q.v. 530 [17] V. Boppana, S. Rajan, K. Takayama, and M. Fujita, Model checking based on sequential ATPG. In Computer aided verification (N. Halbwachs and D. A. Peled, eds.). Proceedings of the 11th International Conference (CAV ’99) held in Trento, Italy, July 6–10, 1999. Lecture Notes in Computer Science, 1633. Springer, Berlin, 1999, 418–430. Zbl 1046.68580 q.v. 527 [18] P. J. Cameron, Dixon’s theorem and random synchronization. Discrete Math. 313 (2013), no. 11, 1233–1236. MR 3034755 Zbl 1277.05119 q.v. 546 [19] R. M. Capocelli, L. Gargano, and U. Vaccaro, On the characterization of statistically synchronizable variable-length codes. IEEE Trans. Inform. Theory 34 (1988), no. 4, 817–825. MR 0966751 Zbl 0656.94022 IEEEXplore 9779 q.v. 527 [20] A. Carbone, Cycles of relatively prime length and the road coloring problem. Israel J. Math. 123 (2001), 303–316. MR 1835302 Zbl 0979.05046 q.v. 548 [21] A. Carpi and F. D’Alessandro, The synchronization problem for locally strongly transitive automata. In Mathematical foundations of computer science 2009. (R. Královič and D. Niwiński, eds.). Proceedings of the 34th International Symposium (MFCS 2009) held in Novy Smokovec, August 24–28, 2009. Lecture Notes in Computer Science, 5734. Springer, Berlin, 2009, 211–222. MR 2539493 Zbl 1250.68150 q.v. 544 [22] A. Carpi and F. D’Alessandro, On the hybrid Černý-road coloring problem and Hamiltonian paths. In Developments in language theory (Y. Gao, H. Lu, S. Seki, and S. Yu, eds.). Proceedings of the 14th International Conference (DLT 2010) held at the University of Western Ontario, London, ON, August 17–20, 2010. Lecture Notes in Computer Science, 6224. Springer, Berlin, 2010, 124–135. MR 2725638 Zbl 1250.05046 q.v. 556 [23] J. Černý, Poznámka k homogénnym eksperimentom s konečnými automatami. Matematicko-fyzikalny Časopis Slovenskej Akadémie Vied 14(3) (1964), 208–216. English translation, A note on homogeneous experiments with finite automata. Translated from the Slovak by M. Holzer and B. Truthe. J. Autom. Lang. Comb. 24 (2019), no. 2–4, 123–132. MR 0168429 MR 4023056 (translation) Zbl 1380.68247 (translation) q.v. 526, 531, 535, 536, 537 [24] J. Černý, A. Pirická, and B. Rosenauerová, On directable automata. Kybernetika (Prague) 7 (1971), 289–298. MR 0302347 Zbl 0223.94029 q.v. 537 [25] Y.-B. Chen and D. J. Ierardi, The complexity of oblivious plans for orienting and distinguishing polygonal parts. Algorithmica 14 (1995), no. 5, 367–397. MR 1350133 Zbl 0830.68062 q.v. 529 [26] K. Chmiel and A. Roman, COMPAS – A computing package for synchronization. In Implementation and application of automata (M. Domaratzki and K. Salomaa, eds.). Revised selected papers from the 15th International Conference (CIAA 2010) held at the University of Manitoba, Winnipeg, MB, August 12–15, 2010. Lecture Notes in Computer Science, 6482. Springer, Berlin, 2011, 79–86. MR 2776279 Zbl 1297.68115 q.v. 535, 546

15. Černý’s conjecture and the road colouring problem

559

[27] H. Cho, S.-W. Jeong, F. Somenzi, and C. Pixley, Synchronizing sequences and symbolic traversal techniques in test generation. J. Electronic Testing 4 (1993), 19–31. q.v. 527 [28] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to algorithms. Third edition. MIT Press, Cambridge, MA, 2009. MR 2572804 Zbl 1187.68679 q.v. 531 [29] K. Culik II, J. Karhumäki, and J. Kari, A note on synchronized automata and road coloring problem. Internat. J. Found. Comput. Sci. 13 (2002), no. 3, 459–471. MR 1904496 Zbl 1066.68065 q.v. 548 [30] M. de Bondt, H. Don, and H. Zantema, DFAs and PFAs with long shortest synchronizing word length. In Developments in language theory (É. Charlier, J. Leroy, and M. Rigo, eds.). Proceedings of the 21st International Conference (DLT 2017) held in Liège, August 7–11, 2017. Lecture Notes in Computer Science, 10396. Springer, Cham, 2017, 122–133. MR 3691074 Zbl 1410.68201 q.v. 545 [31] F. M. Dekking, The spectrum of dynamical systems arising from substitutions of constant length. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 41 (1977/78), no. 3, 221–239. MR 0461470 Zbl 0348.54034 q.v. 529 [32] H. Don and H. Zantema, Finding DFAs with maximal shortest synchronizing word length. In Language and automata theory and applications (F. Drewes, C. Martín-Vide, and B. Truthe, eds.). Proceedings of the 11th International Conference (LATA 2017) held in Umeå, March 6–9, 2017. Lecture Notes in Computer Science, 10168. Springer, Cham, 2017, 249–260. MR 3639810 Zbl 06725141 q.v. 545 [33] L. Dubuc, Sur le automates circulaires et la conjecture de Černý. RAIRO Inform. Théor. Appl. 32 (1998), no. 1–3, 21–34. MR 1657507 q.v. 542 [34] A. L. Dulmage and N. S. Mendelsohn, Gaps in the exponent set of primitive matrices. Illinois J. Math. 8 (1964), 642–656. MR 0181645 Zbl 0181645 q.v. 555 [35] M. Dżyga, R. Ferens, V. V. Gusev, and M. Szykuła, Attainable values of reset thresholds. In 42 nd International Symposium on Mathematical Foundations of Computer Science (K. G. Larsen, H. L. Bodlaender, and J.-F. Raskin, eds.). Proceedings of the symposium (MFCS 2017) held in Aalborg, August 21–25, 2017. LIPIcs. Leibniz International Proceedings in Informatics, 83. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2017, Art. no. 40, 14 pp. MR 3755333 q.v. 546 [36] D. Eppstein, Reset sequences for monotonic automata. SIAM J. Comput. 19 (1990), no. 3, 500–510. MR 1041543 Zbl 0698.68058 q.v. 529, 531, 532, 535 [37] M. A. Fischler and M. Tannenbaum, Synchronizing and representation problems for sequential machines with masked outputs. In Proceedings of the 11 th Annual Symposium on Switching and Automata Theory. SWAT ’70. October 28–30, 1970. IEEE Press, Los Alamitos, CA, 1970, 97–103. IEEEXplore 4569639 q.v. 535, 538 [38] P. Frankl, An extremal problem for two families of sets. European J. Combin. 3 (1982), no. 2, 125–127. MR 0670845 Zbl 0488.05004 q.v. 538 [39] D. Frettlöh and B. Sing, Computing modular coincidences for substitution tilings and point sets. Discrete Comput. Geom. 37 (2007), no. 3, 381–407. MR 2301525 Zbl 1112.37008 q.v. 535 [40] J. Friedman, On the road coloring problem. Proc. Amer. Math. Soc. 110 (1990), no. 4, 1133–1135. MR 0953004 Zbl 0745.05031 q.v. 548 [41] M. R. Garey and D. S. Johnson, Computers and intractability. A guide to the theory of NP-completeness. A Series of Books in the Mathematical Sciences. W. H. Freeman and Co., San Francisco, CA, 1979. MR 0519066 Zbl 0411.68039 q.v. 532

560

Jarkko Kari and Mikhail Volkov

[42] P. Gawrychowski, Complexity of shortest synchronizing word. Private communiction, 2008. q.v. 534 [43] P. Gawrychowski and D. Straszak, Strong inapproximability of the shortest reset word. In Mathematical foundations of computer science 2015. (G. F. Italiano, G. Pighizzini, and D. Sannella, eds.). Part I. Proceedings of the 40 th International Symposium (MFCS 2015) held in Milan, August 24–28, 2015. Lecture Notes in Computer Science, 9234. Springer, Berlin, 2015, 243–255. MR 3419431 Zbl 06482739 q.v. 534 [44] M. Gerbush and B. Heeringa, Approximating minimum reset sequences. In Implementation and application of automata (M. Domaratzki and K. Salomaa, eds.). Revised selected papers from the 15th International Conference (CIAA 2010) held at the University of Manitoba, Winnipeg, MB, August 12–15, 2010. Lecture Notes in Computer Science, 6482. Springer, Berlin, 2011, 154–162. MR 2776287 Zbl 1297.68131 q.v. 534 [45] A. Gill, State-identification experiments in finite automata. Information and Control 4 (1961), 132–154. MR 0136025 Zbl 0129.26203 q.v. 526 [46] S. Ginsburg, On the length of the smallest uniform experiment which distinguishes the terminal states of a machine. J. Assoc. Comput. Mach. 5 (1958), 266–280. MR 0120125 Zbl 0088.34406 q.v. 526 [47] K. Goldberg, Orienting polygonal parts without sensors. Algorithmica 10 (1993), no. 2–4, 201–225. MR 1231364 Zbl 0777.68104 q.v. 529 [48] F. Gonze, V. V. Gusev, B. Gerencsér, R. M. Jungers, and M. V. Volkov, On the interplay between Babai and Černý’s conjectures. Internat. J. Found. Comput. Sci. 30 (2019), no. 1, 93–114. MR 3919503 Zbl 1415.68129 q.v. 544 [49] F. Gonze, R. M. Jungers, and A. Trakhtman, A note on a recent attempt to improve the Pin–Frankl bound. Discrete Math. Theor. Comput. Sci. 17 (2015), no. 1, 307–308. MR 3342342 Zbl 1314.68177 q.v. 541 [50] P. Goralčík and V. Koubek, Rank problems for composite transformations. Internat. J. Algebra Comput. 5 (1995), no. 3, 309–316. MR 1331737 Zbl 0831.20089 q.v. 532 [51] C. Güniçen, E. Erdem, and H. Yenigün, Generating shortest synchronizing sequences using answer set programming. Preprint, 2013. arXiv:1312.6146 [cs.AI] q.v. 535 [52] V. V. Gusev, R. M. Jungers, and D. Průša, Dynamics of the independence number and automata synchronization. In Developments in language theory (M. Hoshi and S. Seki, eds.). Proceedings of the 22nd International Conference (DLT 2018) held in Tokyo, September 10–14, 2018. Lecture Notes in Computer Science, 11088. Springer, Cham, 2018, 379–391. MR 3855957 Zbl 06983393 q.v. 540 [53] V. V. Gusev and M. Szykuła, On the number of synchronizing colorings of digraphs. In Implementation and application of automata (F. Drewes, ed.). Proceedings of the 20 th International Conference (CIAA 2015) held in Umeå, August 18–21, 2015. Lecture Notes in Computer Science, 9223. Springer, Cham, 2015, 127–139. MR 3447348 Zbl 06484736 q.v. 556 [54] N. Jonoska and S. Suen, Monocyclic decomposition of graphs and the road coloring problem. Congr. Numer. 110 (1995), 201–209. MR 1369333 Zbl 0905.05066 q.v. 548 [55] S. Karahoda, O. T. Erenay, K. Kaya, U. C. Türker, and H. Yenigün, Parallelizing heuristics for generating synchronizing sequences. In Testing Software and Systems (F. Wotawa, M. Nica, and N. Kushik, eds.). Proceedings of the 28th IFIP WG 6.1 International Conference, ICTSS 2016, held in Graz, Austria, October 17–19, 2016, Lecture Notes in Computer Science, 9976. Springer, Cham, 2016, 106–122. q.v. 535

15. Černý’s conjecture and the road colouring problem

561

[56] S. Karahoda, K. Kaya, and H. Yenigün, Synchronizing heuristics: Speeding up the fastest. Expert Syst. Appl. 94 (2018), 265–275. q.v. 535 [57] J. Kari, A counter example to a conjecture concerning synchronizing words in finite automata. Bull. Eur. Assoc. Theor. Comput. Sci. 73 (2001), 146. MR 1835978 Zbl 0977.68055 q.v. 544 [58] J. Kari, Synchronization and stability of finite automata. J. UCS 8 (2002), no. 2, 270–277. MR 1895803 Zbl 1258.68084 q.v. 548 [59] J. Kari, Synchronizing finite automata on Eulerian digraphs. Theoret. Comput. Sci. 295 (2003), no. 1–3, 223–232. Mathematical foundations of computer science (Mariánské Lázně, 2001). MR 1964667 Zbl 1045.68082 q.v. 542, 548 [60] A. Kisielewicz, J. Kowalski, and M. Szykuła, Computing the shortest reset words of synchronizing automata. J. Comb. Optim. 29 (2015), no. 1, 88–124. MR 3296258 Zbl 1331.68136 q.v. 532, 546 [61] A. Kisielewicz, J. Kowalski, and M. Szykuła, Experiments with synchronizing automata. In Implementation and application of automata (Y.-S. Han and K. Salomaa, eds.). Proceedings of the 21st International Conference (CIAA 2016) held in Seoul, July 19–22, 2016. Lecture Notes in Computer Science, 9705. Springer, Cham, 2016, 176–188. MR 3537538 Zbl 06650032 q.v. 545, 546 [62] A. Kisielewicz and M. Szykuła, Generating small automata and the Černý conjecture. In Implementation and application of automata (S. Konstantinidis, ed.). Proceedings of the 18th International Conference (CIAA 2013) held at Saint Mary’s University, Halifax, NS, July 16–19, 2013. Lecture Notes in Computer Science, 7982. Springer, Berlin, 2013, 340–348. MR 3111215 Zbl 1298.68143 q.v. 545, 546 [63] A. Kisielewicz and M. Szykuła, Synchronizing automata with extremal properties. In Mathematical foundations of computer science 2015. (G. F. Italiano, G. Pighizzini, and D. Sannella, eds.). Part I. Proceedings of the 40 th International Symposium (MFCS 2015) held in Milan, August 24–28, 2015. Lecture Notes in Computer Science, 9234. Springer, Berlin, 2015, 331–343. MR 3419438 Zbl 06482746 q.v. 544 [64] A. A. Klyachko, I. K. Rystsov, and M. A. Spivak, An extremal combinatorial problem associated with the bound of the length of a synchronizing word in an automaton. Kibernetika (Kiev) 1987, no. 2, 16–20, 25, 132. In Russian. English translation, Cybernetics 23 (1987), no. 2, 165–171. MR 0897921 Zbl 0691.05025 q.v. 538 [65] Z. Kohavi and J. Winograd, Bounds on the length of synchronizing sequences and the order of information losslessness. In Theory of machines and computations (Z. Kohavi and A. Paz, eds.). Proceedings of an International Symposium on the Theory of Machines and Computations, held at Technion in Haifa, Israel, August 16–19, 1971. Academic Press, New York and London, 1971, 288–299. MR 0319676 q.v. 538 [66] Z. Kohavi and J. Winograd, Establishing certain bounds concerning finite automata. J. Comput. System Sci. 7 (1973), 288–299. MR 0319676 Zbl 0283.68047 q.v. 538 [67] J. Kowalski and A. Roman, A new evolutionary algorithm for synchronization. In Applications of evolutionary computation (G. Squillero and K. Sim, eds.). Part I. Proceedings of the 20 th European Conference, EvoApplications 2017, held in Amsterdam, April 19–21, 2017. Lecture Notes in Computer Sciences, 10199. Springer, Cham, 2017, 620–635. q.v. 535 [68] A. E. Laemmel, A general class of discrete codes and certain of their properties. Research Report R-459-55, PIB-389. Microwave Research Institute. Polytechnic Institute of Brooklyn, N.Y., 1956. q.v. 526, 530

562

Jarkko Kari and Mikhail Volkov

[69] A. E. Laemmel, Study on application of coding theory. Technical Report PIBMRI-895.563. Department of Electrophysics. Microwave Research Institute. Polytechnic Institute of Brooklyn, N.Y., 1963. q.v. 526, 530, 548 [70] A. E. Laemmel and B. Rudner, Study of the application of coding theory. Technical Report PIBEP-69-034. Department of Electrophysics. Polytechnic Institute of Brooklyn, N.Y., 1969. q.v. 535 [71] C. L. Liu, Some memory aspects of finite automata. Technical Report 411, Research Laburatory in Electronics. Massachusetts Institute of Technology, Cambridge, MA, 1963. q.v. 526, 531 [72] E. F. Moore, Gedanken-experiments on sequential machines. In Automata studies (C. E. Shannon and J. McCarthy, eds.). Annals of Mathematics Studies, 34. Princeton University Press, Princeton, N.Y., 1956, 129–153. q.v. 526 [73] B. K. Natarajan, An algorithmic approach to the automated design of parts orienters. In Proceedings of the 27 th Annual Symposium on Foundations of Computer Science. SFCS ’86. October 27–29, 1986. IEEE Press, Los Alamitos, CA, 132–142. IEEEXplore 4568204 q.v. 528 [74] B. K. Natarajan, Some paradigms for the automated design of parts feeders. Internat. J. Robotics Research 8 (1989), no. 6, 89–109. q.v. 528 [75] C. Nicaud, The Černý Conjecture holds with high probability. J. Autom. Lang. Comb. 24 (2019), no. 2–4, 343–365. MR 4023067 Zbl 1429.68131 q.v. 546, 547 [76] G. L. O’Brien, The road-colouring problem. Israel J. Math. 39 (1981), no. 1–2, 145–154. MR 0617297 Zbl 0471.05033 q.v. 548 [77] J. Olschewski and M. Ummels, The complexity of finding reset words in finite automata In Mathematical foundations of computer science 2010 (P. Hliněný and A. Kučera, eds.). Proceedings of the 35th International Symposium (MFCS 2010) held at Masaryk University, Brno, August 23–27, 2010. Lecture Notes in Computer Science, 6281. Springer, Berlin, 2010, 568–579. MR 2727259 Zbl 1287.68099 q.v. 534 [78] C. H. Papadimitriou, Computational complexity. Addison-Wesley Publishing Company, Reading, MA, 1994. MR 1251285 Zbl 0833.68049 q.v. 532 [79] C. H. Papadimitriou and M. Yannakakis, The complexity of facets (and some facets of complexity). J. Comput. System Sci. 28 (1984), no. 2, 244–259. MR 0760546 Zbl 0571.68028 q.v. 534 [80] D. Perrin and M. P. Schützenberger, Synchronizing prefix codes and automata and the road coloring problem. In Symbolic dynamics and its applications (P. Walters, ed.). Proceedings of the AMS Conference held at Yale University, New Haven, Connecticut, July 28–August 2, 1991. Contemporary Mathematics, 135. American Mathematical Society, Providence, R.I., 1992, 295–318. MR 1185096 Zbl 0787.68073 q.v. 548 [81] J.-É. Pin, Le problème de la synchronisation et la conjecture de Černý. Thèse de 3ème cycle. Université Paris VI, Paris, 1978. q.v. 544 [82] J.-É. Pin, Sur un cas particulier de la conjecture de Černý. In Automata, languages and programming (G. Ausiello and C. Böhm, eds.). Proceedings of the 5th International Colloquium held at Udine, July 17–21, 1978. Lecture Notes in Computer Science, 62. Springer, Berlin, 1978, 345–352. MR 0520853 Zbl 0389.68036 q.v. 542 [83] J.-É. Pin, On two combinatorial problems arising from automata theory. In Combinatorial mathematics (C. Berge, D. Bresson, P. Camion, J.-F. Maurras and F. Sterboul, eds.). Proceedings of the international colloquium on graph theory and combinatorics. Held at the University of Marseille-Luminy, Marseille-Luminy, June 14–19, 1981. North-Holland

15. Černý’s conjecture and the road colouring problem

[84]

[85]

[86]

[87] [88] [89] [90]

[91]

[92] [93] [94] [95] [96]

[97]

563

Mathematics Studies, 75. Annals of Discrete Mathematics, 17. North-Holland Publishing Co., Amsterdam, 1983, 535–548. MR 0841339 Zbl 0523.68042 q.v. 538 C. Pixley, S.-W. Jeong, and G. D. Hachtel, Exact calculation of synchronization sequences based on binary decision diagrams. In Proceedings of the 29 th ACM/IEEE Design Automation Conference. Held in Anaheim, CA, June 8–12, 1992. IEEE Press, Los Alamitos, CA, 620–623. IEEEXplore 227811 q.v. 535 N. Pytheas Fogg, Substitutions in dynamics, arithmetics and combinatorics (V. Berthé, S. Ferenczi, C. Mauduit and A. Siegel, eds.). Lecture Notes in Mathematics, 1794. Springer, Berlin, 2002. MR 1970385 Zbl 1014.11015 q.v. 529 I. T. Podolak, A. Roman, and D. Je¸drzjczyk, Application of hierarchical classifier to minimal synchronizing word problem. In Artificial intelligence and soft computing (L. Rutkowski, M. Korytkowski, R. Scherer, R. Tadeusiewicz, L. A. Zadeh, and J. M. Zurada, eds.). Part I. Proceedings of the 11th International Conference (ICAISC 2012) held in Zakopane, Poland, April 29–May 3, 2012, Lecture Notes in Computer Science, 7267. Springer, Berlin, 2012, 421–429. q.v. 535 I. T. Podolak, A. Roman, M. Szykuła, and B. Zieliński, A machine learning approach to synchronization of automata. Expert Syst. Appl. 97 (2018), 357–371. q.v. 535 M. O. Rabin and D. Scott, Finite automata and their decision problems. IBM J. Res. Develop. 3 (1959), 114–125. MR 0103795 Zbl 0158.25404 q.v. 530 J. L. Ramírez Alfonsín, The diophantine Frobenius problem. Oxford Lecture Series in Mathematics and its Applications, 30. Oxford University Press, Oxford, 2005. MR 2260521 Zbl 1134.11012 q.v. 536 J.-K. Rho, F. Somenzi, and C. Pixley, Minimum length synchronizing sequences of finite state machine. In Proceedings of the 30 th ACM/IEEE Design Automation Conference. Held in Dallas, TX, June 14–18, 1993. IEEE Press, Los Alamitos, CA, 463–468. IEEEXplore 1600266 q.v. 532 A. Roman, Genetic algorithm for synchronization. In Language and automata theory and applications (A. Dediu, A. Ionescu, and C. Martín-Vide, eds.). Third International Conference (LATA 2009) held in Tarragona, Spain, April 2009. Lecture Notes in Computer Science, 5457. Springer, Berlin, 2009, 684–695. MR 2544456 Zbl 1234.68237 q.v. 535 A. Roman, Synchronizing finite automata with short reset words. Appl. Math. Comput. 209 (2009), no. 1, 125–136. MR 2493291 Zbl 1163.68024 q.v. 535 A. Roman, The NP-completeness of the road coloring problem. Inform. Process. Lett. 111 (2011), no. 7, 342–347. MR 2790163 Zbl 1260.68162 q.v. 556 A. Roman and M. Drewienkowski, A complete solution to the complexity of synchronizing road coloring for non-binary alphabets. Inform. and Comput. 242 (2015), 383–393. MR 3351005 Zbl 1316.68059 q.v. 556 A. Roman and M. Szykuła, Forward and backward synchronizing algorithms. Expert Syst. Appl. 42 (2015), no 24, 9512–9527. q.v. 535 I. K. Rystsov, О минимизации синхронизирующих слов для конечных автоматов (On minimizing length of synchronizing words for finite automata). In Теоретические вопросы проектирования вычислительных систем (Theory of designing of computing systems). Institute of Cybernetics of Ukrainian Acad. Sci., 1980, 75–82, 101–102. In Russian. MR 0598668 q.v. 532 I. K. Rystsov, Almost optimal bound of recurrent word length for regular automata. Kibernet. Sistem. Anal. 1995, no. 5, 40–48, 187. In Russian. English translation, Cybernet. Systems Anal. 31 (1995), no. 5, 669–674. MR 1374728 Zbl 0856.68104 q.v. 544

564

Jarkko Kari and Mikhail Volkov

[98] I. K. Rystsov, Quasioptimal bound for the length of reset words for regular automata. Acta Cybernet. 12 (1995), no. 2, 145–152. MR 1370976 Zbl 0844.68085 q.v. 544 [99] I. K. Rystsov, Reset words for automata with simple idempotents. Kibernet. Sistem. Anal. 2000, no. 3, 32–39, 187. In Russian. English translation, Cybernet. Systems Anal. 36 (2000), no. 3, 339–344. MR 1839243 Zbl 0999.68115 q.v. 544 [100] A. Salomaa, Composition sequences for functions over a finite domain. Theoret. Comput. Sci. 292 (2003), no. 1, 263–281. Selected papers in honor of J. Berstel. MR 1964638 q.v. 532 [101] W. Samotij, A note on the complexity of the problem of finding shortest synchronizing words. In Proc. AutoMathA 2007, Automata: from Mathematics to Applications. Università di Palermo, Palermo, 2007. CD. q.v. 532, 533 [102] S. Sandberg, Homing and synchronizing sequences. In Model-based testing of reactive systems. (M. Broy, B. Jonsson, J.-P. Katoen, M. Leucker, and A. Pretschner, eds.). Advanced lectures. Papers from the Research Seminar held in Schloss Dagstuhl, January 2004. Lecture Notes in Computer Science, 3472. Springer, Berlin, 2005, 5–33. q.v. 527, 531 [103] M.-P. Schützenberger, On an application of semi groups methods to some problems in coding. IEEE Trans. Inform. Theory 2 (1956), no. 3, 47–60. IEEEXplore 1056809 q.v. 527 [104] A. L. Selman, A taxonomy of complexity classes of functions. J. Comput. System Sci. 48 (1994), no. 2, 357–381. MR 1275039 Zbl 0806.68049 q.v. 534 [105] E. Skvortsov and E. Tipikin, Experimental study of the shortest reset word of random automata. In Implementation and application of automata (B. Bouchou-Markhoff, P. Caron, J.-M. Champarnaud, and D. Maurel, eds.). Revised selected papers from the 16th International Conference (CIAA 11) held at the Université Francois Rabelais Tours, Blois, July 13–16, 2011. Lecture Notes in Computer Science, 6807. Springer, Berlin, 2011, 290–298. MR 2862922 Zbl 1297.68167 q.v. 535, 546 [106] E. Skvortsov and Y. Zaks, Synchronizing random automata. Discrete Math. Theor. Comput. Sci. 12 (2010), no. 4, 95–108. MR 2767895 Zbl 1286.68295 q.v. 546 [107] P. H. Starke, Eine Bemerkung über homogene Experimente. Elektron. Inform.-verarb. Kybernetik 2 (1966), 257–259. English translation, A remark about homogeneous experiments. J. Autom. Lang. Comb. 24 (2019), no. 2–4, 133–137. Translated from the German by M. Holzer and B. Truthe. MR 4023057 (translation) Zbl 0166.27003 Zbl 1429.68134 (translation) q.v. 537 [108] B. Steinberg, The averaging trick and the Černý conjecture. In Developments in language theory (Y. Gao, H. Lu, S. Seki, and S. Yu, eds.). Proceedings of the 14th International Conference (DLT 2010) held at the University of Western Ontario, London, ON, August 17–20, 2010. Lecture Notes in Computer Science, 6224. Springer, Berlin, 2010, 423–431. MR 2725663 Zbl 1250.68180 q.v. 544 [109] B. Steinberg, The Černý conjecture for one-cluster automata with prime length cycle. Theoret. Comput. Sci. 412 (2011), no. 39, 5487–5491. MR 2857694 Zbl 1243.68204 q.v. 544, 556 [110] M. Szykuła, Algorithms for synchronizing automata. Ph.D. thesis, Institute of Computer Science. University of Wrocław, Wrocław, 2014. q.v. 545, 546 [111] M. Szykuła, Improving the upper bound on the length of the shortest reset word. In 35 th International Symposium on Theoretical Aspects of Computer Science (R. Niedermeier and B. Vallée, eds.) Proceedings of the symposium (STACS 2018) held in Caen, France,

15. Černý’s conjecture and the road colouring problem

[112]

[113]

[114] [115]

[116]

[117]

[118] [119] [120]

565

February 28–March 3, 2018. Leibniz International Proceedings in Informatics, 96. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2018, Art. no. 56, 13 pp. q.v. 541 M. Szykuła and V. Vorel, An extremal series of Eulerian synchronizing automata. In Developments in language theory (S. Brlek and C. Reutenauer, eds.). Proceedings of the 20 th International Conference (DLT 2016) held in Montréal, QC, July 25–28, 2016. Lecture Notes in Computer Science, 9840. Springer, Berlin, 2016, 380–392. MR 3558116 Zbl 1362.68156 q.v. 544 A. Trahtman, An efficient algorithm finds noticeable trends and examples concerning the Černý conjecture. In Mathematical foundations of computer science 2006 (R. Královič and P. Urzyczyn, eds.). Proceedings of the 31st International Symposium (MFCS2006) held in Stará Lesná, August 28–September 1, 2006. Lecture Notes in Computer Science, 4162. Springer, Berlin, 2006, 789–800. MR 2298228 Zbl 1132.68463 q.v. 532, 535, 545 A. Trahtman, The road coloring problem. Israel J. Math. 172 (2009), 51–60. MR 2534238 Zbl 1175.05058 q.v. 548, 551 A. Trahtman, Modifying the upper bound on the length of minimal synchronizing word. In Fundamentals of computation theory (O. Owe, M. Steffen, and J. Telle, eds.). Proceedings of the 18th International Symposium (FCT 2011) held in Oslo, August 22–25, 2011. Lecture Notes in Computer Science, 6914. Springer, Berlin, 2011, 173–180. MR 2886904 Zbl 1342.68188 q.v. 541 M. V. Volkov, Synchronizing automata and the Černý conjecture. In Language and automata theory and applications (C. Martín-Vide, F. Otto, and H. Fernau, eds.). Revised papers from the 2nd International Conference (LATA 2008) held in Tarragona, March 13–19, 2008. Lecture Notes in Computer Science, 5196. Springer, Berlin, 2008, 11–27. MR 2540309 Zbl 1156.68466 q.v. 529 V. Vorel and A. Roman, Complexity of road coloring with prescribed reset words. In Language and automata theory and applications (A. Dediu, E. Formenti, C. Martín-Vide, and B. Truthe, eds.). Proceedings of the 9th International Conference (LATA 2015) held in Nice, March 2–6, 2015. Lecture Notes in Computer Science, 8977. Springer, Cham, 2015, 161–172. MR 3344800 Zbl 06566775 q.v. 556 V. Vorel and A. Roman, Parameterized complexity of synchronization and road coloring. Discrete Math. Theor. Comput. Sci. 17 (2015), no. 1, 283–305. MR 3342341 Zbl 1326.68178 q.v. 556 H. Wielandt, Unzerlegbare, nicht negative Matrizen. Math. Z. 2 (1950), 642–648. MR 0035265 Zbl 0035.29101 q.v. 535, 555 Y. Zaks and E. Skvortsov, Synchronizing random automata on a 4-letter alphabet. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 402 (2012), Kombinatorika i Teoriya Grafov. IV, 83–90, 219. In Russian. English translation, J. Math. Sci. (N.Y.) 192 (2013), no. 3, 303–306. MR 2981980 Zbl 1276.68099 q.v. 546

Part III

Algebraic and topological theory of automata

Chapter 16

Varieties Howard Straubing and Pascal Weil

Contents 1. 2. 3. 4. 5.

Motivation and examples . . . . . . . . . . . Equations, identities and families of languages Connections with logic . . . . . . . . . . . . Operations on classes of languages . . . . . . Varieties in other algebraic frameworks . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

569 576 592 598 607

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

609

1. Motivation and examples We refer the readers to Chapter 1, and specifically to § 4.2 and § 4.3, for the notion of a language recognised by a morphism into a finite monoid, and for the definition of the syntactic monoid Synt.L/ of a language L. 1.1. Idempotent and commutative monoids. When one begins the study of abstract algebra, groups are usually encountered before semigroups and monoids. The simplest example of a monoid that is not a group is the set ¹0; 1º with the usual multiplication. We denote this monoid U1 . What are the regular languages recognised by U1 ? If A is a finite alphabet and 'W A ! U1 is a morphism, then any language L  A recognised by ' – that is, any set of the form ' 1 .X / where X  U1 – has either the form B  or A n B  , where B  A. In particular, membership of a word w in L depends only on the set ˛.w/ of letters occurring in w (see Example 4.9 in Chapter 1). The property “membership of w in L depends only on ˛.w/” is preserved under union and complement, and thus defines a Boolean algebra of regular languages. Of course, not every language in this Boolean algebra is recognised by U1 ; for example, we could take L D a [ b  . However, it follows from basic properties of the syntactic monoid that this Boolean algebra consists of precisely the languages recognised by finite direct products of copies of U1 . We have thus characterised a syntactic property of regular languages in terms of an algebraic property of its syntactic monoid. The family of finite monoids that divide a direct product of a finite number of copies of U1 is itself closed under finite direct

570

Howard Straubing and Pascal Weil

products and division. Such a family of finite monoids is called a pseudovariety. This particular pseudovariety is often denoted J1 in the literature.1 1.1.1. Decidability and equational description. Thus if we want to decide whether a given language L  A has this syntactic property, we can compute Synt.L/ and try to determine whether Synt.L/ 2 J1 . But how do we do that? There are, after all, infinitely many monoids in J1 . We can, however, bound the size of the search space in terms of jAj. It is not hard to prove that if M is a finite monoid, and 'W A ! „ M  ƒ‚ M … r times



is a morphism, then N D '.A / embeds into

M     M…; „  ƒ‚ s times

where s D jM jjAj . This settles, in a not very satisfactory way, the question of deciding whether Synt.L/ is in J1 : the resulting “decision procedure” – check all the divisors jAj of U12 and see if Synt.L/ is isomorphic to any of them! – is of course ridiculously impractical. Fortunately, there is a better approach: U1 is both commutative and idempotent (i.e., all its elements are idempotents). These two properties are preserved under direct products and division, and consequently shared by all members of J1 . That is, the idempotent and commutative monoids form a pseudovariety that contains J1 . Conversely, every idempotent and commutative finite monoid belongs to J1 . To see this, we make note of a fact that will play a large role in this chapter: if M is a finite monoid and 'W A ! M an onto morphism, then Y M  Synt.' 1 .m//: m2M

In particular, every pseudovariety is generated by the syntactic monoids it contains. We now observe that if ˛.w1 / D ˛.w2 /, and if 'W A ! M is a morphism onto an idempotent and commutative monoid, then '.w1 / D '.w2 /, since we can permute letters and eliminate duplications in any word w without changing its value under ' . Thus each ' 1 .m/ satisfies our syntactic property, and so by the remark just made, M 2 J1 . We can express “M is idempotent and commutative” by saying that M satisfies the identities xy D yx and x 2 D x . This means that these equations hold no matter how we substitute elements of M for the variables x and y . This equational characterisation of J1 provides a much more satisfactory procedure for determining if a monoid M belongs to J1 : if M is given by its multiplication table, then we can verify the identities in time polynomial in jM j. 1 It is also written Sl because its elements are called semilattices.

16. Varieties

571

1.1.2. Connection to logic. Before leaving this example, we note a connection with formal logic. We express properties of words over A by sentences of first-order logic in which variables denote positions in a word. For each a 2 A, our logic contains a unary predicate Qa , where Qa x is interpreted to mean “the letter in position x is a.” We allow only these formulas Qa x as atomic formulas – in particular, we do not include equality as a predicate. A sentence in this logic, for example (with A D ¹a; b; cº) 9x9y8z.Qa x ^ Qb y ^ :Qc z/



defines a language over A , in this case the set of all words containing both a and b , but with no occurrence of c . It is easy to see that the languages definable in this logic are exactly those in which membership of a word w depends only on ˛.w/. The following theorem summarises the results of this subsection. Theorem 1.1. Let A be a finite alphabet and let L  A be a regular language. The following are equivalent: 1. membership of w in L depends only on the set ˛.w/ of letters appearing in w ; 2. Synt.L/ 2 J1 , that is, Synt.L/ divides a finite direct product of copies of U1 ; 3. Synt.L/ satisfies the identities xy D yx and x 2 D x ; 4. L is definable by a first-order sentence over the predicates Qa , a 2 A.

1.2. Piecewise-testable languages. Suppose that instead of testing for occurrences of individual letters in a word, we test for occurrences of non-contiguous sequences of letters, or subwords. More precisely, we say that v D a1    ak , where each ai 2 A, is a subword of w 2 A if w D w0 a1 w1    ak wk  for some w0 ; : : : ; wk 2 A . We also say that the empty word 1 is a subword of every word in A . The set of all words in A that contain v as a subword is thus the regular language Lv D A a1 A    ak A : We say that a language is piecewise-testable if it belongs to the Boolean algebra generated by the Lv . 1.2.1. Decidability and equational description. It is not clear that we can effectively decide whether a given regular language is piecewise testable. For the language class of § 1.1, we were able to settle this question by in effect observing that for every finite alphabet A there were only finitely many languages of the class in A . For piecewisetestable languages, this is no longer the case. It is possible, however, to obtain an algebraic characterisation of the piecewise-testable languages, and this leads to a fairly efficient decision procedure. We first note two relatively easy-to-prove facts. First, the monoids Synt.Lv / are all J-trivial: this means that if m; m0 ; s; t; s 0 ; t 0 2 Synt.Lv / are such that m D s 0 m0 t 0 , m0 D smt , then m D m0 . Second, the family J of J-trivial monoids forms a pseudovariety. It follows then that the syntactic monoid of every piecewise-testable language is J-trivial. A deep theorem, due to I. Simon [63], shows that the converse is true as well: every language recognised by a finite J-trivial monoid is piecewise-testable.

Howard Straubing and Pascal Weil

572

Clearly, we can effectively determine, from the multiplication table of a finite monoid M , all the pairs .m; m0 / 2 M  M such that m0 D smt for some s; t 2 M , and thus determine if M 2 J. This gives us an algebraic decision procedure for piecewisetestability. Can the pseudovariety J be defined by identities in the same manner as J1 ? The short answer is no. This is because satisfaction of an identity u D v , where u and v are words over an alphabet ¹x; y; : : :º of variables, is preserved by infinite direct products as well as finite direct products and divisors. Now consider the monoids Q

Mj D ¹1; m; m2; : : : ; mj D mj C1 º:

Each Mj 2 J, but j >1 Mj contains an isomorphic copy of the infinite cyclic monoid ¹1; a; a2 ; : : :º, which has every finite cyclic group as a quotient. Thus every identity satisfied by all the monoids in J is also satisfied by all the finite cyclic groups, which are not in J. In spite of this, we can still obtain an equational description of J, provided we adopt an expanded notion of what constitutes an identity. If s is an element of a finite monoid M , then we denote by s ! the unique idempotent power of s . We will allow identities in which the operation x 7! x ! is allowed to appear; these are special instances of what we will call profinite identities. It is not hard to see that satisfaction of these new identities is preserved under finite direct products and quotients, and thus every set of such identities defines a pseudovariety. For example, the profinite identity x ! D xx !

is satisfied by precisely the finite monoids that contain no nontrivial groups. This is the pseudovariety of aperiodic monoids, which we denote Ap. Similarly, the profinite identity x! D 1 defines the pseudovariety G of finite groups. As was the case with J, neither of these pseudovarieties can be defined by a set of ordinary identities. It can be shown that the pseudovariety J of finite J-trivial monoids is defined by the pair of profinite identities .xy/! x D .xy/! ;

or, alternatively, by the pair

.xy/! D .yx/! ;

y.xy/! D .xy/! ; xx ! D x ! :

1.2.2. Connection with logic. Let us supplement the first-order logic for words that we introduced earlier with atomic formulas of the form x < y , which is interpreted to mean “position x is strictly to the left of position y .” The language Lv , where v D a1    ak , is defined by the sentence 9x1 9x2    9xk .x1 < x2 ^ x2 < x3 ^    ^ xk

1

< xk ^ Qa1 x1 ^    ^ Qak xk /:

16. Varieties

573

This is a †1 -sentence – one in which all the quantifiers are in a single block of existential quantifiers at the start of the sentence. It follows easily that a language is piecewisetestable if and only if it is defined by a Boolean combination of †1 -sentences. The following theorem summarises the results of this subsection. Theorem 1.2. Let A be a finite alphabet and let L  A be a regular language. The following are equivalent: 1. 2. 3. 4. 5.

L is piecewise testable; Synt.L/ 2 J, that is, Synt.L/ is J-trivial; Synt.L/ satisfies the identities .xy/! D .yx/! and xx ! D x ! ; Synt.L/ satisfies the identities .xy/! x D y.xy/! ; L is definable by a Boolean combination of †1 -sentences over the predicates < and Qa , a 2 A.

1.3. Pseudovarieties of monoids and varieties of languages. We tentatively extract a few general principles from the preceding discussion. These will be explored at length in the subsequent sections. Given a pseudovariety V of finite monoids and a finite alphabet A, we form the family A V of all regular languages L  A for which Synt.L/ 2 V. We can think of V itself as an operator that associates with each finite alphabet A a family of regular languages over A. V is called a variety of languages. (We will give a very different, although equivalent definition of this term in our formal discussion in § 2.) From our earlier observation that pseudovarieties are generated by the syntactic monoids they contain, it follows that if V and W are distinct pseudovarieties, then the associated varieties of languages V and W are also distinct. Thus there is a one-to-one correspondence between varieties of languages and pseudovarieties of finite monoids. Often we are interested in the following sort of decision problem: given a regular language L  A , does it belong to some predefined family V of regular languages, for example, the languages definable in some logic? If V forms a variety of languages, then we can answer the question if we have some effective criterion for determining if a given finite monoid belongs to the corresponding pseudovariety V. (The converse is true as well: if we could decide the question about membership in the variety of languages, we would be able to decide membership in V.) Pseudovarieties are precisely the families of finite monoids defined by sets of profinite identities. For the time being this assertion – a theorem due to Reiterman – will have to remain somewhat vague, since we haven’t even come close to saying what a profinite identity actually is! Such equational characterisations of pseudovarieties are frequently the source of the decision procedures discussed above. If V is a variety of languages, then, as we have seen, each A V is closed under Boolean operations. Observe further that if L 2 A V and v 2 A , then both of the quotient languages v

1

Lv

L D ¹w 2 A j vw 2 Lº; 1

D ¹w 2 A j wv 2 Lº

574

Howard Straubing and Pascal Weil

are in A V, because any monoid recognising L also recognises the quotients. For the same reason, if 'W B  ! A is a morphism, ' 1 .L/ is in B  V. An important result, due to Eilenberg, showed that these closure properties characterise varieties of languages. Theorem 1.3. Let V assign to each finite alphabet A a family A V of regular languages in A . V is a variety of languages if and only if the following three conditions hold: 1. each A V is closed under Boolean operations; 2. if L 2 A V and w 2 A, then w 1 L 2 A V, and Lw 1 2 A V; 3. if L 2 A V and 'W B  ! A is a morphism of finitely generated free monoids, then ' 1 .L/ 2 B  V.

This theorem can be quite useful for showing, in the absence of an explicit algebraic characterisation of the corresponding pseudovariety of monoids, that a combinatorially or logically defined family of languages forms a variety. We conclude from this that such an algebraic characterisation in principle exists. Although it is somewhat involved, Theorem 1.3 is quite elementary; see [20] and [44]. In the next section we will revisit the definition of varieties of languages and profinite identities in a way that will permit us to prove both Theorem 1.3 and Reiterman’s theorem in a single argument. Before we proceed with this program, we briefly describe certain classes of regular languages which admit syntactic characterisations (that is, characterisations in terms of syntactic monoids and syntactic morphisms), but which are not varieties in the sense described above. 1.4. Extensions. Interesting classes of regular languages frequently admit characterisations in terms of their syntactic monoids and syntactic morphisms, and the theory sketched above is meant to provide a formal setting for this algebraic classification of regular languages. However, the framework is not adequate to capture all the examples of interest that arise. Here we give three examples. Consider, first, the family A K1 of languages L  A for which membership of w in L is determined by the leftmost letter of w . This class forms a Boolean algebra closed under quotients, but is not a variety of languages. To see this, note that a.a C b/ 2 ¹a; bº K1 and c  a.a C b C c/ … ¹a; b; cº K1 , even though the two languages have the same syntactic monoid. Alternatively, we can reason using Theorem 1.3, and note that the second language is an inverse homomorphic image of the first, and thus K1 fails to be a variety of languages. More generally, we can define the family A Kd of languages L for which membership S of w in L depends only on the leftmost min.jwj; d / letters of w , as well as A K D d >0 A Kd . All these families are closed under Boolean operations and quotients, yet fail to be varieties of languages. We obtain an example with a similar flavor if we supplement the predicate logic described earlier by atomic formulas x q 0, where q > 1, which is interpreted to mean that position x is divisible by q . (We assume that positions in a word are numbered, beginning with 1 for the leftmost position.) We denote by A QA the family of languages over A definable in this logic. Languages in A QA arise as the regular languages definable in the circuit complexity class AC 0 (see [11]). Each A QA is a Boolean

16. Varieties

575

algebra closed under quotients, however QA is not a variety of languages. To see this, consider the morphism ¹a; bº ! ¹aº that maps a to a and b to the empty word. The set ¹a2n j n > 0º is in ¹aº QA, as it is defined by the sentence 8x.8y.y 6 x/ ! x 2 0/:

However the inverse image of this language under the morphism is the set of strings over ¹a; bº with an even number of occurrences of a, and it is possible to prove by model-theoretic means that this language is not definable in our logic. Finally, consider the family A JC of languages definable by †1 -sentences over the predicates < and Qa with a 2 A (in contrast to the languages definable by Boolean combinations of †1 -sentences, which we considered earlier). It is easy to see that if L 2 A JC and w 2 L, then Lw  L. This readily implies that A JC is not closed under complement, since, for example, the complement of .a C b/ a.a C b/ does not have this property. Thus JC is not a variety of languages. On the other hand, it does satisfy many of the properties of varieties of languages: it is closed under finite unions and intersections, quotients, and inverse images of morphisms between free monoids. It turns out that each of these three examples admits an algebraic characterisation in terms of classes that are very much like pseudovarieties. For our first example, in which membership of a word in a language is determined by the leftmost letter, the correct generalisation of pseudovarieties was already known to Eilenberg: one looks not at the syntactic monoid of a language L, but at the image of the set AC of nonempty words under the syntactic morphism. This is called the syntactic semigroup of L. We can define pseudovarieties of finite semigroups just as we defined pseudovarieties of finite monoids. Then L 2 A K1 if and only if its syntactic semigroup belongs to the pseudovariety of semigroups defined by the identity xy D x . While K1 is not closed under inverse images of morphisms between free monoids, it is closed if we restrict ourselves to non-erasing morphisms – those that map every letter to a nonempty word. We can use a similar method to characterise the class QA. Once again we look not just at the syntactic monoid of a language L, but at the additional structure provided by the syntactic morphism L . It is known that L 2 A QA if and only if for every k > 0, L .Ak / contains no nontrivial groups [11]. The family QA of morphisms from free monoids onto finite monoids with this property forms a kind of pseudovariety with respect to appropriately modified definitions of direct product and division. An equational characterisation of QA is provided by the identity .x !

1

y/! D .x !

1

y/!C1 ;

where the identity is interpreted in the following sense: ' 2 QA if and only if for all words u and v of the same length, x D '.u/ and y D '.v/ satisfy the identity. QA is closed under inverse images of morphisms f W B  ! A such that f .B/  Ak for some k > 0; these are called length multiplying morphisms. In fact, these last two examples are instances of a single phenomenon: families of morphisms 'W A ! M onto finite monoids that form pseudovarieties with respect to some underlying composition-closed class C of morphisms between free monoids.

576

Howard Straubing and Pascal Weil

For the example JC of †1 -definable languages, the algebraic characterisation involves a different generalisation of pseudovarieties. Here the additional structure on the syntactic monoid is provided by the embedding of L .L/ in Synt.L/. If m1 ; m2 2 M , then we say m1 6L m2 if ¹.s; t/ 2 Synt.L/  Synt.L/ j sm2 t 2 L .L/º

 ¹.s; t/ 2 Synt.L/  Synt.L/ j sm1 t 2 L .L/º:

This gives a partial order on Synt.L/ compatible with multiplication (see § 4.4 of Chapter 1). We then find that L 2 A JC if and only if this ordered syntactic monoid satisfies the inequality x 6 1 for each element x . The family of partially-ordered monoids satisfying this inequality is a pseudovariety of ordered finite monoids – it is closed under finite direct products, and order-compatible submonoids and quotients. The theory of pseudovarieties of ordered monoids and the corresponding positive varieties of languages is due to Pin [45] In the next section we will formally develop the framework that gives the correspondence between pseudovarieties and language varieties, and the definition by profinite identities, in a very general setting. Pseudovarieties of finite monoids, as well as all the generalisations mentioned above, will appear as special cases.

2. Equations, identities and families of languages The original statement of Eilenberg’s theorem dealt exclusively with varieties of languages. Here we will show how to use a whole hierarchy of increasingly complex equational characterisations of increasingly structured families of languages. Before we describe these results, we need to give a quick introduction to the free profinite monoid and its connection to the theory of regular languages. 2.1. The free profinite monoid. Say that a finite monoid M separates two words u; v 2 A if there exists a morphism 'W A ! M such that '.u/ ¤ '.v/. Note that if u ¤ v , there always exists such a monoid. Indeed, for each n > 1, consider the quotient monoid A =A>n: it consists of the set of words of length less than n, plus a zero, and each product with length at least n (in A ) is equal to 0. Then A =A>n separates u and v if n > max.juj; jvj/. We denote by r.u; v/ the minimum cardinality of a monoid separating u and v . The profinite distance on A is defined by letting d.u; v/ D 2 r.u;v/ if u ¤ v and d.u; u/ D 0. One verifies easily that d is in fact an ultrametric distance (it satisfies the ultrametric inequality d.u; v/ 6 max.d.u; w/; d.v; w//, stronger than the triangle inequality), and the above discussion shows that the resulting metric space is Hausdorff. The topology thus defined on A is not especially interesting: we get a discrete space, where a sequence .un /n converges to a word u if and only if .un /n is ultimately equal to u. . . This can be verified using the monoids A =A>n described above. There are, however, non-trivial Cauchy sequences. In fact, one can show the following.

16. Varieties

577

Proposition 2.1. A sequence .un /n is Cauchy if and only if, for each morphism 'W A ! M into a finite monoid, the sequence .'.un //n is ultimately constant. For instance, if u is a word, then .unŠ /n is a Cauchy sequence (this can be deduced from the fact that its image under any morphism into a finite monoid is ultimately constant), but it is non-trivial if u ¤ 1. In topological terms, the uniform structure defined by the profinite distance is non-trivial. Using a classical construction from topology (analogous to the construction of the real numbers from the rationals), we can now consider the completion of .A ; d /, c . It can be viewed as the quotient of the set of Cauchy sequences in denoted by A  .A ; d / by the relation identifying two sequences .un / and .vn / if the mixed sequence, alternating the terms of .un / and .vn /, is Cauchy as well. In particular, A is naturally c . seen as a dense subset of A The following results can be verified by elementary means.

Proposition 2.2. Let A be an alphabet.

1. The multiplication operation .u; v/ 7! uv in A is uniformly continuous. 2. Every morphism 'W A ! B  between free monoids, and every morphism W A ! M from a free monoid to a finite monoid (equipped with the discrete distance) is uniformly continuous. c is a compact space. 3. A

By a standard property of completions, it follows from Proposition 2.2 (1) that c : the resulting monoid is called the the multiplication of A can be extended to A free profinite monoid on A. Similarly, Proposition 2.2 (3) shows that each morphism 'W A ! B  between free monoids (resp. each morphism W A ! M from a free monc ! B c oid to a finite monoid) admits a uniquely defined continuous extension, 'W O A c ! M ). (resp. O W A For example, consider the Cauchy sequence .unŠ /n , where u 2 A , which we c , which we will denote u! . Observe discussed above. This represents an element of A  that for any morphism ' from A into a finite monoid, the sequence '.u O nŠ / is ultimately constant and equal to the unique idempotent power of '.u/, so in the notation we introduced earlier we have, very conveniently, '.u O ! / D .'.u//! :

c represented by the Cauchy sequence We can similarly define u! 1 as the element of A nŠ 1 '.u O /. Finally, we note the strong connection between regular languages and free profinite monoids.

Proposition 2.3. Let A be an alphabet and let L  A .

c , L x , is clopen (i.e., open 1. L is regular if and only if its topological closure in A c . and closed), if and only if L D K \ A for some clopen set K  A

578

Howard Straubing and Pascal Weil

c , then the following are equivalent: 2. If L is regular and u 2 A x i. u 2 L; ii. '.u/ O 2 '.L/ for every morphism ' from A to a finite monoid; iii. '.u/ O 2 '.L/ for every morphism ' from A to a finite monoid recognising L; iv. .u/ O 2 .L/ where  is the syntactic morphism of L.

2.2. Equations and lattices of languages. We begin our study of families of regular languages with the simplest such family: a lattice of languages over a fixed alphabet. In this chapter, we define a lattice of languages over an alphabet A to be a set of languages over A which is closed under finite union and finite intersection, and which contains A and ; (respectively, the union and the intersection of an empty family of languages). c , usually denoted by A profinite equation on A is a pair .u; v/ of elements of A  u ! v . If u; v 2 A , the equation is called explicit. A language L  A is said to satisfy the equation u ! v , written L ` u ! v , if x H) v 2 L: x u2L

Remark 2.4. It is important to note that u, v and the words in L are all defined over the same alphabet A. In contrast to the identities we encountered in § 1, in this definition, the letters occurring in u and v are not considered as variables, to be replaced by arbitrary elements. We will formally define identities in § 2.4. The notion of equation is particularly relevant for regular languages. The following results directly from Proposition 2.3. c . Proposition 2.5. Let L  A be regular and let u; v 2 A

1. If u; v 2 A , then L ` u ! v if and only if u 2 L H) v 2 L. 2. If  is the syntactic morphism of L, then L ` u ! v if and only if .u/ O 2 .L/ H) .v/ O 2 .L/.

Let E be a set of equations on A. We denote by L.E/ the set of regular languages in A which satisfy all the equations in E . It is immediately verified that this set is closed under unions and intersections. Further, both ; and A satisfy every equation. So L.E/ is a lattice. The main theorem of this section states that all lattices of regular languages arise this way. Theorem 2.6. Let L be a class of regular languages in A . Then L is a lattice if and only if there exists a set E of profinite equations on A such that L D L.E/. We have already seen that one direction of this equivalence holds: every set of the form L.E/ is a lattice. The proof of the converse is obtained after several steps. The first concerns the set of equations satisfied by a given language. If L  A , let c  A c j L ` u ! vº: EL D ¹.u; v/ 2 A

Lemma 2.7. If L is regular, then EL is clopen.

16. Varieties

579

Proof. By definition of the satisfaction of equations, c  A c j .u 62 L/ c / [ .A c  L/: x _ .v 2 L/º x D .L xc  A x EL D ¹.u; v/ 2 A

c , L x and L x c are compact (since L is regular). Lemma 2.7 follows from the fact that A

The proof of the next claim illustrates the crucial role played by the compactness T c of A . Let L be a lattice of regular languages in A and let EL D L2L EL .

Lemma 2.8. Let L be a regular language in L.EL /: that is, L satisfies all the profinite equations satisfied by all the elements of L. Then there exists a finite subset K of L such that L 2 L.EK /.

c Proof. By Lemma 2.7, EL and each EK (K 2 L) are open sets. Moreover, if .u; v/ c does not belong to any of the EK (K 2 L), then .u; v/ belongs to each EK , that is, every language in L satisfies u ! v . It follows that L satisfies u ! v as well, that is, c c . .u; v/ 2 EL . Therefore EL and the EK (K 2 L) form an open cover of A c is covered By compactness, there exists a finite subcollection K of L such that A S c c by EL and the EK , K 2T , K. It follows that EL contains the complement of K2K EK namely the intersection K2K EK . That is, L satisfies all the equations satisfied by the elements of K, which establishes the claim.

We are now ready to prove Theorem 2.6, by showing that if L is a lattice of regular languages in A , then L D L.EL /. It is immediate by construction that L is contained in L.EL /. Let us now consider a language L 2 L.EL /. By Lemma 2.8, we have L 2 L.EK / for a finite subset K of L. For each u 2 L, let K.u/ be the intersection of the languages K 2 K containing u. Even though L may be infinite, K.u/ takes S only finitely many values since K is finite. By definition of the K.u/, S we have L  u2L K.u/, a finite union. Conversely, let v 2 u2L K.u/. Then there exists a word u 2 L such that v belongs to every K 2 K containing u. That is, every K 2 K satisfies the equation u ! v . In other words, u ! v lies in E SK , and hence L satisfies that equation. Since u 2 L, it follows that v 2 L. Thus L D u2L K.u/ and hence L 2 L, which concludes the proof. 2.3. More classes of languages: from lattices to varieties. Here we explore how classes of regular languages that are more structured than lattices can be defined by more structured sets of equations. We start with an elementary lemma.

Lemma 2.9. Let L be a lattice of regular languages satisfying the profinite equation u ! v. 1. If L is closed under complementation, then L also satisfies v ! u. 2. If L is closed under quotients, then L satisfies the equations xuy ! xvy , for c . all x; y 2 A

Proof. It follows from the definition of equations that L satisfies u ! v if and only if its complement satisfies v ! u. The first part of the claim follows immediately.

580

Howard Straubing and Pascal Weil

It is also elementary that, if x; y 2 A and x 1 Ly 1 ` u ! v , then L ` xuy ! xvy . Thus, if L is closed under quotients, then L satisfies all the equations xuy ! xvy c since EL is closed and A is dense with x; y 2 A . This holds also if x; y 2 A c . in A

c , we say We now extend the notion of profinite equations as follows: if u; v 2 A that a language L satisfies the symmetrical equation u $ v if L satisfies both u ! v and v ! u. We also say that a language L satisfies the profinite inequality v 6 u if it satisfies c , and it satisfies the profinite all the equations of the form xuy ! xvy with x; y 2 A equality u D v if it satisfies both u 6 v and v 6 u. The verification of the following corollary is now elementary.

Corollary 2.10. Let L be a set of regular languages in A . 1. Then L is a Boolean algebra if and only if L D L.E/ for some set E of symmetrical profinite equations on A. 2. L is a lattice closed under quotients if and only if L D L.E/ for some set E of profinite inequalities on A. 3. L is a Boolean algebra closed under quotients if and only if L D L.E/ for some set E of profinite equalities on A. 2.4. Identities and varieties. We now come to the historically and mathematically important class of varieties. Varieties of languages were defined in § 1.3 but we will not use this definition here. In fact, in the course of this section, we will give an alternate, equivalent definition of varieties. An important difference between varieties and the lattices of languages over a fixed alphabet discussed so far in § 2, is that a variety V consists of a collection of lattices A V, one for each finite alphabet A. More generally, we define a class of regular languages V to be an operator which assigns to each finite alphabet A, a family A V of regular languages in A . First, we prove a technical lemma. c . Lemma 2.11. Let 'W A ! B  be a morphism, L  B  and u; v 2 A x if and only if u 2 ' 1 .L/. 1. '.u/ O 2L

2. L satisfies '.u/ O ! '.v/ O if and only if '

1

.L/ satisfies u ! v .

Proof. The first statement is trivial if u; v 2 A : indeed, ' and 'O coincide on words, x (resp. ' 1 .L/) with A (resp. B  ) is L (resp. ' 1 .L/). The and the intersection of L c is obtained by density. extension to the case where u; v 2 A The second statement follows immediately from the first and the definition of profinite equations. We extend the notion of profinite equations, this time to profinite identities, to permit the treatment of classes of regular languages instead of lattices of regular languages

16. Varieties

581

over a fixed alphabet. Since there is no alphabet of reference anymore, we will usually denote by X the alphabet over which profinite identities are written. c Let C be a composition-closed class of morphisms between free monoids, u; v 2 X  and L  A , where X and A are finite, but possibly different alphabets. We say that L C-identically satisfies u ! v if, for each morphism 'W X  ! A in C, L satisfies '.u/ O ! '.v/ O . We say that a class of regular languages V C-identically satisfies an equation if A V does, for each finite alphabet A. The following statement is a direct application of Lemma 2.11. Corollary 2.12. Let V be a class of regular languages, let C be a family of morphisms between free monoids closed under composition, such that whenever 'W A ! B  is in C and L 2 B  V, then ' 1 .L/ 2 A V. c ), then V C-identically If X  V satisfies the profinite equation u ! v (with u; v 2 X satisfies u ! v . Using the notions introduced in § 2.3, we say that L satisfies the profinite C-identity u D v (resp. profinite ordered C-identity u 6 v ) if L C-identically satisfies u D v (resp. u 6 v ). If E is a set of profinite equations and for each finite alphabet A, A V is the set of regular languages in A which C-identically satisfy the elements of E , we say that the resulting class of regular languages V is C-defined by E . Let us now define (positive) C-varieties: a class V of regular languages is a positive C-variety (resp. a C-variety) of languages if each A V is a lattice (resp. a Boolean algebra) closed under quotients and if, for each 'W A ! B  in C and each L 2 B  V, we have ' 1 .L/ 2 A V. If C is the class of all morphisms between free monoids, we drop the prefix C and simply talk of (ordered) profinite identities and (positive) varieties of languages. Collecting Corollaries 2.10 and 2.12, we have the following characterisations.

Theorem 2.13. Let V be a class of regular languages and let C be a compositionclosed class of morphisms between free monoids. Then V is a positive C-variety (resp. a C-variety) if and only if V is C-defined by a set of profinite ordered C-identities (resp. profinite C-identities). Remark 2.14. In § 1.3, we gave a different definition of varieties of languages, and Theorem 1.3 stated that it was equivalent to the definition given above. We will prove this equivalence in § 2.5 below, thus formally reconciling the two definitions. 2.5. Eilenberg’s and Reiterman’s theorems. We note that (in)equalities can be interpreted in the (ordered) syntactic monoid of a language. Let L be a regular language c . By Proposition 2.5, if  is the syntactic morphism of L, then in A and let u; v 2 A L ` v 6 u if and only if .v/ O 6L .u/ O . Thus membership of a regular language L in a lattice of regular languages closed under quotients is characterised by properties of the syntactic morphism of L. We can also interpret identities in abstract finite ordered monoids – that is, finite c , monoids in which there is a partial order 6 compatible with multiplication: if u; v 2 X

582

Howard Straubing and Pascal Weil

we say that a finite ordered monoid M satisfies the profinite identity u 6 v if for every morphism 'W X  ! M we have '.u/ O D '.v/ O . Likewise a monoid M satisfies the profinite identity u D v if for each such ' we have '.u/ O D '.v/ O . We extend this notion further to C-satisfaction of identities. We call a morphism 'W A ! M , where M is finite and ' maps onto M , a stamp. We also define ordered stamps as morphisms from a free monoid A onto an ordered finite monoid. (Such morphisms are automatically order-preserving if we consider the trivial ordering on A in which w1 6 w2 if and only if w1 D w2 .) Let C be a class of morphisms between finitely generated free monoids that is closed under composition and that contains all the length-preserving morphisms. We say that the ordered stamp 'W A ! .M; 6/ C-satisfies the profinite c if and only if for all morphisms W X  ! A with identity u 6 v with u; v 2 X 2 C, we have 'O O .u/ 6 'O O .v/. We similarly define C-satisfaction of identities u D v by (not necessarily ordered) stamps. We have already defined pseudovarieties of finite monoids in § 1. We can extend this definition to define C-pseudovarieties of stamps. We call a collection V of stamps a C-pseudovariety if it satisfies the following two conditions: 1. if 'W A ! M is in V, W B  ! A is in C, and  is a morphism from Im.' / onto a finite monoid N , then ' W B  ! N is in V; 2. if 'i W A ! Mi are stamps in V for i D 1; 2, then their direct product '1  '2 W A ! Im.'1  '2 /  M1  M2 is in V.

If we restrict the morphisms occurring in these definitions to order-preserving morphisms or ordered monoids, we obtain the definition of ordered C-pseudovarieties of stamps. Ordinary pseudovarieties coincide with C-pseudovarieties in the case where C contains all morphisms between finitely-generated free monoids. We say that a class V of finite (ordered) monoids is defined by a set E of identities (written V D ŒŒE) if V consists of all the finite (ordered)monoids that satisfy all of the identities in E . Similarly, we say that a family V of stamps is C-defined by E (we write V D ŒŒEC ) if V consists of all the stamps that C-satisfy these identities. Further if V is a class of monoids or stamps, ordered or unordered, we define the corresponding class V of languages by setting L 2 A V if and only if Synt.L/ 2 V (if V is a class of monoids) or L 2 V (if V is a class of stamps). We write V 7! V to denote this correspondence. This leads us to a restatement of Eilenberg’s theorem, Theorem 1.3 above, as well as its generalisation to C-varieties, and allows us to prove it simultaneously with Reiterman’s theorem. Theorem 2.15. The following statements hold. 1. E i len b erg ’ s t h eo r em . If V is a pseudovariety (respectively C-pseudovariety, ordered pseudovariety) and V 7! V, then V is a variety of languages (respectively C-variety of languages, positive variety of languages) and in each case this gives a one-to-one correspondence between pseudovarieties and varieties of languages.

16. Varieties

583

2. R ei t er m a n ’ s t h eo r em . A class V of monoids (stamps, ordered monoids) is a pseudovariety (respectively C-pseudovariety, ordered pseudovariety) if and only if it is defined (C-defined) by a set of profinite identities. In the argument we sketch below, we confine ourselves to the case of ordinary monoids, but everything generalises in an entirely straightforward fashion to ordered monoids and stamps. The key to the proofs of both parts of the theorem is Theorem 2.13 above, along with the following elementary but very useful lemma, already brought to the reader’s attention in § 1.1. Lemma 2.16. Let 'W A ! M be a morphism into a finite monoid. Then M divides the direct product of the syntactic monoids of the languages ' 1 .m/, m 2 M . Proof. For each m 2 M , let m W A ! Synt.' 1 .m// be the syntactic morphism of ' 1 .m/. It suffices to show that for each u; v 2 A , m .u/ D m .v/ for each m 2 M implies '.u/ D '.v/. Indeed, let m D '.u/. Then u 2 ' 1 .m/ and since m .v/ D m .u/, we have v 2 ' 1 .m/, '.v/ D m D '.u/. Corollary 2.17. Every pseudovariety of monoids is generated by the syntactic monoids it contains. Proof. The result follows directly from Lemma 2.16, since M recognises each ' 1 .M / (m 2 M ): thus each Synt.' 1 .m// divides M and hence lies in the pseudovarieties containing M . Now let V be a variety of languages and let E be a set of profinite identities defining V. Let also V be the class of finite monoids satisfying the profinite identities in E . It is easily verified that V is a pseudovariety. Moreover, if L is a regular language in A , we have L 2 A V if and only if L ` E , if and only if Synt.L/ satisfies the profinite identities in E , if and only if Synt.L/ 2 V. Thus V 7! V in the correspondence described in § 1.3. If W is another pseudovariety such that W 7! V, then V and W contain the same syntactic monoids, and Corollary 2.17 shows that V D W. This establishes Eilenberg’s theorem. For Reiterman’s theorem, we start with a pseudovariety V and consider the associated variety of languages V. The above reasoning shows that V is defined by any set of profinite identities which, seen in the setting of classes of languages, defines V. Note that these proofs are different from the classical proofs of Eilenberg’s theorem, in [20] or [44], and of Reiterman’s theorem, in [3], [49], or [58]. 2.6. Examples of varieties. We now look at some concrete instances of varieties, revisiting our examples from § 1, among others, in light of the theory presented above. In doing so, we will work from both sides of the correspondence between pseudovarieties and varieties of languages, at times beginning with a variety of languages, at others with a property of a class of finite monoids.

Howard Straubing and Pascal Weil

584

2.6.1. Idempotent and commutative monoids. We begin, as before, with the variety of languages corresponding to the pseudovariety J1 . For each finite alphabet A, let A J1 be the smallest Boolean-closed family of subsets of A that contains all the languages B  , where B  A. Equivalently, it is the smallest Boolean-closed set containing all the A aA (a 2 A). Putting it again differently, A J1 is precisely the family of languages L in A for which membership of a word w in L depends only on the set ˛.w/ of letters of w . This is because [ ¹v 2 A j ˛.v/ D ˛.w/º D ˛.w/ n B : B¨˛.w/

Observe that for all a 2 A and B  A, a

1





B DB a

1

´ ; D B

if a … B , if a 2 B .

Further, if C is another finite alphabet and 'W C  ! A is a morphism, '

1

.B  / D .C \ '

1

.B// :

Left and right quotient and inverse image under morphisms all commute with Boolean operations. So these two observations imply, independently of any algebraic considerations, that J1 is a variety of languages, and thus, by Theorem 2.13 is defined by a set of profinite identities. Further, from our proof of Eilenberg’s theorem, the same set of identities defines the corresponding pseudovariety of finite monoids. Of course, we have already exhibited these identities, but let us see what they look like in the context of our equational theory. Let X D ¹x; yº, and let A be any finite alphabet. Every language L 2 A J1 , satisfies the identities xy D yx and x 2 D x , since for any morphism 'W X  ! A and any u; v 2 A ; ˛.u'.xy/v/ D ˛.u'.yx/v/, and ˛.u'.x 2 /v/ D ˛.u'.x/v/. Conversely, suppose L  A satisfies these identities. We will show L 2 A J1 . Let w; w 0 2 B  , with w 2 L and ˛.w/ D ˛.w 0 /. We claim w 0 2 L. Since ˛.w/ D ˛.w 0 /, we can transform both w and w 0 into a common normal form w 00 by successively interchanging adjacent letters until the word is sorted (with respect to some total ordering on A) and then replacing occurrences of aa by a, where a 2 A. Interchanging adjacent letters entails replacing ua1 a2 v by ua2 a1 v , where u; v 2 A and a1 ; a2 2 A. Since L satisfies the identity xy D yx , if ua1 a2 v 2 L then ua2 a1 v 2 L (using the morphism 'W X  ! A that maps x; y to a1 ; a2 , respectively.). Similarly, replacing aa by a preserves membership in L, since L satisfies the identity x 2 D x . Thus J1 is defined by this pair of identities. It follows that the corresponding pseudovariety J1 of finite monoids is defined by the same pair of identities, and thus consists of the idempotent and commutative monoids. 2.6.2. Piecewise-testable languages. Let us consider the piecewise-testable languages of § 1.2. We denote the family of piecewise-testable languages over a finite alphabet A by A J. Let us look at the profinite identities satisfied by these languages. As observed earlier (§ 2.1), if u 2 X  then the sequence .unŠ /n is a Cauchy sequence whose limit is written u! . Moreover, for any morphism 'W X  ! A , where A is a finite

16. Varieties

585

! alphabet, '.u O ! / D .'.u// O (the idempotent power of '.u/ O ). Now let X D ¹x; yº. We claim that every piecewise-testable language L over A satisfies the profinite identities

.xy/! x D .xy/! D y.xy/! :

This is equivalent to saying that for all s; t; u; v 2 A , x () s.tu/! v 2 L x () su.tu/! v 2 L: x s.tu/! tv 2 L

Now fix an integer k > 0. For sufficiently large values of n, the words s.tu/nŠ tv;

s.tu/nŠ v;

su.tu/nŠ v

contain the same subwords of length k . Since L is piecewise-testable, for sufficiently large n, all but finitely many of the terms of the three sequences are either all in L or x is clopen, the three respective limits are either all in L x or all all outside of L. Since L x outside L. Thus, as we showed in § 2.5, the syntactic monoid of any piecewise testable language satisfies these same profinite identities. We arrive again at the observation that the syntactic monoid of every piecewise-testable language satisfies the identities .xy/! x D .xy/! D y.xy/! . That these identities define the pseudovariety J of finite J-trivial monoids is simple to establish. That they completely characterise the variety of piecewise-testable languages is the deep content of Simon’s theorem [63]. 2.6.3. Group languages. Similarly, the pseudovariety G of finite groups is defined by the profinite identity x ! D 1. As a consequence, the corresponding variety G of languages is defined by the same profinite identity. In contrast to the other examples presented here, we do not possess a simple description of G in terms of basic operations on words. 2.6.4. Left-zero semigroups. We already appealed to Eilenberg’s theorem in § 1 to show that the class K1 is not a variety of languages. But we can show here that it is a C-variety for a slightly restricted class C of morphisms. Let Cne denote the class of non-erasing morphisms between finitely-generated free monoids, those 'W A ! B  such that for all a 2 A, '.a/ ¤ 1. Let L 2 A K1 . If s; t; u; v 2 A , and t; u ¤ 1, then stuv 2 L if and only if stv 2 L. Moreover, this property of L characterises membership in A K1 . One way to state this property is that the variety of languages K1 is defined by the Cne -identity xy D x . Equivalently, the corresponding Cne -pseudovariety K1 of stamps is defined by the same Cne -identity. This means .'W A ! M / 2 K1 if '.uv/ D '.u/, for u; v 2 AC . Alternatively, one may consider, instead of the Cne -pseudovariety generated by the syntactic morphisms of languages in K1 , the pseudovariety of finite semigroups generated by the images of nonempty words under the syntactic morphisms. This was the approach originally taken, but here we prefer to emphasise that all these many different flavors of pseudovarieties can be treated in the same general setting.

586

Howard Straubing and Pascal Weil

2.6.5. Quasiaperiodic stamps. Whenever we have a morphism 'W A ! M , the family of sets ¹'.As / j s > 0º forms a subsemigroup of the power set semigroup P.M /. As this is a finite cyclic semigroup, generated by '.A/, it contains a unique idempotent. Thus there is some s > 0 such that '.As / D '.A2s /, so that '.As / is a subsemigroup of M . We call this the stable semigroup of ' . Let QA denote the set of morphisms ' from a free finitely-generated monoid onto a finite monoid such that ' is surjective, and the stable semigroup of ' is aperiodic. We claim QA is a Clm -pseudovariety of stamps, where Clm consists of morphisms W A ! B  between finitely generated free monoids such that all .a/, where a 2 A, are nonempty words having the same length. (The letters lm stand for length-multiplying, since the lengths of all words in A are multiplied by a constant factor when is applied.) To see this, suppose .'W B  ! M / 2 QA, and . W A ! B  / 2 Clm . Let '.B s / be the stable semigroup of ' , ' .At / the stable semigroup of ' W A ! Im.' /, and k the length of each .a/ for a 2 A. Then ' .At / D ' .Ast /  '.Akst / D '.As /, and thus the stable semigroup of ' is also aperiodic. Further, if the stable semigroups 'j .Asj / of stamps 'j W A ! Mj , for j D 1; 2, are aperiodic, then the stable semigroup of '1  '2 is contained in '1 .As1 /  '2 .As2 /, and is therefore aperiodic.Thus QA is a Clm -pseudovariety, and is accordingly defined by a set of profinite Clm -identities. What does it mean for a stamp 'W A ! M to satisfy a Clm identity u D v ? In such an identity, u and v are elec for some finite alphabet X . The identity is satisfied if for every morphism ments of X  W X ! A in Clm , 'O O .u/ D 'O O .v/. Informally, this says that so long as we replace the letters in u and v by elements of AC that all have the same length, the images in M are identical. We claim that QA is defined by the single profinite Clm -identity .x !

1

y/! D .x !

1

y/!C1 :

Let us prove this. First, we show that QA satisfies the identity. Let .'W A ! M / 2 QA, and choose p > 0 such that for all m 2 M , mp is idempotent. We then also have mps idempotent for all m 2 M , where '.As / is the stable semigroup of ' . If the identity is not satisfied, then there exist words u and v in B  , both of length k > 0, such that .'.ups

1

v//ps ¤ .'.ups

1

v//psC1 :

Thus ¹.'.ups 1 y//psCr j r > 0º is a nontrivial group in '..As /C / D '.As /, contradicting membership in QA. Conversely, suppose a stamp 'W A ! M satisfies the identity. Suppose the stable semigroup '.As / contains a group element g D '.u/, with juj D s . Let e D '.v/, where jvj D s is the identity of this group. Since ' satisfies the identity, e D '..u!

1

v/! / D '..u!

1

v/!C1 / D g

1

;

so every group in '.As / is trivial. We introduced the Clm -pseudovariety QA in § 1 in quite different terms, by giving a logical description of the corresponding Clm -variety of languages. We will show in § 3 that they do in fact correspond.

16. Varieties

587

2.6.6. †1 -languages. As in § 1.2.2, we denote by A JC the family of languges over A defined by †1 sentences. Languages in this family are precisely the finite unions of the languages Lv , where v 2 A . We claim that JC is defined by the profinite ordered identity x 6 1. A language L satisfies this identity if and only if for all u; v; w 2 A , whenever uw 2 L, then uvw 2 L. Clearly, each Lv satisfies this identity. We must show, conversely, that any language satisfying this identity is a finite union of Lv for various v 2 A . Certainly, if L satisfies the identity and v 2 L, then Lv  L, so that [ LD Lv : v2L

We need to show that this can be replaced by a finite union. Let T consist of the subword-minimal elements of L, that is, those v 2 L such that no proper subword of v is in L. Then [ LD Lv : v2T

We now invoke a theorem of G. Higman [30]: the subword ordering in A has no infinite antichains. That is, any set T of words in which no element is a strict subword of another element is finite. The corresponding ordered pseudovariety JC consequently consists of all partially ordered finite monoids for which the identity 1 is the maximum element, and thus a language belongs to A JC if and only if its ordered syntactic monoid satisfies this property. 2.6.7. Languages with zero. All of our examples so far have concerned some flavor of varieties of languages, language families that are defined across all finite alphabets and are closed under inverse images of morphisms between free monoids. Part of the great novelty of the equational theory of Gehrke et al. [25] presented here is that it applies to language classes with weaker closure properties. Here we give a simple example. We say a regular language L  A is a language with zero if Synt.L/ has a zero. This is equivalent to saying that there is a two-sided ideal J in A such that either J  L or L \ J D ;. This property is easily seen to be closed under Boolean operations and quotients. It is, not, however, closed under inverse images of any compositionclosed class C of morphisms that contains the length-preserving morphisms. Indeed, let L  A be any regular language without a zero, and let b be a new letter. Then, viewed as a subset of .A[¹bº/ , L has a zero, so this class is not closed under the inverse image of the length-preserving morphism that embeds A in .A [ ¹bº/ . Nonetheless, by our Corollary 2.10, this class of languages is defined by a set of profinite inequalities. We now exhibit such a set of inequalities. We start by defining three sequences of words in A . Let u1 ; u2 ; : : : be any enumeration of the elements of A , let vn D u1    un ;

588

Howard Straubing and Pascal Weil

and w1 D 1;

wnC1 D .wn vn wn /nŠ :

Look at the image of the wi under a surjective morphism 'W A ! M , where M is finite. Since every u 2 A occurs as a factor of all but finitely many wi , almost all '.wi / are in the minimal ideal K of M . Since for all m 2 M , mnŠ is idempotent for sufficiently large n, almost all '.wi / are idempotents in the minimal ideal of M . Finally, if '.wi / is such an idempotent e , then '.wi C1 / is an idempotent in eKe , and so is itself equal to e . Thus for every finite monoid, the sequence .'.wn //n is convergent, so .wn /n converges c , such that '. to an element A of A O A / is an idempotent in the minimal ideal of '.A /.  Suppose L  A has a zero. Then the minimal ideal of Synt.L/ consists of this 0 alone, so if  is the syntactic morphism of L and a 2 A, . O A / D .a O A / D . O A a/. Thus L satisfies the equalities aA D A D A a

for all a 2 A. Conversely, if L satisfies these equalities, then the minimal ideal of .A / contains just one element, so L is a language with zero. So these equalities define the class of languages with zero. 2.6.8. Languages defined by density. Say that a language L  A is dense if every word of A occurs as a factor of a word in L, that is, L \ A uA ¤ ; for every u 2 A . The set consisting of A and the non-dense languages forms a quotient-closed lattice, which is defined by the profinite inequalities x 6 0 (x 2 A ) – this is short for aA D A a D A for every a 2 A and x 6 A for every x 2 A , see [25]. Now define the density of a language L as the function dL .n/ which counts the number of words of length n in L. A language with bounded density (also called slender) is easily seen to be a finite union of languages of the form xu y (x; u; y 2 A ). Similarly, a language of polynomial density, also called sparse, can be shown to be a finite union of languages of the form u0 v1 u1    vn un where the ui and vj are in A . Together with A , the set of slender (resp. sparse) languages in A forms a quotientclosed lattice of languages, for which defining profinite inequalities can be found in [25]. 2.7. Deciding membership in an equationally defined class of languages. We are often interested in decision problems for families of regular languages. We say that a family F of regular languages over a finite alphabet A is decidable if there is an algorithm that, given a regular language in L  A as input, determines whether L 2 F. Here a regular language L is given by specifying a DFA that recognises L, or some other formalism (e.g., regular expression, logical formula) from which a DFA can be effectively computed. The problem arises, for example, if we are looking for a test of whether a given language is expressible in some logic for defining regular languages (see § 3). We can similarly define decidable families of finite monoids. Such a family F is decidable if there is an algorithm that, given the multiplication table for a finite monoid M , determines whether M 2 F. The definition extends in the obvious fashion to families of ordered monoids and stamps. For ordered monoids the input includes,

16. Varieties

589

in addition to the multiplication table of M , a representation of the graph of the partial order on M . For stamps 'W A ! M we are also given the values '.a/ for a 2 A. We will say that a variety V of languages is decidable if A V is decidable for every finite alphabet A. In this case the Eilenberg correspondence theorem gives a rather obvious connection between the two kinds of decidable families.

Theorem 2.18. A ( positive) variety (respectively, C-variety) of languages is decidable if and only if the corresponding pseudovariety of (ordered ) monoids (respectively, stamps) is decidable. Proof. We give the proof just for the case of ordinary varieties of languages and pseudovarieties of monoids; the argument is essentially the same for all the other variants. Let V be a variety of languages and V the corresponding pseudovariety of monoids. Suppose first that V is decidable. Let A D .Q; A; i; F / be a DFA recognising a language L  A . From A we can effectively construct the multiplication table of Synt.L/. We then apply the algorithm for V to decide whether Synt.L/ 2 V, and thus whether L 2 A V. Conversely, suppose V is decidable. Let M be a finite monoid and choose a finite alphabet A together with a surjective morphism 'W A ! M . (For example, we could choose A D M and ' the extension to A of the identity map on M .) Then by Lemma 2.16 and Corollary 2.17, M divides the direct product of the monoids Synt.' 1 .m// for m 2 M , and each of the Synt.' 1 .m// in turn divides M . Thus M 2 V if and only if each of the languages ' 1 .m/ is in A V. Furthermore, from ' we can construct a DFA .M; A; 1; ¹mº/ recognising ' 1 .m/, and thus decide whether each is in A V. Thus V is decidable. Decision problems for varieties of regular languages can have arbitrarily large computational complexity, or indeed be undecidable. To see this, observe simply that if P is any set of primes, then we can form the pseudovariety GP of finite groups G such that every prime divisor of jGj is in P . Testing membership of a given prime p in P then reduces, in time polynomial in p , to testing membership in GP , so GP is at least as complex as P . On the other hand, Reiterman’s theorem, which says varieties are defined by sets of profinite identities, suggests that we could determine membership in varieties simply by verifying whether identities hold in finite monoids. This is deceptive, since c do not generally have simple descriptions that make it possible to elements of X evaluate their images in finite monoids, and, further, the equational description of a pseudovariety might require infinitely many profinite identities. We can nonetheless say something definitive about the complexity of the decision problems in the case where the equational definition consists of a finite set of profinite identities  D  , c . This means that  and  are formed from elements where  and  are ! -terms in X of X by successive application of concatenation and the operation  7!  ! .

590

Howard Straubing and Pascal Weil

Theorem 2.19. Let V be a variety of languages defined by a finite set of profinite identities of the form  D  , where  and  are ! -terms, and let V be the corresponding pseudovariety of finite monoids. Then V is decidable by a logspace algorithm in the size of the input multiplication table, and V is decidable by a polynomial space algorithm in the size of the input automaton. Proof. We first consider testing membership of a monoid M in V. Let jM j D n. The multiplication table of M can be represented in O.n2 log n/ bits and each element of M by O.log n/ bits. We will show how to determine membership of M in V using k  log2 n additional bits of workspace, where the constant k is determined by the length of the longest ! -term occurring in the defining profinite identities for V. To make the proof easier to follow, let us suppose we have an identity ..x ! y/! z/! D .xz/! . The algorithm loops through all triples .x; y; z/ of elements of M and writes them in the workspace. It then uses log2 n bits of additional workspace to compute x ! . This is done by repeatedly consulting the multiplication table, writing x 2 ; x 3 ; : : : in the same workspace, and after each write, consulting the multiplication table to check if the element is idempotent. We similarly compute .x ! y/! , ..x ! y/! z/! , and .xyz/! . All in all, we used 7  log2 n bits of workspace. After all the values are computed, we compare the last two. The algorithm rejects if it finds a mismatch. If it finds none, it goes on to the next identity, and accepts if all the identities are tested with no mismatch. We now turn to testing membership in V. The algorithm we give is actually a nondeterministic polynomial space algorithm for nonmembership of a regular language in A V. Since, by Savitch’s theorem (see [60] and also Sipser [64]) nondeterministic polynomial space is equivalent to deterministic polynomial space, and the latter is closed under complement, this will be enough. Let us work with the same example identity we used in the first part of the proof. The algorithm begins by guessing words x; y; z and computing the vectors .q1 x; : : : ; qn x/;

.q1 y; : : : ; qn y/;

.q1 z; : : : ; qn z/;

where ¹q1 ; : : : ; qn º is the set of states of the input DFA. Observe that the words x; y; z themselves are not stored. Instead they are guessed letter by letter, and only the vectors of states are written in the workspace. This requires O.n log n/ bits, where n is the number of states of the DFA. Observe as well that once we have the vector .q1 u; : : : ; qn u/ we can, with an additional n log2 n bits, compute the vector .q1 u! ; : : : ; qn u! /, since we can write the vectors of the successive powers .q1 uk ; : : : ; qn uk / reusing the same workspace, and then check after each write whether quk D qu2k for each state q . As a result we obtain the vectors .q1 './; O : : : ; qn '.// O , .q1 './; O : : : ; qn '.// O for some morphism 'W X  ! A . If these vectors turn out to be different, we accept. Thus this algorithm nondeterministically recognises the complement of A V, using O.n log n/ space. The foregoing theorem illustrates a potentially large gap in complexity between testing membership in V from an input DFA and testing membership in the corresponding pseudovariety V from the multiplication table of a monoid. This is to be expected,

16. Varieties

591

since an automaton is in general exponentially more succinct than the multiplication table of its transition monoid. In some instances, however, it is possible to give efficient algorithms that begin with automata, using so-called forbidden pattern characterisations of varieties. We illustrate this with a very simple example, using the ordered variety JC . Consider the following figure: p w

v

q w

We say that a DFA .Q; A; i; F / contains this pattern if there are states q1 ; q2 and words u; v; w 2 A such that i u D q1 , q2 D q1 v , q1 w … F , q2 w 2 F . We say the DFA avoids the pattern if it does not contain it. It is easy to see that a DFA recognising a language L avoids this pattern if and only if whenever uw 2 L, uvw 2 L. Thus the languages in A avoiding the pattern are exactly those that satisfy the inequality x 6 1; that is, the language family A JC . We use this to prove the following. Theorem 2.20. There is an algorithm determining membership in JC that runs in nondeterministic logspace in the size of an accepting DFA. (In particular, membership can be determined in polynomial time.) Proof. We nondeterministically guess letters to obtain an accessible state q1 , using log2 n bits, where n is the number of states in the automaton. We then further guess letters to obtain another state q2 D q1 v , written on another log2 n-bit field in the work space. Finally, we guess more letters, applying them to both components of the pair .q1 ; q2 / and arrive at a state .q1 w; q2 w/. We accept if the first member of this pair of states is an accepting state of the DFA and the second is not. Thus we have a nondeterministic logspace algorithm for the regular languages outside of JC . But by the theorem of Immerman and Szelepcsenyi (see [32], [73], and also [64]), nondeterministic logspace is closed under complement, so we have the desired result. The same reasoning is used in many proofs showing that varieties of languages are decidable in nondeterministic logspace: find a forbidden pattern characterisation of the variety using a fixed number of states. (For instance, Pin and Weil [50], Glasser and Schmitz [26].) While such results appear to bridge the complexity gap between polynomial-time algorithms that begin with a multiplication table and exponential-time algorithms that begin with an automaton, forbidden pattern arguments are not always available. In particular, we have the following result, which we cite without proof, from Cho and Huynh [18]. Theorem 2.21. Testing whether a regular language given by a DFA is aperiodic is PSPACE-complete.

592

Howard Straubing and Pascal Weil

3. Connections with logic In § 1 we outlined, in an informal way, some of the logical apparatus for expressing properties of words over a finite alphabet. Here we give a more precise and general description. As before, variable symbols x; y; x1 ; x2 , etc., denote positions in a word. For each a 2 A our logics have a unary predicate symbol Qa , where Qa x is interpreted to mean “the symbol in position x is a.” We also have a binary predicate symbol s , where s.x; y/ is interpreted to mean “position y is the successor of position x .” We will usually use the alternative notation y D x C 1 for this. We now consider monadic second-order formulas over this base of predicates. These are formulas built not merely by quantifying over individual positions, but also by quantifying over sets of positions, denoted by upper-case variable letters, and employing an additional relation symbol x 2 X between positions (first-order variables) and sets of positions (second-order variables). For example, consider the monadic second order formula ' : where '1 is and '2 is

9x9y9X.Qa x ^ Qb y ^ x 2 X ^ y 2 X ^ '1 ^ '2 /; :9z.x D z C 1 ^ z 2 X / ^ :9z.z D y C 1 ^ z 2 X /;

8z.z 2 X ! .y D z _ 9u.u 2 X ^ u D z C 1//: The formula ' is a sentence; that is, it has no free variables. Thus ' defines a language L' over A D ¹a; bº, namely the set of all words in which the formula is true. The sentence asserts the existence of positions x and y with letters a and b respectively, and of a set X of positions that contains both x and y , that contains the successor of each of its elements with the exception of y , and that contains no elements less than x . Thus L' is the regular language A aA bA . This example is an instance of the following important theorem, due to J. R. Büchi, see [17]; see also [41] and [71].

Theorem 3.1. A language L  A is regular if and only if L D L' for some sentence ' of monadic second-order logic. We obtain subclasses of regular languages by restricting these second-order formulas in various ways. One obvious such restriction is to study first-order formulas: those formulas that use no second-order quantification. We denote this logic, as well as the family of regular languages that can be defined in it, by FOŒC1. More generally, consider any k -ary relation ˛ on the set of positions in a word that does not depend on the letters that appear in the word. Suppose further that ˛.x1 ; : : : ; xk / is definable by a formula of monadic second-order logic. Then we obtain a subclass of the regular languages by considering those languages definable by first-order sentences in which ˛ is allowed as an atomic formula. We denote this class FOŒ˛, and similarly write FOŒ˛1 ; ˛2 ; : : : when there are several such predicates. For example, the relation x < y is definable in monadic second-order logic, by a formula much like the one used above

16. Varieties

593

to define the language L D A aA bA . Thus we obtain the logic and the language class FOŒ 1, a binary predicate k that says two positions are equivalent modulo k . These predicates, too, are definable in monadic second-order logic, and thus we obtain language classes FOŒ 0. In the r -round game Gr .u; u0 ; ˛1 ; : : : ; ˛m /, Spoiler makes a play by 0 placing a new pebble xkC1 in u or xkC1 in u0 . If Spoiler played in u then Duplicator 0 0 must respond with xkC1 in u . Otherwise Duplicator responds with xkC1 in u. The result is two new pebbled words v; v 0 . Spoiler and Duplicator proceed to play the game

594

Howard Straubing and Pascal Weil

Gr 1 .v; v 0 ; ˛1 ; : : : ; ˛m /. Whoever wins this .r 1/-round game is the winner of the r -round game. Ordinary words may be considered as special instances of pebbled words and thus we can consider the games Gr .w; w 0 ; ˛1 ; : : : ; ˛m /, where w; w 0 2 A . The fundamental property of such games is given by the following theorem.

Theorem 3.2. Let w; w 0 2 A , r > 0. The words w and w 0 satisfy the same sentences in FOŒ˛1 ; : : : ; ˛m  of quantifier depth r or less if and only if Duplicator has a winning strategy in Gr .w; w 0 ; ˛1 ; : : : ; ˛m /. See, for example, [41] and [71]. Here is an example. Consider the two words w D aab and w 0 D aaab . Spoiler has a winning strategy if G2 .w; w 0 ; 1, t 2 Tn , and s2 ; : : : ; sn 2 S , we have tS .s; s2 ; : : : ; sn / 2 L if and only if tS .s 0 ; s2 ; : : : ; sn / 2 L. For some varieties, such as of semigroups, monoids, groups, or rings, and for any finitely generated variety of lattices, it turns out that, rather than considering all terms in the preceding equivalence, it suffices to consider a finite number of them. For instance, for the variety of monoids, it suffices to consider the single term t D .xy/z , as in the usual definition of the syntactic congruence for monoids. See Clark et al. [46] for alternative characterisations of varieties with such a finiteness property.

17. Profinite topologies

619

For an algebra S , we say that a subset L of S is recognised by a homomorphism 'W S ! T if L D ' 1 'L. In other words, L is a union of classes of the kernel congruence ker ' D .'  '/ 1 S or, equivalently, ker ' is contained in L . For a class C of algebras, we say that a subset L of S is C-recognisable if L is recognised by a homomorphism 'W S ! T into some algebra T from C. In particular L is recognisable by some finite algebra if and only if L has finite index, in which case we also say simply that L is recognisable. 2.2. Pseudometric and uniform spaces. A pseudometric on a set X is a function d from X  X to the non-negative reals such that the following conditions hold: i. d.x; x/ D 0 for every x 2 X ; ii. d.x; y/ D d.y; x/ for all x; y 2 X ; iii. triangle inequality: d.x; z/ 6 d.x; y/ C d.y; z/ for all x; y; z 2 X . In case, additionally, d.x; y/ D 0 implies x D y , then we say that d is a metric on X . If, instead of the triangle inequality, we impose the stronger iv. ultrametric inequality: for all x; y; z 2 X ,

d.x; z/ 6 max¹d.x; y/; d.y; z/º

then we refer respectively to a pseudo-ultrametric and an ultrametric. For each of these types of “something” metrics, a “something” metric space is a set endowed with a “same thing” metric. The remainder of this section is dedicated to recalling the notion of a uniform space. We build up here on the approach of [32]. The reader may prefer to consult a book on general topology such as [109]. Definition 2.1. A uniformity on a set X is a set U of reflexive binary relations on X such that the following conditions hold: 1. if R1 2 U and R1  R2 , then R2 2 U; 2. if R1 ; R2 2 U, then there exists R3 2 U such that R3  R1 \ R2 ; 3. if R 2 U, then there exists R0 2 U such that R0 ı R0  R; 4. if R 2 U, then R 1 2 U. An element of a uniformity is called an entourage. A uniform space is a set endowed with a uniformity, which is usually understood and not mentioned explicitly. A uniformity basis on a set X is a set U of reflexive binary relations on X satisfying the above conditions (2)–(4). The uniformity generated by U consists of all binary relations on X that contain some member of U. A uniformity U is transitive if it admits a basis consisting of transitive relations. The notion of a uniform space generalises that of a pseudometric space. In this respect, the following notation is suggestive of the intuition behind the generalisation.

620

Jorge Almeida and Alfredo Costa

For an entourage R and elements x; y 2 X , we write d.x; y/ < R to indicate that .x; y/ 2 R. Indeed, given a metric d on X , if we let R denote the set of pairs .x; y/ 2 X  X such that d.x; y/ <  , then the set Ud of all R , with  > 0, is a uniformity basis on X such that d.x; y/ < R if and only if d.x; y/ <  . The uniformity Ud is said to be defined by d . The topology of a uniform space X (or induced by its uniformity) has neighborhood basis for each x 2 X consisting of all sets of the form BR .x/ D ¹y 2 X W d.x; y/ < Rº. Not every topology is induced by a uniformity, see Theorem 38.2 in [109]. Note that the Ttopology induced by a uniformity U on X is Hausdorff if and only if the intersection U is the diagonal T (equality) relation X . In general, it follows from the T definition of uniformity that U is an equivalence relation on X . The quotient set X= U is then naturally endowed with the quotient uniformity, whose T T entourages are the relations R= U, with R 2 U. Of course, the quotient space X= U is Hausdorff T and we call it the Hausdorffisation of X while the natural mapping X ! X= U is called the natural Hausdorffisation mapping. Given a uniformity U on a set X and a subset Y , the relative uniformity on Y consists of the entourages of the form R\.Y Y / with R 2 U. Endowed with this uniformity, Y is said to be a uniform subspace of X . Recall that a net in a set X is a function f W I ! X , where I is a directed set, meaning a set endowed with a partial order 6 such that, for all i; j 2 I , there is some k 2 I with i 6 k and j 6 k . A subnet of such a net is a net gW J ! X for which there is an order-preserving function W J ! I such that g D f ı  and, for every i 2 I , there is some j 2 J with i 6 .j /, that is,  has cofinal image in I . Usually, the net f is represented by .xi /i 2I , where xi D f .i /. The subnet g is then represented by .xij /j 2J , where ij D .j /. In case X is a topological space, we say that the net .xi /i 2I converges to x 2 X if, for every neighborhood N of x , there is some i 2 I such that xj 2 N whenever j > i . A net .xi /i 2I in a uniform space X is said to be a Cauchy net if, for every entourage R, there is some i 2 I such that d.xj ; xk / < R whenever j; k > i . A uniform space is said to be complete if every Cauchy net converges. A Hausdorff topological space X is said to be compact if every open covering of X contains a finite covering. Equivalently, every net in X has a convergent subnet. A topological space is said to be zero-dimensional if it admits a basis consisting of clopen sets, that is sets that are both closed and open. It is well known that a compact space is zero-dimensional if and only if it is totally disconnected, meaning that all its connected components are singleton sets. One can also show that a compact space has a unique uniformity that induces its topology, see Theorem 36.19 in [109]. A uniform space X is totally bounded S if, for every entourage R, there is a finite cover X D U1 [    [ Un such that nkD1 Uk  Uk  R. It is well known that a Hausdorff uniform space is compact if and only if it is complete and totally bounded, see Theorem 39.9 in [109]. A function 'W X ! Y between two uniform spaces is uniformly continuous if, for every entourage R of Y , there is some entourage R0 of X such that d.x1 ; x2 / < R0 implies d.'.x1 /; '.x2 // < R. Equivalently, ' maps Cauchy nets to Cauchy nets.

17. Profinite topologies

621

We say that ' is a uniform isomorphism if it is a uniformly continuous bijection whose inverse is also uniformly continuous. The function ' is a uniform embedding if ' is a uniform isomorphism of X with a subspace of Y . Note that, if 'W X ! Y is a uniformly T continuous function, then ' induces a unique uniformly continuous T function W X= UX ! Y = UY between the corresponding Hausdorffisations such that ı X D Y ı ' , where X and Y are the natural Hausdorffisation mappings. We call the Hausdorffisation of ' . One can show (see Theorem 38.3 in [109]) that a uniformity is defined by some pseudometric (respectively by a pseudo-ultrametric) if and only if it has a countable basis (and, respectively, it is transitive). In the Hausdorff case, one can remove the prefix “pseudo.” Moreover, every uniform space can be uniformly embedded in a product of pseudometric spaces, see Theorem 39.11 in [109]. For every uniform space X there is a complete uniform space Xy such that X embeds uniformly in Xy as a dense subspace. This can be done by first uniformly embedding X in a product of pseudometric spaces and then completing each factor by diagonally embedding it in the space of equivalence classes of Cauchy sequences under the relation .xn /n  .yn /n if lim d.xn ; yn / D 0 (cf. Theorems 39.12 and 24.4 in [109]). Such a space Xy is unique in the sense that, given any other complete uniform space Y in which X embeds uniformly as a dense subspace, there is a unique uniform isomorphism Xy ! Y leaving X pointwise fixed. The uniform space Xy is called the completion of X . It is easy to verify that the Hausdorffisation of the completion of X is the completion of the Hausdorffisation of X ; it is known as the Hausdorff completion of X . Moreover, the Hausdorff completion of X is compact if and only if X is totally bounded. The following is a key property of completions. Proposition 2.1. Let X and Y be uniform spaces and let 'W X ! Y be a uniformly continuous function. Then there is a unique extension of ' to a uniformly continuous function 'W O Xy ! Yy . Let I be a nonempty set. If Ui is a uniformity on a set Xi for each i 2 I , then the Q Cartesian product i 2I Xi may be endowed with the product uniformity, with basis consisting of all sets of the form pi11 .R1 / \    \ pin1 .Rn /, where each Rj 2 Uij and each pi W X  X ! Xi  Xi is the natural projection on each component. From the fact that a nonempty product of complete uniform spaces is complete (see Theorem 39.6 in [109]), it follows that completion and product commute. One can also easily show that Hausdorffisation and product commute. 2.3. Profinite uniformities and metrics. By a topological algebra we mean an algebra endowed with a topology with respect to which each basic operation is continuous. A compact algebra is a topological algebra whose topology is compact. We view finite algebras as topological algebras with respect to the discrete topology. When we write that two topological algebras are isomorphic we mean that there is an algebraic isomorphism between them which is also a homeomorphism. A subset X of a topological algebra S is said to generate S if it generates a dense subalgebra of S .

622

Jorge Almeida and Alfredo Costa

Similarly, a uniform algebra is an algebra endowed with a uniformity such that the basic operations are uniformly continuous. Note that a uniform algebra is also a topological algebra for the topology induced by the uniformity and that, in case the topology is compact, the basic operations are continuous if and only if they are uniformly continuous (for the unique uniformity inducing the topology). Consistently with the choice of the discrete topology for finite algebras, we endow them with the discrete uniformity, in which every reflexive relation is an entourage. Let F be a class of finite algebras. A subset L of a topological (respectively uniform) algebra S is said to be F-recognisable if there is a continuous (resp. uniformly continuous) homomorphism 'W S ! P into some P 2 F such that L D ' 1 'L. In case F consists of all finite algebras, we say simply that L is recognisable to mean that it is F-recognisable. Let T be a class of topological algebras. A topological algebra S is said to be residually in T if, for every pair of distinct points s; t 2 S , there exists a continuous homomorphism 'W S ! P , into some P 2 T , such that '.s/ ¤ '.t/. Suppose that S is a topological algebra and Q is a pseudoquasivariety. The case that will interest us the most is when Q is a pseudovariety and S is a discrete algebra. The pro-Q uniformity on S , denoted UQ , is generated by the basis consisting of all congruences  such that S= 2 Q and the natural mapping S ! S= is continuous. Note that UQ is indeed a uniformity on S , which is transitive. In case Q consists of all finite algebras, we also call the pro-Q uniformity the profinite uniformity. The pro-Q uniformity on S is Hausdorff if and only if S is residually in Q as a topological algebra. More precisely, the Hausdorffisation of S is given by the pro-Q uniform structure of S=Q , under the quotient topology. The topology induced by the pro-Q uniformity of the algebra S is also called its pro-Q topology. Sets that are open in this topology are also said to be Q-open and a similar terminology is adopted for closed and clopen sets. Similar notions can be defined if we start with a uniform algebra instead of a topological algebra, replacing continuity by uniform continuity, but we will have no use for them here. Note that the pro-Q uniformity UQ is totally bounded for a pseudoquasivariety Q. Given a subset L of an algebra S , we denote by EL the equivalence relation whose classesTare L and its complement S n L. Note that, for a congruence  on S , we have  D L EL , where the intersection runs over all  -classes. The following is now immediate. Proposition 2.2. Suppose that Q is a pseudoquasivariety and S is a topological algebra. 1. The Hausdorff completion of S under UQ is compact. 2. A subset L of S is Q-recognisable if and only if EL belongs to UQ . In case Q is a pseudovariety, a further equivalent condition is that the syntactic congruence L belong to UQ . 3. The Q-recognisable subsets of S are Q-clopen and constitute a basis of the pro-Q topology of S . In particular, the pro-Q topology of S is zero-dimensional and a subset L of S is Q-open if and only if L is a union of Q-recognisable sets.

17. Profinite topologies

623

In contrast, not every Q-clopen subset of an algebra S needs to be Q-recognisable. For instance, for the pseudovariety N, of all finite nilpotent semigroups, one may easily show that the pro-N topology on the (discrete) free semigroup AC over a finite alphabet A is discrete, and so every subset is clopen, while it is well known that the N-recognisable subsets of AC are the finite and cofinite languages. For a pseudoquasivariety Q and a topological algebra S , we define two functions on S  S as follows. For s; t 2 S , rQ .s; t/ is the minimum of the cardinalities of algebras P from Q for which there is some continuous homomorphism 'W S ! P such that '.s/ ¤ '.t/, where we set min ; D 1. We then put dQ .s; t/ D 2 rQ .s;t / with the convention that 2 1 D 0. One can easily check that dQ is a pseudo-ultrametric on S , which is called the pro-Q pseudo-ultrametric on S . The following result is an immediate generalisation of (§ 3 in [85]), where the hypothesis that the signature is finite serves to guarantee that there are at most countably many isomorphism classes of finite  -algebras. Proposition 2.3. Suppose that  is a finite signature. For a pseudoquasivariety Q and a topological algebra S , the following conditions are equivalent: 1. the pro-Q uniformity on S is defined by the pro-Q pseudo-ultrametric on S ; 2. the pro-Q uniformity on S is defined by some pseudo-ultrametric on S ; 3. there are at most countably many Q-recognisable subsets of S ; 4. for every P 2 Q, there are at most countably many homomorphisms S ! P . In particular, all these conditions hold in case S is finitely generated. Moreover, if Q contains nontrivial algebras then, for the discrete free algebra FA Q over the variety generated by Q, the pro-Q uniformity is defined by the pro-Q pseudo-ultrametric if and only if A is finite. The next result gives a different way of looking into pro-Q topologies and uniformities. Proposition 2.4. Let S be a topological algebra and Q a pseudoquasivariety. 1. The pro-Q uniformity of S is the smallest uniformity U on S for which all continuous homomorphisms from S into members of Q are uniformly continuous. 2. The pro-Q topology of S is the smallest topology T on S for which all continuous homomorphisms from S into members of Q remain continuous. 3. The algebra S is a uniform algebra with respect to its pro-Q uniformity. In particular, it is a topological algebra for its pro-Q topology. Following [85], we say that a function 'W S ! T between two topological algebras is .Q; R/-uniformly continuous if it is uniformly continuous with respect to the uniformities UQ , on S , and UR , on T . Similarly, we say that ' is .Q; R/-continuous if it is continuous with respect to the Q-topology of S and the R-topology of T . It is now easy to deduce the following result, which is a straightforward generalisation of Theorem 4.1 in [85]. Proposition 2.5. Let Q and R be two pseudoquasivarieties, S and T be two topological algebras, and 'W S ! T an arbitrary function.

624

Jorge Almeida and Alfredo Costa

1. ' is .Q; R/-uniformly continuous if and only if, for every R-recognisable subset L of T , ' 1 L is a Q-recognisable subset of S . 2. ' is .Q; R/-continuous if and only if, for every R-recognisable subset L of T , ' 1 L is a union of Q-recognisable subsets of S . Proposition 2.5 was motivated by the work of Pin and Silva [86] on non-commutative versions of Mahler’s theorem in p -adic number theory, which states that a function N ! Z is uniformly continuous with respect to the p -adic metric if and only if it can be uniformly approximated by polynomial functions. 2.4. Profinite algebras. This subsection is mostly based on [10], where the reader may find further details. For a class T of topological algebras, a pro-T algebra is a compact algebra that is residually in T . A profinite algebra is a pro-T algebra where T is the class of all finite algebras. An inverse system I D .I; Si ; 'ij / of topological algebras consists of a family .Si /i 2I of such algebras, indexed by a directed set I , together with a family .'ij /i;j 2I Ii >j of functions, the connecting homomorphisms, such that the following conditions hold: i. each 'ij is a continuous homomorphism Si ! Sj ; ii. each 'i i is the identity function on Si ; iii. for all i; j; k 2 I such that i > j > k , the equality 'jk ı 'ij D 'i k holds. The inverse limit of an inverse system I D .I; Si ; 'ij / is the subspace lim I of Q i 2I Si consisting of the families .si /i 2I such that 'ij .si /QD sj whenever i > j . Note that, in case lim I is nonempty, it is a subalgebra of i 2I Si and, therefore, a topological algebra. The inverse limit may be empty. For instance, the inverse limit of the inverse system .N; Œn; C1Œ; 'nm / is empty, where the intervals are viewed as semilattices under the usual ordering and with the inclusion mappings as connecting homomorphisms 'nm . In contrast, if all the Si are compact algebras, then so is lim I, see Exercise 29C in [109]. The following is a key property of pro-V algebras for a pseudovariety V. Proposition 2.6. Let V be a pseudovariety, S a pro-V algebra, and 'W S ! T a continuous homomorphism onto a finite algebra. Then T belongs to V. More generally, for a pseudoquasivariety Q, the following alternative characterisations of pro-Q algebras are straightforward extensions of the pseudovariety case for semigroups, which can be found, for instance, in Proposition 4.3 in [10]. x of all pro-Q algebras Proposition 2.7. Let Q be a pseudoquasivariety. Then the class Q consists of all inverse limits of algebras from Q and it is the smallest class of topological algebras containing Q that is closed under taking isomorphic algebras, closed subalgex and Q have the same finite members. bras, and arbitrary direct products. The classes Q x In case Q is a pseudovariety, the class Q is additionally closed under taking profinite continuous homomorphic images.

17. Profinite topologies

625

Since every compact metric space is a continuous image of the Cantor set (see Theorem 30.7 in [109]), the profiniteness assumption in the second part of Proposition 2.7 cannot be dropped. The nontrivial parts of the next theorem were first observed in [36] to follow from the arguments in [4], which in turn extend the case of semigroups, due to Numakura [80], through the approach of Hunter [66]. The key ingredient is the following lemma, first stated explicitly and proved by Hunter in Lemma 4 in [66] for semigroups although, in this case, it can also be extracted from [80]. Lemma 2.8. Let S be a compact zero-dimensional algebra and let L be a subset of S for which the syntactic congruence is determined by finitely many terms. Then L is recognisable if and only if L is clopen. The reader may wish to compare Lemma 2.8 with Proposition 2.2(3) and the subsequent comments. Theorem 2.9. Let S be a compact algebra and consider the following conditions: 1. 2. 3. 4.

S S S S

is profinite; is an inverse limit of an inverse system of finite algebras; is isomorphic to a closed subalgebra of a direct product of finite algebras; is a compact zero-dimensional algebra.

Then the implications (1) () (2) () (3) H) (4) always hold, while (4) H) (3) also holds in case the syntactic congruence of S is determined by a finite number of terms. One can find in [46] explicit proofs of Lemma 2.8 and Theorem 2.9. As mentioned in § 2.1, the same paper provides characterisations of the finiteness assumption in Theorem 2.9. In particular, compact zero-dimensional semigroups, monoids, groups, rings, and lattices in finitely generated varieties of lattices are profinite. The finitely generated case of the following variant of Lemma 2.8 can be found in [4]. The essential step for the proof of the general case can be found in Lemma 4.1 in [10]. Proposition 2.10. Let Q be a pseudoquasivariety and let S be a pro-Q algebra. Then a subset L of S is clopen if and only if it is Q-recognisable, if and only if it is recognisable. In particular, the topology of S is the smallest topology for which all continuous homomorphisms from S into algebras from Q (or, alternatively, into finite algebras) are continuous with respect to it. Hence, a topological algebra is a pro-Q algebra if and only if it is compact and its topology coincides with its pro-Q topology. A way of constructing profinite algebras is via the Hausdorff completion of an arbitrary topological algebra S with respect to its pro-Q uniformity. We denote this completion by CQ .S /. The next result can be easily deduced from Propositions 2.1, 2.2, and 2.4.

626

Jorge Almeida and Alfredo Costa

Proposition 2.11. Let S be a topological algebra and Q a pseudoquasivariety. Then CQ .S / is a pro-Q algebra. Moreover, if S is residually in Q, then the topology of S coincides with the induced topology as a subspace of CQ .S /. It is important to keep in mind that the topology of a pro-Q algebra S may not be its pro-Q topology when S is viewed as a discrete algebra. To give an example, we introduce a pseudovariety which is central in the theory of finite semigroups: the class A of all finite aperiodic semigroups whose subgroups are trivial. Example 2.2. Let N be the discrete additive semigroup of natural numbers and consider its pro-A completion CA .N/, which is obtained by adding one point, denote it 1, which is such that n C 1 D 1 C n D 1 and lim n D 1. Then the mapping that sends natural numbers to 1 and 1 to 0 is a homomorphism into the semilattice ¹0; 1º which is not continuous for the topology of CA .N/ but which is continuous for the pro-A topology. In contrast, it is a deep and difficult result that, for every finitely generated profinite group, its topology coincides with its profinite topology as a discrete group [79]. The proof of this result depends on the classification of finite simple groups. The Q-recognisable subsets of an algebra S constitute a subalgebra PQ .S / of the Boolean algebra P.S / of all its subsets. On the other hand, a compact zero-dimensional space is also known as a Boolean space. The two types of Boolean structures are linked through Stone duality (cf. § IV.4 in [45]), whose easily described direction associates with a Boolean space its Boolean algebra of clopen subsets; every Boolean algebra is obtained in this way. The following result shows that the Boolean space CQ .S / and the Boolean algebra PQ .S / are Stone duals. In it, we adopt a convenient abuse of notation: for the natural mapping W S ! CQ .S / and a subset K of CQ .S /, we write K \ S for x for the closure of L in CQ .S /.  1 K , while, for a subset L of S , we write L Theorem 2.12. Let Q be a pseudoquasivariety and let S be an arbitrary topological algebra. Then the following are equivalent for a subset L of S : 1. the set L is Q-recognisable; 2. the set L is of the form K \ S for some clopen subset K of CQ .S /; x is open and L x \ S D L. 3. the set L

x is When the pro-Q topology of S is discrete, a further equivalent condition is that L x open. Moreover, the clopen sets of the form L with L a Q-recognisable subset of S form a basis of the pro-Q topology of S .

Since CQ .S / has further structure involved besides its topology, which is the sole to intervene in Stone duality, one may ask what further structure is reflected in the Boolean algebra. This question has been investigated in [57], and [58], in the context of the theory of semigroups and its connections with regular languages. For a topological algebra S , we denote by End.S / the monoid of continuous endomorphisms of S . It can be viewed as a subspace of the product space S S , that is

17. Profinite topologies

627

with the pointwise convergence topology. A classical alternative is the compact-open topology, for which a basis consists of all sets of the form .K; U /, which in turn consist of all self maps ' of S such that '.K/  U , where K is compact and U is open. These two topologies on a space of self maps of S in general do not coincide. However, for finitely generated profinite algebras they coincide on End.S /. This was first proved by Hunter (Proposition 1 in [65]) and rediscovered by the first author (Theorem 4.14 in [12]) in the context of profinite semigroups. Steinberg [105] showed how this is related with the classical theorem of Ascoli on function spaces. The proofs extend easily to an arbitrary algebraic setting. Theorem 2.13. For a finitely generated profinite algebra S , the pointwise convergence and compact-open topologies coincide on End.S / and turn it into a profinite monoid such that the evaluation mapping End.S /  S ! S , sending .'; s/ to '.s/, is continuous. A further result from [105] that extends to the general algebraic setting is that finitely generated profinite algebras are Hopfian in the sense that all continuous onto endomorphisms are automorphisms. Denote by Aut.S / the group of units of End.S /, consisting of all continuous automorphisms of S whose inverse is also continuous, the latter restriction being superfluous in case S is compact. From Theorem 2.13, it follows that, for a finitely generated profinite algebra S , Aut.S / is a profinite group. In case S is a profinite group, this result as well as the Hopfian property of S are well known in group theory [99]. 2.5. Relatively free profinite algebras. Let Q be a pseudoquasivariety. We say that a pro-Q algebra S is free pro-Q over a set A if there is a mapping W A ! S satisfying the following universal property: for every function 'W A ! T into a pro-Q algebra, there is a unique continuous homomorphism 'W O S ! T such that 'O ı  D ' . The mapping  is usually not unique and it is said to be a choice of free generators. The following result is well known [10]. Proposition 2.14. For every pseudoquasivariety Q and every set A, there exists a free pro-Q algebra over A, namely the inverse limit of all A-generated algebras from Q , with connecting homomorphisms respecting the choice of generators. Up to isomorphism respecting the choice of free generators, it is unique. x A Q The notation is justified We denote the free pro-Q algebra over a set A by  below. An alternative way of constructing free pro-Q algebras is through the pro-Q Hausdorff completion of free algebras.

Proposition 2.15. Let Q be a pseudoquasivariety and let A be a set. Let V be the variety generated by Q. Then the pro-Q Hausdorff completion of the free algebra FA V is a free pro-Q algebra over A.

Jorge Almeida and Alfredo Costa

628

x A Q is metrisable. In contrast, Note that, by Proposition 2.3, if A is finite, then  the argument presented in the end of § 3 of [32] for pseudovarieties of monoids may be extended to every nontrivial pseudoquasivariety Q to show that, if A is infinite, then x A Q is not metrisable.  A topological algebra S is self-free with basis A if A is a generating subset of S such that every mapping A ! S extends uniquely to a continuous endomorphism of S .

Theorem 2.16. The following conditions are equivalent for a profinite algebra S : 1. the topological algebra S is self-free with basis A; x A Q; 2. there is a pseudoquasivariety Q such that S is isomorphic with  x A V. 3. there is a pseudovariety V such that S is isomorphic with  Proof. The implications (3) H) (2) H) (1) are obvious, so it remains to prove that (1) H) (3). Suppose that (1) holds and let V be the pseudovariety generated by all finite algebras that are continuous homomorphic images of S . We claim that S is isomorphic x A V. with  We first observe that, since S is a profinite algebra, it is an inverse limit of finite algebras, which may be chosen to be continuous homomorphic images of S . Hence S is a pro-V algebra and, therefore, there is a unique continuous homomorphism x A V ! S such that, for a choice of free generators W A !  x A V, the composite ' ı  'W  is the inclusion mapping A ,! S . Since S is generated by A as a topological algebra, the function ' is surjective. It suffices to show that it is injective. x A V. Since  x A V is residually in V, there is some Let u; v be distinct points of  x continuous homomorphism W A V ! T , onto some T 2 V, such that .u/ ¤ .v/. By the definition of V, there are continuousQ homomorphisms i W S ! Vi (i D 1; : : : ; n) onto finite algebras, a subalgebra U of niD1 Vi , and a surjective homomorphism W U !Q T . Since  is surjective, there is a mapping W A ! U such that  ı  D ı . Let i W jnD1 Vj ! Vi be the i -th component projection. Since i is surjective, there is a function i W A ! S such that i ı i D i ı . By self-freeness of S , with basis A, it follows Q that there is a continuous endomorphism O i of S such that O i jA D i . Let W S ! niD1 Vi be the unique continuous homomorphism such that i ı  D i ı O i for i D 1; : : : ; n. The following diagram depicts the relationships between these mappings: i

A -

! T

!

'

!

!

x AV 



! S ! !  U 

 Oi

! ! S



! Q ! jnD1 Vj

i

! Vi ! i

Note that i ıjA D i ıi D i ı for i D 1; : : : ; n, which shows that jA D  and so the image of  is contained in U and the chain of equalities ıı'ı D ıjA D ı D ı holds, which yields  ı  ı ' D . Since .u/ ¤ .v/, we deduce that '.u/ ¤ '.v/, which establishes the claim that ' is injective.

17. Profinite topologies

629

Theorem 2.16 not only gives a characterisation of relatively free profinite algebras in terms of properties that only involve the algebras themselves, but also shows that, when talking about such algebras, we may as well deal only with pseudovarieties. Yet another description of relatively free profinite algebras is given by algebras of implicit operations, which further provide a useful viewpoint. For a class C of profinite algebras and a set A, an A-ary implicit operation w on C is a correspondence associating with each S 2 C a continuous operation wS W S A ! S such that, for every continuous homomorphism 'W S ! T between members of C, the equality wT .' ıf / D '.wS .f // holds for every f 2 S A . We call wS the interpretation of w in S .

Proposition 2.17. Let C be a class of finite algebras, let V be the pseudovariety it x A V and a pro-V algebra S , let w generates, and let A be a set. For w 2  xS W S A ! S be defined by w xS .'/ D '.w/ O , where 'O is the unique continuous homomorphism x A V ! S such that 'O ı  D ' . Then w  x is an A-ary implicit operation on the class of all pro-V algebras and every such operation is of this form. Moreover, the correspondence associating to w the restriction of w x to C is injective and, therefore, so is the correspondence w 7! w x. x A V with the implicit operation w Thus, we may as well identify each w 2  x that it determines. In terms of implicit operations, the interpretation of the basic operations is quite transparent: for an n-ary operation symbol f , implicit operations x A V, a pro-V algebra S , and a function ' 2 S A , w1 ; : : : ; wn 2  x

.f A V .w1 ; : : : ; wn //S .'/ D f S ..w1 /S .'/; : : : ; .wn /S .'//:

In other words, the basic operations are interpreted pointwise. Among the implicit operations on the class of all profinite algebras, we have the projections xa . More precisely, for a set A and a 2 A, the A-ary projection on the acomponent is interpreted in a profinite algebra S by .xa /S .'/ D '.a/ for each ' 2 S A . By restriction to pro-V algebras, we also obtain corresponding implicit operations, x A V generated by the xa with a 2 A which we still denote xa . The subalgebra of  is denoted A V. Its elements are also known as A-ary explicit operations on pro-V x A V, it follows immediately that A V is the algebras. From the universal property of  free algebra FA V, where V is the variety generated by V. The following result explains the notation. x A V. Proposition 2.18. Let V be a pseudovariety. Then the algebra A V is dense in  The operational point of view has the advantage that pro-V algebras are automatically endowed with a structure of profinite algebras over any enriched signature obtained by adding implicit operations on V. This idea is essential for § 2.6. x A V is said to be a pseudoidenA formal equality u D v between members of some  tity for V and the elements of A are called the variables of the pseudoidentity. It is said to hold in a pro-V algebra S if uS D vS . In case V is the pseudovariety of all finite algebras, we omit reference to V. For a set † of pseudoidentities for V, the class of all algebras from V that satisfy all pseudoidentities from † is denoted J†K; this class is said to be defined by † and † to be a basis of pseudoidentities for it.

630

Jorge Almeida and Alfredo Costa

Theorem 2.19 (Reiterman [93]). A subclass of a pseudovariety V is a pseudovariety if and only if it is defined by some set of pseudoidentities for V. There are many alternative proofs of Reiterman’s theorem, as well as extensions to various generalisations of the algebras considered in this chapter. The most relevant in the context of this handbook seems to be the one obtained by Molchanov [77] for “pseudovarieties” of algebras with predicates, also proved independently by Pin and Weil [88]. The interest in Reiterman’s theorem stems from the fact that it provides a language to obtain elegant descriptions of pseudovarieties. Moreover, namely through the techniques described in the next subsection, they sometimes lead to decidability results, even if in a somewhat indirect way. 2.6. Decidability and tameness. In the theory of regular word or tree languages, pseudovarieties serve the purpose of providing an algebraic classification tool for certain combinatorial properties. The properties that are amenable to this approach have been identified, first by Eilenberg [56] for word languages, and later by the first author in [5] and [6] and by Steinby in [106] for tree languages. By considering additional relational structure on the algebras, further combinatorial properties may be captured (see [83] and [91]). Basically, in such an algebraic approach, one seeks to decide whether a language has a certain combinatorial property by testing whether its syntactic algebra has the corresponding algebraic property, that is, if this algebra belongs to a certain pseudovariety. Thus, a property of major interest that pseudovarieties may have is decidability of the membership problem: given a finite algebra, decide whether or not it belongs to the pseudovariety. We then simply say that the pseudovariety is decidable. One way to establish that a pseudovariety is decidable is to prove that it has a finite basis of pseudoidentities which are equalities between implicit operations that can be effectively computed, so that the pseudoidentities in the basis can be effectively checked. In fact, for most commonly encountered implicit operations, the computation can be done in polynomial time, in terms of the size of the algebra, and so the verification of the basic pseudoidentities can then be done in polynomial time. However, many pseudovarieties of interest are not finitely based. For instance, it is easy to see that, if a pseudovariety is generated by a single algebra, then it is decidable, but it may not be finitely based, an important example being the pseudovariety generated by the syntactic monoid B21 of the language .ab/ over the 2-letter alphabet, see [81] and [100]. Moreover, contrary to a conjecture proposed by the first author [6], a pseudovariety for which the membership problem is solvable in polynomial time may not admit a finite basis of pseudoidentities [107]. Sapir has even shown that there is a finite semigroup that generates such a pseudovariety (see Theorem 3.53 in [69]). It has recently been announced by M. Jackson that the membership problem for V.B21 / is NP -hard and so, provided P ¤ NP , that problem cannot be decided in polynomial time, which would solve Problem 3.11 in [69].

17. Profinite topologies

631

Pseudovarieties are often described by (infinite) generating sets of algebras. This comes about by applying some natural operator on other pseudovarieties, like the join in the lattice of pseudovarieties. In general, for any construction C.S1 ; : : : ; Sn / of an algebra from given algebras Si , perhaps under suitable restrictions or additional data (like in the definition of semidirect product, where an action of one of the factors on the other is required), one may consider the pseudovariety C.V1 ; : : : ; Vn / generated by all algebras of the form C.S1 ; : : : ; Sn / with each Si in a given pseudovariety Vi . The join is obtained in this way by considering the usual direct product. Another type of operator of interest is the following: for two pseudovarieties V and W, their Mal’cev M W is the pseudovariety generated by all algebras S for which there is a product V congruence  such that S= belongs to W, and each class which is a subalgebra belongs to V. Since most such natural operators in the case of semigroups do not preserve decidability (see [1] and [42]), it is of interest to develop methods that, under suitable additional assumptions on the given pseudovarieties, guarantee that the operator produces a decidable pseudovariety. The starting point in the profinite approach is to obtain a basis of pseudoidentities for the resulting pseudovariety. In the context of semigroups and monoids, bases theorems of this kind have been established for Mal’cev products [87] and various types of semidirect products [38]. Unfortunately, there is a gap in the proof of the latter, so that the results are only known to hold under certain additional finiteness hypotheses. 1 The bases provided by such theorems for a binary operator C.V; W/ consist of pseudoidentities which are built from pseudoidentities determined by V by substituting the variables by certain implicit operations. The implicit operations that should be considered to test membership in C.V; W/ of a given finite A-generated alx A W, determined by the gebra S are the solutions of certain systems of equations in  operator C , subject to regular constraints determined by each specific evaluation of the variables in S which is to be tested. This approach was first introduced in [8] and [7], improved in [30], and later extended in [10] and, independently and in a much more systematic way, also in [96]. The reader is referred to [7], [30], and [10] for the proofs of the results presented in this section. We proceed to formalise the above ideas. Consider a set † of pseudoidentities, which we view as a system of equations. The sides of the equations u D v in † are x X U on a suitable ambient pseudovariety U over a fixed implicit operations u; v 2  alphabet X , whose letters are called the variables of the system. We may say that † consists of U-equations to emphasise this condition. Additionally, we impose for x A U over another fixed alphabet. The each variable x a clopen constraint Kx   x A U. We say that the constrained system constraints are thus recognisable subsets of  x A U is a function such has a solution in an A-generated pro-U algebra T if W X !  xX U !  x A U and W  x A U ! T are the that the following two conditions hold, where O W  unique continuous homomorphisms respectively extending and respecting the choice of generators of T : 1 See [96] for a discussion and a general basis theorem, which in turn has not led to decidability results.

632

Jorge Almeida and Alfredo Costa

1. for each variable x 2 X , the constraint .x/ 2 Kx is satisfied; 2. for each equation u D v in †, the equality . .u// O D . .v// O holds.

The following is a simple compactness result which can be found for instance in [10]. Theorem 2.20. A system of U-equations over a set of variables X with clopen conx A U (x 2 X ) has a solution in every A-generated algebra from a given straints Kx   x A V. subpseudovariety V of U if and only if it has a solution in  If the set of variables X is finite, which we assume from here on, then there is a x A U ! S into a finite algebra S that recognises all the continuous homomorphism 'W  x A U (x 2 X ). Then the existence of a solution for the system given constraints Kx   in an A-generated algebra T 2 U is equivalent to the existence of a solution in T for the same system for at least one of a certain set of constraints of the form Kx0 D ' 1 .s/ with s 2 S . Thus, one may prefer to give the constraints in the form of a function X ! S into an A-generated finite algebra S . Another formulation of the above ideas is in terms of relational morphisms, which is the perspective initially taken in [7] and which prevails in [96]. A relational morphism between two topological algebras S and T is a closed subalgebra  of the direct product S  T whose projection in the first component is onto. Note that, if S and T are pro-U algebras then so is  and if  is A-generated, then the induced continuous x A U ! S and W  x A U ! T are such that  is obtained by homomorphisms 'W  1 x x A U  T . This is called a canonical composing the relations '  S  A U and   factorisation of . An example of such a relational morphism is obtained as follows. Let 'W A ! S be a generating mapping for a pro-U algebra S and let V be a subpseudovariety of U. x A U ! S and W  xAU !  x AV Consider the unique continuous homomorphisms 'W O  1 respecting the choice of generators. Then V;A D 'O is a relational morphism from x A V. S to  We say that the system of U-equations † with constraints given by a function W X ! S into a finite algebra S is inevitable with respect to a relational morphism   S  T , where T is a profinite algebra, if there is a continuous homomorphism x X U ! T such that the following conditions hold: ıW  1. for each variable x 2 X , the constraint ..x/; ı.x// 2  is satisfied; 2. for each equation u D v in †, the equality ı.u/ D ı.v/ holds. One can easily check that this property is equivalent to the existence of a solution of x A U, where  D 'O 1 is the the system subject to the constraints Kx D 'O 1 ..x//   canonical factorisation associated with a finite generating set A for . Theorem 2.20 then yields the following similar compactness theorem for inevitability. Theorem 2.21. For a system of U-equations over a finite set X , of variables, with constraints given by a mapping X ! S into a finite algebra S and a subpseudovariety V of U, the following conditions are equivalent: 1. the constrained system is inevitable with respect to every relational morphism  from S into an arbitrary algebra from V;

17. Profinite topologies

633

2. the constrained system is inevitable with respect to every relational morphism  from S into an arbitrary pro-V algebra; 3. for some finite generating set A of S , the constrained system is inevitable with respect to the relational morphism V;A ; 4. for every finite generating set A of S , the constrained system is inevitable with respect to the relational morphism V;A . Let V be a subpseudovariety of U. We say that a constrained system is V-inevitable if it satisfies the equivalent conditions of Theorem 2.21. The pseudovariety V is said to be hyperdecidable with respect to a class S of systems of U-equations with constraints in algebras from U if there is an algorithm that decides, for each constrained system in S, whether it is V-inevitable. An approach to prove hyperdecidability which was devised by Steinberg and the first author (see [30] and [31]), inspired by seminal work of Ash [41], was to draw this property from other either more familiar or more conceptual properties. Assume that the class S consists of finite systems, that it is recursively enumerable, and that the implicit operations that appear on the sides of the equations of the systems are computable. Moreover, suppose that V is recursively enumerable. One can then effectively check whether a constrained system in S is inevitable with respect to a relational morphism from the constraining algebra into an algebra from V, which gives a semi-algorithm to enumerate the constrained systems which are not V-inevitable. To decide whether x A V it thus suffices to add hypotheses a constrained system from S has a solution in  to guarantee that there is also a semi-algorithm to enumerate the systems that are Vinevitable. To do so, the idea is to prove that if the system is V-inevitable, then there is a solution of a special kind, so that the candidates for such special solutions can be effectively enumerated and whether such a candidate is indeed a solution can be effectively checked. To formalise this idea, consider a recursively enumerable set  of computable implicit operations on U, including the basic operations. We call such a set  a computable implicit signature over U. Note that every pro-U algebra has automatically the structure of a  -algebra (see Proposition 2.17). For a subpseudovariety V of U, we x A V generated by A. It follows from the definition denote by A V the  -subalgebra of   of free pro-V algebra that A V is freely generated by A in the variety of  -algebras generated by V. The word problem for A V consists in, given two  -terms over the alphabet A, deciding whether they represent the same element of A V. We may now state the following key definition. Definition 2.2. Let V be a recursively enumerable subpseudovariety of U and let S be a class of constrained systems of U-equations. We say that V is  -reducible with respect x A U in  x A V, it has a to S if, whenever a constrained system in S has a solution W X !  x A V. If, moreover, the word problem for  V is decidable, solution 0 W X ! A U in 2  A 2 A topological formulation of the notion of  -reducibility was recently found in [26]. It simply states x A V from S taking values in  U that, for each system from S, forgetting the constraints, the solutions in  A x A V. are dense in the set of all solutions in 

634

Jorge Almeida and Alfredo Costa

then we say that V is  -tame with respect to S. We say that V is completely  -tame if it is  -tame with respect to the class of all finite constrained systems of equations of  -terms. The following result summarises the above discussion. Theorem 2.22. Let U be a recursively enumerable pseudovariety and let  be a computable implicit signature over U. Let S be a recursively enumerable class of constrained systems of equations between  -terms. Finally, let V be a subpseudovariety of U. If V is  -tame with respect to S, then V is hyperdecidable with respect to S. Several important examples of tame pseudovarieties are discussed in § 3.2. Here, we only present tameness results which hold in the general algebraic context to which this section is dedicated. Before doing so, we introduce a weaker version of tameness which is also of interest. Let S be an A-generated algebra from U and let  be a computable implicit signature. The relational morphism N V;A  S  A V is obtained by taking the intersection of V;A with S  A V. We say that V is weakly  -reducible for a class S of constrained systems of U-equations if, for every V-inevitable constrained system in S, say with constraints in the A-generated algebra S 2 U, the system is inevitable with respect to the relational morphism N V;A . Replacing  -reducibility by weak  -reducibility in the definition of  -tameness we speak of weak  -tameness. Viewing A V as a discrete algebra, there is another natural relational morphism  V;A  S  A V, namely the  -subalgebra generated by the pairs of the form .a; a/ with a 2 A. The notation is justified since, as it is easily proved, the relation N V;A is the closure of V;A in S  A V with respect to the discrete topology in the first component and the pro-V topology in the second component. We say that V is  -full if the two relational morphisms coincide for every A-generated algebra S from U. Note that a weakly  -reducible  -full pseudovariety is  -reducible. Conversely, the terminology is justified by the fact that, if V is  -reducible with respect to a constrained system † of U-equations, then it is also weakly  -reducible with respect to †. We say that the pseudovariety V has computable  -closures if there is an algorithm such that, given a finite alphabet A, a regular subset L of A V and an element v 2 A V, determines whether or not v belongs to the closure of L in the pro-V topology of A V. The following combines a couple of results from [30]. Theorem 2.23. Let V be a recursively enumerable subpseudovariety of a recursively enumerable pseudovariety U, let  be a computable implicit signature, and suppose that the word problem for each A V is decidable. 1. If V is  -full then V has computable  -closures. 2. If V is weakly  -reducible for a class S of constrained systems of U-equations and V has computable  -closures, then V is hyperdecidable with respect to S. We say that a class of algebras is locally finite if all finitely generated algebras in the variety it generates are finite. This is the case, for instance, for a pseudovariety generated by a single algebra but not every locally finite pseudovariety is of this kind.

17. Profinite topologies

635

A well-known example in the realm of semigroups is provided by the pseudovariety of all finite bands (in which every element is idempotent). A decidable locally finite pseudovariety V is said to be order computable if the x n V is function that associates with each positive integer n the cardinality of the algebra  computable. It seems to be an open problem whether every locally finite pseudovariety is order computable. The following result is an immediate extension of Theorem 4.18 in [30], which is based on the “slice theorem” of Steinberg [102]. Theorem 2.24. Let V be a  -tame pseudovariety with respect to a class S of systems of equations and let W be an order-computable pseudovariety. Then the join V _ W is also  -tame with respect to S. x A W D  W for One of the ingredients behind the proof of Theorem 2.24 is that  A every locally finite pseudovariety W and every implicit signature  . Under this weaker property for a computable implicit signature  , tameness becomes much simpler. The following result is a simple corollary of some of the above results. We do not know whether the  -fullness hypothesis can be dropped.

Proposition 2.25. Let  be a computable implicit signature and let V be a recursively x A V D  V holds for every finite enumerable pseudovariety such that the equality  A set A and V is  -full. Then V is completely  -tame if and only if the word problem for each A V is decidable.

3. The case of semigroups The motivation to study profinite topologies in finite semigroup theory comes from automata and language theory: Eilenberg’s correspondence theorem [56] shows the relevance of investigating pseudovarieties of semigroups and monoids. The results mentioned in this section by no means cover entirely the literature in the area that is presently available. In particular, we stick to the more classical case of semigroups, while for instance the cases of ordered semigroups or stamps have come to play a significant role, as can be seen in Chapter 16. It turns out that in all these cases the same relatively free profinite semigroups intervene and the results are also often quite similar, although sometimes their proofs involve additional technical difficulties. 3.1. Computing profinite closures. There are several reasons why profinite topologies are relevant for automata theory and § 1 and § 2 provide many of them. We start this subsection by formulating a simple problem which has a direct translation in terms of profinite topologies. Let A be a finite alphabet and let L  AC be a regular language. Membership in L of a word w 2 AC can be effectively tested by checking whether the action of w on the initial state of the minimal automaton of L leads to a final state, or whether the syntactic image of w belongs to that of L. But, one may be interested in a weaker test such as whether w may be separated from L by a regular language K of a particular type, for

636

Jorge Almeida and Alfredo Costa

instance a group language (i.e., a language whose syntactic semigroup is a group): does there exist a group language K  AC such that w 2 K and K \ L D ;? Let G denote the pseudovariety of all finite groups. In view of Proposition 2.2(3), the above separation property is equivalent to being able to separate w from L by some open set in the pro-G topology of AC . Thus, in terms of the pro-G topology, the above question translates into testing membership of w in the closure of L in the pro-G topology of AC . More generally, we have the following result, where we denote by clV .L/ the closure of L in the pro-V topology of AC . Proposition 3.1. Let A be a finite alphabet and let V be a pseudovariety of semigroups. For a regular language L  AC , a word w 2 AC can be separated from L by a V-recognisable language if and only if w … clV .L/. Thus, for a pseudovariety V to have computable  -closures for the implicit signature reduced to multiplication (see § 2.6) is a property that has immediate automata-theoretic relevance. The special case of separation by group languages has particular historical importance. It was first considered by Pin and Reutenauer [84], who proposed the following recursive procedure to compute clG .L/, where FG.A/ denotes the group freely generated by A. Theorem 3.2 (Theorem 2.4 in [84]). Given a regular expression for a language L  AC , replace the operation K 7! K C by that of taking the subgroup of FG.A/ generated by the argument K . The resulting expression describes a subset of FG.A/ and clG .L/ is its intersection with AC . The correctness of the algorithm described in Theorem 3.2 was reduced to the proposed conjecture that the product of finitely many finitely generated subgroups of a free group is closed in its profinite topology, thus generalising M. Hall’s result that finitely generated subgroups of the free group are closed in the profinite topology [60]. This conjecture was established by Ribes and Zalesski˘ı [97] using profinite group theory. The original motivation for computing clG .L/ comes from the fact that Pin and Reutenauer also showed that the correctness of their procedure implies that the “type II conjecture” holds. This other conjecture gives a constructive description of the group kernel KG .M / of a finite monoid M . More generally, for a pseudovariety H of groups, the H-kernel KH .M / consists of all m 2 M such that, for every relational morphism  from M to a group in H, .m; 1/ belongs to . The type II conjecture states that KG .M / is the smallest submonoid of M that contains the idempotents and is closed under the operation that sends m to amb if aba D a or bab D b (weak conjugation). An independent proof of the type II conjecture was obtained by Ash in [41] and is discussed in § 3.2. The pro-V closure of regular languages in AC has also been considered for other pseudovarieties V. For a pseudovariety H of groups, the motivation comes from the M H for a pseudovariety of monoids W. membership problem in the pseudovariety W Indeed, it is easy to show that a finite monoid M belongs to this Mal’cev product if and

17. Profinite topologies

637

only if KH .M / belongs to W. On the other hand, Pin [82] observed that, if 'W A ! M is an onto homomorphism and m 2 M , then m 2 KH .M / if and only if the empty word 1 belongs to clH .' 1 .m//. Thus, we have the following result. Proposition 3.3. Let W be a decidable pseudovariety of monoids and H a pseudovariety of groups such that one can decide whether, given a regular language L  A , the M H is decidable. empty word belongs to clH .L/. Then W

The problem of computing the pro-H closure in A of a regular language L  A has been considered for other pseudovarieties of groups such as Ab (Abelian groups), see [53], Gp (p -groups), Gnil (nilpotent groups), and Gsol (solvable groups), see [98] and [75]. Suppose that H is a pseudovariety of groups such that the free group FG.A/ is x A H. The pro-H residually in H. Then we have natural embeddings A ,! FG.A/ ,!  x topology of a subalgebra of A H is its subspace topology by Propositions 2.11 and 2.15. Thus, an equivalent problem to computing the H-kernel of a finite monoid is to decide whether 1 belongs to the closure in FG.A/ of a regular language L  A . In case H is closed under extensions (or, equivalently, under semidirect product), Ribes and Zalesski˘ı (Theorem 5.1 in [98]) have shown that, for the pro-H topology, the product of finitely many finitely generated closed subgroups of FG.A/ is also closed. Using the Pin–Reutenauer techniques, they reduced the computation of the pro-H closure of a regular language L  A to the computation of the pro-H closure in FG.A/ of a given finitely generated subgroup. That these results also apply to Gnil has been recently shown in [27]. Algorithms for the computation of the pro-Gp and pro-Gnil closures of finitely generated subgroups of a free group can be found in [98] and [75]. The case of Gsol remains open. For an element s of a profinite semigroup S , s ! denotes the unique idempotent in the closed subsemigroup T generated by s , and s ! 1 the inverse of s !C1 D ss ! in the maximal subgroup of T . Consider the implicit signature  consisting of multiplication together with the unary operation x 7! x ! 1 . The free group FG.A/ may be then identified with the free algebra A G. This suggests generalisations of the Pin–Reutenauer procedure for computing the pro-G closure of a regular language L  A to other pseudovarieties. The analog of the procedure is shown in [22] to hold for the pseudovariety A (and the signature  ). Another important application of the separation problem has been given by Place and Zeitoun [89] in the study of the decidability problem for the Straubing-Thérien hierarchy of star-free languages. 3.2. Tameness. There is a natural way of associating a system of semigroup equations with a finite digraph that is relevant for the computation of semidirect products of pseudovarieties of semigroups [7]. Namely, the variables of the system are the vertices and e the arrows, and each arrow u ! v gives rise to an equation ue D v which relates the act of following the arrow with multiplication as in a Cayley graph. The term tameness was introduced in [31] to refer to tameness with respect to such systems of equations in the sense of § 2.6. To avoid confusion, we prefer to call it here graph tameness.

638

Jorge Almeida and Alfredo Costa

We adopt a similar convention for other properties parametrised by systems of equations, such as hyperdecidability and reducibility. For example, for the one-vertex one-loop digraph, the corresponding equation is xy D x . It is easy to verify that, with constraints given by a function  into a finite monoid M , this equation is G-inevitable if and only if .y/ 2 KG .M /. For the twovertex digraph with n arrows from one vertex to the other one, the associated system of equations has the form xyi D z (i D 1; : : : ; n). If  is a constraining function into a finite monoid M such that .x/ D 1, then the system is G-inevitable if and only if, for every relational morphism  from M into an arbitrary G 2 G, there is some g 2 G such that .¹y1 ; : : : ; yn ; zº/  ¹gº  . Replacing G by an arbitrary pseudovariety V of monoids, the latter condition is expressed by saying that the subset .¹y1 ; : : : ; yn ; zº/ of M is V-pointlike. The first and best known example of a graph tame pseudovariety is that of the pseudovariety G. This result has been discovered in different disguises, first by Ash [41], as a means of establishing the type II (see § 3.1 and [68]) and pointlike [62] conjectures. In Ash’s formulation, the arrows of finite digraphs are labelled with elements of a finite monoid M and the result is said to be inevitable if, for every relational morphism  from M into an arbitrary finite group G , each label can be replaced by a -related label in G such that, for every (not necessarily directed) cycle, the product of the labels of the arrows, or their inverses for backward arrows, is equal to 1 in G . In the notation of § 2.6 (and also taking into account Lemma 4.8 of [30]), Ash’s theorem states that such a labelled digraph is inevitable if and only if the preceding property holds for the relational morphism G;A associated with any choice of generating set A for M . It then follows easily that Ash’s theorem translates to the statement that G is graph  -tame. Fix a finite relational language. A class R of relational structures is said to satisfy the finite extension property for partial automorphisms (FEPPA) if, for every finite structure R in R and every set P of isomorphisms between substructures of R, if there exists in R an extension S of R in which all f 2 P extend to automorphisms of S , then there is such an extension S which is finite. A homomorphism of relational structures is a function that preserves the relations in the forward direction. The exclusion of a class R of relational structures is the class of relational structures S such that there is no homomorphism R ! S with R 2 R. Herwig and Lascar [64] showed that, for a finite class R of finite relational structures, its exclusion class satisfies FEPPA. They also gave an equivalent formulation of this result in terms of a property of free groups, which Delgado and the first author (see [23] and [24]) proved to be equivalent to the graph  -tameness of G. On the other hand, it follows from results of Coulbois and Khélif [52] that the pseudovariety G is not completely  -tame. It would be of interest to find a signature  such that G is completely  -tame, if any such signature exists. Theorem 3.4. The pseudovariety G is graph  -tame but not completely  -tame. Tameness has also been investigated for other pseudovarieties of groups. The pseudovariety Ab is completely  -tame [25]. On the other hand, a pseudovariety of Abelian groups is completely hyperdecidable if and only if it is decidable while it is completely

17. Profinite topologies

639

 -tame if and only if it is locally finite or Ab, see [54]. For the pseudovariety Gp , the situation is more complicated. Steinberg in Theorem 11.12 in [101] showed that, for every nontrivial extension-closed pseudovariety of groups H such that the pro-H closure of a finitely generated subgroup of a free group is again finitely generated, H is graph weakly  -reducible. On the other hand, a graph  -reducible pseudovariety must admit a basis of pseudoidentities consisting of  -identities (see Proposition 4.2 in [30]) which, since free groups are residually in Gp , entails that Gp is not graph  -tame. Using symbolic dynamics techniques to generate a suitable infinite implicit signature, the first author has established the following result [9].

Theorem 3.5. There is a signature  such that Gp is graph  -tame. Building on the approach of [9], Alibabaei has constructed for each decidable pseudovariety H of Abelian groups an implicit signature with respect to which H is completely tame [2] and also an implicit signature with respect to which Gnil is graph tame [3]. A semigroup is said to be completely regular if every element lies in some subgroup. The pseudovariety CR consists of all completely regular finite monoids and OCR is the subpseudovariety consisting of those in which the idempotents constitute a submonoid. Both these pseudovarieties have been shown to be graph  -tame (see [34] and [33]), 3 results which depend heavily on Theorem 3.4 together with structure theorems for the corresponding relatively free profinite monoids. Several aperiodic pseudovarieties have also been investigated. An interesting example is that of the pseudovariety J of all finite J-trivial semigroups, corresponding to the variety of piecewise-testable languages (see [56]). For a finite alphabet A, the x A J D  J and also solved the word problem for  J first author has shown that  A A (see § 8.2 in [6]). Since it is an easy exercise to deduce that J is  -full, it follows from Proposition 2.25 that J is completely  -tame, and therefore graph hyperdecidable. The construction of a “real algorithm” to decide inevitability turns out to be much more involved [39]. For the pseudovariety R, consisting of all finite R-trivial semigroups, constructing a concrete algorithm to show that R is graph hyperdecidable is technically complicated, even when only strongly connected digraphs are considered [29]. Building on seminal ideas of Makanin [74] and taking into account the structure of free pro-R semigroups [37], Costa, Zeitoun and the first author [21] have established the following result. Theorem 3.6. The pseudovariety R is completely  -tame. 3 The conjecture to which the graph tameness of CR is reduced in [34] has been observed by K. Auinger (private communication) to hold using the methods of [23] and [24]. There is also another difficulty which comes from the fact that free profinite semigroupoids over profinite graphs are considered. As has been shown in [14], there are some rather delicate aspects in the description of such structures when the graph has infinitely many vertices, namely the free subsemigroupoid generated by a dense subgraph of the profinite graph may not be dense, and one needs in general to transfinitely iterate algebraic and topological closures. However, one can check that for the free profinite semigroupoid in question, the iteration stops in one step, from which it follows that the required properties of the suitable free profinite semigroupoid are guaranteed [16].

640

Jorge Almeida and Alfredo Costa

This result has been extended in [13] to pseudovarieties of the form DRH, consisting of all finite semigroups in which every regular R-class is a group from the pseudovariety H of groups. Consider next the pseudovariety LSl of all finite local semilattices, which corresponds to the variety of locally testable languages, see [56]. The proof of the following result involves very delicate combinatorics on words, see [51] and [50]. Theorem 3.7. The pseudovariety LSl is completely  -tame. J. Rhodes announced in a conference held in 1998 in Lincoln, Nebraska, that A is graph  -tame. The only part of the program to establish such a result that has been published is McCammond’s solution of the word problem for A A, see [76]. Another, earlier ingredient in Rhodes’ ideas comes from Henckell’s computation of the A-pointlike subsets of a given finite semigroup [61]. See [90] for a different proof. In [63] there is also an alternative proof and the following generalisation. Theorem 3.8. If  is a recursive set of prime numbers, then there is an algorithm x , to compute pointlike sets of finite semigroups with respect to the pseudovariety G consisting of all finite semigroups whose subgroups are  -groups. Once it was discovered that there was a gap in the proof of the basis theorem (see the discussion in § 2.6), which invalidated the reduction of the decidability of the Krohn– Rhodes complexity to proving that A is tame announced in [30], Rhodes withdrew several manuscripts that he claimed would prove that A is tame. The Krohn–Rhodes complexity pseudovarieties are defined recursively by C0 D A and CnC1 D Cn  G  A, see [70], which determines a complete and strict hierarchy for the pseudovariety of all finite semigroups. Here,  denotes the semidirect product of pseudovarieties of semigroups as defined in § 10.1 in [6], which is an associative operation. It has also been investigated whether tameness is preserved under the operators of join and semidirect product. Since tameness is apparently much stronger than decidability, if tameness is preserved by semidirect product then the decidability of the Krohn–Rhodes complexity is indeed reduced to proving that A is tame. But, so far, only very special cases have been treated. An example is the following result [19], which improves on [28]. Theorem 3.9. Let V be a graph  -tame pseudovariety and let W be an order computable pseudovariety. Then V  W is graph  -tame. It is also unknown whether tameness is preserved under join. Yet, several positive results have been obtained. The following theorem combines results from [30] and [20]. Theorem 3.10. Let C be a class of constrained systems of equations. 1. Let V be an order-computable pseudovariety. If a pseudovariety W is  -tame with respect to C, then so is V _ W. 2. Let V be a recursively enumerable  -full subpseudovariety of J such that the word problem for A V is decidable. If a pseudovariety W is  -tame with respect to C, then so is V _ W.

17. Profinite topologies

641

3. Let W be a pseudovariety satisfying some pseudoidentity of the form x1    xn y !C1 zt ! D x1    xn yzt ! :

If W is  -tame with respect to C, then so is R _ W. Theorem 3.10 yields for instance that the join J _ G is graph  -tame, a result which was also proved by Steinberg [102].

4. Relatively free profinite semigroups Several representation theorems and structural results about relatively free profinite semigroups have been obtained for various pseudovarieties, such as J (J-trivial), see § 8.2 in [6], R (R-trivial), see [37] and [40], DA (regular elements are idempotent), see [78], and LSl (local semilattices), see [49]. Much remains unknown, particularly in the case of pseudovarieties containing LSl. However, progress has been made in this case too. For instance, in [59] and [18] faithful representations of finitely generated free profinite semigroups over A were obtained. There is a common trend in these faithful representations of free profinite semigroups over A, R, or DA, and also in the partial faithful representations obtained in [67] and [71] for many other pseudovarieties: it is the fact they consist in viewing pseudowords as linearly ordered sets whose elements are labelled with letters, generalising the fact that words are nothing else than such sets with a finite cardinal. In the most general case, that of the pseudovariety S of all finite semigroups, no meaningful faithful representation is known (albeit we can always get partial informax A S by looking at their projection on  x A V, for some semigroup tion on the elements of  x pseudovariety V, when a suitable representation for A V is available). This adds motivation for studying the structure of free profinite semigroups over S and other “large” pseudovarieties. In this section we review some results on this subject, mainly about Green’s relations, with an emphasis on maximal subgroups. A substantial part of the results originated in connections with symbolic dynamics, most introduced by the first author, sometimes in co-authorship. We highlight some of the progress in this front. Other approaches, for the most part developed by Rhodes and Steinberg, based on expansions of finite semigroups or on wreath product techniques, also led to results about structural properties of free profinite semigroups over many pseudovarieties containing LSl, as is the case in [94], [104], [103], and [48]. We mention in § 4.2 two results where these other approaches played a key role, namely Theorems 4.9 and 4.10. 4.1. Connections with symbolic dynamics. For a good reference book on symbolic dynamics, see [72]. Even though an introduction to symbolic dynamics appears in Chapter 27, for the convenience of the presentation we include our own brief introduction. Let A be a finite alphabet. Since A is compact, the product space AZ is compact. The shift on AZ is the homeomorphism W AZ ! AZ sending .xi /i 2Z to .xi C1 /i 2Z .

642

Jorge Almeida and Alfredo Costa

A symbolic dynamical system, also called shift space or subshift, is a nonempty 4 closed subspace X of AZ such that .X/ D X, for some finite alphabet A. A shift space X is minimal if X does not contain subshifts other than X. A block of .xi /i 2Z is a word xi xi C1    xi Cn , with i 2 Z and n > 0. Let B.X/ denote the set of all blocks of elements of X. One has X  Y if and only if B.X/  B.Y/. Often, one may define a subshift by an effectively computable amount of data. This happens, for example, if B.X/ is a rational language, in which case we say that X is sofic. Sofic subshifts are considered in Chapter 27. Another class of examples, extensively studied, comes from subshifts defined by primitive substitutions, see [92]. Here, by a substitution over a finite alphabet A we mean an endomorphism ' of AC . A substitution ' over A is primitive if there is n > 1 such that all letters of A are factors of ' n .a/, for every a 2 A. For such a primitive substitution, there is a unique minimal subshift X' such that B.X' / is the set of all factors of words of the form ' n .a/, where n > 1 and a 2 A. A subset L of a semigroup S is irreducible if u; v 2 L implies uwv 2 L for some w 2 S . A subshift X of AZ is irreducible if B.X/ is an irreducible language of AC . Minimal subshifts are irreducible. A subshift X is periodic if X is a finite set of the form ¹ n .x/W n 2 Zº for some x 2 X. An irreducible subshift is either periodic or infinite. For the remainder of this subsection, V is a pseudovariety of semigroups containing x A V. all finite nilpotent semigroups. Then AC is isomorphic with A V and embeds in  x A V. Hence L x \ AC D L holds for every language The elements of AC are isolated in  L of AC . Therefore B.X/ captures all information about X. Clearly, B.X/ is closed x A V is also M V, the topological closure of B.X/ in  under taking factors; when V D A closed under taking factors, a fact that follows from the multiplication being open in x A V when V D A M V, cf. Lemma 2.3 and Proposition 2.4 in [14].  Using a compactness argument, in case X is irreducible one shows the existence of a unique 6J -minimal J-class JV .X/ consisting of factors of B.X/. If V contains LSl then x A V whose finite factors belong to B.X/, see [47]. From B.X/ consists of elements of  this one gets the following proposition, which is a particular case of Proposition 3.6 in [48]. Proposition 4.1. Let V be a pseudovariety of semigroups containing LSl. Let X and Y be irreducible subshifts. Then X  Y if and only if JV .Y/ 6J JV .X/. The following result is taken from [11]. Theorem 4.2. If V contains LSl, then the mapping X 7! JV .X/ is a bijection from the x A V. set of minimal subshifts of AZ onto the set of 6J -maximal regular J-classes of  If jAj > 2, then there are 2@0 minimal subshifts of AZ (cf. Chapter 2 in [73]), and a chain with 2@0 irreducible subshifts of AZ , see § 7.3 in [108]. Hence, from Theorem 4.2 and Proposition 4.1 we obtain the following result (a weaker version appears in [49]). 4 The empty set is frequently considered a subshift in the literature (e.g., in Chapter 27).

17. Profinite topologies

643

Theorem 4.3. Let V be a pseudovariety containing LSl and let A be an alphabet with x A V, there are both chains and anti-chains at least two letters. For the relation 2. Let x A V. Then h.w/ 6 log2 jAj, and equality holds if and only if w belongs to the w 2  x A V. minimum ideal of 

x A V with For each k such that 0 < k 6 log2 jAj, consider the set Ek of all w 2  h.w/ < k . In particular, thanks to Proposition 4.4, Elog2 jAj is the complement of the x A V. The following summarises the most important results from [35]. minimum ideal of 

Theorem 4.5. Let V be a pseudovariety containing LSl and suppose 0 < k 6 log2 jAj. x A V, we have h.uv/ D max¹h.u/; h.v/º, and so Ek is a 1. For all u; v 2  x A V. In particular, the minimum ideal is prime. subsemigroup of  2. The set Ek is stable under the application of every n-ary implicit operation w such that h.w/ < k  log2 jAj  logn jAj. 3. The set Ek is also stable under the iterated application of a continuous endox A V such that '.A/  Ek , in the following sense: if belongs morphism ' of  x A V/ generated by ' , then .Ek /  Ek . to the closed subsemigroup of End. If Y is a proper subshift of an irreducible sofic subshift X, then h.Y/ < h.X/, see Corollary 4.4.9 in [72]. By a reduction to this result, the following theorem generalising x A V is proved in [48]. some of the above mentioned properties of the minimum ideal of 

Theorem 4.6. Let V be a pseudovariety of semigroups containing LSl and let X be M V or B.X/ is V-recognisable. Then a sofic subshift of AZ . Suppose that V D A x A V n JV .X/ is a subsemigroup h.w/ < h.X/ whenever w 2 B.X/ n JV .X/. Moreover,  x A V. of  4.2. Closed subgroups of relatively free profinite semigroups. Note that maximal subgroups of profinite semigroups are closed. If a closed subsemigroup of a profinite semigroup is a group then, for the induced topology, it is a profinite group. This subsection presents results on the structure of closed subgroups of relatively free profinite semigroups, with an emphasis on maximal subgroups, using symbolic dynamics. We shall see examples of maximal subgroups that are (relatively) free profinite groups. When A is a finite set and H is a nontrivial pseudovariety of groups, it is x A H. customary to refer to the cardinal of A as being the rank of 

644

Jorge Almeida and Alfredo Costa

x A S is the image of a continuous idempotent endomorphism of  x A S. A retract of  x x The free profinite subgroups of A S of rank jAj that are retracts of A S are characterised in Theorem 4.4 in [35]. Combining that characterisation with results from [11] leads to the following theorem. x AS Theorem 4.7. For every finite alphabet A, there are maximal subgroups H of  x such that H is a retract of A S and a free profinite group of rank jAj.

The maximal subgroups in a J-class of a profinite semigroup are isomorphic profinite groups (cf. Theorem A.3.9 in [96]). When X is an irreducible subshift, we may consider the (isomorphism class of the) maximal profinite subgroup of JV .X/, denoted GV .X/. It is invariant under isomorphisms of subshifts, as long as V D V  D and V contains all finite semilattices [47], where D denotes the pseudovariety of all finite semigroups whose idempotents are right zeros. Let 'W AC ! AC be a primitive substitution. The substitution ' is called periodic if the associated minimal subshift X' is periodic. If there are b; c 2 A such that '.a/ starts with b and ends with c for every a 2 A, then the substitution is said to be proper. Denote respectively by 'S and by 'G the unique extension of ' to a continuous endomorphism of x A S and to a (continuous) endomorphism of  x A G. The following is a result from [15].  Theorem 4.8. If ' is a proper non-periodic primitive substitution over A, then the x A S/ is a maximal subgroup of JS .X' /, which is presented as a profinite retract 'S! . group by the set of generators A subject to the relations of the form 'G! .a/ D a (a 2 A). For the general case where is a primitive (not necessarily proper) non-periodic substitution, one finds in [55] an algorithm to build a proper primitive substitution ' such that X' is isomorphic to X , and so the general case can be reduced to the proper case via the invariance of the maximal subgroup under isomorphism of subshifts. An alternative finite presentation for GS .X / as a profinite group is given in [15]. These results yield that it is decidable whether a given finite group is a (continuous) homomorphic image of GS .X /. Note that, if in Theorem 4.8 the extension of ' to the free group over A is invertible, then we immediately get that GS .X' / is a free profinite group of rank jAj, which is a particular case of Corollary 5.7 in [11]. On the other hand, it was proved in [15] that if  is the Prouhet–Thue–Morse substitution, that is, the substitution given by .a/ D ab and .b/ D ba, then GS .X / is not a relatively free profinite group. In [17], further knowledge on GS .X/ was obtained, when X is minimal, without requiring that X is defined by a substitution. Namely, it was shown that GS .X/ is an inverse limit of profinite completions of fundamental groups of a special family of finite graphs that is naturally associated to X. For the sofic case, concerning groups of the form GV .X/, we have first to introduce a definition which is similar with the definition, given in § 2, of a free profinite group over a pseudovariety H of groups (cf. Chapter 3 in [99]). A subset X of a profinite group G is said to converge to the identity if every neighborhood of the identity element of G contains all but finitely many elements of X . A profinite group F is a free pro-H

17. Profinite topologies

645

group with a basis X converging to the identity if X is a subset of F for which every mapping 'W X ! G , with G a pro-H group such that '.X / converges to the identity, has a unique extension to a continuous group homomorphism 'W O F ! G . All bases of F converging to the identity have the same cardinality, which is called the rank of F , and if F and F 0 have bases converging to the identity with the same cardinal, then F and F 0 are isomorphic as profinite groups. Note that, if jX j is finite, then F is isomorphic x X H, and so this definition of rank extends the one given for finitely generated with  relatively free profinite groups. A free pro-H group in the former sense is also free proH with some basis converging to one, but the converse is not true; indeed, as follows from Theorem 4.9, for a nontrivial pseudovariety of groups H, the free pro-H group of x A H is not metrisable when A is infinite (see note countable rank is metrisable, but  following Proposition 2.15). x the pseudovariety of all finite semiFor a pseudovariety H of groups, denote by H x . We are now able to cite the groups whose subgroups belong to H. Note that S D G result from [48] about maximal subgroups of the form GHx .X/. Note that the minimal sofic subshifts are periodic subshifts. Theorem 4.9. Let H be a nontrivial pseudovariety of groups and X an irreducible sofic subshift. If X is periodic, then GHx .X/ is a free pro-H group of rank 1. If X is non-periodic x -recognisable then GHx .X/ is a free pro-H group of rank @0 , provided H and B.X/ is H is extension-closed and contains nontrivial p -groups for infinitely many primes p . Note that Theorem 4.9 applies to X D AZ , in which case JHx .X/ is the minimum x AH x . This case was previously shown in [104]. For further results on the ideal of  x A V, where V may be among pseudovarieties other structure of the minimum ideal of  than those in Theorem 4.9, see [94] and [104]. In contrast with Theorems 4.8 and 4.9, x A S is a singleton, Corollary 13.2 in [94]. the H-class of a non-regular element of  x A S are free profinite groups, they do have a While not all closed subgroups of  property resembling freeness. A profinite group G is projective if, for all continuous onto homomorphisms of profinite groups 'W G ! K and ˛W H ! K , there is a continuous homomorphism 'W O G ! H such that ˛ ı 'O D ' . It is easy to see that all (finitely generated) projective profinite groups embed into (finitely generated) free profinite groups [99]. The following converse is much more difficult to prove. Theorem 4.10 ([95]). Let V be a pseudovariety of semigroups such that .V\Ab/V D V. Then every closed subgroup of a free pro-V semigroup is a projective profinite group. The definition of projective profinite group can be considered for other algebras. The projective profinite semigroups embed into free profinite semigroups but, in conx A S that are not projective. trast with Theorem 4.10, there are finite subsemigroups of  x x A V for other V), and their For further details about finite subsemigroups of A S (and  interplay with projective profinite semigroups, see Remark 4.1.34 in [96]. Acknowledgements. The work of Jorge Almeida has been partially supported by CMUP (UID/MAT/00144/2013), which is funded by FCT (Portugal) with national (MCTES) and European structural funds (FEDER), under the partnership agreement

646

Jorge Almeida and Alfredo Costa

PT2020. The work of Alfredo Costa has been partially supported by the Centre for Mathematics of the University of Coimbra – UID/MAT/00324/2013, funded by the Portuguese Government through FCT/MCTES and co-funded by the European Regional Development Fund through the Partnership Agreement PT2020.

References [1] D. Albert, R. Baldinger, and J. Rhodes, Undecidability of the identity problem for finite semigroups. J. Symbolic Logic 57 (1992), no. 1, 179–192. MR 1150933 Zbl 0780.20035 q.v. 631 [2] K. Alibabaei, Every decidable pseudovariety of abelian groups is completely tame. Semigroup Forum 99 (2019), no. 1, 106–125. MR 3978501 Zbl 07080906 q.v. 639 [3] K. Alibabaei, The pseudovariety of all nilpotent groups is tame. Internat. J. Algebra Comput. 29 (2019), no. 6, 1019–1034. MR 3996983 Zbl 07119216 q.v. 639 [4] J. Almeida, Residually finite congruences and quasi-regular subsets in uniform algebras. Portugal. Math. 46 (1989), no. 3, 313–328. MR 1021193 Zbl 0688.08001 q.v. 625 [5] J. Almeida, On pseudovarieties, varieties of languages, filters of congruences, pseudoidentities and related topics. Algebra Universalis 27 (1990), no. 3, 333–350. MR 1058478 Zbl 0715.08006 q.v. 630 [6] J. Almeida, Finite semigroups and universal algebra. Translated from the 1992 Portuguese original and revised by the author. Series in Algebra, 3. World Scientific, River Edge, N.J., 1994. MR 1331143 Zbl 0844.20039 q.v. 630, 639, 640, 641 [7] J. Almeida, Hyperdecidable pseudovarieties and the calculation of semidirect products. Internat. J. Algebra Comput. 9 (1999), no. 3–4, 241–261. Dedicated to the memory of Marcel-Paul Schützenberger. MR 1722629 Zbl 1028.20038 q.v. 631, 632, 637 [8] J. Almeida, Some algorithmic problems for pseudovarieties. Publ. Math. Debrecen 54 (1999), suppl., 531–552. Automata and formal languages, VIII (Salgótarján, 1996). MR 1709911 Zbl 0981.20049 q.v. 631 [9] J. Almeida, Dynamics of implicit operations and tameness of pseudovarieties of groups. Trans. Amer. Math. Soc. 354 (2002), no. 1, 387–411. MR 1859280 Zbl 0988.20014 q.v. 639 [10] J. Almeida, Finite semigroups: an introduction to a unified theory of pseudovarieties. In Semigroups, algorithms, automata and languages (G. M. S. Gomes, J.-É. Pin, and P. V. Silva, eds.). Papers from the Thematic Term held in Coimbra, May–July 2001. World Scientific, River Edge, N,J., 2002, 3–64. MR 2023783 Zbl 1033.20067 q.v. 624, 625, 627, 631, 632 [11] J. Almeida, Profinite groups associated with weakly primitive substitutions. Fundam. Prikl. Mat. 11 (2005), no. 3, 13–48. In Russian. English translation, J. Math. Sci. (N.Y.) 144 (2007), no. 2, 3881–3903. MR 2176678 Zbl 1110.20022 q.v. 642, 644 [12] J. Almeida, Profinite semigroups and applications. Notes taken by A. Costa. In Structural theory of automata, semigroups, and universal algebra (V. B. Kudryavtsev and I. G. Rosenberg, eds.). Proceedings of the NATO Advanced Study Institute (the 42nd Séminaire des Mathématiques Supérieures) held at the University of Montreal, Montreal, QC, July 7–18, 2003, 1–45. MR 2210124 Zbl 1109.20050 q.v. 627

17. Profinite topologies

647

[13] J. Almeida and C. Borlido, Complete  -reducibility of pseudovarieties of the form DRH. Internat. J. Algebra Comput. 27 (2017), no. 2, 189–235. MR 3633403 Zbl 06717910 q.v. 640 [14] J. Almeida and A. Costa, Infinite-vertex free profinite semigroupoids and symbolic dynamics. J. Pure Appl. Algebra 213 (2009), no. 5, 605–631. MR 2494356 Zbl 1179.20050 q.v. 639, 642 [15] J. Almeida and A. Costa, Presentations of Schützenberger groups of minimal subshifts. Israel J. Math. 196 (2013), no. 1, 1–31. MR 3096581 Zbl 1293.20054 q.v. 644 [16] J. Almeida and A. Costa, A note on pseudovarieties of completely regular semigroups. Bull. Aust. Math. Soc. 92 (2015), no. 2, 233–237. MR 3392250 Zbl 1346.20074 q.v. 639 [17] J. Almeida and A. Costa, A geometric interpretation of the Schützenberger group of a minimal subshift. Ark. Mat. 54 (2016), no. 2, 243–275. MR 3546353 Zbl 1394.20033 q.v. 644 [18] J. Almeida, A. Costa, J. C. Costa, and M. Zeitoun, The linear nature of pseudowords. Publ. Mat. 63 (2019), no. 2, 361–422. MR 3980930 Zbl 07094859 q.v. 641 [19] J. Almeida, J. C. Costa, and M. L. Teixeira, Semidirect product with an order-computable pseudovariety and tameness. Semigroup Forum 81 (2010), no. 1, 26–50. MR 2672170 Zbl 1229.20059 q.v. 640 [20] J. Almeida, J. C. Costa, and M. Zeitoun, Tameness of pseudovariety joins involving R. Monatsh. Math. 146 (2005), no. 2, 89–111. MR 2176337 Zbl 1091.20036 q.v. 640 [21] J. Almeida, J. C. Costa, and M. Zeitoun, Complete reducibility of systems of equations with respect to R. Port. Math. (N.S.) 64 (2007), no. 4, 445–508. MR 2374398 Zbl 1148.20042 q.v. 639 [22] J. Almeida, J. C. Costa, and M. Zeitoun, Closures of regular languages for profinite topologies. Semigroup Forum 89 (2014), no. 1, 20–40. MR 3249866 Zbl 1307.20048 q.v. 637 [23] J. Almeida and M. Delgado, Sur certains systèmes d’équations avec contraintes dans un groupe libre. Port. Math. (N.S.) 56 (1999), no. 4, 409–417. MR 1732025 Zbl 0960.20014 q.v. 638, 639 [24] J. Almeida and M. Delgado, Sur certains systèmes d’équations avec contraintes dans un groupe libre – addenda. Port. Math. (N.S.) 58 (2001), no. 4, 379–387. MR 1881867 Zbl 0960.20014 q.v. 638, 639 [25] J. Almeida and M. Delgado, Tameness of the pseudovariety of abelian groups. Internat. J. Algebra Comput. 15 (2005), no. 2, 327–338. MR 2142087 Zbl 1083.20045 q.v. 638 [26] J. Almeida and O. Klíma, Towards a pseudoequational proof theory. Port. Math. (N.S.) 75 (2018), no. 2, 79–119. MR 3892751 Zbl 07031079 q.v. 633 [27] J. Almeida, M. H. Shahzamanian, and B. Steinberg, The pro-nilpotent group topology on a free group. J. Algebra 480 (2017), 332–345. MR 3633311 Zbl 06710531 q.v. 637 [28] J. Almeida and P. V. Silva, On the hyperdecidability of semidirect products of pseudovarieties. Comm. Algebra 26 (1998), no. 12, 4065–4077. MR 1661260 Zbl 0931.20047 q.v. 640 [29] J. Almeida and P. V. Silva, SC-hyperdecidability of R. Theoret. Comput. Sci. 255 (2001), no. 1–2, 569–591. MR 1819091 Zbl 0989.20045 q.v. 639 [30] J. Almeida and B. Steinberg, On the decidability of iterated semidirect products with applications to complexity. Proc. London Math. Soc. (3) 80 (2000), no. 1, 50–74. MR 1719180 Zbl 1027.20033 q.v. 631, 633, 634, 635, 638, 639, 640

648

Jorge Almeida and Alfredo Costa

[31] J. Almeida and B. Steinberg, Syntactic and global semigroup theory, a synthesis approach. In Algorithmic problems in groups and semigroups (J. C. Birget, S. W. Margolis, J. Meakin, and M. V. Sapir, eds.). Papers from the International Conference held at the University of Nebraska, Lincoln, NE, May 11–16, 1998. Trends in Mathematics. Birkhäuser Boston, Boston, MA, 2000, 1–23. MR 1750489 Zbl 0947.20043 q.v. 633, 637 [32] J. Almeida and B. Steinberg, Rational codes and free profinite monoids. J. Lond. Math. Soc. (2) 79 (2009), no. 2, 465–477. MR 2496524 Zbl 1172.20039 q.v. 619, 628 [33] J. Almeida and P. G. Trotter, Hyperdecidability of pseudovarieties of orthogroups. Glasg. Math. J. 43 (2001), no. 1, 67–83. MR 1825723 Zbl 0982.20040 q.v. 639 [34] J. Almeida and P. G. Trotter, The pseudoidentity problem and reducibility for completely regular semigroups. Bull. Austral. Math. Soc. 63 (2001), no. 3, 407–433. MR 1834943 Zbl 0982.20042 q.v. 639 [35] J. Almeida and M. V. Volkov, Subword complexity of profinite words and subgroups of free profinite semigroups. Internat. J. Algebra Comput. 16 (2006), no. 2, 221–258. MR 2228511 Zbl 1186.20040 q.v. 643, 644 [36] J. Almeida and P. Weil, Relatively free profinite monoids: an introduction and examples. In Semigroups, formal languages and groups (J. B. Fountain, ed.). Proceedings of the NATO Advanced Study Institute held at the University of York, York, August 7–21, 1993. NATO Advanced Science Institutes Series C: Mathematical and Physical Sciences, 466. Kluwer Academic Publishers Group, Dordrecht, 1995, 73–117. MR 1630619 Zbl 0877.20038 q.v. 625 [37] J. Almeida and P. Weil, Free profinite R-trivial monoids. Internat. J. Algebra Comput. 7 (1997), no. 5, 625–671. MR 1470356 Zbl 0892.20035 q.v. 639, 641 [38] J. Almeida and P. Weil, Profinite categories and semidirect products. J. Pure Appl. Algebra 123 (1998), no. 1–3, 1–50. MR 1492894 Zbl 0891.20037 q.v. 631 [39] J. Almeida and M. Zeitoun, The pseudovariety J is hyperdecidable. RAIRO Inform. Théor. Appl. 31 (1997), no. 5, 457–482. MR 1611659 Zbl 0928.20046 q.v. 639 [40] J. Almeida and M. Zeitoun, An automata-theoretic approach to the word problem for ! terms over R. Theoret. Comput. Sci. 370 (2007), no. 1–3, 131–169. MR 2289710 Zbl 1110.68059 q.v. 641 [41] C. J. Ash, Inevitable graphs: a proof of the type II conjecture and some related decision procedures. Internat. J. Algebra Comput. 1 (1991), no. 1, 127–146. MR 1112302 Zbl 0722.20039 q.v. 633, 636, 638 [42] K. Auinger and B. Steinberg, On the extension problem for partial permutations. Proc. Amer. Math. Soc. 131 (2003), no. 9, 2693–2703. MR 1974324 Zbl 1027.20035 q.v. 631 [43] G. Birkhoff, On the structure of abstract algebras. Proc. Cambridge Phil. Soc. 31 (1935), 433–454. JFM 61.1026.07 Zbl 0013.00105 q.v. 617, 618 [44] G. Birkhoff, Moore–Smith convergence in general topology. Ann. of Math. (2) 38 (1937), no. 1, 39–56. MR 1503323 JFM 63.0567.06 Zbl 0016.08502 q.v. 616 [45] S. Burris and H. P. Sankappanavar, A course in universal algebra. Graduate Texts in Mathematics, 78. Springer, Berlin, 1981. MR 0648287 Zbl 0478.08001 q.v. 616, 626 [46] D. Clark, B. A. Davey, R. S. Freese, and M. Jackson, Standard topological algebras: syntactic and principal congruences and profiniteness. Algebra Universalis 52 (2004), no. 2–3, 343–376. MR 2161658 Zbl 1088.08006 q.v. 618, 625

17. Profinite topologies

649

[47] A. Costa, Conjugacy invariants of subshifts: an approach from profinite semigroup theory. Internat. J. Algebra Comput. 16 (2006), no. 4, 629–655. MR 2258833 Zbl 1121.37013 q.v. 642, 644 [48] A. Costa and B. Steinberg, Profinite groups associated to sofic shifts are free. Proc. Lond. Math. Soc. (3) 102 (2011), no. 2, 341–369. MR 2769117 Zbl 1257.20054 q.v. 641, 642, 643, 645 [49] J. C. Costa, Free profinite locally idempotent and locally commutative semigroups. J. Pure Appl. Algebra 163 (2001), no. 1, 19–47. MR 1847374 Zbl 0990.20038 q.v. 641, 642 [50] J. C. Costa and C. Nogueira. Complete reducibility of the pseudovariety LSl. Internat. J. Algebra Comput. 19 (2009), no. 2, 247–282. MR 2512554 Zbl 1188.20069 q.v. 640 [51] J. C. Costa and M. L. Teixeira, Tameness of the pseudovariety LSl. Internat. J. Algebra Comput. 14 (2004), no. 5–6, 627–654. International Conference on Semigroups and Groups in honor of the 65th birthday of Prof. John Rhodes. MR 2104772 Zbl 1087.20039 q.v. 640 [52] T. Coulbois and A. Khélif, Equations in free groups are not finitely approximable. Proc. Amer. Math. Soc. 127 (1999), no. 4, 963–965. MR 1485465 Zbl 0914.20031 q.v. 638 [53] M. Delgado, On the hyperdecidability of pseudovarieties of groups. Internat. J. Algebra Comput. 11 (2001), no. 6, 753–771. MR 1880376 Zbl 1880376 q.v. 637 [54] M. Delgado, A. Masuda, and B. Steinberg, Solving systems of equations modulo pseudovarieties of abelian groups and hyperdecidability. In Semigroups and formal languages (J. M. André, V. H. Fernandes, M. J. J. Branco, G. M. S. Gomes, J. Fountain, and J. C. Meakin, eds.). Proceedings of the International Conference held at the Universidade de Lisboa, Lisboa, July 12–15, 2005. World Scientific, Hackensack, N.J., 2007, 57–65. MR 2364777 Zbl 1133.20046 q.v. 639 [55] F. Durand, B. Host, and C. Skau, Substitutional dynamical systems, Bratteli diagrams and dimension groups. Ergodic Theory Dynam. Systems 19 (1999), no. 4, 953–993. MR 1709427 Zbl 1044.46543 q.v. 644 [56] S. Eilenberg, Automata, languages, and machines. Vol. B. With two chapters by B. Tilson. Pure and Applied Mathematics, 59. Academic Press, New York and London, 1976. MR 0530383 Zbl 0359.94067 q.v. 630, 635, 639, 640 [57] M. Gehrke, S. Grigorieff, and J.-É. Pin, Duality and equational theory of regular languages. In Automata, languages and programming (L. Aceto, I. Damgård, L. A. Goldberg, M. M. Halldórsson, A. Ingólfsdóttir, and I. Walukiewicz, eds.). Part II. Proceedings of the 35th International Colloquium (ICALP 2008) held in Reykjavik, July 7–11, 2008. Lecture Notes in Computer Science, 5126. Springer, Berlin, 2008, 246–257. MR 2503592 Zbl 1165.68049 q.v. 615, 626 [58] M. Gehrke, S. Grigorieff, and J.-É. Pin, A topological approach to recognition. In Automata, languages and programming (S. Abramsky, C. Gavoille, C. Kirchner, F. Meyer auf der Heide and P. G. Spirakis, eds.). Part II. Proceedings of the 37th International Colloquium (ICALP 2010) held in Bordeaux, July 6–10, 2010. Lecture Notes in Computer Science, 6199. Springer, Berlin, 2010, 151–162. MR 2734643 Zbl 1288.68176 q.v. 615, 626 [59] S. J. v. Gool and B. Steinberg, Pro-aperiodic monoids via saturated models. In 34 th Symposium on Theoretical Aspects of Computer Science (H. Vollmer and B. Vallée, eds.). Proceedings of the symposium (STACS 2017) held in Hannover, March 8–11, 2017. LIPIcs. Leibniz International Proceedings in Informatics, 66. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2017, Art. No. 39, 14 pp. MR 3655366 Zbl 1402.68126 q.v. 641

650

Jorge Almeida and Alfredo Costa

[60] M. Hall, A topology for free groups and related groups. Ann. of Math. (2) 52 (1950), 127–139. MR 0036767 Zbl 0041.36210 Zbl 0045.31204 q.v. 636 [61] K. Henckell, Pointlike sets: the finest aperiodic cover of a finite semigroup. J. Pure Appl. Algebra 55 (1988), no. 1-2, 85–126. MR 0968571 Zbl 0682.20044 q.v. 640 [62] K. Henckell and J. Rhodes, The theorem of Knast, the P G D BG and Type II conjectures. In Monoids and semigroups with applications (J. Rhodes, ed). Proceedings of the Workshop on Monoids held at the University of California, Berkeley, California, July 31–August 5, 1989. World Scientific, River Edge, N.J., 1991, 453–463. MR 1142393 Zbl 0826.20054 q.v. 638 [63] K. Henckell, J. Rhodes, and B. Steinberg, Aperiodic pointlikes and beyond. Internat. J. Algebra Comput. 20 (2010), no. 2, 287–305. MR 2646752 Zbl 1227.20049 q.v. 640 [64] B. Herwig and D. Lascar, Extending partial automorphisms and the profinite topology on free groups. Trans. Amer. Math. Soc. 352 (2000), no. 5, 1985–2021. MR 1621745 Zbl 0947.20018 q.v. 638 [65] R. P. Hunter, Some remarks on subgroups defined by the Bohr compactification. Semigroup Forum 26 (1983), no. 1–2, 125–137. MR 0685122 Zbl 0503.22001 q.v. 627 [66] R. P. Hunter, Certain finitely generated compact zero-dimensional semigroups. J. Austral. Math. Soc. Ser. A 44 (1988), no. 2, 265–270. MR 0922611 Zbl 0649.22002 q.v. 625 [67] M. Huschenbett and M. Kufleitner, Ehrenfeucht–Fraïssé games on omega-terms. In 31 st International Symposium on Theoretical Aspects of Computer Science (E. W. Mayr and N. Portier, eds.). Proceedings of the Symposium (STACS’14) held in Lyon, March 5–8, 2014. LIPIcs. Leibniz International Proceedings in Informatics, 25. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2014, 374–385. MR 3181430 Zbl 1359.03013 q.v. 641 [68] J. Karnofsky and J. Rhodes, Decidability of complexity one-half for finite semigroups. Semigroup Forum 24 (1982), no. 1, 55–66. MR 0645703 Zbl 0503.20028 q.v. 638 [69] O. G. Kharlampovich and M. Sapir, Algorithmic problems in varieties. Internat. J. Algebra Comput. 5 (1995), no. 4-5, 379–602. MR 1361261 Zbl 0837.08002 q.v. 630 [70] K. Krohn and J. Rhodes, Complexity of finite semigroups. Ann. of Math. (2) 88 (1968), 128–160. MR 0236294 Zbl 0162.03902 q.v. 640 [71] M. Kufleitner and J. P. Wächter, The word problem for omega-terms over the Trotter– Weil hierarchy. Theory Comput. Syst. 62 (2018), no. 3, 682–738. MR 3774450 Zbl 06879947 q.v. 641 [72] D. Lind and B. Marcus, An introduction to symbolic dynamics and coding. Cambridge University Press, Cambridge, 1995. MR 1369092 Zbl 1106.37301 q.v. 641, 643 [73] M. Lothaire, Algebraic combinatorics on words. A collective work by J. Berstel, D. Perrin, P. Seebold, J. Cassaigne, A. De Luca, S. Varricchio, A. Lascoux, B. Leclerc, J.-Y. Thibon, V. Bruyère, C. Frougny, F. Mignosi, A. Restivo, C. Reutenauer, D. Foata, G.-N. Han, J. Désarménien, V. Diekert, T. Harju, J. Karhumäki and W. Plandowski. With a preface by Berstel and Perrin. Encyclopedia of Mathematics and its Applications, 90. Cambridge University Press, Cambridge, 2002. MR 905123 Zbl 1001.68093 q.v. 642 [74] G. S. Makanin, The problem of solvability of equations in a free semigroup. Mat. Sb. (N.S.) 103(145) (1977), no. 2, 147–236, 319. In Russian. English translation, Math. USSRSb. 32 (1977), 128–198. MR 0470107 Zbl 0396.20037 q.v. 639

17. Profinite topologies

651

[75] S. Margolis, M. Sapir, and P. Weil, Closed subgroups in pro-V topologies and the extension problem for inverse automata. Internat. J. Algebra Comput. 11 (2001), no. 4, 405–445. MR 1850210 Zbl 1027.20036 q.v. 637 [76] J. McCammond, Normal forms for free aperiodic semigroups. Internat. J. Algebra Comput. 11 (2001), no. 5, 581–625. MR 1869233 Zbl 1026.20037 q.v. 640 [77] V. A. Molchanov, Nonstandard characterization of pseudovarieties. Algebra Universalis 33 (1995), no. 4, 533–547. MR 1331916 Zbl 0840.03055 q.v. 630 [78] A. Moura, Representations of the free profinite object over DA. Internat. J. Algebra Comput. 21 (2011), no. 5, 675–701. MR 2827197 Zbl 1243.20070 q.v. 641 [79] N. Nikolov and D. Segal, Finite index subgroups in profinite groups. C. R. Math. Acad. Sci. Paris 337 (2003), no. 5, 303–308. MR 2016979 Zbl 1033.20029 q.v. 626 [80] K. Numakura, Theorems on compact totally disconnected semigroups and lattices. Proc. Amer. Math. Soc. 8 (1957), 623–626. MR 0087032 Zbl 0081.25602 q.v. 625 [81] P. Perkins, Bases for equational theories of semigroups. J. Algebra 11 (1969), 298–314. MR 0233911 q.v. 630 [82] J.-É. Pin, Topologies for the free monoid. J. Algebra 137 (1991), no. 2, 297–337. MR 1094245 Zbl 0739.20032 q.v. 637 [83] J.-É. Pin, Eilenberg’s theorem for positive varieties of languages. Izv. Vyssh. Uchebn. Zaved. Mat. 1995, no. 1, 80–90. In Russian. English translation, Russian Math. (Iz. VUZ) 39 (1995), no. 1, 74–83. MR 1391325 Zbl 0852.20059 q.v. 630 [84] J.-É. Pin and C. Reutenauer, A conjecture on the Hall topology for the free group. Bull. London Math. Soc. 23 (1991), no. 4, 356–362. MR 1125861 Zbl 0754.20007 q.v. 636 [85] J.-É. Pin and P. V. Silva, On profinite uniform structures defined by varieties of finite monoids. Internat. J. Algebra Comput. 21 (2011), no. 1-2, 295–314. MR 2787462 Zbl 1238.20068 q.v. 623 [86] J.-É. Pin and P. V. Silva, A noncommutative extension of Mahler’s theorem on interpolation series. European J. Combin. 36 (2014), 564–578. MR 3131915 Zbl 1357.11116 q.v. 624 [87] J.-É. Pin and P. Weil, Profinite semigroups, Mal’cev products and identities. J. Algebra 182 (1996), no. 3, 604–626. MR 1398113 Zbl 0857.20040 q.v. 631 [88] J.-É. Pin and P. Weil, A Reiterman theorem for pseudovarieties of finite first-order structures. Algebra Universalis 35 (1996), no. 4, 577–595. MR 1392285 Zbl 0864.03024 q.v. 630 [89] T. Place and M. Zeitoun, Going higher in the first-order quantifier alternation hierarchy on words. In Automata, languages, and programming (J. Esparza, P. Fraigniaud, T. Husfeldt, and E. Koutsoupias, eds.). Part II. Proceedings of the 41st International Colloquium (ICALP 2014) held at the IT University of Copenhagen, Copenhagen, July 8–11, 2014. Lecture Notes in Computer Science, 8573. Springer, Berlin, 2014, 342–353. MR 3238384 Zbl 1407.03055 q.v. 637 [90] T. Place and M. Zeitoun, Separating regular languages with first-order logic. In Proceedings of the Joint Meeting of the Twenty-Third EACSL Annual Conference on Computer Science Logic (CSL) and the Twenty-Ninth Annual ACM/IEEE Symposium on Logic in Computer Science (LICS). Held in Vienna, July 14–18, 2014. Association for Computing Machinery, New York, 2014, Article no. 75, 10 pp. MR 3397696 Zbl 1401.68165 q.v. 640 [91] L. Polák, A classification of rational languages by semilattice-ordered monoids. Arch. Math. (Brno) 40 (2004), no. 4, 395–406. MR 2129961 Zbl 1112.68098 q.v. 630

652

Jorge Almeida and Alfredo Costa

[92] N. Pytheas Fogg, Substitutions in dynamics, arithmetics and combinatorics (V. Berthé, S. Ferenczi, C. Mauduit and A. Siegel, eds.). Lecture Notes in Mathematics, 1794. Springer, Berlin, 2002. MR 1970385 Zbl 1014.11015 q.v. 642 [93] J. Reiterman, The Birkhoff theorem for finite algebras. Algebra Universalis 14 (1982), no. 1, 1–10. MR 0634411 Zbl 0484.08007 q.v. 630 [94] J. Rhodes and B. Steinberg, Profinite semigroups, varieties, expansions and the structure of relatively free profinite semigroups. Internat. J. Algebra Comput. 11 (2001), no. 6, 627–672. MR 1880372 Zbl 1026.20057 q.v. 641, 645 [95] J. Rhodes and B. Steinberg, Closed subgroups of free profinite monoids are projective profinite groups. Bull. Lond. Math. Soc. 40 (2008), no. 3, 375–383. MR 2418793 Zbl 1153.20029 q.v. 645 [96] J. Rhodes and B. Steinberg, The q -theory of finite semigroups. Springer Monographs in Mathematics. Springer, New York, 2009. MR 2472427 Zbl 1186.20043 q.v. 631, 632, 644, 645 [97] L. Ribes and P. A. Zalesski˘ı, On the profinite topology on a free group. Bull. London Math. Soc. 25 (1993), no. 1, 37–43. MR 1190361 Zbl 0811.20026 q.v. 636 [98] L. Ribes and P. A. Zalesski˘ı, The pro-p topology of a free group and algorithmic problems in semigroups. Internat. J. Algebra Comput. 4 (1994), no. 3, 359–374. MR 1297146 Zbl 0839.20041 q.v. 637 [99] L. Ribes and P. A. Zalesski˘ı, Profinite groups. Ergebnisse der Mathematik und ihrer Grenzgebiete, 3. Folge, 40. Springer, Berlin, 2000. MR 1775104 Zbl 0949.20017 q.v. 627, 644, 645 [100] M. V. Sapir, On the finite basis property for pseudovarieties of finite semigroups. C. R. Acad. Sci. Paris Sér. I Math. 306 (1988), no. 20, 795–797. MR 0949035 Zbl 0658.20033 q.v. 630 [101] B. Steinberg, Inevitable graphs and profinite topologies: some solutions to algorithmic problems in monoid and automata theory, stemming from group theory. Internat. J. Algebra Comput. 11 (2001), no. 1, 25–71. MR 1818661 Zbl 1024.68064 q.v. 639 [102] B. Steinberg, On algorithmic problems for joins of pseudovarieties. Semigroup Forum 62 (2001), no. 1, 1–40. MR 1832252 Zbl 0980.20055 q.v. 635, 641 [103] B. Steinberg, A combinatorial property of ideals in free profinite monoids. J. Pure Appl. Algebra 214 (2010), no. 9, 1693–1695. MR 2593694 Zbl 1203.20052 q.v. 641 [104] B. Steinberg, Maximal subgroups of the minimal ideal of a free profinite monoid are free. Israel J. Math. 176 (2010), 139–155. MR 2653189 Zbl 1220.20045 q.v. 641, 645 [105] B. Steinberg, On the endomorphism monoid of a profinite semigroup. Port. Math. 68 (2011), no. 2, 177–183. MR 2849853 Zbl 1226.22003 q.v. 627 [106] M. Steinby, A theory of tree language varieties. In Tree automata and languages (M. Nivat and A. Podelski, eds.). Papers from the workshop held in Le Touquet, June 1990. Studies in Computer Science and Artificial Intelligence, 10. North-Holland Publishing Co., Amsterdam, 1992, 57–81. MR 1196732 Zbl 0798.68087 q.v. 630 [107] M. V. Volkov, On a class of semigroup pseudovarieties without finite pseudoidentity basis. Internat. J. Algebra Comput. 5 (1995), no. 2, 127–135. MR 1328547 Zbl 0834.20058 q.v. 630 [108] P. Walters, An Introduction to ergodic theory. Graduate Texts in Mathematics, 79. Springer, Berlin, 1982. MR 0648108 Zbl 0475.28009 q.v. 642 [109] S. Willard, General topology. Addison-Wesley, Reading, MA, 1970. MR 0264581 Zbl 0205.26601 q.v. 619, 620, 621, 624, 625

Chapter 18

The factorisation forest theorem Thomas Colcombet

Contents 1. 2. 3. 4. 5. 6.

Introduction . . . . . . . . . . . . . . . . Some definitions . . . . . . . . . . . . . . The factorisation forest theorem . . . . . . Algebraic applications . . . . . . . . . . . Variants of the factorisation forest theorem Applications as an accelerating structure .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

653 657 659 667 680 685

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

690

1. Introduction In automata theory, it is a very common and elementary argument to remark that, beyond a certain size, every run of a finite state automaton contains some repetition of a state. Once this repetition is witnessed, using copy and paste of the piece of run between these two occurrences, one can produce arbitrarily long valid versions of this run. This is the content of the “pumping lemma,” which is nothing but a direct consequence of the pigeonhole principle. The argument can also be used in the reverse way; whenever a run is too long, it contains a repetition of states, and it is possible to delete the piece of run separating these two occurrences. In this case, it is used for reducing the size of the input. These arguments are used in many situations. The first one is typically used for proving the impossibility of a finite state automaton to perform some task, such as recognising a given language. The second is to use it for proving the existence of small witnesses to the non-emptiness of a regular language, i.e., a small model property in the terminology of logic. These two arguments are among the most important and useful ones in automata theory. They illustrate in the most basic way the central importance of “finding repetitions” in this context. All the content of this chapter is about “finding repetitions” in a more advanced way. In some situations, the above argument of “state repetition” is not sufficient, and one is interested in finding “behaviour repetitions” of the automaton. A behaviour here is any piece of information associated with a word (usually belonging to a finite set of possible behaviours). A typical behaviour of a (non-deterministic finite state) automaton over a word u is the set of pairs .p; q/ such that the automaton has a run from p

654

Thomas Colcombet

to q while reading the word u. This set of pairs gathers all the relevant information concerning how the automaton can behave while reading the word, whatever the context into which the word is plugged. Given an input word, one can associate a behaviour with each factor of the word, and the theorem of Ramsey [34] tells us that every sufficiently long word contains long repetitions of identical behaviours: for each n, every sufficiently long word u can be decomposed into vu1 u2    un w;

()

in which all the words ui ui C1    uj for i 6 j exhibit the same behaviour. Let us emphasise the difference with the pumping argument given above. Indeed, a run is a labelling of the positions in a word, while behaviours label factors of the word. Nevertheless, the theorem of Ramsey is of a similar nature as the pumping lemma, as it relies on a pigeonhole principle. A famous use of this Ramsey argument in automata theory is the proof of closure under complement for Büchi automata [10]. This subject is addressed in Chapter 6. The factorisation forest theorem, which is the subject of this chapter, goes even one step further. Not only does it establish the existence of the repetition of some behaviour (as the theorem of Ramsey does), but it completely factorises each word into a structure (a factorisation tree) which exhibits repetitions of behaviours everywhere. The theorem can be understood as a nested variant of the theorem of Ramsey. Consider some input word. A shown in equation (), a single use of the theorem of Ramsey splits the word into several sub-words corresponding to the same behaviour (plus two words for the extremities). However, each of these sub-words can itself also be very long, and one could again use the theorem of Ramsey on each of them, thus providing a sub-decomposition. In fact, one would like to iterate this process until the word is entirely decomposed, in the sense that the remaining words are just isolated letters that we are not interested in factorising. The result of this process can be represented as a tree, the root of which is the initial word, which has as children the words obtained by the first application of Ramsey, etc. Such trees are called Ramsey factorisation trees in this chapter. In general, there is not much one can guarantee about such an iteration. For instance, there is a priori no upper bound on the number of iterations required to completely decompose the word. What Simon’s factorisation forest theorem teaches us is that, under the correct assumptions, this induction needs only be iterated a bounded number of times. Said differently, there is a bound such that every input word admits a Ramsey factorisation tree of height at most this bound. The required assumption is that the behaviours are equipped with a finite semigroup structure, and that the labelling of the input word by behaviours is consistent with this structure. This means that the behaviours S are equipped with an associative product , such that if u has behaviour a and v has behaviour b , then uv has behaviour a b (simply denoted ab ). Formally, it amounts to require that the mapping from words to behaviours is a morphism of semigroups. The factorisation forest theorem can then be presented as follows.

18. The factorisation forest theorem

655

Factorisation forest theorem (Simon [38]). For all finite semigroups S and all morphisms ˛ from AC to S , there exists a bound k such all words in AC have a Ramsey factorisation tree of height at most k . Though very close in spirit, the factorisation forest theorem and the theorem of Ramsey are not directly related. Simon’s theorem is weaker since it requires an extra hypothesis, namely that the behaviours be equipped with a semigroup structure, and this is a very strong assumption. But under this assumption, the factorisation forest theorem gives a much more precise result than the theorem of Ramsey. Technically, the two results are also proved using very different arguments. The theorem of Ramsey is proved by successive extraction processes, i.e., an extended pigeon-hole principle, while the proof of the factorisation forest theorem is based on algebraic arguments involving the theory of semigroup ideals (the Green’s relations). To conclude, the factorisation forest theorem is to be used when arguments based on the pigeonhole principle and the theorem of Ramsey are no longer sufficient. The price to pay is to provide a semigroup structure for describing the problem. This is often the case when problems arise from automata or logic. Related work. The factorisation forest theorem was stated and established by Simon [38] as a generalisation of the lemma of Brown, see [8] and [9] about locally finite semigroups (see § 4.2). Simon gave another proof of the factorisation forest theorem [39]. Other proofs improving on the bounds have later been given in [13], [16], and [26]. A survey on the factorisation forest theorem has be written by Bojańczyk [3] and another simple proof can be found in the following introduction to Green’s relations [17]. Simon used the theorem to prove the decidability of the finite closure property in the tropical semiring. The tropical semiring is the set of non-negative integers N augmented with infinity and equipped with addition as product, and minimum as sum. The finite closure problem is, given a finite set of square matrices of the same size over the tropical semiring, to determine if their closure under product is finite. Simon proved the decidability of this problem in [36]. The finite section problem is more general, and consists in determining precisely what entries in the matrices can take arbitrary high values. This problem is equivalent to an automaton-related problem: the limitedness of distance automata. Distance automata are non-deterministic finite state automata with weak counting capabilities. These form an instance of weighted automata, as presented in Chapter 4, and more particularly an instance of automata weighted over tropical semirings (very close to the max-plus automata presented in Chapter 5). Such automata compute functions, and the problem of limitedness of distance automata consists of determining whether the function computed by a given distance automaton is bounded. Hashiguchi established the decidability of this problem in [24]. Several proofs are known of this result. Leung proposed a very natural algorithm for solving this problem, the proof of correctness of which is rather complex [28] and [29]. Simon gave a simplified proof of Leung’s algorithm using the forest factorisations theorem [40]. The decidability status of this problem for real weighted automata has a different status, and

656

Thomas Colcombet

is presented in Chapter 5. Finer results in the analysis of the behaviour of quantitative automata make use of the theorem of factorisation forests. This is, for instance, the case for computing the asymptotic behaviour of min-plus and max-plus automata [20] and [21]. In the same vein, the theory of cost-functions over finite words, an extension of regular languages with quantitative capabilities, as well as its extensions, make heavy use of variants of the factorisation forest theorem [19] and [18]. Another application of the factorisation forest theorem is in the characterisation of certain classes of languages. For instance, it has been used by Pin and Weil for giving an effective characterisation of the polynomial languages [32]. Polynomials languages are languages describable as a finite sum of languages of the form A0 a1 A1 a1    an An in which a1 ; : : : ; an are letters, and A1 ; : : : ; An are sets of letters. It is possible to characterise the syntactic ordered monoids of languages that are described by polynomials. The technique has been used for an extended result concerning the polynomial closure of varieties in [7]. Inspired by these works, a similar technique has also been used for characterising another pseudovariety of regular languages by Almeida and Klíma [1]. More recently, Place and Zeitoun established using this approach that deciding the separation question for the nth level of the quantifier alternation hierarchy of word languages was sufficient for deciding the membership to level n C 1, see [33]. The factorisation forest theorem, in its original form, can only be used for finite words. There exists a variant of this theorem that applies to infinite words [15]. This theorem was in particular used for developing the algebraic theory of the regular languages of words of countable length [11]. Another variant of the theorem is the deterministic version [14], which is usable in some situations over trees. It was used in [14] for reducing MSO-formulas over trees to †2 -definability. It was used for fast query evaluation for XML queries [5], [6], and [4]. It was also used for giving another proof of the fact that MSO-queries over trees can be enumerated with constant delay [25]. Content and structure of the chapter. § 2 is devoted to definitions. Words and semigroups are introduced in § 2.1, and a presentation using linear orderings and additive labellings of words and morphisms is given in § 2.2 (this is a point of view is inspired from model-theory and turns out to be very convenient in the development of the paper). In § 2.3, some more advanced, yet classical, definitions and results concerning semigroups are given; in particular Green’s relations are introduced and some standard results recalled. § 3 states the factorisation forest theorem in several ways, and establishes it. Our reference statement, using “Ramsey splits” is stated in § 3.1, and proved in the subsequent § 3.2. The original presentation is given in § 3.3, and questions of optimality of the bound are addressed in § 3.4. § 4 is devoted to giving algebraic applications. In order to shorten the proofs we first provide in § 4.1 two variant presentations of the factorisation forest theorem, often easier to use, namely Theorems 4.1 and 4.2. In § 4.2, we state and establish Brown’s lemma, that we use in § 4.3 for showing the decidability of the finite closure property.

18. The factorisation forest theorem

657

We then establish the more advanced result of decidability of the bounded section for matrices in the tropical semiring (§ 4.4). We finally show the link with language theory with the question of characterising the polynomial languages in § 4.5. § 5 introduces variants of the factorisation forest theorem. The first variant, the deterministic factorisation forest theorem, is the subject of § 5.2. The second variant is one which can be used over all (meaning possibly infinite) linear orderings: it is the subject of § 5.1. Finally, some applications of the factorisation forest theorem as an “accelerating structure” are presented in § 6. The first one is mainly an example, and can be used to evaluate the value of an infix of a word in constant time (§ 6.1). The second one makes use of the deterministic variant of the theorem for showing a result concerning MSO-definable relations in trees (§ 6.2).

2. Some definitions 2.1. Semigroups and monoids. The notions of semigroups, monoids, morphisms for them, as well as submonoids, subsemigroups, and idempotent elements are introduced in Chapter 1. For a thorough introduction to semigroups, we refer the reader to the monographs [27] and [31]. Given a semigroup S D S , S denotes the unique semigroup morphism from S C (the words written over S seen as the alphabet) to S which coincides with the identity on letters, i.e., S .a/ D a and S .ua/ D S .u/a, where a 2 S . Let E.S / D ¹e 2 S j ee D eº denote the set of idempotents of S . For simplicity, we often omit the S subscript and simply write  . Recall that given a semigroup S , S 1 denotes the monoid S itself if S is a monoid, or the semigroup S augmented with a new neutral element 1 otherwise, thus making S a monoid. Finally, given a set A  S , hAiS is the least subsemigroup of S which contains A. It is equal to .AC /. We use the same notations for monoids. 2.2. Linear orderings and multiplicative labellings. A linear ordering is a set equipped with a total order. Apart from § 5.1, we will only consider finite linear orderings. Typically, given a word u D a1    an , we consider its domain dom.u/ D ¹1; : : : ; nº (we can see a word as a function from its domain to its alphabet) and its set of cuts cuts.u/ D ¹0; : : : ; nº. A cut is a position between letters. The cut i for i D 1; : : : ; n 1 represents the position between letters i and i C 1. The cut 0 represents the beginning of the word, and the cut n the end of the word. Cuts among 1; : : : ; n 1 are called inner cuts. The set of inner cuts is inner-cuts.u/. Given two cuts i < j , the factor between positions i and j is ui;j D ai C1 ai C2    aj . Let ˛ be a linear ordering and S a semigroup. A multiplicative labelling 1 is a mapping  from the set of ordered pairs .x; y/ 2 ˛ 2 such that x < y to S such that .x; y/.y; z/ D .x; z/

for all x < y < z in ˛ .

1 It is called an additive labelling in the context of the composition method [35].

Thomas Colcombet

658

Given a semigroup morphism ' from AC to some semigroup S and a word u in AC , there is a natural way to construct a multiplicative labelling 'u from cuts.u/ to S as follows. For every two cuts x < y in cuts.u/, set def

'u .x; y/ D '.ux;y /:

This mapping is naturally a multiplicative labelling since for all x < y < z in cuts.u/, 'u .x; y/'u .y; z/ D '.ux;y /'.uy;z / D '.ux;y uy;z / D '.ux;z / D 'u .x; z/:

This view using linear orderings and multiplicative labellings rather than words and morphisms is non-standard. It has several advantages in the present context. A first technical advantage is that some operations are easier to describe; for instance, restricting a multiplicative labelling to a sub-ordering is straightforward (this is used several times in the main proof in § 3.2). Another advantage is that its extension to infinite linear orderings is more natural than the use of infinite words (see § 5.1). 2.3. Standard results on the structure of finite semigroups. In this section, we recall some basic definitions and gather results concerning finite semigroups. The reader can refer to the monographs [27] and [31] for more details on the subject. Green’s relations are also presented in Chapter 27. Green’s relations are form an essential tool in the understanding of the structure of semigroups. In our case, this is a key ingredient in the proof of the factorisation forest theorem. Apart from this proof, Green’s relations are not used in the chapter. In fact, one way to see the factorisation forest theorem is as a convenient and easy to use result which gives access to non-trivial consequences of the theory of Green’s relations. The Green’s relations in a semigroup S are defined for all a; b 2 S as follows: a 6L b

a 6R b a 6J b a 6H b

if a D cb for some c in S 1 ; 1

if a D bc for some c in S ; 0

0

1

if a D cbc for some c; c in S ; if a 6L b and a 6R b;

aLb

if a 6L b and b 6L a;

aRb

if a 6R b and b 6R a;

aJb

if a 6J b and b 6J a; if a L b and a R b:

aHb

Fact 1. Let a; b; c be in S . If a L b then ac L bc . If a R b then ca R cb . For every a; b in S , there exists c such that a L c R b if and only if a R c 0 L b for some c 0 . As a consequence of the last equivalence, we define the last of Green’s relations: aDb

if a L c R b for some c in S; equivalently if a R c 0 L b for some c 0 in S:

The key result is the following (here the hypothesis of finiteness of S is mandatory). Fact 2. D D J.

For this reason, we refer from now on only to D and not J. However, we will use the preorder 6J (which is an order over the D-classes). An element a in S is called regular if asa D a for some s in S . A D-class is regular if all its elements are regular.

18. The factorisation forest theorem

659

Fact 3. A D-class D is regular, if and only if it contains an idempotent, if and only if every L-class in D contains an idempotent, if and only if every R-class in D contains an idempotent, if and only if there exists a; b in D such that ab 2 D . Fact 4. For every a; b in D such that ab 2 D , a R ab and b L ab . Furthermore, there is an idempotent e in D such that a L e and b R e . Fact 5. All H-classes in a D-class have the same cardinality. Fact 6. Let H be an H-class in S . Either for all a; b in H , ab 62 H ; or for all a; b in H , ab 2 H , and furthermore .H; / is a group.

3. The factorisation forest theorem In this section, we give various statements for the factorisation forest theorem. The description begins with a formulation in terms of splits, Theorem 3.1, in § 3.1. Its proof is the subject of § 3.2. We then describe in § 3.3 the result in the original terminology of Simon using Ramsey factorisation trees, Theorem 3.4. In § 3.4, we provide some optimality considerations relative to the bound involved in these theorems. Note that other presentations of variants of this theorem are also present in the subsequent sections. In § 4, algebraic presentations are given, namely Theorems 4.1 and 4.2. Finally, an infinitary variant of the factorisation forest theorem, Theorem 5.1, is presented in § 3.4, and a deterministic variant, Theorem 5.3, in § 5.2. 3.1. A statement via splits. A split of height h, h being a non-negative integer, over a linear ordering ˛ , is a mapping from the positions of ˛ to ¹1; : : : ; hº. A split s induces an equivalence relation s over ˛ defined for all x 6 y in ˛ by if s.x/ D s.y/ and s.x/ > s.z/ for all x 6 z 6 y:

x s y

A split s of height h is called normalised if s.min ˛/ D h. In the following drawing, the points of the linear ordering .¹0; : : : ; 13º; x . By construction, we have both .y; x/ 2 L.x/ and .x; z/ 2 R.x/. Since furthermore .y; x/.x; z/ D .y; z/ 2 D , using Fact 4, there exists an idempotent e 2 L.x/ \ R.x/. This means by Fact 6 that L.x/ \ R.x/ is a group. The claim holds. Now let H1 ; : : : ; Hk be an enumeration of H-classes included in D that induce groups. We choose Hk D L.min ˛/ \ R.min ˛/, without loss of generality. Let n be the size of H1 (recall that all H -classes inside a D -class have same size according to Fact 5, and hence n is also the size of H2 ; : : : ; Hk ). Note finally that N.D/ D k n. Set Xi to be ¹x j L.x/ \ R.x/ D Hi º for all i D 1; : : : ; k . The Xi ’s are disjoint by construction, and furthermore, according to the above claim, their union equals ˛ . For each i D 1; : : : ; k , .Xi /  Hi . Thus by Lemma 3.2, there exists a normalised split si for ˛jXi of height n which is Ramsey for jXi . Now define the split s for all x 2 ˛ as follows: s.x/ D si .x/ C .i

1/n;

in which i is such that x 2 Xi :

Thomas Colcombet

662

Let us prove that s is Ramsey. Consider an equivalence class C for s . By construction of s , there is some i such that C  Xi . Hence, C is also an equivalence class for si . Since si is Ramsey for jXi by construction, this means that there exists an idempotent e such that .x; y/ D e for all x < y in C . Hence s is Ramsey. Furthermore, the height of s is nk D N.D/, and since min ˛ 2 Xk by choice of Hk , and k is normalised, we get s.min ˛/ D nk D N.D/, i.e., s is normalised. Now, let us complete the proof of Theorem 3.1.

Proof. For every a 2 S , let a"J be ¹b j a 6J bº, and let M.a/ be the maximum of the P sum kiD1 N.Di / in the .Di /s range over all chains of D-classes D1 xi

1

j .xi

1 ; x/

6J aº

for all i > 1:

(If there is no such element x , the constructions stops.) In the end, a sequence x0 < x1 <    < xm

of elements in ˛ is produced. Let X D ¹x0 ; : : : ; xm º. One also defines Y1 ; : : : ; Ym to be the intervals of positions occurring between the xi ’s: formally, Y0 ; : : : ; Ym are defined such that the union of X; Y0 ; : : : ; Ym is ˛ , and x0 < Y0 < x1 < Y1 < x2 <    < xm < Ym

(note that some of the sets Y0 ; : : : ; Ym may be empty).

Remark a. For all i; j such that 0 6 i < j 6 m, one has .xi ; xj / D a. Indeed, a 6J .xi ; xj / since .˛/  a"J by assumption. Furthermore, .xi ; xi C1 / 6J a by construction of xi C1 . Hence .xi ; xj / 6J a also holds. Said differently, .X /  D.a/. Remark b. For all i such that 0 6 i 6 m, we have .¹xi º [ Yi / \ D.a/ D ;. This comes from the minimality argument in the choice of each xi .

Case 1. Let us assume first that a is regular. The principle of the construction is to use Lemma 3.3 over X , apply the induction hypothesis for each of the Yi ’s, and combine the obtained splits. Set N to N.D.a//, and M to M.a/. By Remark a, .X /  D.a/. Thus one can apply Lemma 3.3, and get a normalised Ramsey split s 0 of height N for jX . Thanks to Remark b, one can use the induction

18. The factorisation forest theorem

663

hypothesis (second item) on j¹xi º[Yi , for all 0 6 i 6 m. We obtain a Ramsey split si for jYi of height at most M N . We combine the splits s 0 ,s1 ,. . . into a split s as follows: ´ s 0 .x/ C M N if x 2 X; s.x/ D for all x 2 ˛ . si .x/ for x 2 Yi otherwise,

It is clear that, since s 0 is normalised, the same holds for s . Let us show that this split is Ramsey. Consider an equivalence class C for s . We distinguish two sub-cases. If s.x/ > M N for some x 2 C , this means that the first case in the definition of s.x/ is used for all elements of C . Hence, C  X , and C is an equivalence class for s 0 . Since s 0 is Ramsey, there exists an idempotent e such that .x; y/ D e for all x < y in C . Otherwise, s.x/ 6 M N for all x 2 C . In this case, it is not possible that C contains two elements which are separated by an element from X . We deduce that C  Yi for some i . Furthermore, since s and si coincide over Yi , the class C is also an equivalence class for si . As si is Ramsey by construction, this means that there exists an idempotent e such that .x; y/ D e for all x < y in C . Overall, s 0 is Ramsey for  . This completes the proof of the second item of the induction hypothesis, as well as the first item when a is regular.

Case 2. It remains the case when a is irregular. We claim first that jX j 6 2. Indeed, assume the opposite for the sake of contradiction. This would mean that .x0 ; x1 /, .x1 ; x2 / and .x0 ; x2 / D .x0 ; x1 /.x1 ; x2 / belong all to D.a/. By Fact 3, this means that D.a/ is a regular D-class, contradicting the irregularity of a. This establishes the claim. We can now define the split s for all x 2 ˛ n ¹x0 º of height M as follows: ´ M if x D x1 , s.x/ D for all x 2 ˛ . si .x/ for x 2 Yi otherwise,

Let us show that this split is Ramsey. Consider an equivalence class C for s . Again we distinguish two cases. If s.x/ > M , for some x 2 C , this means that x D x1 , and hence C D ¹x1 º. Otherwise s.x/ 6 M 1 for some x 2 C . The same argument as in the first case of the induction hypothesis can be used. Overall, s is Ramsey for j˛n¹min ˛º . Thus the induction hypothesis holds for all elements a. We can use it to establish Theorem 3.1. Let a be some element in the minimal J-class of S . This means S D a"J . Let ˛ be a finite linear ordering, and  a multiplicative labelling of ˛ by S . Since S D a"J , and a is regular, one can apply the second item of the induction hypothesis on  : there exists a normalised Ramsey split for  of height at most M.a/ D N.S /. Let us now turn to the original statement as proposed by Simon.

3.3. The original statement using factorisation trees. Theorem 3.1 is stated in terms of splits as in [16]. The original statement of Simon [38] uses a different presentation that we describe in this section.

Thomas Colcombet

664

Fix an alphabet A and a semigroup morphism ' from AC to a finite semigroup S . A factorisation tree is an unranked ordered tree in which each node is either a leaf labelled by a letter, or an internal node. The value of a node is the word obtained by reading the leaves below it from left to right. A factorisation tree of a word u 2 AC is a factorisation tree with value u. The height of the tree is defined as usual, with the convention that the height of a single leaf is 0. A factorisation tree is Ramsey (for ' ) if every node either 1. is a leaf, or 2. has two children, or 3. the values of its children are all mapped by ' to the same idempotent e 2 E.S /. Figure 1 presents a Ramsey factorisation tree for the word 1112022201212 over the alphabet ¹0; 1; 2º, with respect to the natural morphism to Z=3Z. Each non-leaf node of the tree is depicted as an horizontal line. The only node which satisfies property 3 is highlighted in a grey surrounding. One can check that indeed, the image by the morphism of the value of each child of this node is 0.

1

1

1

2

0

2

2

2

0

1

2

1

2

Figure 1. A Ramsey factorisation tree in Z=3Z

The factorisation forest theorem reads as follows, in which N.S / is the value introduced in Theorem 3.1: Theorem 3.4 (factorisation forest [38]). For all alphabets A, all semigroup morphisms ' from AC to a finite semigroup S , and all words u 2 AC , there exists a Ramsey factorisation tree for u; ' of height at most 3N.S / 1. The various references given for this result differ in the value of the bound k D 3N.S / 1. In the original proof of Simon [38], the bound was k D 9jS j. Simon gave then a simplified proof [39] yielding a worse bound of 2jSjC1 2 (this proof relied on the Krohn–Rhodes decomposition theorem). A bound of k D 7jS j was obtained by Chalopin and Leung [13]. A bound of 3jS j is given in [15] and [16]. The optimal bound is 3jS j 1 was obtained by Kufleitner [26], see also [17]. Since N.S / 6 jS j the present result improves on the bound of 3jS j 1 to 3N.S / 1. This better bound is essentially obtained by a more careful analysis of the complexity of the construction. Lemma 3.5 describes the relationship between Ramsey splits and Ramsey factorisations. Using it, Theorem 3.4 immediately follows from Theorem 3.1 (recall the definitions from § 2.2).

18. The factorisation forest theorem

665

Lemma 3.5. Let A be an alphabet, let ' be a morphism from AC to a finite semigroup S , and let u 2 AC be a word. Then a. every Ramsey factorisation tree of height k of u induces a Ramsey split of height at most k for 'u jinner-cuts.u/ ; b. every Ramsey split of height k for 'u induces a factorisation tree of height at most 3k of u, of height 3k 1 if the split is furthermore normalised. Proof. For (a), we set the value s.x/ of the split for x 2 inner-cuts.u/, say for x the cut between letter i and letter i C 1 in u, to be the maximal depth of a node that has the i th and the .i C 1/th letter below it. It is not difficult to see that this defines a split of height at most k , and that it is Ramsey for 'u jinner-cuts.u/ . For (b), note that the only class of split value 1 factorises the word u into u D u0 u1    u` in such a way that '.u1 / D    D '.u` 1 / is an idempotent. Hence we construct the prefix of a tree as in the following diagram:

u0

u1

u2

:::

u`

1

u`

and then proceed inductively with the subwords u0 ; : : : ; u` . At the end, we get a Ramsey factorisation tree, and its height is at most 3k . Furthermore note that, if the split is normalised, there is no need to use the root node of the above gadget for the highest -class. Hence the height is at most 3k 1.

3.4. Optimality of the bound. We have seen a bound of N.S / in Theorem 3.1, and a bound of 3N.S / 1 for Theorem 3.4. The question we are addressing in this section is whether this bound is optimal. This question has been the source of some investigations [13] and [26]. Indeed, in some applications, this parameter may have a significant complexity impact (see the applications in § 4 and § 6). It is also natural that a better understanding of this parameter requires a better understanding of the structure of semigroups. This remark itself justifies the interest in this question. Chalopin and Leung [13] and Kufleitner [26] derived lower bounds. The following result of Kufleitner shows that the bound of 3jS j 1 of Theorem 3.4 is optimal for groups (in the case of groups, N.S / D jS j).

Theorem 3.6 ([26]). Let G be non-trivial finite group and 'W G C ! G be its evaluation morphism. There exists a word w 2 G C such that all Ramsey factorisation trees of w have height at least 3jGj 1. In combination with Lemma 3.5, we deduce the optimality of Theorem 3.1 from Theorem 3.6. Corollary 3.7. For all non-trivial finite groups G there exists a multiplicative labelling  from a finite linear ordering to G such that every Ramsey split of  has height at least jGj.

666

Thomas Colcombet

Proof. Consider the word w from Theorem 3.6 and the corresponding multiplicative labelling  D 'w . For the sake of contradiction, assume that there is a Ramsey split of height jS j 1 for  . By Lemma 3.5 this means that there exists a Ramsey factorisation tree of height at most 3.jS j 1/ for w; ' , contradicting Theorem 3.6. In this chapter, we have given an optimised result, with a bound of N.S / in terms of splits (Theorem 3.1), and 3N.S / 1 in terms of the factorisation forest (Theorem 3.4). In some cases N.S / D jS j; this is the case, for instance, when S is a group, but not only (consider for instance the semigroup .¹1; : : : ; nº; max/). However, it can also happen that the gap between N.S / and jS j can be arbitrarily high: for instance, consider, for each positive integer n, the semigroup Sn with elements ¹a1 ; : : : ; an ; 0º for which the product is defined by 0x D x0 for all x , and ai aj is ai if j D i , and 0 otherwise. This semigroup has size n C 1, but N.Sn / D 2 for all n. This shows that a careful analysis can drastically improve the original upper bound of jS j. However, one can still wonder whether the bound N.S / is optimal. More precisely, given a semigroup S , does there exist always a multiplicative labelling such that no split Ramsey for it has height less than N.S /? The answer to this question is negative. Consider for instance the semigroup Sn D .¹1; 2; : : : ; n 1; 1º; C/ (in which the sum is defined in the natural way). Then N.S / D jS j D n. However, for every multiplicative labelling from a (finite) linear ordering to Sn there exists a Ramsey split of height at most dlog2 ne C 2. We give a proof using factorisation trees (this extend to splits using Lemma 3.5). Note that (a) in this semigroup, every word of length greater than n has value 1. Note that (b) in any semigroup, every word of size at most k admits a factorisation tree of height at most dlog2 ke (using a balanced binary factorisation tree of logarithmic height). Combining these remarks, we can construct a factorisation for every word u as follows. One factorises u into u1    u` v in which ju1 j D    D ju` j D n, and jvj < n. By remark (b), all words u1 ; : : : ; u` ; v admit Ramsey factorisation trees t1 ; : : : ; t` ; t 0 of height at most dlog2 ne. Furthermore, by Remark (b) all words u1 ; : : : ; u` have same value 1 (which is an idempotent). This means that one can construct a Ramsey factorisation tree of height dlog2 ne C 2 for it: the root is binary, the right child being the root of t 0 , and the left child being an idempotent node with n children, which are the roots of respectively t1 ; : : : ; t` . It is clear that this tree is a Ramsey factorisation for u, and also that it has height at most dlog2 ne C 2. Thus, the question of characterising the optimal bound for the factorisation forest theorem is still open. Kufleitner gives a finer analysis of the bound for aperiodic semigroups using factorisation trees. Indeed, the result is optimal for groups. What about group-trivial semigroups? The answer is that it is possible to obtain a better upper bound in this case: Theorem 3.8 ([23] and [26]). For every aperiodic (i.e., group-trivial) semigroup S , and every morphism from AC to S , every word u 2 AC admits a Ramsey factorisation tree of height at most 2jS j. Furthermore, for each n, there exists an aperiodic semigroup of size n such that this bound is optimal.

18. The factorisation forest theorem

667

4. Algebraic applications The purpose of this section is to give algebraic consequences to the factorisation forest theorem. In these applications, we deliberately chose to use other presentations of the result, which are at the same time weaker (we lose the information concerning the bound), but much easier to apply (no more trees nor splits). In § 4.1, we give two other presentations of the theorem of factorisations forest, namely Theorems 4.1 and 4.2. Then, our first application, one of the original motivations of Simon, is to provide an elementary proof of Brown’s lemma, Lemma 4.4, concerning locally finite semigroups. In § 4.3, Brown’s lemma is used in a proof of the finite closure problem, Theorem 4.6, a theorem due to Simon stating that it is possible to decide if the closure under product of a set matrices over the tropical semiring is finite. In § 4.4, we establish a significantly stronger result due to Hashiguchi, Theorem 4.10, stating that the boundedness of the function computed by a distance automaton is decidable. Finally, § 4.5 provides an algebraic application of the theorem of factorisation forest of a different nature, namely for the characterisation of languages definable by polynomials. 4.1. An algebraic presentation. In this section, we give two other equivalent presentations of the factorisation forest theorem. Depending on the context, the various presentations may prove easier to use. In particular, the two presentations avoid the use of trees or splits. The first presentation below is particularly interesting when one is interested in effectively computing a presentation for the semigroup generated by a given subset of a monoid. In this survey we do not present any examples of this kind of applications. It is used for instance in [20] and [21] in which the asymptotic behaviour of automata over the tropical semiring are studied with much greater precision than what is done here. Theorem 4.1. Let S be a semigroup, let ' be a semigroup morphism from S to a finite semigroup T , and assume X  S . Let Xn be defined as follows: X0 D X; XnC1 D Xn [ Xn Xn [

[

hXn \ '

1

.e/iS

for all n > 0:

e2E.S/

Then hX iS D X3N.T / 1 . Proof. It is clear, by induction on n, that Xn  hX i. Quite naturally, the proof of the opposite inclusion is by induction on the height of factorisation trees. For all n > 0, define Yn  S as follows: Yn D ¹S .u/ j u 2 X C ;

u has a Ramsey factorisation tree of height at most nº;

where the Ramsey factorisation is with respect to the morphism ' ı S from X C to T .

Thomas Colcombet

668

Let us show by induction on n that Yn  Xn . Assuming this, Theorem 3.4 implies that hX i  X3N.T / 1 , and the results follow. The induction remains to be established. Note first that we have X0 D X D Y0 when n D 0. Now consider some n > 0, and let a 2 YnC1 . We will show that a 2 XnC1 . By definition, there exists a Ramsey factorisation T of height at most n C 1 for some word u 2 X C with S .u/ D a. There are three cases. If T is restricted to a leaf, then u is in fact a single letter word. Hence, a belongs to X  XnC1 . Otherwise, assume the root of T is a binary node. Then u can be decomposed as vw , such that S .v/ 2 Xn and S .w/ 2 Xn . It follows, by the induction hypothesis and the definition of XnC1 that a D S .u/ D S .v/S .w/ 2 Xn Xn  XnC1 . Finally, assume the root of T is an idempotent node. This means that u can be decomposed as v1    vk such that there exists an idempotent e with '.S .vi // D e for all i , and S .vi / 2 Yn for all i . Hence, in combination with the induction hypothesis, we have that S .vi / 2 Xn \ ' 1 .e/ for all i . Thus a D S .u/ D S .v1 /    S .vk / 2 hXn \ ' 1 .e/i  XnC1 by definition of XnC1 . In fact, a closer inspection reveals that the above theorem is equivalent to the forest factorisation theorem. Indeed, a similar inductive proof establishes that Yn D Xn for all n where Yn is as in the above proof. Thus, if one applies Theorem 4.1 to S D AC , one directly deduces Theorem 3.4. Our second variant can be understood as follows. Theorem 4.1 can be seen as an iteration reaching a least fix-point. Theorem 4.2 formalises differently this view of the result. Theorem 4.2. Let S be a semigroup, ' be a semigroup morphism from S to a finite semigroup T , and assume X  S . Then every family P  P.S / such that 1. 2. 3. 4.

for all a 2 T , ' 1 .x/ 2 P , for all A; B 2 P , A [ B 2 P , for all A; B 2 P , AB 2 P , and for all A 2 P with f .A/ D ¹eº for some idempotent e 2 T , hAiS 2 P

satisfies hX i 2 P .

Remark 4.3. In practice, instead of (1), we will frequently use the following slightly stronger condition: 10 . for all A  B 2 P , A 2 P , 100 . X 2 P .

It is clear that (10 ) and (100 ) together imply (1). Proof. Let us assume first that P is minimal such that it satisfies conditions (1)–(4). We first claim that every A 2 P has the restriction property, i.e., for all c 2 T , then A \ ' 1 .c/ 2 P [ ¹;º. It is sufficient for us to prove that the restriction property is preserved under (1)–(4). Indeed, if A D ¹x 2 X j '.x/ D aº 2 P , then, clearly, A \ ' 1 .c/ equals A if c D a, or ;. This settles case (1). Now consider the case A [ B .

18. The factorisation forest theorem

669

It is clear that if both A and B have the restriction property, then A [ B also has the property, since .A [ B/ \ ' 1 .c/ D .A \ ' 1 .c// \ .B \ ' 1 .c// 2 P (by (2)). This proves case (2). Now consider the case AB . We have [ .AB/ \ ' 1 .c/ D .A \ ' 1 .a//.B \ ' 1 .b//: abDc

Thus, assuming that A; B have the restriction property, using (3), AB also has the restriction property. This establishes the case of (3). Finally, assume A 2 P and '.A/ D ¹eº for some idempotent e . Then clearly, if c ¤ e , then hAi \ ' 1 .c/ D ;. Otherwise, when c D e , we have hAi \ ' 1 .c/ D hAi 2 P . Hence hAi has the restriction property, which is the case (4). It follows that every A 2 P (using the minimality assumption) has the restriction property. The claim is established. Now let the Xn be as in Theorem 4.1 In this case, let us prove by induction on n that Xn 2 P . For n D 0, from (1) and (2), we have X0 D X 2 P . Otherwise, assume Xn 2 P . Then clearly, using properties (1)–(4) and the above claim, XnC1 2 P (the claim is mandatory for proving that if Xn 2 P , then Xn \ ' 1 .e/ 2 P ). It follows, using Theorem 4.1, that hX i D X3N.T / 1 2 P . Now consider some P 0 that satisfies conditions (1)–(4) (without any minimality assumption). This means P  P 0 (in which P is minimal). One has hX i 2 P  P 0 . This establishes the general case. Once more, it is easy to show that this result is equivalent to the forest factorisation theorem, as far as the precise bound of 3N.S / 1 is not concerned. 4.2. Brown’s lemma. In this section we show how to derive Brown’s lemma from the above result. Extending Brown’s lemma was one of the motivations of Simon when introducing the factorisation forest theorem. A semigroup S is locally finite if every finite subset X  S generates a finite subsemigroup hX iS . Brown’s theorem is stated as follows:

Lemma 4.4 ([9]). Let f W S ! T be a semigroup morphism. If T is locally finite and f 1 .e/ is locally finite for every idempotent e 2 T , then S is locally finite.

Proof. Let f; S and T be as in the statement of the theorem. Let X  S be finite. We want to show that hX iS is finite. Let T 0 D f .hX iS /. Since f .X / is finite and T is locally finite, we get that T 0 D f .hX iS / D hf .X /iT is finite. Let P be the set of finite subsets of hX iS . Clearly, P satisfies conditions (10 ), (100 ), (2), and (3) of Theorem 4.2 and Remark 4.3. Let us establish the missing condition (4). Consider A 2 P such that f .A/ D ¹eº. This means A  f 1 .e/. Since by hypothesis f 1 .e/ is locally finite and A is finite, it follows that hAiS is finite, i.e., hAiS 2 P . Using Theorem 4.2 we obtain hX iS 2 P , i.e., hX iS is finite. Hence, S is locally finite. 4.3. The finite closure property in the tropical semiring. In this section, we show how to use Brown’s lemma for deciding the finite closure problem in the tropical semiring. In the next section, we extend these techniques to solve a more general result, this time using the factorisation forest theorem. This theory is nicely surveyed in [37].

670

Thomas Colcombet

Semirings were introduced in Chapter 1. Here we consider the tropical semiring T D .N[¹1º; min; C/, (also called the min-plus-semiring). We use standard notation for matrices over this semiring. Let Tmn denote the matrices of size m n. Square matrices of same size over a semiring themselves form a semiring when equipped with the usual multiplication and sum. In this section, we consider the multiplicative semigroup of n  n-square matrices over the tropical semiring, denoted Tnn . The finite closure problem is the following: Input: A positive integer n and matrices A1 ; : : : ; Ak 2 Tnn . Output: “Yes” if the set hA1 ; : : : ; Ak iTnn is finite; “No” otherwise. We prove below that this problem is decidable. On the way we show that the corresponding Burnside problem admits a positive answer. More precisely, one says that a semigroup S is torsion if for every element x 2 S , hxiS is finite. It is clear that every finite semigroup is both finitely generated and torsion. The Burnside problem consists of determining for which semigroups the converse holds. The proof of Simon shows that the Burnside problem admits a positive answer for semigroups of matrices over the tropical semiring, i.e., a subsemigroup of Tnn is finite if and only if it is both finitely generated and torsion. Phrased differently: Theorem 4.5 ([36]). Every torsion subsemigroup of Tnn is locally finite. The corresponding decidability result is established at the same time: Theorem 4.6 ([36]). The finite closure property is decidable inside Tnn . The problem for the decidability proof is that the tropical semiring is infinite, which prevents exploring it entirely. For this reason, the essential argument in the proof consists in translating the question to a question concerning a finite algebraic object. Formally, one constructs a morphism from the tropical semiring to a finite semiring that forgets the exact values of the matrix entries. Let us consider the reduced semiring T1 D .¹0; 1; 1º; min; C/ (in which all operations are natural, and 1 C 1 equals 1). Given an element a 2 T, let aN denote S D 1 and aN D 1 in all other cases; i.e., one the reduced version defined by 0N D 0, 1 approximates every positive integer by 1. The function x is a morphism of semirings. This is the reason it extends in the usual way to matrices, yielding once more a morphism of semirings: the morphism that replaces every positive integer entry in a matrix by 1. Call a matrix A in Tnn idempotent if its image under x is an idempotent (of Tnn 1 ). Conversely, given an element a 2 ¹0; 1; 1º, and a positive integer k , let k  a denote the element a if a 2 ¹0; 1º, and k otherwise. We also extend this operation to matrices. Given a matrix A, let jjAjj denote the maximal positive integer entry it contains (or 1 if there is no such entry). An idempotent matrix A over T is called stable if the set hAiTnn is finite. Lemma 4.7. For every idempotent matrix A 2 Tnn , the following statements are equivalent:

18. The factorisation forest theorem

671

1. A is stable; 2. for all i; j such that Ai;j ¤ 1, there exists k such that Ai;k ¤ 1, Ak;k D 0 and Ak;j ¤ 1; 3. jjAp jj 6 2jjAjj for all p > 1. Proof. (1) H) (2). Assume that A is stable, and consider i; j such that Ai;j ¤ 1. Since A is idempotent, we have that Api;j ¤ 1 for all p > 1. Since furthermore A is stable, this means that Api;j can take only finitely many values when p ranges. Let m be the highest such value, i.e., Api;j 6 m for all p > 1. In particular, for p D .mC1/jQjC2; this is witnessed by the existence of i0 ; i1 ; : : : ; ip such that i0 D i , ip D j , and Ai0 ;i1 CAi1 ;i2 C  CAip 1 ;ip 6 m: Since p D .mC1/jQjC2, there exist 1 6 l < s < p such that Ai` ;i`C1 C    C Ais 1 ;is D 0, and i` D is . Using the idempotence of A, we get for k D i` that Ai;k ¤ 1, Ak;k D 0, and Ai;k ¤ 1. (2) H) (3). Assume (2) holds. For p D 1, (3) is obvious. Consider some p > 2. Let 1 6 i; j 6 n. If Ai;j D 0, we have Api;j D 0 by idempotence of A. The same holds for Ai;j D 1. Now if Ai;j 2 NC , then, by hypothesis, there exists k such that Ai;k ¤ 1, Ak;k D 0 and Ak;j ¤ 1. This means that the term Ai;k C Ak;k C    C Ak;k C Ak;j is involved in the minimum defining the value of Api;j . It follows that Api;j 6 2jjAjj. Overall, we obtain that Api;j 6 2jjAjj. Hence Ai;j 6 2jjAjj. (3) H) (1). Assume (3) holds. Each matrix B 2 hAi is such that both Bx D Ax (by idempotence) and jjBjj 6 2jjAjj (by item 3). Since there are only finitely many such matrices, hAi is finite. Hence A is stable. Corollary 4.8. Let A; B in Tnn with Ax D Bx, A is stable if and only if B is stable. Furthermore, the stability of an idempotent matrix is decidable. Thanks to the above corollary, it is meaningful to say that an idempotent matrix A over T1 is stable if there exists one matrix B 2 Tnn such that Bx D A is stable, or equivalently if this holds for every matrix B such that Bx D A. The core of the proof is embedded in the following lemma. Lemma 4.9. Given matrices A1 ; : : : ; Ak 2 Tnn , hA1 ; : : : ; Ak iTnn is finite if and only is stable. if every idempotent matrix in hA1 ; : : : ; Ak iTnn 1 Proof. Let C denote hA1 ; : : : ; Ak iTnn , and Cx denote hA1 ; : : : ; Ak iTnn . 1 If there is an unstable matrix in Cx , this means that there exists an unstable matrix in A 2 C . By definition of stability, this means that hAiTnn is infinite, and hence C is infinite. Conversely, assume that every idempotent matrix in Cx is stable. We apply Brown’s lemma to the morphism x that sends C to Cx . Since Cx  Tnn , it is finite, and as 1 a consequence also locally finite. Now consider some idempotent matrix A 2 Tnn , 1

672

Thomas Colcombet

let us show that ¹B 2 C j Bx D Aº is locally finite. Consider some finite setX  Tnn such that Xx D ¹Aº. Let m D maxB2X .jjBjj/, and consider some B1 ; : : : ; Bn 2 X ; we have jjB1    Bn jj 6 jj.m  A/    .m  A/jj 6 2m

(the last inequality is from (3) of Lemma 4.7). We obtain that jjBjj 6 2m for all B 2 hX iTnn , and as a consequence hX iTnn is finite. Hence ¹B j Bx D Aº is locally finite. Using Brown’s lemma, we directly obtain that C is locally finite. Since C is generated by finitely many matrices (namely A1 ; : : : ; Ak ), this means that C is finite. From the above lemma, one immediately obtains Theorem 4.5: consider a torsion subsemigroup S of Tnn . Then every idempotent matrix in S is stable. Hence if X is x only contains stable idempotents. By Lemma 4.9, this a finite subset of S , then hXi means that hX i is finite. We conclude that S is locally finite. The lemma also yields a decision procedure: compute the closure hA1 ; : : : ; Ak i, and check whether there is an unstable matrix in this set. We obtain Theorem 4.6. Technically, in this application, we did not use directly the factorisation forest theorem, but rather Brown’s lemma which is one of its consequences. In the next section, we study a generalisation of the above problem, and this time, Brown’s lemma is no longer sufficient. 4.4. The bounded section in the tropical semiring. We have seen in the above section how to decide whether the closure under product of a set of matrices over the tropical semiring is finite. The bounded section problem is a generalisation of this problem, which requires a more subtle analysis. The problem now is not to check whether infinitely many matrices are generated, but more precisely to determine which entries in the matrix are unbounded. Formally, the bounded section problem is the following. Input: A positive integer n, a finite set of matrices X  Tnn , and two n-tuples I; F 2 ¹0; 1ºn .

Output: “Yes” if there is m such that I t AF 6 m for all A 2 hX iTnn ; “no” otherwise. Before presenting a decision procedure for this problem (Theorem 4.10 below), we introduce a related problem, the limitedness problem for distance automata. Distance automata are non-deterministic finite automata in which each transition is labelled by a cost among 0; 1. The cost of a run of such an automaton is the sum of the costs of its transitions. The cost of a word is the minimum cost over all possible runs of the automaton over this input. This value can be 1 if there are no such runs; otherwise it is a non-negative integer. For instance, the following automaton computes the minimal size of a maximal segment of consecutive a’s (i.e., maps an1 ban2    bank to max .n1 ; : : : ; nk /):

18. The factorisation forest theorem

a; b W 0 p

a; b W 0

aW1 bW0

q

673

bW0

r

It does this by guessing the position of the maximal segment of consecutive a’s of shortest size and using state q along this segment, state p before, and the state r after. The corresponding run computes the length of this segment by using cost 1 for each a-transition between q -states. The boundedness problem is the following: Input: A distance automaton A. Output: “Yes” if the function it computes is bounded; “no” otherwise. This problem is close to the original limitedness problem studied by Hashiguchi which asks whether the function is bounded over its domain, i.e., over the words that are mapped to an integer by the automaton (a closer inspection shows simple mutual reductions between the two problems; we do not develop it here). The bounded section problem and the boundedness problem are in fact the same problem. The proof of this equivalence uses the classical argument that weighted automata can be represented by matrices (see Chapter 4). Indeed, given a distance automaton, it is possible to associate with each letter a a transition matrix A over ¹0; 1; 1º whose rows and columns are indexed by Q in the following way. The entry with index p; q of the matrix is 0 if there is a transition from p to q reading letter a with cost 0 in the automaton, the entry is 1 if there is a transition from p to q reading letter a with cost 1 (but none of cost 0), and finally it is 1 if there are no transitions of the automaton at all from p to q while reading letter a. Using this translation, each finite word over a1    a` can be transformed into a sequence of matrices A1 ; : : : ; A` 2 Tnn . One can prove (by induction) that the entry p; q in the product matrix A1    A` has value m if and only if there is a run of the automaton over a1    a` starting in state p , ending in state q , and m is the least cost among all such runs. The entry is 1 if there is no such run. The sets of initial states and final states can be translated into vectors I and F over ¹0; 1º by I.p/ D 0 if p is initial, 1 otherwise, and F .p/ D 0 if p is final, 1 otherwise. It is then easy to see that I t A1    A` F is exactly the value computed by the automaton while reading the word a1    a` . Hence the existence of a bound on the function computed by the automaton has been reduced to the bounded section problem. The converse reduction is similar: there is a straightforward translation from a set of matrices A1 ; : : : ; Ak to a distance automaton over an alphabet of size k (note here that a distance automaton uses only the costs 0 and 1. Therefore the reduction replaces every positive integer by 1. This approximation is valid since we are only interested in boundedness questions.) Such automata are special cases of weighted automata (see Chapter 4) that are weighted over the min-plus semiring. Theorem 4.10 ([24]). The bounded section problem is decidable.

Thomas Colcombet

674

We present a proof based on the algorithm of Leung [29] and the theorem of factorisation forest for establishing its correctness. Simon later gave another proof for Theorem 4.10 using the factorisation forest theorem [40], but the complexity is not as good as the one obtained by Leung (which is optimal). The principal idea of this algorithm is similar to the one for the finite closure problem: one “approximates” the matrices in Tnn by matrices in Tnn . However, checking 1 whether there exist unstable matrices in hX iTnn is now no longer sufficient: indeed, the stability property ensures that no entry in the matrix is unbounded. Here, we are interested in determining which entries are unbounded. For this, we use the stabilisation operation ] introduced by Leung. Given any idempotent matrix M in Tnn , 1 ] it transforms it into a stable matrix M , more precisely the least one (for componentwise order) which is above or equal M . Indeed, when an idempotent matrix A in Tnn is not stable, iterating it yields an infinite set of matrices. Thus, some of its entries take on arbitrary large values when the matrix is iterated. We define Ax ] , the stabilisation of the matrix Ax, to be obtained from the matrix Ax by setting those entries to 1 whose values are unbounded when the matrix is iterated. This matrix happens to be stable, and it essentially represents the result of iterating the matrix A “many times.” For instance, consider the following idempotent matrix A and its iterations: 

0 AD 1

 1 ; 1



0 A D 1 2

 1 ; 2

:::;



0 A D 1 n

 1 ; n

::::

The bottom right entry is the only non-infinity one which tends toward infinity in this sequence. The stabilisation reflects this fact in that the corresponding entry is set to infinity: Ax D



   0 1 0 1 is stabilised into Ax ] D : 1 1 1 1

Formally, given an idempotent M 2 Tnn , the matrix M ] 2 Tnn is defined as follows: 1 1 ]

Mi;j

8 ˆ k . Since the second statement exactly corresponds to the case of a negative answer to the bounded section problem, we obtain a decision procedure for the boundedness problem by taking the set of input matrices X , closing it under product and stabilisation of idempotents, and verifying that Ixt B Fx ¤ 1 for all the obtained matrices B . This completes the proof of Theorem 4.10. This procedure is exponential, but a closer inspection of the structure of hX i] reveals in fact that the algorithm can be performed in PSPACE [29]. This also matches the known lower bound from [30]. The remainder of this section is devoted to the proof of Lemma 4.11. This requires to introduce some notation. Given a 2 T and some k > 1 let aN k 2 T1 be 0 if a D 0, be 1 if 1 6 a 6 k , and be 1 otherwise. The intention is that 1 2 T1 represents a “small” value, while 1 2 T1 represents a “large” value (not necessarily infinity). Seen like this, the mapping which associates aN k with a tells us whether the value a should be considered as small or large, where k denotes the threshold between “small” and “large.” One easily checks that ab 2k 6 aN k bN k 6 ab k . Since this operation is order preserving, this inequality extends to matrices in a natural way: if A; B are matrices over the tropical semiring, then AB 2k 6 Ax k Bx k 6 AB k where x k is extended to matrices component-wise. More generally, A1    Am mk 6 A1 k    Am k 6 A1    Am k . Given matrices A1 ; : : : ; Am 2 Tnn (we also use the same definition for matrices in Tnn ), a path from i0 to i` in A1    Am is a sequence p D i0 ; : : : ; im of elements 1 among 1; : : : ; n such that i D i0 and im D j . Its value v.p/ is the sum .A1 /i0 ;i1 C    C .Am /im 1 ;im . This definition is related to the product of matrices in the following way: .A1    Am /i;j is the minimum value over all paths from i to j in A1 ; : : : ; Am . Lemma 4.12. For all M 2 hXx i] and all k > 1, there exists A 2 hX i such that M 6 Ax k . Proof. The proof is by induction on the number of operations needed to produce the matrix M from matrices in Xx . Fix k . If M 2 Xx , then M D Ax for some A 2 X . Hence M D Ax 6 Ax k (whatever k is). If the induction hypothesis holds for M; N , i.e., there are A; B 2 hX i such that M 6 Ax k and N 6 Bx k , then it holds for MN since AB 2 hX i and MN 6 Ax k Bx k 6 AB k . Finally, the interesting case is when a stabilisation operation is used: the induction hypothesis holds for an idempotent matrix E , and we have to show that is also holds for E ] . Assume there exists B 2 hX i such that E 6 Bx k . Now consider some K sufficiently large (for instance K D .k C 2/n C 2). We claim that E ] 6 Ax k where A D B K (which belongs to hX i). Consider i; j D 1; : : : ; n, and a path p D i0 ; : : : ; iK ] from i to j in B K . We have to prove that Ei;j 6 v.p/ k . Since we already know that k E D E K 6 Ax k    Ax k 6 AK , the only interesting case is for entries in which E ] and E ] differ, i.e., when Ei;j D 1 and Ei;j D 1. By definition of stabilisation, this

Thomas Colcombet

676

means that for all ` D 1; : : : ; n,

either Ei;` D 1;

or

E`;j D 1;

or

E`;` > 1:

(?)

Since we have chosen K sufficiently large, there is some state ` which appears at least k C 2 times among i1 ; : : : ; iK 1 . This induces a decomposition of p into p0 ; : : : ; pkC2 , in which each pm is a path in some B Km : The path p0 is from i to `, the paths p1 to pkC1 are from ` to `, and the path pkC2 is from ` to j . We distinguish three cases depending on (?). If Ei;` D 1, then k k k 0 K0 1 D Ei;` D .E K0 /i;` 6 .Ax k /K i;` 6 v.p0 / 6 v.p/ ; i;` 6 A

from which we deduce that v.p/ > k . The same holds if Ej;` D 1. The third case is when E`;` > 1. Then, the same chain of inequalities yields v.pm / k > 1 for all m D 1; : : : ; k C 1. Hence v.pm / > 1. As a consequence, we have once more v.p/ > k . Corollary 4.13. Statement (1) of Lemma 4.11 implies statement (2). t

Proof. Assume that Ix M Fx D 1, and fix k . By Lemma 4.12, there exists A 2 hX i t t such that M 6 Ax k . Hence 1 D Ix M Fx 6 Ix Ax k Fx 6 I t AF k , i.e., I t AF > k .

The second implication is the more involved one. It amounts to proving the following lemma. Lemma 4.14. There exists k such that for all A 2 hX i, Ax k 6 M for some M 2 hXx i] . Corollary 4.15. Statement (2) of Lemma 4.11 implies statement (1).

Proof of Corollary 4.15. Assume that (2) holds. Let k be the positive integer obtained from Lemma 4.14. Then there is some A 2 hX i such that I t AF > k , i.e., I t AF k D 1. x ] . We obtain 1 D I t AF k D Furthermore, by Lemma 4.11, Ax k 6 M for some M 2 hXi t xk x t x x x I A F 6 I M F . This establishes (1). It remains to prove Lemma 4.14. For the rest of this section, let us say that a set Y  Tnn covers a set X 2 Tnn if there exists k > 1 such that, for all A 2 X , 1 there exists M 2 Y such that Ax k 6 M . In this case, k is called the witness. Using x ] covers hX i.” Call this terminology, Lemma 4.14 can be rephrased simply as “hXi nn two matrices over T 0-equivalent if they coincide on their 0 entries. Call a matrix 0-idempotent if it is 0-equivalent to its square. Lemma 4.16. If Y covers X , and all matrices in X are 0-equivalent and all are 0-idempotents, then hY i] covers hX i.

Proof. Let A1 ; : : : ; An 2 X , and set A D A1    An . We have to prove that there is some M 2 hY i] such that Ax k 6 M , in which k must be constructed independently from A1 ; : : : ; An (and in particular independently from n). We claim that for all k and all idempotents E 2 hY i] , if A1 k 6 E and An k 6 E then Ax 2k 6 E ] (note that we do not make any assumptions on A2 ; : : : ; An 1 here). ] ] Indeed, consider i; j D 1; : : : ; n. If Ei;j D 1, we of course have Ax 2k 6 1 D Ei;j .

18. The factorisation forest theorem

677

] If Ei;j D 0, this means that Ei;j D 0 and, as a consequence, there is a path from i to j in E n with value 0. Since all the 0-entries in E are also 0-entries in each Am , ] the same path can be used in A1 ; : : : ; An . The last case is Ei;j D 1. By definition of stabilisation, this implies that there is some ` such that Ei;` 6 1, E`;` D 0 and E`;j 6 1. Consider the path p D i; `; : : : ; `; j in A1 ; : : : ; An . Since A1 k 6 E and Ei;` 6 1, we have .A1 /i;` 6 k . In the same way .An /`;j 6 k . Furthermore, since E`;` D 0 and using the 0-equivalence assumption, we obtain .Am /`;` D 0 for all m. Hence the value of p is at most 2k . This concludes the claim. Consider now the general case. Let k be the witness that Y covers X and fix some matrices M1 ; : : : ; Mn 2 Y such that Ai k 6 Mi for each i . We choose a sufficiently large K . Given an element N 2 hYx i] , we say that N appears in M1 ; : : : ; Mn between positions m; m0 if N D Mm    Mm0 and m0 m < K . The proof is by induction on the number of idempotents appearing in M1 ; : : : ; Mn . More precisely, we prove that for each i there exist a constant ki such that, if at most i distinct idempotents appear in M1 ; : : : ; Mn , then A1    An ki 6 M for some M 2 hX i] . For i D 0, no idempotents appear in M1 ; : : : ; Mn . This means that n is small, i.e., n < K (indeed, by the theorem of Ramsey or the factorisation forest theorem, every sufficiently long product has to contain an idempotent). We have

A1    An Kk 6 A1    An nk 6 A1 k    An k 6 M1    Mn 2 hX i] :

Now suppose that i > 1 idempotents appear in M1 ; : : : ; Mn . Let E be one of them. We first treat the case where E appears both at the beginning and the end of M1 ; : : : ; Mn , i.e., both between positions 1; m, and between position m0 ; n. There are two cases. If m C 1 > m0 , the two appearances of E overlap or are contiguous. In this case, by definition of appearance, n 6 2K and, as in the case i D 0, we obtain that A1    An 2Kk 6 N for some N 2 hX i] . Otherwise, we know that A1    Am Kk 6 E , and Am0    An Kk 6 E . Hence we can use our first claim on the following sequence of matrices: .A1    Am /; AmC1 ; : : : ; Am0

1 ; .Am0

   An /;

and we obtain A1    An 2Kk 6 E ] . The general case is now easy. Consider a sequence A1 ; : : : ; An . It can be decomposed into three sequences U D .A1 ; : : : ; Am

1 /;

A1    Am 2KkC2ki

1

V D .Am ; : : : ; Am0

1 /;

W D .Am0 ; : : : ; An /;

such that E does not appear in U nor W , but both at the beginning and the end of V . According to the induction hypothesis on U and W , there exists M; M 0 2 hX i] such that we have A1    Am 1 ki 1 6 M and Am0    An ki 1 6 M 0 . Using the previous case with E appearing at the beginning and the end of V , we also have Am    Am0 1 2Kk 6 N for some N 2 hX i] . Overall, 6 A1    Am

1

ki

1

6 MNM 0 2 hX i] :

Am    Am0

1

2Kk

This establishes the induction hypothesis with ki D 2Kk C 2ki

Am0    An ki 1.

1

678

Thomas Colcombet

We can now conclude the proof of Lemma 4.11 using the factorisation forest theorem. Proof of Lemma 4.11. Let P be the set of all subsets Y  hX i that are covered by hXx i] . We also say that a set covered by hX i has property P . Consider the morphism ' mapping each element of hX i to its 0-equivalence class. Let us show that one can apply Theorem 4.2 to hX i, which is generated by X , the morphism being ' and the family P . We prove the following. x ] , it is clear that the same holds for every subset of Y . 10 . If Y is covered by hXi 00 1 . Let k be the maximum over jjAjj for all A 2 X (it exists since X is finite). Then, we have Ax k 6 Ax 2 Xx for all A 2 X . Hence X is covered by Xx . 2. If Y; Z are covered by hXx i] with respective witnesses kY and kZ , then Y [ Z x ] , taking as witness max.kY ; kZ /. is covered by hXi 3. If Y; Z are covered by hXx i] , witnessed by kY and kZ , then we have AB kA CkB 6 x ]. Ax kA Bx kB . Hence kA C kB is a witness that Y Z is covered by hXi ] 4. Finally, suppose that Y is covered by hXx i and that '.Y / D ¹Eº an idempotent E . Since Y is covered by hXx i] , Lemma 4.14 implies that hY i is covered x ] i] , i.e., by hXx i] . by hhXi x ] covers hX i. This concludes the proof Overall, by Theorem 4.2, we conclude that hXi of Lemma 4.14, and hence of Theorem 4.10. 4.5. Polynomial closure. Our last algebraic application of the factorisation forest theorem concerns the problem of finding characterisations of families of regular languages. In Chapters 16 and 17 this topic is treated much more deeply. The factorisation forest theorem is used in this context to obtain characterisations (possibly non-effective) of the polynomial closure of a class of languages. Given a class of languages L, a language K belongs to its polynomial closure Pol.L/ if it is a finite union of languages of the form L0 a1 L1    an Ln , where each Li belongs to L and the ai s are letters. In general, the idea is to transform a characterisation of L (by profinite equations, identities, etc.) into another one for Pol.L/. The first use of this technique appear in Pin and Weil [32] for positive varieties, and the most general and recent such result treats the case of the polynomial closure of any lattice of regular languages [7]. We present here the simplest among the results of this kind: the characterisation of polynomial languages. This corresponds to the case when L contains the languages of the form B  where B is any set of letters. The interest of this particular case is that the family of languages obtained in this way coincides with the ones definable in †2 , i.e., the fragment of first-order logic consisting of formulas which take the form of a block of existential quantifiers, followed by a block of universal quantifiers, followed by a quantifier-free formula. A monomial language is a language of the form A0 a1 A1    an An , where a1 ; : : : ; an are letters and A0 ; : : : ; An are sets of letters. For instance ¹"º is the monomial language defined by ; , and ¹aº is defined as ; a; . A polynomial language is a finite union of monomial languages.

18. The factorisation forest theorem

679

Theorem 4.17 ([32]). A language is a polynomial language if and only if its syntactic ordered monoid satisfies for all idempotent e .

e > eh¹s j e 6J sºie

The exact content of the inequality e > eh¹s j e 6J sºie may be at first sight difficult to grasp. To give some more intuition, let us make the following remark before entering the proof. Remark 4.18. Let alph.u/ denote the set of letters occurring in a word u 2 A . Then h¹s j a 6J sºi D ¹f .u/ j alph.u/  alph.f

1

.a//º

for all a 2 M , i.e., this set represents the possible values of all words that consists of letters that could appear in a word evaluating to a. This means that the central condition e > eh¹s j e 6J sºie tests the occurring letters in an idempotent context. Now consider the particularly simple monomial language B  for some B  A. Clearly, the membership to this language has only to do with the occurring letters; more precisely, if alph.u/  alph.v/ and v 2 B  , then u 2 B  . The above property e > eh¹s j e 6J sºie is a form of generalisation of this remark to polynomial languages: in particular it implies that whenever u is an idempotent word, if xuy is in the language, xuvuy is also in the language for all v such that alph.v/  alph.u/. Proof. From left to right. Assume that L is a polynomial language and let k be the maximal degree of a monomial it contains. Consider now an idempotent word u and assume xuy 2 L for some words x; y , then xukC1 y 2 L (by idempotence of u). This word belongs to one of the monomials of L, say K D A0 a1    a` A` , with ` 6 k . Since there are k C 1 occurrences of the word u in v , at least one is contained in one of the Ai . This means that for some s such that 0 6 s 6 k , we have xus 2 A0 a1    ai Ai , u 2 Ai , and uk s 2 Ai ai C1    a` A` . Now let w be any word such that alph.w/  alph.u/. From the above decomposition, we have that xus uwuuk s y also belongs to K and hence to L. Using the idempotence of u, this is also the case for xuwuy . From right to left. This direction uses the factorisation forest theorem. Let P  P.A / denote the set

¹X  A j for every a 2 M; there exists a polynomial language Ka such that X \ f 1 .a#/  Ka  f 1 .a#/º

We will apply Theorem 4.2 to the family P to show that A 2 P . By the definition of P , this means that, for every a 2 M there exists a polynomial Ka such that Ka D f 1 .a#/. Since polynomial languages are closed under finite union of polynomials, it follows that, for every downward closed set I  M , f 1 .I / is a polynomial language. This concludes this direction of the proof. It remains to show that Theorem 4.2 can indeed be applied to P . It is clear that A 2 P since every finite language is a polynomial language. It is also clear from the definition that P is closed under taking subsets, and under unions (since

Thomas Colcombet

680

polynomial languages are closed under unions). Now consider A; B in P . Let us show that AB 2 P . Let .Kx /x2M (resp. .Kx0 /x2M ) be the polynomial languages witnessing the fact that A 2 P (resp. B 2 P ). Now consider the polynomial language X KD Kx Ky0 : xy6a

By construction, f .u/ 6 a for every word u 2 K . Hence K  f 1 .a#/. Now consider u 2 .AB/ \ f 1 .a#/. Since u 2 AB , u can be decomposed as u D vw with v 2 A, w 2 B . By hypothesis, v 2 A, and hence v 2 Kf .v/ . Similarly w 2 Kf0 .w/ . We get u D vw 2 Kf .v/ Kf0 .w/ . Since furthermore u 2 f 1 .a#/, we have f .v/f .w/ D f .u/ 6 a and, as a consequence, Kf .v/ Kf0 .w/  K . Overall u 2 K . It remains to check the last condition. Assume A 2 P and f .A/ D ¹eº for some idempotent e . For x 2 M , let Kx be the polynomial language witnessing that A 2 P . Consider the following polynomial: K 0 D Ke C Ke alph.f

1

.e// Ke :

By the above remark, we know that f .K 0 /  f 1 .e#/. Conversely, assume that u 2 AC . Then u can be written as u1    un with ui 2 A for all i . Clearly, if n D 1, then u 2 Ke  K 0 . Otherwise, since u2 ; : : : ; un 1 belong to alph.f 1 .e//, we have u 2 K 0 . Hence AC  K 0  f

C

1

.e#/:

Let us prove that A 2 P using the above inequalities. Consider some a 2 M . If e 62 f 1 .a#/, set Ka D ; since f .AC / D ¹eº we have AC \ f 1 .a#/ D ;  Ka  f 1 .a#/. Otherwise, set Ka D K 0 , and we have by the above inequalities AC \ f

1

.a#/  K 0 D Ka  f

1

.e#/  f

1

.a#/:

Hence AC 2 A, and Theorem 4.2 can indeed be applied to P .

5. Variants of the factorisation forest theorem This section present two variant of the theorem of the factorisation forest theorem. The first one generalises the factorisation forest theorem to infinite words (§ 5.1). This version had applications to the MSO theory of countable linear orderings. The second variant establishes the existence of weaker forms of factorisations that can be constructed in a deterministic manner (§ 5.2). This had applications to trees. 5.1. Infinitary variant. So far, we have seen the factorisation forest theorem for finite linear orderings/finite words. In fact, the finiteness assumption is not so relevant for the result. For the presentation of infinitary variants, the machinery of splits is easier to use than factorisation trees. We only consider splits in this section. From what we have seen so far, we can already deduce an infinitary variant of the result. Consider the linear ordering .N; 0 1 0 A D 1 0! 2 †20 \ …02 D 02 .

1.2.2. The Hausdorff–Kuratowski hierarchy. The Borel hierarchy was then refined, first by Hausdorff, then Kuratowski, revealing !1 - new layers between any two of its consecutive levels. This yields the Hausdorff–Kuratowski hierarchy, whose idea is to rearrange the 0˛C1 -sets into differences of †˛0 -sets. Given any countable ordinal ˇ > 0, 0 0 a set X belongs to S exists an -increasing sequence .X /  X , where  … par.ˇ/ holds when  and ˇ have different parities and  is less than 6 ˇ . 6 The parity of an infinite ordinal is the parity of the remainder of its division by ! . So for instance: ! , !  2, ! 2 are all even, but ! C 1, !  3 C ! C 5 and ! 2 C 3 are all odd.

Jacques Duparc

700

Example 1.4. Given a sequence X0  X1  X2  X3  X4 of †˛0 -sets, the set X D .X4 X X3 / [ .X2 X X1 / [ X0 belongs to D5 .†˛0 /. X0 X1 X2 X3

X X4

Figure 2. A difference of five sets

Example 1.5. A 0.A X ¹0º/! 2 D2 .†10 /, for it can be written as X1 X X0 where X0 D 1 0A! and X1 D A 0A 0A! are both open sets.

Proposition 1.1 (Hausdorff and Kuratowski). Given any countable ordinal ˛ > 0, we have [ Dˇ .†˛0 / D 0˛C1 : 0w X1 >w X2 >w    >w Xn >w XnC1 >w    :

The reader interested in the proof may consult [30], [45], [46], and [47]. As a consequence of Lemmas 1.8 and 1.9, on Borel sets, the Wadge order 6w becomes a well-order on the quotient by the equivalence relation X  Y () X  Y or X  Y { . Before we start analyzing this well-order, we distinguish between two kinds of Borel sets: the self dual (s.d.) sets that satisfy X 6w X { (or equivalently X w X { ) and the other ones called non-self dual (n.s.d.). Example 1.12. 1. 0A! ; 0A 0A! [ 10! are self dual. 2. ;; A! ; A 0A! ; .A 0/! are non-self dual.

A similar proof as the one of Lemma 1.9 gives the following characterisation.

Lemma 1.10 (Martin). Given X  A! Borel, X Šw X { () 9x 2 A! 8n 2 N .x n/

1

X w X:

We recall that for u 2 A and X  A! , u 1 X stands for ¹y 2 A! j uy 2 X º:

12 The reader should also keep in mind that, by the very definition of the Wadge order, Y 6w X { and Y { 6w X are equivalent statements.

708

Jacques Duparc

For a proof, see [28] and [46]. Given X  A! we use the notation 13 ˙X for 0X [ .A X ¹0º/X {. The reader may easily verify that ˙X is s.d. The previous result brings a perspicuous insight on the s.d. sets. Proposition 1.11. Given any Borel set X  A! (A finite), we have

X 6w X { () 9Y 0 X n , for all X  M . This includes the semirings P .A / and P .c.A //. The semiring Rel A of relations on A is a Conway semiring in the same way. When r 2 Rel A , the relation r  is the reflexive-transitive closure of r . In § 5 and § 6 we will discuss complete and continuous semirings that give rise to Conway semirings (and in fact iteration semirings; see below). The semirings of regular languages or rational power series defined below are also Conway semirings (or partial Conway semirings). Any semiring S is a partial Conway semiring with D.S / D ¹0º and 0 D 1. We mention a particular 4-element commutative idempotent Conway semiring, called “Conway’s leap” in [101]. Its elements are 0; 1; a; a , ordered as indicated, whose sum operation is maximum and whose multiplication operation is also idempotent with aa D a . It holds that 1 D 1 and a D a . An important class of partial Conway semirings arises by considering semirings in which certain linear equations have unique solutions. Definition 2.3. Suppose that S is a semiring with a distinguished ideal I . We say that S is a partial iterative semiring if for each a 2 I and s 2 S , there is a unique solution to the equation x D ax C s . A morphism .S; I / ! .S 0 ; I 0 / of partial iterative semirings is a semiring morphism hW S ! S 0 with h.I /  I 0 .

For example, when the semiring S is a complete metric space, and for any a 2 I and s 2 S , the function x 7! ax C s is a proper contraction, then .S; I / is a partial iterative semiring, since by Banach’s fixed point theorem [5], any proper contraction has a unique fixed point. Thus, when .S; I / is a partial iterative semiring and a 2 I and b 2 S , then the equation x D ax C b has a b as its unique solution, where a is the unique solution of x D ax C 1. Thus, we obtain an operation  W I ! S . The following proposition, see [21], connects partial iterative semirings and partial Conway semirings: Theorem 2.1. Equipped with the star operation defined above, each partial iterative semiring .S; I / is a partial Conway semiring. Moreover, any morphism of partial iterative semirings is a partial Conway semiring morphism. In certain situations, a partial iterative semiring can be extended to a Conway semiring, see [17] and [18]. Theorem 2.2. Suppose that S is a partial iterative semiring with distinguished ideal I and S0 is a subsemiring of S which is a Conway semiring. Suppose that S is the direct sum of S0 and I , so that each element of S can be written in a unique way as a sum s0 C a with s0 2 S0 and a 2 I . Then there is a unique way of turning S into a Conway semiring whose star operation extends the one defined on S0 . We give an outline of the proof. As explained above, in addition to the star operation on S0 , there is another star operation defined on I that provides unique solutions to equations of the sort x D ax C 1, for all a 2 I . When a is in S0 \ I ,

20. Equational theories for automata

733

the two star operations give the same result since the fixed point identity holds in S0 . If s 2 S with s D s0 Ca, where s0 2 S0 and a 2 I , we are forced to define s  D .s0 a/ s0 , where s0 is taken in S0 and .s0 a/ exists since s0 a is in I . It can be verified that the new star operation extends the ones defined on S0 and I , and that the sum star and product star identities hold. We will use Theorem 2.1 and Theorem 2.2 to show that semirings of formal series over a semiring are partial Conway semirings, and semirings of formal series over Conway semirings are Conway semirings. Problem 1. Is the equational theory of Conway semirings decidable? How about the equational theory of ! -idempotent Conway semirings? Give a concrete description of the free Conway semirings (satisfying 1 D 1). The initial Conway semiring was described in [30]. One can associate an identity with any finite monoid or finite automaton, see [30], [42], and [87]. It is known that an identity associated with a finite monoid or automaton holds in all Conway semirings satisfying 1 D 1 if and only if the monoid or automaton is aperiodic. Conway semirings are closely related to the Conway theories or Conway algebras of [17]. In [11], it is shown that the equational theory of Conway algebras over any finite signature is in PSPACE; moreover, it is PSPACE-complete in most cases. Conway semirings and partial Conway semirings are closed under several constructions that are essential for developing a theory of automata based on these structures. We will discuss some of these constructions in the remaining part of this section. 2.1. Matrix semirings over Conway semirings. It is well known that for each n > 0, the set of all n  n-matrices over S , equipped with the pointwise sum operation and the usual matrix multiplication, is itself a semiring denoted S nn . The zero element of S nn is the zero matrix denoted just 0, and the multiplicative unit is the n  n diagonal matrix En whose diagonal entries are all equal to 1. Suppose that S is a partial Conway semiring with D.S / D I . Then for each n > 0, the collection I nn of all n  n matrices all of whose entries are in I is an ideal of the matrix semiring S nn . Following [30], we define a star operation  W I nn ! S nn by induction on n. When n D 0, I nn contains a single matrix and the star operation is trivial. When n D 1, we use the star operation on I . Assuming that n > 1 we write nn Y , define n D k C 1. For a matrix M D X U V in I  X U

Y V



 ˛ D

 ˇ ; ı

where X 2 S kk , Y 2 S k1 , U 2 S 1k and V 2 S 11 , and where ˛ D .X C Y V  U /

D ıUX 

ˇ D ˛Y V  ; ı D .V C UX  Y / :

(3)

Zoltán Ésik

734

Theorem 2.3. When S is a partial Conway semiring with distinguished ideal D.S / D I , then for each n > 0, S nn is a partial Conway semiring with distinguished ideal D.S nn / D I nn . In fact, .X Y / D En C X.YX / Y

holds for all rectangular matrices X 2 I nm and Y 2 I mn , m; n > 0. Moreover, the matrix star identity (3) holds for all decompositions of a matrix M 2 S nn into four blocks X; Y; U; V such that X and V are square matrices. A proof of Theorem 2.3 is outlined in [30]. Complete proofs have been supplied in [51] and [85]. In [17], Theorem 2.3 is derived from a more general fact for Conway theories. For partial Conway semirings, the theorem appears in [21]. However, the proof is the same as for Conway semirings. Suppose that S is a partial iterative semiring. Then S is a partial Conway semiring, so that by (3) there is a star operation  W I nn ! S nn for each n > 0. On the other hand, by the next result, S nn is also a partial iterative semiring with distinguished ideal I nn . Thus, there is another star operation on matrices: for any M 2 I nn , M  is the unique solution of the equation X D M X C En in the semiring S nn . However, these two operations are the same, since for M 2 I nn the unique solution of the equation X D M X C En may be computed by the matrix star formula (3). The following result from [21] can be extracted from the proof of the matrix extension theorem in [17] and [18]. Theorem 2.4. When S is a partial iterative semiring with distinguished ideal I , then for each n > 0, S nn is a partial iterative semiring with distinguished ideal I nn . Moreover, the star operation on I nn determined by the partial iterative semiring structure is the same as the star operation determined by the matrix star formula (3). In short, if certain linear fixed point equations have unique solutions, then so do appropriate systems of linear fixed point equations whose unique solutions may be calculated using the matrix star formula. For later use we mention that the permutation identity (4) holds in all partial Conway semirings. Proposition 2.5. Suppose that S is a partial Conway semiring and M 2 D.S /nn , where n > 1. Let Œn denote the set ¹1; : : : ; nº and let  be any bijective function Œn ! Œn with associated 0-1-matrix, also denoted  . Then .M  T / D M   T ;

(4)

where  denotes the transpose of  . T

2.2. Formal series. A partial monoid is a structure M D .M; ; 1/, where  is a partially defined associative product operation on M with unit element 1. Thus, for all x; y; z 2 M , .xy/z is defined if and only if x.yz/ is, and if both are defined then .xy/z D x.yz/. Moreover, x1 and 1x are always defined and x1 D x D 1x , for all x 2 M . Clearly, a partial monoid is a monoid if and only if the product of any two elements is defined. We say that a partial monoid M is finitely factorisable if each

20. Equational theories for automata

735

x 2 M can be written in a finite number of different ways as a product x1    xk with xi ¤ 1 for all i . Call an element x 2 M irreducible if x ¤ 1 and x has no factorisation x1    xk with k > 1 and xi ¤ 1 for all i . It is not difficult to show that when M is finitely factorisable then each x 2 M is a finite, possibly empty product of irreducible elements. Clearly, every free monoid A or free commutative monoid c.A / is finitely factorisable. Let A and D be sets, where A is the set of “action symbols” and D is the set of “data.” The set of nonempty alternating words is D.AD/C . (A similar notion may be found in [26], for example.) When ud and d 0 v are in D.AD/C with d; d 0 2 D , define the product .ud /.d 0 v/ exactly when d D d 0 . Moreover, in this case let .ud /.d 0 v/ D udv . By adding a unit element 1 to D.AD/C , we obtain a finitely factorisable partial monoid denoted .A; D/ .

Definition 2.4. Let S be a semiring and M a finitely factorisable partial monoid. The set of (formal) series S hhM ii is the P set of functions sW M ! S . As usual, we express a series s as the formal sum s D m2M .s; m/m and call s.m/ D .s; m/ 2 S the coefficient of m. We turn S hhM ii into a semiring with sum and product s1 C s2 D s1 s2 D

X

..s1 ; m/ C .s2 ; m//m;

m2M

X

X .s1 ; m1 /.s2 ; m2 /m;

m2M m1 m2 Dm

for all s1 ; s2 2 S hhA ii. We define the constant 0 as the series whose coefficients are all 0, and 1 as the series such that the coefficient of 1 2 M is 1 and all other coefficients are 0. The support of a series s 2 S hhM ii is supp.s/ D ¹m j .s; m/ ¤ 0º. A series is proper if its support does not contain 1. We let S hM i denote the subsemiring of S hhM ii formed by all series having a finite support. Note that S may be identified with the subsemiring of S hhM ii consisting of those series whose support is the set ¹1º. Also, each m 2 M may be identified with the series s with .s; m/ D 1 and .s; m0 / D 0 for all m0 ¤ m. When mm0 is not defined in M , then mm0 is 0 in S hhM ii. When M D A , for some set A, S hM i is called a polynomial semiring. Each language L  A may be identified with its characteristic series s 2 BhhA ii with .s; u/ D 1 if and only if u 2 L, for all u 2 A . Since the sum and product operations in BhhA ii correspond to union and concatenation in P .A /, the semirings P .A / and BhhA ii are isomorphic. A similar fact holds for commutative languages in c.A /. If s 2 S hhM ii is proper, where M is finitely factorisable, then for any r 2 S hhM ii, the equation x D sx C r has s  r as its unique solution, where s  D 1 C s C s 2 C    . This infinite sum exists since s is proper and M is finitely factorisable. As a corollary of Theorem 2.1 we have:

Zoltán Ésik

736

Theorem 2.6. Suppose that S is a semiring and M is a finitely factorisable partial monoid. Let P denote the ideal of proper series in S hhM ii. Then .S hhM ii; P / is a partial iterative semiring and thus a partial Conway semiring. In particular, .S hhA ii; P / and .S hhc.A /ii; P / are partial iterative semirings, for all sets A, where P is the ideal of proper series. Below when we say that S hhM ii is a partial iterative or partial Conway semiring, where S is a semiring and M is a finitely factorisable monoid, we will always understand that the distinguished ideal is that of the proper series. Using Theorem 2.2 we have the following: Corollary 2.7. Suppose that S is a Conway semiring and M is a finitely factorisable partial monoid. Then there is a unique way of extending the star operation on S to S hhM ii such that S hhM ii becomes a Conway semiring. This unique star operation is given by X .s  ; x/ D .s; 1/ .s; x1 /.s; 1/    .s; 1/ .s; xn /.s; 1/ x1 xn Dx; xi ¤1

for all x 2 M . See also [17], [18], and [47]. In particular, when S is a Conway semiring, S hhA ii and S hhc.A/ ii are also Conway semirings, for any set A. This applies to the semirings B and N1 , for example. 2.3. Duality. When S is a semiring, its dual semiring S d is obtained by reversing the multiplication operation. Thus, in S d , the product of x and y is yx , the product of y and x in the semiring S . An important property of (partial) Conway semirings is that they are closed under forming duals. We do not know whether the dual of a partial iterative semiring is also a partial iterative semiring (with the same distinguished ideal). Below we will call a partial iterative semiring S with distinguished ideal I a symmetric partial iterative semiring if S d is also a partial iterative semiring with the same ideal. Every partial iterative semiring S hhM ii, where S is any semiring and M is a finitely factorisable partial monoid, is a symmetric partial iterative semiring.

3. Automata in Conway semirings In this section, our aim is to establish Kleene’s theorem in the axiomatic setting of partial Conway semirings. The main ideas used below were already present in [30], where it was essentially proved that Kleene’s theorem holds in ! -idempotent Conway semirings. The setting of partial Conway semirings makes the result applicable in situations where there is no totally defined star operation. In this section, we suppose that S is a partial Conway semiring, S0 is a subsemiring of S and I0  D.S /. We will define (finite) automata-recognisable and rational elements in S over .S0 ; I0 /, and prove that these two notions are equivalent.

20. Equational theories for automata

737

Definition 3.1. An automaton in S over .S0 ; I0 / is a triplet A D .˛; M; ˇ/ consisting of an initial vector ˛ 2 S01n , a transition matrix M 2 S0 hI0 inn , where S0 hI0 i is the set of all finite linear combinations s1 a1 C    C sk ak of elements ai 2 I0 with coefficients si 2 S0 , and a final vector ˇ 2 S0n1 . The integer n > 1 is called the dimension of A. The behaviour of A is jAj D ˛A ˇ . Two automata are equivalent if they have the same behaviour. We say that s 2 S is recognisable over .S0 ; I0 / if s is the behaviour of an automaton over .S0 ; I0 /. The set of all recognisable elements is denoted RecS .S0 ; I0 /. For example, when S is the partial Conway semiring NhhA ii for a set A, S0 is N and I0 D NhAi is the set of all polynomials whose support is included in A, then an automaton A D .˛; M; ˇ/ is just a finite weighted automaton over the alphabet A with weights in the semiring N, and its behaviour is the series recognised by A. This holds because M  D En C M C M 2 C    , where the infinite sum exists since the entries of M are proper and A is finitely factorisable. When S is the Conway semiring BhhA ii, S0 D B and I0 D BhAi, then an automaton A D .˛; M; ˇ/ is just an ordinary nondeterministic finite automaton with a set of initial and a set of final states. The behaviour of A is the (characteristic series) of the language recognised by A. See [12], [40], [50], [89], [108], and [110]. Next we define rational elements. Let Rat0S .S0 ; I0 / denote the smallest set containing I0 [ ¹0º which is closed under the rational operations C, , C and left and right multiplication with elements of S0 . Note that Rat0S .S0 ; I0 /  D.S /. Definition 3.2. We say that s 2 S is rational over .S0 ; I0 / if s D s0 C a for some s0 2 S0 and a 2 Rat0S .S0 ; I0 /. We let RatS .S0 ; I0 / denote the set of rational elements over .S0 ; I0 /. When S D NhhA ii, S0 D N and I0 D NhAi, then RatS .S0 ; I0 / consists of the rational series over A with coefficients in N, see [12], [40], [89], and [108]. When S is the Conway semiring BhhA ii, S0 D B and I0 D BhAi, then RatS .S0 ; I0 / consists of the (characteristic series of) regular subsets of A , and similarly for S D Bhhc.A /ii with I0 D BhAi. The following facts hold.  RatS .S0 ; I0 / contains S0 and is closed under sum and product. Moreover, it is closed under (the partially defined) star operation if and only if it is closed under the plus operation.  RatS .S0 ; I0 / is contained in the least subsemiring of S containing S0 and I0 which is closed under star.  If either S D D.S / and S0 is closed under star, or 8s0 2 S0 8a 2 D.S /

.s0 C a 2 D.S / H) s0 D 0/;

then RatS .S0 ; I0 / is closed under star. Moreover, in either case, RatS .S0 ; I0 / is the least subsemiring of S containing S0 and I0 which is closed under star.

Zoltán Ésik

738

Thus, if each s 2 S has at most one decomposition s D s0 C a with s0 2 S0 and a 2 D.S /, then RatS .S0 ; I0 / is closed under star. In the proof of our Kleene theorem, Theorem 3.2, we will make use of the following fact which is immediate from the matrix star formula (3). Lemma 3.1. Suppose that each entry of the n  n matrix M is in Rat0S .S0 ; I0 /. Then the same holds for the matrix M C . Theorem 3.2. Suppose that S is a partial Conway semiring, S0 is a subsemiring of S and I0  D.S /. Then RecS .S0 ; I0 / D RatS .S0 ; I0 /.

Proof. Let A D .˛; M; ˇ/ be an automaton over .S0 ; I0 /. Then jAj D ˛M  ˇ D ˛ˇ C ˛M C ˇ . Clearly, ˛ˇ 2 S0 . By the previous lemma, each entry of M C is in Rat0S .S0 ; I0 /. Since Rat0S .S0 ; I0 / is closed under left and right multiplication with elements of S0 and since Rat0S .S0 ; I0 / is closed under sum, it follows that ˛M C ˇ is in Rat0S .S0 ; I0 /. Thus, jAj is the sum of an element of S0 and an element of Rat0S .S0 ; I0 /, showing that jAj is in RatS .S0 ; I0 /. This proves that RecS .S0 ; I0 /  RatS .S0 ; I0 /. In order to prove the reverse inclusion, as a first step we show that for each s 2 Rat0S .S0 ; I0 / there exists an automaton A over .S0 ; I0 / whose behaviour is s such that the product of the initial and the final vector of A is 0. Assume that s D 0. Then consider the automaton A0 D .0; 0; 0/ of dimension 1. We have that jA0 j D 0. Next let s D a for some a 2 I0 . Then define the following automaton Aa of dimension 2:      0 a 0 Aa D .1 0/; ; : 0 0 1 We have

 jAa j D .1

0/

   1 a 0 D a: 0 1 1

In the induction step there are five cases to consider. Suppose that s D s1 C s2 or s D s1 s2 such that there exist automata Ai D .˛i ; Mi ; ˇi / over .S0 ; I0 / with jAi j D si satisfying ˛i ˇi D 0, i D 1; 2. We construct automata A1 C A2 and A1  A2 defining s1 C s2 and s1 s2 , respectively. Let      M1 0 ˇ A1 C A2 D .˛1 ˛2 /; ; 1 ˇ2 0 M2 and      M1 ˇ1 ˛2 M2 ˇ ˛ ˇ A1  A2 D .˛1 0/; ; 1 2 2 : 0 M2 ˇ2 Then

 jA1 C A2 j D .˛1

˛2 /

  M1 0

0 M2

D ˛1 M1 ˇ1 C ˛2 M2 ˇ2 D jA1 j C jA2 j;

  ˇ1 ˇ2

20. Equational theories for automata

and

 jA1  A2 j D .˛1

0/

  M1 0

M1 ˇ1 ˛2 M2C M2

739

  ˇ1 ˛2 ˇ2 ˇ2

D ˛1 M1 ˇ1 ˛2 ˇ2 C ˛1 M1 ˇ1 ˛2 M2C ˇ2

D ˛1 M1 ˇ1 ˛2 M2 ˇ2 D jA1 j  jA2 j:

Also,

.˛1

˛2 /.ˇ1

and .˛1

0/.ˇ1 ˛2 ˇ2

ˇ2 /T D ˛1 ˇ1 C ˛2 ˇ2 D 0 ˇ2 /T D ˛1 ˇ1 ˛2 ˇ2 D 0:

Next we show that when s D r C for some r which is the behaviour of an automaton A D .˛; M; ˇ/ over .S0 ; I0 / such that ˛ˇ D 0, then s is the behaviour of an automaton AC over .S0 ; I0 /. Since jAj D ˛M  ˇ D ˛ˇ C ˛M C ˇ D ˛M C ˇ;

r D ˛M  ˇ D ˛M C ˇ . Now let

AC D .˛; M C ˇ˛M; ˇ/:

By .M C ˇ˛M / D M  .ˇ˛M C / , we have

jAC j D ˛M  .ˇ˛M C / ˇ D ˛M C ˇ.˛M C ˇ/ D .˛M C ˇ/C D jAjC D s:

By assumption, we have that ˛ˇ D 0. Last, if s D jAj and s0 2 S0 , where A D .˛; M; ˇ/ is an automaton over .S0 ; I0 / with ˛ˇ D 0, then s0 s D js0 Aj and ss0 D jAs0 j where s0 A D .s0 ˛; M; ˇ/ and As0 D .˛; M; ˇs0 /. Also .s0 ˛/ˇ D ˛.ˇs0 / D 0. We have thus shown that Rat0S .S0 ; I0 /  RecS .S0 ; I0 /. Finally, if s 2 RatS .S0 ; I0 /, so that s D s0 C r for some s0 2 S0 and r 2 Rat0S .S0 ; I0 /, then there is an automaton A D .˛; M; ˇ/ over .S0 ; I0 / whose behaviour is r . Define     0 0 B D .s0 ˛/; ; .1 ˇ/ : 0 M Then

 jBj D .s0

˛/



1 0

0 M

  1 D s0 C ˛M  ˇ D s0 C r D s: ˇ

Remark 3.3. We remark that any automaton A D .˛; M; ˇ/ over .S0 ; I0 / has an equivalent automaton whose initial vector has a single nonzero component which is 1. For example, the automaton A D .˛; M; ˇ/ is equivalent to      0 ˛M ˛ˇ B D .1 0/; ; : 0 M ˇ

740

Zoltán Ésik

Corollary 3.4. Suppose that S is a Conway semiring, S0 is a Conway subsemiring of S and I0  S . Then RecS .S0 ; I0 / D RatS .S0 ; I0 / is the least Conway subsemiring of S which contains S0 [ I0 . In this case we may modify the definition of an automaton by allowing transition matrices M 2 S nn to have entries in S0 hI0 i C S0 , since we may write M as the sum of a matrix M0 2 S0nn with a matrix N 2 S0 hI0 inn and apply the sum star identity M  D .M0 N / M0 to conclude that .˛; M; ˇ/ is equivalent to .˛; M0 N; M0ˇ/. Corollary 3.5. Suppose that S is a partial Conway semiring, S0 is a subsemiring of S and I0  D.S /. Suppose that whenever s0 C a 2 D.S / for some s0 2 S0 and a 2 D.S /, then s0 D 0. Then RecS .S0 ; I0 / D RatS .S0 ; I0 / is the least partial Conway subsemiring of S which contains S0 [ I0 .

The case when the partial Conway semiring is a semiring of series deserves special attention. Let S be a semiring and M a finitely factorisable partial monoid. Consider the partial Conway semiring S hhM ii. Recall that the star operation is defined on the proper series and that S can be identified with a subsemiring of S hhM ii. Let I0 be a subset of the set of irreducible elements of M .

Corollary 3.6. Under the previous assumptions, RecShhM ii .S; I0 / D RatShhM ii .S; I0 /

is the least partial Conway subsemiring of S hhM ii which contains S [ I0 . In particular, when M D .A; D/ and I0 is a subset of the irreducible elements of M , then RecShh.A;D/ii .S; I0 / D RatShh.A;D/ ii .S; I0 /. When S is a semiring and A is any set, we denote RatShhA ii .S; A/ by S rat hhA ii and RecShhA ii .S; A/ by S rec hhA ii. We adopt similar notation for RatShhc.A /ii .S; A/ and RecShhc.A /ii .S; A/.

Corollary 3.7. Suppose that S is a semiring and A is a set. Then S rat hhA ii is the least partial Conway subsemiring of S hhA ii containing S [ A. Moreover, S rat hhA ii is a symmetric partial iterative semiring with star operation defined on the proper rational series and S rat hhA ii D S rec hhA ii.

Recall from Corollary 2.7 that when S is a Conway semiring, then so is S hhA ii, for any set A.

Corollary 3.8. Let S be a Conway semiring. Then S rat hhA ii is the least Conway subsemiring of S hhA ii containing S [ A. Moreover, S rat hhA ii D S rec hhA ii.

Similar facts hold for c.A /. Corollary 3.7 is an instantiation of Corollary 3.6. Further results that can be derived from Corollary 3.7 include the Kleene theorems for timed automata of [25, 39] and [48], Part I. Recall that the Conway semirings BhhA ii and P .A / are isomorphic. Similarly, Bhhc.A /ii and P .c.A // are also isomorphic. A language in A (or commutative language in c.A /) is called regular if its characteristic series is rational. Regular languages in A form a Conway semiring that we denote by RegA . Similarly, commutative regular languages in c.A / form a Conway semiring CReg A .

20. Equational theories for automata

741

4. Iteration semirings Conway semirings provide an equational framework for Kleene’s theorem. The surprising strength of the Conway semiring (or Conway theory [17]) identities can be demonstrated in various different contexts including Hoare logic [15] and [79] and Parikh’s theorem [2] and [67]. However, the Conway semiring identities do not account for certain important constructions on automata such as the power set construction or minimisation, or the Krohn–Rhodes decomposition. In order to be able to carry out these constructions within equational logic and to obtain a complete description of the valid identities between rational expressions (i.e., terms in the language of  -semirings) for regular languages, Conway [30] associated an identity with every finite group. Many important (partial) Conway semirings including complete and continuous semirings, or more generally, the inductive  -semirings discussed below, or the partial iterative semirings defined in the previous section satisfy Conway’s group identities. When a (partial) Conway semiring satisfies the group identities, it will be called a (partial) iteration semiring. Suppose that G is a finite group of order n. Without loss of generality we may assume that the elements of G are the integers in Œn; with each integer i 2 Œn we associate the variable xi . Let  denote the product operation of G . The structure of G can be fully described by an n  n matrix MG whose .i; j /th entry is the variable xi 1 j , for all i; j 2 Œn. Each row and each column of MG is a permutation of the first row. Let us define MG by the matrix star formula (3). Note that each entry of MG is a rational expression in the variables x1 ; : : : ; xn . Let r1 ; : : : ; rn denote the expressions appearing in the first row of MG . It can be seen that, modulo the Conway identities, each row and each column of MG is a permutation of r1 ; : : : ; rn . The group identity associated with the finite group G is r1 C    C rn D .x1 C    C xn / : (5) Using matrix notation, the group identity associated with G can be written as e1 MG un D .x1 C    C xn / ;

where e1 is a n-dimensional row vector whose first component is 1 and whose other components are 0, and un is an n-dimensional column vector whose components are all 1. For example, the group identity associated with a group of order 2 is     x y 1 .1 0/ D .x C y/ y x 1

which by the matrix star formula can be written as

.x C yx  y/ .1 C yx  / D .x C y/ :

(6)

Definition 4.1. An iteration semiring 1 is a Conway semiring satisfying all group identities. A partial iteration semiring is a partial Conway semiring S which satisfies all group identities (5) when each xi is interpreted as an element of D.S /. A morphism of (partial) iteration semirings is a (partial) Conway semiring morphism. 1 The term iteration semiring with a different axiomatisation was first used in [18].

742

Zoltán Ésik

Example 4.1. In Conway semirings, the group identity (6) associated with the cyclic group of order 2 is equivalent to the identity .x 2 / .1 C x/ D x  :

(7)

Indeed, (6) reduces to (7) by substituting 0 for x and x for y . On the other hand, if (7) holds in a Conway semiring, then .x C yx  y/ .1 C yx  / D .x  yx  y/ x  .1 C yx  / D x  .yx  /2 .1 C yx  / D x  .yx  / D .x C y/ proving (6).

An idempotent or commutative (partial) iteration semiring is a (partial) iteration semiring which is idempotent or commutative. An iteration semiring is called ! -idempotent if it is an ! -idempotent Conway semiring. We let IS and OIS respectively denote the variety of all iteration semirings and ! -idempotent iteration semirings. All of the examples of Conway semirings mentioned in § 2 are in fact iteration semirings. Theorem 4.1 ([30]). In ! -idempotent Conway semirings, the group identity associated with a cyclic group of order n is equivalent to the nth power identity: .x n / .1 C x C    C x n

1

/ D x :

We have seen that partial iterative semirings are partial Conway semirings. Actually these semirings are all partial iteration semirings. Theorem 4.2 ([21]). Each partial iterative semiring .S; I / is a partial iteration semiring. Using this result and Theorem 2.2, we have from [18]: Theorem 4.3. Suppose that S is a partial iterative semiring with distinguished ideal I and S0 is a subsemiring of S which is an iteration semiring. Suppose that S is the direct sum of S0 and I . Then there is a unique way of turning S into an iteration semiring whose star operation extends the one defined on S0 . The group identities seem to be extremely difficult to verify in practice. However, they are implied by the simpler “functorial star” condition [17] which is often easier to establish. Proposition 4.4. Suppose that S is a partial Conway semiring. Suppose that for all matrices M 2 D.S /nn and x 2 D.S / the implication M un D un x H) M  un D un x 

(8)

holds. Then S is a partial iteration semiring.

Indeed, when G is a finite group of order n, each row of MG is a permutation of x1 ; : : : ; xn . Thus, MG has constant row sum x D x1 C  Cxn . If the condition involved in Proposition 4.4 holds, then each row sum of MG is x  . Proposition 4.4 describes an infinitely axiomatised quasi-variety (see [35] and [114]) contained in IS. A finitely based quasi-variety of this sort is the one axiomatised by the Conway identities and a quasi-identity in 6 variables introduced in [61]: .x1 C x2 / z D .y1 C y2 / u H) .x1 C x2 / z D .x1 C x2 y2 y1 / .z C x2 y2 u/: (9)

20. Equational theories for automata

743

The fact that the Conway identities together with this quasi-identity imply the functoriality condition of Proposition 4.4 has been shown in [17]. It is not known whether or not there exists an identity which is true of all Conway semirings satisfying (8) or (9) but does not hold in iteration semirings. However, in the ! -idempotent case, there is no such identity. Problem 2. Is the equational theory of iteration semirings decidable? Describe the structure of the free iteration semirings. The structure of the initial iteration semiring is known, see [17]. 4.1. Matrix semirings. The most important fact about matrix semirings over (partial) iteration semirings is that they also form (partial) iteration semirings. Consequently, whenever an identity holds in IS, then the “matrix version” of that identity also holds. The following fact was proved for OIS in [87]. For IS and for partial iteration semirings, the theorem follows from a more general result on iteration theories [42]. Theorem 4.5. Suppose that S is a partial iteration semiring with D.S / D I . If we extend the star operation to matrices in I nn by the matrix star formula (3), then S nn becomes a partial iteration semiring with D.S nn / D I nn . 4.2. Formal series. We have seen in § 2 that when S is any semiring and M is a finitely factorisable partial monoid, then S hhM ii is a (symmetric) partial iterative semiring and thus a partial Conway semiring, where the star operation is defined on the proper series. From Theorems 4.2 and 4.3 we obtain the following: Corollary 4.6. If S is a semiring and M is a finitely factorisable partial monoid, then S hhM ii is a partial iteration semiring in a unique way. Moreover, for any set A, S rat hhA ii and S rat hhc.A /ii with distinguished ideal formed by the proper rational series are partial iteration semirings. Corollary 4.7. If S is an iteration semiring and M is a finitely factorisable partial monoid, then there is unique way to extend the star operation to the semiring S hhM ii such that S hhM ii becomes an iteration semiring. Thus, when S is an iteration semiring, then for any set A, S rat hhA ii and S rat hhc.A /ii are iteration semirings.  rat  rat  Thus, Nrat 1 hhA ii, N1 hhc.A /ii, RegA , CRegA are iteration semirings, and N hhA ii is a partial iteration semiring.

4.3. Duality. Recall that the dual S d of a partial Conway semiring S is a partial Conway semiring, where all operations are the same as in S except for multiplication which is the reverse of the multiplication operation of S . The very same fact holds for (partial) iteration semirings since the “dual” of a group identity is a group identity. See [21] and [47]. In the next two sections, we will discuss complete and continuous semirings which give rise to iteration semirings.

Zoltán Ésik

744

5. Complete semirings DefinitionP 5.1. A semiring S is complete [40] if it is equipped with a summation operation i 2I xi that maps families xi , i 2 I of elements of S into S , where I is any set. The summation operation interacts with the semiring operations:

XX

i 2I j 2Ji

xj D

k2

X

S

i 2I

X

i 2Œn

xk ; Ji

xi D x1 C    C xn ;

X i 2I

 X xi y; xi y D i 2I

(10)

n > 0; y

X i 2I

 X yxi xi D

(11)

i 2I

where we assume that the sets Ji , i 2 I are pairwise disjoint. A morphism of complete semirings is a semiring morphism which preserves summation. In a countably complete semiring, summation is restricted to countable families. Morphisms of countably complete semirings preserve countable sums. Clearly, every complete semiring is countably complete, but the converse fails in general. For more on complete and countably complete semirings, we refer to [58], [59], [60], [66], [83], and [88]. It was shown in [60] that in a complete semiring one may always restrict summation to families indexed by sets bounded by a cardinal that depends on the cardinality of the semiring. For some generalisations of the notion of (countably) complete semirings refer to [46] and [90]. A completely idempotent semiring is a complete semiring S such that for any nonempty P family xi 2 S , i 2 I and for any x 2 S , if xi D x for all i 2 I then i 2I xi D x . Countably idempotent semirings are defined similarly. We note that completely idempotent semirings are the same as the S-algebras of [30], or Blikle nets [13], or (unital) quantales [106]. In the next section, will see that they are the same as the idempotent continuous semirings. Examples P of (countably) complete semirings include N1 and B. In N1 , a (countable) sum i 2I xi is 1 ifPand only if at least one xi is 1, or the set of all i such that xi ¤ 0 is infinite. In B, i 2I xi D 0 if and only if xi D 0 for all i 2 I . The semiring B is completely (and countably) idempotent. More generally, every completely distributive complete lattice is a completely idempotent semiring. Further examples of completely idempotent semirings include P .M / and RelA , where M is a monoid and A is a set and summation is given by set union. The countable subsets of M determine a countably idempotent subsemiring of P .M /. Each complete semiring may be turned into a  -semiring by defining P countably n x D n>0 x . We call this operation the canonical star operation. It is clear that morphisms of countably complete semirings preserve the canonical star operation. The following result is from [20]. 

20. Equational theories for automata

745

Theorem 5.1. Each countably complete semiring is an iteration semiring and satisfies the following identities: 1 x D x1 ;

(12)



 

(13)





(14)

.1 C x/ D 1 x ; 1 D1 :

Each countably idempotent semiring satisfies 1 D 1. The fact that every countably complete semiring is a Conway semiring was proved in [66]. The same fact for completely idempotent semirings was already established in [30]. The group identities hold since (8) does; see [17] and below. We call the iteration semirings resulting from (countably) complete semirings (countably) complete iteration semirings. If S is a (countably) complete semiring, then equipped with the pointwise summation operation, so is each matrix semiring S nn . (In fact, for every (countable) set I , one can turn S I I P into a (countably) complete semiring.) So S nn comes with a  star operation M D k>0 M k : But there is another star operation on S nn , which is obtained from the star operation on S using the matrix formula (3). Luckily, the two star operations are the same [66]. We also note that any countably complete semiring S satisfies XZ D ZY H) X  Z D ZY  for all X 2 S mm , Y 2 S nn and Z 2 S mn . Thus (8) holds. Call a partial monoid M countably factorisable if each m 2 M can be written only a countable number of different ways as a finite product of elements different from 1. Let S be a countably complete semiring. Due to the fact that S has countable sums, S hhM ii can be turned into a semiring. Defining summation pointwise, S hhM ii becomes a countably complete semiring and is equipped with the canonical star operation. Another star operation may be obtained using Corollary 2.7. Again, the two star operations are the same, since both give rise to iteration semirings. When S is complete and M is any partial monoid, S hhM ii is also a complete semiring. We also note that the dual of a (countably) complete semiring is (countably) complete with the same summation. The structure of the free countably complete semirings can be described using formal series. Theorem 5.2. For each set A, the countably complete subsemiring of N1 hhA ii (BhhA ii) determined by all series of countable support is freely generated by A in the class of countably complete (countably idempotent) semirings. Moreover, for any set A, BhhA ii (or P .A /) is a free completely idempotent semiring on A. Note that when A is countable, the above subsemiring is just N1 hhA ii (BhhA ii). In particular, N1 is an initial countably complete semiring and B is both an initial countably idempotent and an initial completely idempotent semiring. In contrast, for cardinality reasons, there exists no initial complete semiring; see [60]. Similar results apply to the semirings N1 hhc.A /ii and Bhhc.A /ii.

Zoltán Ésik

746

6. Continuous semirings An ordered semiring is a semiring S equipped with a partial order 6 which is preserved by the operations. (A more general definition is given in [58].) An ordered semiring S is positively ordered if 0 6 x for all x 2 S . Clearly, every positively ordered semiring is zero-sum free. A morphism of (positively) ordered semirings also preserves the partial order. If S is a semiring, we can always define a preorder 6s on S by x 6s y if and only if there exists some z with x C z D y . This relation 6s is preserved by both sum and multiplication. When 6s is a partial order, we call it the sum order (or difference order or natural order). Any sum ordered semiring is positively ordered. Also, if S is positively ordered by any partial order 6, then 6s is also a partial order and x 6 y whenever x 6s y . Thus, if S is positively ordered, its order relation extends the sum order. In the important special case when S is idempotent, the relation 6s is always a partial order and agrees with the semilattice order: x 6s y if and only if x C y D y . Moreover, it is the only partial order turning S into a positively ordered semiring. Definition 6.1. A continuous semiring is a positively ordered semiring S such that each nonempty chain C  S (see [35]) has a supremum sup C , and moreover, the sum and multiplication operations are continuous in the sense that they preserve the supremum of nonempty chains in both arguments. In an ! -continuous semiring, only ! -chains are required to have suprema. A morphism of (! -)continuous semirings preserves the supremum of (! -)chains. An idempotent (! -)continuous semiring is a continuous semiring which is idempotent. It is known that in an (! -)continuous semiring, the supremum of any (countable) directed set exists; cf. [29] and [91]. Every continuous semiring is ! -continuous, but the converse does not hold in general. Our notion of continuous semiring agrees with the usual notion of a continuous algebra in semantics [63]. In several papers, continuous semirings are sum ordered; see e.g. [84] and [107]. It follows from general results on continuous algebras that every positively ordered semiring can be embedded in a continuous semiring; see [14]. For a recent comparison of various notions of continuity for monoids and semirings we refer to [72]. Fixed point computations over continuous semirings have been considered lately – e.g., in [54] and [55]. In any (! -)continuous semiring, we define the sum of any (countable) family of elements xi , i 2 I , as follows: X X xi D sup xj ; i 2I

F I j 2F

where F ranges over the set of all finite subsets of I . The following fact is well known: Proposition 6.1. Equipped with the above summation, each (! -)continuous semiring S is a (countably) complete semiring. Moreover, every idempotent continuous semiring is a completely idempotent semiring, and every idempotent ! -continuous semiring is a countably idempotent semiring.

20. Equational theories for automata

747

Actually, every completely idempotent semiring, equipped with the semilattice order is an idempotent continuous semiring. A similar fact holds for countably idempotent ! -continuous semirings. By Theorem 5.1 and Proposition 6.1, any (! -)continuous semiring S gives rise to an iteration semiring with star operation x D

X

n>0

x n D sup n

n X

xi

i D0

that we call a (! -)continuous iteration semiring. Continuous and ! -continuous semirings are also closed under the constructions of matrix semirings, formal series semirings, and dual semirings. We give some details below. Suppose that S is a .! -)continuous semiring. Then, equipped with the pointwise partial order, each semiring S nn is also (! -)continuous. An important fact is that for each M 2 S nn and Y 2 S nk , M  Y is the least pre-fixed point [95] of the map S nk ! S nk given by X 7! M X C Y . Thus MM  Y C Y 6 M  Y , and for all X 2 S nk if M X C Y 6 X then M  Y 6 X . It follows that M  Y is not only a pre-fixed point but also a fixed-point, i.e., MM  Y C Y D M  Y . These facts can be seen directly and also follow from the Bekić–DeBakker–Scott rule [9] and [36] and are discussed, e.g., in [17]. If S is ! -continuous and M is countably factorisable then, equipped with the pointwise order, S hhM ii is ! -continuous. A similar fact is true for continuous semirings without any assumption on M . Moreover, S d , the dual of S , is (! -)continuous, with the same star operation. The structure of the free continuous semirings is known (see, e.g., [60]): Theorem 6.2. For any set A, the semiring N1 hhA ii, equipped with the pointwise order (or sum order), is freely generated by A in the class of continuous semirings. Similarly, N1 hhc.A /ii is a free commutative continuous semiring on A, and P .A / (P .c.A //, resp.) is a free idempotent (commutative and idempotent, resp.) continuous semiring on A. Considering only series with a countable support, a similar result holds for ! -continuous semirings. The subsemiring formed by the countable sets in P .A / (resp., P .c.A //, is a free idempotent (resp., commutative and idempotent) ! -continuous semiring on A.

7. Completeness Iteration semirings were introduced with the aim of obtaining complete systems of identities for regular languages. Conway conjectured that the identities of ! -idempotent iteration semirings are complete for regular languages. This conjecture was confirmed in [87]: the semirings of regular languages can be characterised as the free iteration semirings in OIS. More recently, this result has been extended to rational power series

748

Zoltán Ésik

with coefficients in N, or N1 . In this section, we will give a brief account of these results and some related ones. Recall that for any semiring S and set A, S hhA ii and S rat hhA ii are (symmetric) partial iterative semirings and whence partial iteration semirings. The star operation is defined on the ideal of proper (rational) series. Theorem 7.1 ([20]). For each set A, the semiring Nrat hhA ii is a free partial iteration semiring on A. This means that for any partial iteration semiring S and function hW A ! S with Ah  D.S / there exists a unique partial iteration semiring morphism h] W Nrat hhA ii ! S extending h. We give a brief outline of the proof. Let h0 denote the unique semiring morphism N ! S which exists since N is an initial semiring. When s is in Nrat hhA ii, there is an automaton A D .˛; M; ˇ/ over .N; A/ in NhhA ii whose behaviour is s . The image of A under h is an automaton Ah over .Nh0 ; Ah/ in S : Ah D .˛h0 ; M h; ˇh0 /. We are forced to define sh] as the behaviour of Ah. Following the proof of Theorem 3.2, it is not difficult to verify that if h] is a function then it preserves the rational operations as well as the constants. The fact that h] is a function is established using a recent result in [7] and [8]: If two automata A and A0 are equivalent (i.e., they have the same behaviour), then they can be connected with a chain of functional simulations and dual functional simulations [17], (coverings and co-coverings in the terminology of [7]). This implies that the family of quasi-identities (8) and their “duals” establish that Ah and A0 h are also equivalent. Due to the special structure of N (being “atomistic” in the terminology of [16]), and to some results in [42], the group identities suffice instead of (8) to establish that Ah and A0 h are equivalent. Several extensions of Theorem 7.1 are discussed in [20] and [52]. Since Nrat hhA ii is a symmetric partial iterative semiring and hence a partial iteration semiring, and since any morphism of partial iterative semirings preserves the (partially defined) star operation, we obtain as an immediate consequence of Theorem 7.1 the following result: Corollary 7.2. For each set A, the semiring Nrat hhA ii is both a free partial iterative semiring and a free symmetric partial iterative semiring on A. We now turn to rational power series with coefficients in the iteration semiring N1 . In this case, for each set A, N1 hhA ii is a continuous iteration semiring containing  Nrat hhA ii as a subsemiring. In [20] and [111], it is shown that Nrat 1 hhA ii is a Fatou rat   rat  rat  extension of N hhA ii, so that NhhA ii \ N1 hhA ii D N hhA ii. We have already characterised N1 hhA ii as a free continuous semiring on A. When A is countable, this semiring is also the free countably complete semiring on A.  Theorem 7.3 ([20]). For each set A, the semiring Nrat 1 hhA ii is the free iteration semiring on A satisfying the identities (12)–(14).

20. Equational theories for automata

749

Corollary 7.4. The following are equivalent for rational expressions t; t 0 in the variables x1 ; : : : ; xn :  i. t and t 0 denote the same rational power series in Nrat 1 hh¹x1 ; : : : ; xn º ii; 0 rat  ii. the identity t D t holds in all semirings N1 hhA ii, where A is any set; iii. the identity t D t 0 holds in all (! -)continuous iteration semirings; iv. t D t 0 holds in all (countably) complete iteration semirings; v. t D t 0 holds in all iteration semirings satisfying (12)–(14). A closely related result was earlier proved in [87]:

Theorem 7.5. For each alphabet A, the iteration semiring Reg A is freely generated by A in the variety OIS. Theorem 7.5 follows from Theorem 7.3 by observing that Reg A is isomorphic to  the least ! -idempotent quotient of Nrat 1 hhA ii. With a more complicated axiomatisation of iteration semirings, the same result was obtained independently in [16]. The following result that corresponds to Corollary 7.2 is an algebraic version of Salomaa’s axiomatisation [109] of regular languages. For extensions to fields, commutative rings, and more generally “Noetherian semirings,” see [12], [52], [86], and [93]. Corollary 7.6. For any set A, Reg A , equipped with the distinguished ideal of all regular languages not containing the empty word, is the free (symmetric) partial iterative semiring on A. Corollary 7.7. The following are equivalent for rational expressions t; t 0 in the variables x1 ; : : : ; xn : i. t and t 0 denote the same regular language in Reg A , where A D ¹x1 ; : : : ; xn º; ii. the identity t D t 0 holds in all semirings Reg A , where A is any set; iii. the identity t D t 0 holds in all (! -)continuous idempotent iteration semirings; iv. t D t 0 holds in all iteration semirings RelA of relations; v. t D t 0 holds in OIS.

The equivalence of the second and fourth condition was shown, for example, in [34] and [41]. Next we turn to commutative regular languages. Recall that a language L  c.A / is regular if its characteristic series is in Brat hhc.A /ii. An important result [104] – also see [30] and [110] – describes a complete infinite set of identities for commutative regular languages. In this case, instead of the group identities, the power identities suffice. An alternative formulation of this completeness result is: Theorem 7.8. For each set A, CReg A is freely generated by A in the variety OISc of all  -semirings axiomatised by the “classical identities,” i.e., the Conway identities and the power identities, the identity 1 D 1, and the identities xy D yx; 

(15)







.x C y/ D .xy/ .x C y /:

(16)

750

Zoltán Ésik

Corollary 7.9. The following conditions are equivalent for rational expressions t; t 0 in the variables x1 ; : : : ; xn :

i. t and t 0 evaluate to the same commutative regular language in CReg A , where A D ¹x1 ; : : : ; xn ºt; ii. t D t 0 holds in all semirings CReg A , where A is any set; iii. t D t 0 holds in all commutative idempotent (! -)continuous iteration semiringst; iv. t D t 0 holds in all commutative ! -idempotent iteration semirings satisfying (16); v. t D t 0 holds in all  -semirings satisfying the classical identities, the identity 1 D 1 and the identities (15) and (16). Early results in [104] and [105] show that there is no finite system of identities of regular languages, or commutative regular languages, that would prove all identities of regular languages or commutative regular languages in one variable. See also [30] and [110]. Thus, the varieties OIS and OISc are non-finitely based. It is shown in [32] that OISc is the variety generated by the iteration semiring of regular languages over a one-letter alphabet, so that there is no complete system of identities of one-letter regular languages either. In [3], this result is obtained in a direct way, similar to the model construction methods of [30]. In [30], it is proved that each complete system of identities for OIS must contain an infinite number of identities in at least two variables. In [33], it was proved that there is a no finite system of identities that hold in OIS which would prove all valid identities of OIS not involving the sum operation. Since OIS is defined by a single identity relatively to IS, similar facts hold for IS. The group identities are not all independent in Conway semirings. Let G denote a set of finite groups. It is known that the ! -idempotent Conway semiring identities and the collection of identities associated with the groups in G form a complete set of identities for OIS if and only if for each finite simple group G there is a group H 2 G such that G divides H , i.e., G is a quotient of a subgroup of H . A similar fact holds for IS. See [30], [42], and [87]. This gives yet another proof of the fact that OIS and IS are not finitely based. Moreover, since every finite group can be embedded in a finite simple group, no essential simplification can be obtained by considering subsets of the group identities. Problem 3. Does there exist a minimal complete set of identities for IS? Does there exist an integer k such that the valid identities of IS in at most k variables form a complete set? One can ask the same questions for OIS. Regarding OISc , there is a minimal complete set of identities containing only those power identities that correspond to prime numbers. Problem 4. Find a complete set of identities of commutative (! -)continuous semirings.

20. Equational theories for automata

751

Again, there is no finite complete set of identities. The decision problem for the equational theory of OIS is equivalent to the equivalence problem of regular expressions and is thus PSPACE-complete; see [113]. The equational theory of OISc is also decidable and is reducible by Corollary 7.9 to the equality problem of semilinear sets. For complexity considerations, see [68] and [69].

8. Inductive  -semirings and Kleene algebras In continuous semirings, the star operation provides least (pre-)fixed point solutions to linear equations. The same fact holds for the semirings of regular languages. Kozen [76] and Krob [87] have shown that in idempotent semirings, this property alone is sound and complete for the equational theory of regular languages and whence for the equational theory of idempotent continuous semirings. This completeness result has recently been extended to all continuous semirings in [20]. Definition 8.1. A right-handed inductive  -semiring 2 [47] is a  -semiring which is also an ordered semiring and satisfies the identity xx  C 1 6 x 

(17)

ax C b 6 x H) a b 6 x:

(18)

and the right-handed fixed point induction rule

A right-handed Kleene algebra RKA (see [76]) is an idempotent right-handed inductive -semiring. A morphism of right-handed inductive  -semirings or Kleene algebras is an order preserving  -semiring morphism.



Any right-handed inductive  -semiring S is positively ordered. Indeed, since 1x C 0 D x , we have that 0 D 1 0 6 x , for all x 2 S . Thus, the partial order of a righthanded Kleene algebra is the semilattice order, and any  -semiring morphism between right-handed Kleene algebras automatically preserves the order. We also note that the star operation is nondecreasing in all right-handed inductive  -semirings. Indeed, if x 6 y in a right-handed inductive  -semiring, then xy  C 1 6 yy  C 1 6 y  and thus x 6 y . Left-handed inductive  -semirings and left-handed Kleene algebras are defined dually. It was shown in [75] that right-handed Kleene algebras are not necessarily lefthanded. For a refinement of this fact, see [101]. Definition 8.2. A symmetric inductive  -semiring [47] is both a right-handed inductive -semiring and a left-handed inductive  -semiring. A Kleene algebra KA (see [76]) is an idempotent symmetric inductive  -semiring. A morphism of these structures is an ordered semiring morphism preserving star. 

Theorem 8.1 ([47]). Any right-handed inductive  -semiring is an iteration semiring. 2 Called just inductive  -semiring in [47].

752

Zoltán Ésik

Since the dual of an iteration semiring is also an iteration semiring, the same result holds for left-handed iteration semirings. We note that if a right-handed inductive  -semiring satisfies the left-handed fixed point induction rule xa C b 6 x H) ba 6 x;

(19)

then it is a symmetric inductive  -semiring.

 Theorem 8.2 ([20]). The semiring Nrat 1 hhA ii, equipped with the sum order, is both the  free right-handed inductive -semiring generated by A which additionally satisfies the identity x1 6 1 x , and the free symmetric inductive  -semiring on A.  We note that the sum order on Nrat 1 hhA ii is not the same as the pointwise order; see [20].

Theorem 8.3 ([76] and [87]3 ). The iteration semiring Reg A , equipped with the inclusion order, is both a free right-handed Kleene algebra and a free Kleene algebra on A. Corollary 8.4. An identity between rational expressions holds in the variety of iteration semirings satisfying (12)–(14) if and only if it holds in all right-handed inductive  -semirings satisfying x1 6 1 x if and only if it holds in all symmetric inductive  -semirings. An identity between rational terms holds in OIS if and only if it holds in RKA, or in KA. (In [76], only the second result concerning KA is proved.) Generalisations of Theorem 8.3 are given in [45] and [53]. Theorem 8.5. The iteration semiring CReg A is a free commutative Kleene algebra on A. Indeed, CReg A embeds in the commutative idempotent continuous iteration semiring P .c.A // which is in KA. Thus, by Theorem 8.3 and Theorem 7.8, it remains to show that (16) holds in all commutative Kleene algebras. Now it is clear that .xy/ .x  C y  / 6 .x C y/

holds. The reverse inequality follows by fixed point induction from .x C y/.xy/ .x  C y  / C 1 6 .xy/ .x  C y  /

which can be established in commutative Kleene algebras by a routine calculation. Corollary 8.6. An identity between rational expressions holds in OISc if and only if it holds in all commutative Kleene algebras. Although the equational theory of KA is the same as the equational theory of idempotent continuous iteration semirings, the quasi-equational theory of KA is strictly contained in the quasi-equational theory of idempotent continuous iteration semirings. 3 In [87], this result appears in a slightly different form.

20. Equational theories for automata

753

For example, the quasi-identity (22) below holds in all idempotent continuous  -semirings but does not hold in KA, cf. [101]. The computational complexity of the quasiequational theory of KA and the quasi-equational theory of idempotent continuous iteration semirings were studied in [80]. It was proved that the quasi-equational theory of KA is †01 -complete, whereas the quasi-equational theory of idempotent continuous iteration semirings is …11 -complete. See also [34]. Several syntactically restricted subclasses of the quasi-equational theory of KA have been studied in [80]. The quasiequational theory of relational Kleene algebras is also …11 -complete; see [64]. If S is a right-handed inductive  -semiring then so is each matrix semiring S nn , for any n > 0. Moreover, for any set A, S hhA ii is a right-handed inductive  -semiring. Similar facts hold for symmetric inductive  -semirings, and for RKA and KA. Power series semirings over inductive  -semirings are again inductive  -semirings. See [17], [47], and [76]. Kleene algebras have many applications in programming and programming logics. Here we briefly mention some important algebraic structures related to Kleene algebras that have arisen in connection with these applications. A Kleene algebra with tests (KAT) [78] and [81] is a Kleene algebra with an embedded Boolean algebra. The axioms of KAT are sound in several models including relational and language models. In [81], it is shown that the axioms of KAT are complete for the equational theory of relational models. In [79], it is shown that the soundness and relational completeness of Hoare logic can be reduced to KAT. (Actually one does not need the full strength of KAT for this purpose.) The relationship between KAT and Cook’s relative completeness theorem [31] is analysed in [82]. Closely related results were proved earlier in [15] and [17], where the soundness and relative completeness of Hoare logic over flowchart algorithms was reduced to Conway theories. Kleene algebras with domain [37] are Kleene algebras extended with a domain and codomain operation subject to certain simple equational axioms. For Kleene lattices, see [70] and [77]. For action algebras, see § 9. The relationship between Kleene algebras and dynamic algebras [74], [99], and [102] is discussed in [65]. For Kleene algebras with converse see [10], and [22]. 8.1. Transitive closure. RKA and KA are finitely axiomatised quasi-varieties of idempotent  -semirings capturing the equational theory of OIS and idempotent continuous iteration semirings. In this section, we exhibit yet another finitely axiomatised quasi-variety of this sort. Following [100] (see also [76]), we present a simplified version of the fixed point induction rule. Proposition 8.7. In idempotent  -semirings, the right-handed fixed point induction rule is equivalent to the following one: ax 6 x H) a x 6 x:

(20)

Indeed, suppose that S is an idempotent  -semiring. If the right-handed fixed point induction rule holds and ax 6 x , then ax C x 6 x , so a x 6 x . If (20) holds and ax C b 6 x , then ax 6 x and a x 6 x . Since also b 6 x , we conclude that a b 6 a x 6 x . Of course, a similar fact holds for the left-handed fixed point induction rule.

Zoltán Ésik

754

Definition 8.3. Let S be an idempotent  -semiring and x 2 S . We say that x is reflexive if 1 6 x and transitive if xx 6 x . Let TCL denote the class of all idempotent  -semirings S such that for each x 2 S , x  is the reflexive-transitive closure of x . Formally, an idempotent  -semiring S belongs to TCL if it satisfies the identity and the quasi-identity

1 C x C x x 6 x

(21)

1 C x C yy 6 y H) x  6 y:

(22)

x  6 .x C y/

(23)

x 2 6 x H) x  6 1 C x:

(24)

x 2 6 x H) x C 6 x:

(25)

Proposition 8.8. An idempotent  -semiring is in TCL if and only if it satisfies (21), and Also, (24) may be replaced by (25):

Note that (23) asserts that star is nondecreasing. Indeed, if (21) and (22) hold in an idempotent  -semiring then star is nondecreasing, and if x 2 6 x then 1Cx C.1Cx/2 6 1Cx and thus x  6 1Cx . Assume now that (21), (23), and (24) hold. If 1Cx Cyy 6 y , then yy 6 y and thus y  6 1 C y . Since also 1 6 y and x 6 y , x  6 y  6 1 C y D y . Proposition 8.9. RKA  TCL  OIS, where each inclusion is proper.

The first inclusion follows from the previous proposition; also see [100]. The second inclusion is shown in [24]. In [101], it is shown that the first inclusion is proper. The second inclusion is proper since TCL is finitely axiomatised but OIS is not. Conway’s leap belongs to OIS but is not included in TCL. Since the least variety of  -semirings containing RKA is OIS, we immediately have:

Corollary 8.10 ([24]). The variety generated by TCL is OIS. An equation holds in OIS if and only if it holds in TCL. For related results, see also [23], [30], and [87]. In fact, TCL may be axiomatised by the idempotent semiring identities together with (23) and (24), or (23) and the quasiidentity x 2 D x H) x  D 1 C x ; see [19], [23], and [30]. Remark 8.11. The quasi-variety REL generated by the iteration semirings Rel A is included in KA. Moreover, it is strictly included in KA, since REL has no finite axiomatisation [4].

9. Residuation As we have already seen, OIS is a non-finitely based variety, i.e., there is no finite complete set of identities of regular languages. Surprisingly, by adding the residuation

20. Equational theories for automata

755

operations to the rational operations, it is possible to obtain a finitely based equational theory that proves all valid identities of regular languages in the rational operations, cf. [101]. Definition 9.1. An action algebra [101] A D .A; C; ; 0; 1; ; !; / consists of a commutative idempotent monoid .A; C; 0/, a monoid .A; ; 1/ which is equipped with the unary operations ; !; such that .21/, .22/, and the following axiom hold: x 6 .z

y/ () xy 6 z () y 6 .x ! z/:

(26)

Since the last axiom breaks down to 4 quasi-identities, the class ACT of action algebras is a quasi-variety. The operations and ! are called left and right residuation (or residuals), respectively. Every action algebra is an idempotent semiring and is in TCL (more precisely, its reduct obtained by removing the residuation operations belongs to TCL). Thus, ACT is TCL with residuation. While TCL properly includes RKA and KA, TCL with residuation is the same as KA with residuation. Any action algebra is partially ordered by the semilattice order. The axiom (26) asserts that for any fixed x , the operation x ! z is a right adjoint [6] to the operation xy , left multiplication with x , and symmetrically, z x is right adjoint to yx , right multiplication with x . P Equipped with the semilattice order, every quantale Q is a complete lattice with i 2I xi D supi 2I xi for all families xi , i 2 I of elements of Q . Moreover, multiplication preserves arbitrary suprema in both arguments. Each quantale Q can be turned into an action algebra in a unique way. We define .y

x/ D sup¹z j zx 6 yº

.x ! y/ D sup¹z j xz 6 yº:

In the quantale of all languages over a set A, the residuation operations are the duals of x/ D y=x N , the usual quotient operations [30], [108], and [110]. For example, .y where for any x and y , y=x denotes the right quotient of y with respect to x and xN is the complement of x . Another example is the quantale of all binary relations on a set, where sum and multiplication are set union and relation composition. Residuals have their usual meaning, cf. [71]. Definition 9.2. A residuated Kleene algebra is a Kleene algebra equipped with operations and ! satisfying (26). Let KAR denote the class of all residuated Kleene algebras. The following fact was proved in [77] and [101]: Theorem 9.1. KAR = ACT. A major result on action algebras is that ACT is a variety capturing the equational theory of regular languages.

Zoltán Ésik

756

Theorem 9.2 ([101]). ACT is a variety. A complete set of identities for ACT consists of the idempotent semiring axioms, the identities y .y

x 6 .y C y 0 /

x/x 6 y; y 6 yx

x;

x;

x ! y 6 x ! .y C y 0 /;

x.x ! y/ 6 y; y 6 x ! xy;

(21), (23), and the identity of pure induction: .x ! x/ 6 .x ! x/:

As shown in [101], the idempotent semiring identities may be weakened, since in conjunction with the other identities they hold if and only if the additive structure is a commutative idempotent monoid, the multiplicative structure is a monoid, and multiplication is nondecreasing in either argument. Theorem 9.3 ([101]). An identity between rational terms holds in ACT if and only if it holds in OIS. Actually one-sided residuation suffices for the above theorem. As argued in [101], OIS, or equivalently the equational theory of (regular) languages with the regular (or rational) operations is both too big and too small. It is to big in the sense that it is not finitely based, and it is too small for not being able to rule out even finite “nonstandard” models, such as Conway’s leap. In contrast, action algebras form a finitely based variety having only standard finite models, since every finite idempotent semiring can be expanded in a unique way to an action algebra, see [101]. In [28], it is shown that the equational theory of “  -continuous action algebras” is 0 …1 -complete, whereas the equational theory of ACT is †01 . Consequently, there is an identity true of all continuous action algebras which does not hold in ACT. By [27], the equational theory of action algebras of (regular) languages is also …01 -complete. These results should be compared with those in [80]. The logic associated with action algebras may be seen as an enrichment of Lambek’s calculus with a star operation. For the relationship between Lambek’s calculus, Kleene algebras and action algebras, we refer to [27], [28], and [70]. Two important subvarieties of action algebras are action lattices and Boolean action lattices, see [70] and [77]. It is shown in [77] that the Kleene algebra of matrices over an action algebra expands to an action algebra if and only if it is an action lattice. Nice discussions on the relationship between relation algebras with star or transitive closure (see [71] and [94]) and Kleene algebras, action algebras, and dynamic algebras can be found in [100], [101], and [103].

20. Equational theories for automata

757

10. Some extensions The star operation is a fixed point operation in iteration semirings, since s  provides a (canonical) solution to fixed point equations of the sort x D sx C 1. A general study of the equational properties of fixed point operations has been carried out in elementary category theory. The iteration theories and their algebras, called iteration algebras [17] capture the equational properties of fixed point operations in several different contexts. By Theorem 7.5, the classical language (or trace) semantics of finite automata can be characterised by a finite number of identities relative to iteration semirings. A bisimulation based semantics of finite automata was introduced in [92] and [96]. In the algebraic structures arising in the bisimulation semantics, multiplication distributes over finite sums only on the right. Sound and complete sets of equational and quasi-equational axioms have been obtained for bisimulation equivalence in [17], [57], and [56], and for simulation equivalence in [43]. Other extensions deal with infinite words (see [97]) and Büchi-automata (see [17], [49], [48], and [96]), probabilistic bisimulation equivalence (see [1]), and tree automata (see [44]). Acknowledgement. The work of this article was partially supported by the National Foundation of Hungary for Scientific Research (OTKA), Grant No. K 108448.

References [1] L. Aceto, Z. Ésik, and A. Ingólfsdóttir, Equational axioms for probabilistic bisimilarity. In Algebraic methodology and software technology (H. Kirchner and C. Ringeissen, eds.). Lecture Notes in Computer Science, 2422. Springer, Berlin, 2002, 239–253. MR 2050826 Zbl 1275.68099 q.v. 757 [2] L. Aceto, Z. Ésik, and A. Ingólfsdóttir, A fully equational proof of Parikh’s theorem. Theor. Inform. Appl. 36 (2002), no. 2, 129–153. MR 1948766 Zbl 1024.68070 q.v. 741 [3] L. Aceto, W. Fokkink, and A. Ingólfsdóttir, On a question of A. Salomaa: the equational theory of regular expressions over a singleton alphabet is not finitely based. Theoret. Comput. Sci. 209 (1998), no. 1–2, 163–178. MR 1647510 Zbl 0915.68117 q.v. 750 [4] H. Andréka, Representations of distributive lattice-ordered semigroups with binary relations. Algebra Universalis 28 (1991), no. 1, 12–25. MR 1083817 Zbl 0725.06007 q.v. 754 [5] S. Banach, Sur les opérations dans les ensembles abstraits et leur application aux équations intégrales. Fund. Math. 3 (1922), 133–181. MR 3949898 JFM 48.0201.01 q.v. 732 [6] M. Barr and C. Wells, Category theory for computing science. Prentice Hall International Series in Computer Science. Prentice Hall International, New York, 1990. MR 1094561 Zbl 0714.18001 q.v. 755 [7] M.-P. Béal, S. Lombardy, and J. Sakarovitch, On the equivalence of Z-automata. In Automata, languages and programming (L. Caires, G. F. Italiano, L. Monteiro, C. Palamidessi, and M. Yung, eds.). Proceedings of the 32nd International Colloquium (ICALP 2005) held in Lisbon, July 11–15, 2005. Springer, Berlin, 2005, 397–409. MR 2184648 Zbl 1082.68069 q.v. 748 [8] M.-P. Béal, S. Lombardy, and J. Sakarovitch, Conjugacy and equivalence of weighted automata and functional transducers. In Computer science—theory and applications

758

[9] [10]

[11] [12]

[13] [14] [15] [16]

[17]

[18]

[19] [20]

[21]

[22] [23] [24]

Zoltán Ésik (D. Grigoriev, J. Harrison, and E. A. Hirsch, eds.). Proceedings of the 1st International Symposium on Computer Science in Russia (CSR 2006) held in St. Petersburg, June 8–12, 2006. Lecture Notes in Computer Science, 3967. Springer, Berlin, 2006, 58–69. MR 2260982 Zbl 1185.68381 q.v. 748 H. Bekić, Definable operations in general algebras, and the theory of automata and flowcharts. Technical report. IBM Laboratory, Vienna, 1969. q.v. 747 L. Bernátsky and Z. Ésik, Equational properties of Kleene algebras of relations with conversion. Theoret. Comput. Sci. 137 (1995), no. 2, 237–251. MR 1311223 Zbl 0872.08004 q.v. 753 L. Bernátsky and Z. Ésik, Semantics of flowchart programs and the free Conway theories. RAIRO Inform. Théor. Appl. 32 (1998), no. 1–3, 35–78. MR 1657511 q.v. 733 J. Berstel and C. Reutenauer, Noncommutative rational series with applications. Encyclopedia of Mathematics and its Applications, 137. Cambridge University Press, Cambridge, 2011. (New version of Rational series and their languages. EATCS Monographs on Theoretical Computer Science, 12. Springer, Berlin, 1988.) MR 2760561 Zbl 1250.68007 q.v. 730, 737, 749 A. J. Blikle, Nets: complete lattices with composition. Bull. Acad. Polon. Sci., Sér. Sci. Math. Astr. Phys. 19 (1971), 1123–1127. Zbl 0226.68032 q.v. 744 S. L. Bloom, Varieties of ordered algebras. J. Comput. System Sci. 13 (1976), no. 2, 200–212. MR 0427204 Zbl 0337.06008 q.v. 746 S. L. Bloom and Z. Ésik, Floyd–Hoare logic in iteration theories. J. Assoc. Comput. Mach. 38 (1991), no. 4, 887–934. MR 1134520 Zbl 0799.68042 q.v. 741, 753 S. L. Bloom and Z. Ésik, Equational axioms for regular sets. Equational axioms for regular sets. (English summary) Math. Structures Comput. Sci. 3 (1993), no. 1, 1–24. MR 1206870 Zbl 0796.68153 q.v. 748, 749 S. L. Bloom and Z. Ésik, Iteration theories. The equational logic of iterative processes. EATCS Monographs on Theoretical Computer Science. Springer, Berlin, 1993. MR 1295433 Zbl 0773.03033 q.v. 731, 732, 733, 734, 736, 741, 742, 743, 745, 747, 748, 753, 757 S. L. Bloom and Z. Ésik, Matrix and matricial iteration theories. I. J. Comput. System Sci. 46 (1993), no. 3, 381–408. MR 1228813 Zbl 0791.08006 q.v. 732, 734, 736, 741, 742 S. L. Bloom and Z. Ésik, Two axiomatizations of a star semiring quasi-variety. Bull. European Assoc. Theor. Comput. Sci. 59 (1996), 150–152. Zbl 0856.08009 q.v. 754 S. L. Bloom and Z. Ésik, Axiomatizing rational power series over natural numbers. Inform. and Comput. 207 (2009), no. 7, 793–811. MR 2519073 Zbl 1167.68036 q.v. 744, 748, 751, 752 S. L. Bloom, Z. Ésik, and W. Kuich, Partial Conway and iteration semirings. Fund. Inform. 86 (2008), no. 1–2, 19–40. MR 2462488 Zbl 1167.08002 q.v. 731, 732, 734, 742, 743 S. L. Bloom, Z. Ésik, and G. Stefanescu, Notes on equational theories of relations. Algebra Universalis 33 (1995), no. 1, 98–126. MR 1303634 Zbl 0834.08004 q.v. 753 M. Boffa, Une remarque sur les systèmes complets d’identités rationnelles. RAIRO Inform. Théor. Appl. 24 (1990), no. 4, 419–423. MR 1079723 Zbl 0701.68059 q.v. 754 M. Boffa, Une condition impliquant toutes les identités rationnelles. RAIRO Inform. Théor. Appl. 29 (1995), no. 6, 515–518. MR 1377029 Zbl 0881.68071 q.v. 754

20. Equational theories for automata

759

[25] P. Bouyer and A. Petit, A Kleene/Büchi-like theorem for clock languages. J. Autom. Lang. Comb. 7, no. 2 (2001), 167–186. q.v. 740 [26] P. Bouyer, A. Petit, and D. Thérien, An algebraic characterization of data and timed languages. In CONCUR 2001 – concurrency theory (K. G. Larsen and M. Nielsen, eds.). Proceedings of the 12th International Conference held in Aalborg, August 20–25, 2001. Lecture Notes in Computer Science, 2154. Springer, Berlin, 2001, 248–261. MR 1905477 Zbl 1006.68078 q.v. 735 [27] W. Buszkowski, On the complexity of the equational theory of relational action algebras. In Relations and Kleene algebra in computer science (R. A. Schmidt, ed.). Proceedings of the 9th International Conference on Relational Methods in Computer Science (RelMiCS-9) and the 4th International Workshop on Applications of Kleene Algebra (AKA 2006) held at the University of Manchester, Manchester, August 29–September 2, 2006. Lecture Notes in Computer Science, 4136. Springer, Berlin, 2006, 106–119. MR 2281595 Zbl 1135.68018 q.v. 756 [28] W. Buszkowski, On action logic: equational theories of action algebras. J. Logic Comput. 17 (2007), no. 1, 199–217. MR 2305044 Zbl 1118.03013 q.v. 756 [29] P. M. Cohn, Universal algebra. Harper & Row, Publishers, New York and London, 1965. MR 0175948 Zbl 0141.01002 q.v. 746 [30] J. C. Conway, Regular algebra and finite machines. Chapman & Hall/CRC Mathematics. Chapman and Hall, London, 1971. MR 3967692 Zbl 0231.94041 q.v. 729, 731, 733, 734, 736, 741, 742, 744, 745, 749, 750, 754, 755 [31] S. A. Cook, Soundness and completeness of an axiom system for program verification. SIAM J. Comput. 7 (1978), no. 1, 70–90. MR 0495086 Zbl 0374.68009 q.v. 753 [32] S. Crvenković, I. Dolinka, and Z. Ésik, A note on equations for commutative regular languages. Inform. Process. Lett. 70 (1999), no. 6, 265–267. MR 1716197 Zbl 0999.68098 q.v. 750 [33] S. Crvenković, I. Dolinka, and Z. Ésik, On equations for union-free regular languages. Inform. and Comput. 164 (2001), no. 1, 152–172. MR 1808636 Zbl 1005.08001 q.v. 750 [34] S. Crvenković and R. Madarász, On Kleene algebras. Theoret. Comput. Sci. 108 (1993), no. 1, 17–24. International Colloquium on Words, Languages and Combinatorics (Kyoto, 1990). MR 1203820 Zbl 0778.03006 q.v. 749, 753 [35] B. A. Davey and H. A. Priestley, Introduction to lattices and order. Cambridge Mathematical Textbooks. Cambridge University Press, Cambridge, 1990. MR 1058437 Zbl 0701.06001 q.v. 730, 742, 746 [36] J. W. de Bakker and D. Scott, A theory of programs. Technical report. IBM Laboratory, Vienna, 1969. q.v. 747 [37] J. Desharnais, B. Möller, and G. Struth, Kleene algebra with domain. ACM Trans. Comput. Log. 7 (2006), no. 4, 798–833. MR 2264424 Zbl 1367.68205 q.v. 753 [38] M. Droste and W. Kuich, Semirings and formal power series. In Handbook of weighted automata. (M. Droste, W. Kuich, and H. Vogler, eds). Monographs in Theoretical Computer Science. An EATCS Series. Springer, Berlin, 2009, Chapter 1, 3–28. MR 2777727 q.v. 730 [39] M. Droste and K. Quaas, A Kleene–Schützenberger theorem for weighted timed automata. Theoret. Comput. Sci. 412 (2011), no. 12–14, 1140–1153. MR 2797755 Zbl 1217.68128 q.v. 740 [40] S. Eilenberg, Automata, languages and machines. Vol. A. Pure and Applied Mathematics, 58. Academic Press, New York, 1974. MR 0530382 Zbl 0317.94045 q.v. 737, 744

760

Zoltán Ésik

[41] J. Engelfriet, Simple program schemes and formal languages. Lecture Notes in Computer Science, 20. Springer, Berlin, 1974. MR 0502130 Zbl 0288.68030 q.v. 749 [42] Z. Ésik, Group axioms for iteration. Inform. and Comput. 148 (1999), no. 2, 131–180. MR 1674307 Zbl 0924.68143 q.v. 733, 743, 748, 750 [43] Z. Ésik, Axiomatizing the least fixed point operation and binary supremum. In Computer science logic (P. Clote and H. Schwichtenberg, eds.). Proceedings of the 14th International Workshop (CSL 2000) held at the Annual Conference of the European Association for Computer Science Logic (EACSL) in Fischbachau, August 21–26, 2000, 302–316. MR 1859446 Zbl 1859434 q.v. 757 [44] Z. Ésik, Axiomatizing the equational theory of regular tree languages. J. Log. Algebr. Program. 79 (2010), no. 2, 189–213. MR 2598374 Zbl 1184.68315 q.v. 757 [45] Z. Ésik and W. Kuich, A generalization of Kozen’s axiomatization of the equational theory of regular sets. In Words, semigroups, & transductions (G. P. M. Ito and S. Yu, eds.). Festschrift in honor of G. Thierrin. World Scientific, River Edge, N.J., 2001, 99–114. MR 1914752 q.v. 752 [46] Z. Ésik and W. Kuich, Rationally additive semirings. J. UCS 8 (2002), no. 2, 173–183. MR 1895796 Zbl 1257.16034 q.v. 744 [47] Z. Ésik and W. Kuich, Inductive  -semirings. Theoret. Comput. Sci. 324 (2004), no. 1, 3–33. MR 2083926 Zbl 1105.68062 q.v. 736, 743, 751, 753 [48] Z. Ésik and W. Kuich, A semiring-semimodule generalization of ! -regular languages. (1) I. J. Autom. Lang. Comb. 10 (2005), no. 2–3, 203–242. (2) II. J. Autom. Lang. Comb. 10 (2005), no. 2–3, 243–264. MR 2285329 (1) MR 2285330 (2) Zbl 1161.68025 (1) Zbl 1161.68524 (2) q.v. 740, 757 [49] Z. Ésik and W. Kuich, On iteration semiring-semimodule pairs. Semigroup Forum 75 (2007), no. 1, 129–159. MR 2351928 Zbl 1155.16035 q.v. 757 [50] Z. Ésik and W. Kuich, Finite automata. In Handbook of weighted automata. (M. Droste, W. Kuich, and H. Vogler, eds). Monographs in Theoretical Computer Science. An EATCS Series. Springer, Berlin, 2009, Chapter 3, 69–104. MR 2777729 q.v. 737 [51] Z. Ésik and W. Kuich, Modern automata theory. Preprint 2009. https://www.dmg.tuwien.ac.at/kuich/mat.ps q.v. 734 [52] Z. Ésik and W. Kuich, Free iterative and iteration K -semialgebras. Algebra Universalis 67 (2012), no. 2, 141–162. MR 2898722 Zbl 1260.08001 q.v. 748, 749 [53] Z. Ésik and W. Kuich, Free inductive K -semialgebras. J. Log. Algebr. Program. 82 (2013), 111–122. Zbl 1286.68332 q.v. 752 [54] J. Esparza, S. Kiefer, and M. Luttenberger, Newtonian program analysis. J. ACM 57 (2010), no. 6, art. 33, 47 pp. MR 2739024 Zbl 1327.68079 q.v. 746 [55] K. Etessami and M. Yannakakis, Recursive Markov chains, stochastic grammars, and monotone systems of nonlinear equations. J. ACM 56 (2009), no. 1, art. 1, 66 pp. MR 2541335 Zbl 1325.68091 q.v. 746 [56] W. Fokkink, On the completeness of the equations for the Kleene star in bisimulation. In Algebraic methodology and software technology (M. Wirsing and M. Nivat, eds.). Proceedings of the 5th International Conference (AMAST ’96) held in Munich, July 1–5, 1996. Lecture Notes in Computer Science, 1101. Springer, Berlin, 1996, 180–194. MR 1480695 Zbl 0886.03032 q.v. 757 [57] W. Fokkink and H. Zantema, Basic process algebra with iteration: completeness of its equational axioms. Comput. J. 37 (1994), no. 4, 259–268. q.v. 757

20. Equational theories for automata

761

[58] J. S. Golan, The theory of semirings with applications in computer science. Pitman Monographs and Surveys in Pure and Applied Mathematics, 54. Longman Scientific & Technical, Harlow, and John Wiley & Sons, New York, 1992. MR 1163371 Zbl 0780.16036 q.v. 730, 744, 746 [59] J. S. Golan, Semirings and affine equations over them: theory and applications. Mathematics and its Applications, 556. Kluwer Academic Publishers Group, Dordrecht, 2003. MR 1997126 Zbl 1042.16038 q.v. 730, 731, 744 [60] M. Goldstern, Completion of semirings. Preprint, 2002. arXiv:math/0208134 [math.RA] q.v. 744, 745, 747 [61] P. V. Gorshkov and K. V. Arkhangelsky, Conditional identities in the algebra of regular languages. Dokl. Akad. Nauk Ukrain. SSR Ser. A 1987, no. 10, 67–69, 88. MR 0926807 Zbl 0626.68059 q.v. 742 [62] G. Grätzer, Universal algebra. Second edition. Springer, Berlin, 1979. MR 0538623 Zbl 0412.08001 q.v. 730 [63] I. Guessarian, Algebraic semantics. With a preface by M. Nivat. Lecture Notes in Computer Science, 99. Springer, Berlin, 1981. MR 0617908 Zbl 0474.68010 q.v. 746 [64] C. Hardin and D. Kozen, On the complexity of the Horn theory of REL. Technical report, Cornell University, Ithaca, N.Y., 2003. q.v. 753 [65] D. Harel, D. Kozen, and J. Tiuryn, Dynamic logic. Foundations of Computing Series. MIT Press, Cambridge, MA, 2000. MR 1791342 Zbl 0976.68108 q.v. 753 [66] U. Hebisch, The Kleene theorem in countably complete semirings. Bayreuth. Math. Schr. 31 (1990), 55–66. MR 1056148 Zbl 0715.68060 q.v. 744, 745 [67] M. Hopkins and D. Kozen, Parikh’s theorem in commutative Kleene algebra. In 14 th Symposium on Logic in Computer Science. Proceedings of the symposium (LICS’99) held in Trento, July 2–5, 1999. IEEE Computer Society, Los Alamitos, CA, 1999, 394–401. MR 1943433 IEEEXplore 782634 q.v. 741 [68] D. T. Hu`ynh, The complexity of equivalence problems for commutative grammars. Inform. and Control 66 (1985), no. 1–2, 103–121. MR 0818858 Zbl 0601.68048 q.v. 751 [69] D. T. Hu`ynh, A simple proof for the †p 2 upper bound of the inequivalence problem for semilinear sets. Elektron. Informationsverarb. Kybernet. 22 (1986), no. 4, 147–156. MR 0845419 Zbl 0617.68050 q.v. 751 [70] P. Jipsen, From semirings to residuated Kleene lattices. Studia Logica 76 (2004), no. 2, 291–303. MR 2072987 Zbl 1045.03049 q.v. 753, 756 [71] B. Jónnson, The theory of binary relations. In Algebraic logic (H. Andréka, J. D. Monk, and I. Németi, eds.). Papers from the colloquium held in Budapest, August 8–14, 1988. Colloquia Mathematica Societatis János Bolyai, 54. North-Holland Publishing Co., Amsterdam, 1991, 245–292. MR 1153429 Zbl 0760.03018 q.v. 755, 756 [72] G. Karner, Continuous monoids and semirings. Theoret. Comput. Sci. 318 (2004), no. 3, 355–372. MR 2060518 Zbl 1068.68079 q.v. 746 [73] S. C. Kleene, Representation of events in nerve nets and finite automata. In Automata Studies (C. E. Shannon, and J. McCarthy, John eds.). Annals of Mathematics Studies 34. University Press, Princeton, N.Y, 1956, 3–41. MR 0077478 q.v. 729 [74] D. Kozen, On the duality of dynamic algebras and Kripke models. In Logic of programs (E. Engeler, ed.). Papers from the workshop held in Zürich, May–July 1979. Lecture Notes in Computer Science, 125. Springer, Berlin, 1981, 1–11. MR 0735274 Zbl 0482.03008 q.v. 753

762

Zoltán Ésik

[75] D. Kozen, On Kleene algebras and closed semirings. In Mathematical foundations of computer science. 1990 (B. Rovan, ed.). Proceedings of the Fifteenth Symposium held in Banská Bystrica, August 27–31, 1990. Lecture Notes in Computer Science, 452. Springer, Berlin, 1990, 26–47. MR 1084822 Zbl 0732.03047 q.v. 751 [76] D. Kozen, A completeness theorem for Kleene algebras and the algebra of regular events. Inform. and Comput. 110 (1994), no. 2, 366–390. MR 1276741 Zbl 0806.68082 q.v. 751, 752, 753 [77] D. Kozen, On action algebras. In Logic and information flow (J. van Eijck and A. Visser, eds.). Foundations of Computing Series. MIT Press, Cambridge, MA, 1994, 78–88. MR 1295061 q.v. 753, 755, 756 [78] D. Kozen, Kleene algebra with tests. ACM Trans. Prog. Lang. Syst. 19 (1997) 427–443. q.v. 753 [79] D. Kozen, On Hoare logic and Kleene algebra with tests. In 14 th Symposium on Logic in Computer Science. Proceedings of the symposium (LICS’99) held in Trento, July 2–5, 1999. IEEE Computer Society, Los Alamitos, CA, 1999, 160–172. MR 1943411 Zbl 1365.68326 IEEEXplore 782610 q.v. 741, 753 [80] D. Kozen, On the complexity of reasoning in Kleene algebra. Inform. and Comput. 179 (2002), no. 2, 152–162. LICS’97 (Warsaw). MR 1948308 Zbl 1096.03077 q.v. 753, 756 [81] D. Kozen and F. Smith, Kleene algebra with tests: completeness and decidability. In Computer science logic (D. van Dalen and M. Bezem, eds.). Selected papers from the 10 th International Workshop (CSL’96) held at the 5th Annual Conference of the European Association for Computer Science Logic (EACSL) at the University of Utrecht, Utrecht, September 21–27, 1996. Lecture Notes in Computer Science, 1258. Springer, Berlin, 1997, 244–259. MR 1611502 Zbl 0882.03064 q.v. 753 [82] D. Kozen and J. Tiuryn, On the completeness of propositional Hoare logic. Inform. Sci. 139 (2001), no. 3–4, 187–195. MR 1873597 Zbl 0996.03022 q.v. 753 [83] D. Krob, Monoïdes et semi-anneaux complets. Semigroup Forum 36 (1987), no. 3, 323–339. MR 0916429 Zbl 0636.16019 q.v. 744 [84] D. Krob, Monoïdes et semi-anneaux continus. Semigroup Forum 37 (1988), no. 1, 59–78. MR 0929444 Zbl 0649.16032 q.v. 746 [85] D. Krob, Complete systems of B-rational identities. Theoret. Comput. Sci. 89 (1991), no. 2, 207–343. MR 1133622 Zbl 0737.68053 q.v. 734 [86] D. Krob, Expressions rationnelles sur un anneau. In Topics in invariant theory (M.-P. Malliavin, ed.). Papers from the Dubreil–Malliavin Algebra Seminar held in Paris, 1989–1990. Lecture Notes in Mathematics, 1478. Springer, Berlin, 1991, 215–243. q.v. 749 [87] D. Krob, Matrix versions of aperiodic K -rational identities. RAIRO Inform. Théor. Appl. 25 (1991), no. 5, 423–444. MR 1144008 Zbl 0768.68077 q.v. 733, 743, 747, 749, 750, 751, 752, 754 [88] W. Kuich, The Kleene and Parikh theorem in complete semirings. In Automata, languages and programming (T. Ottmann, ed.). Proceedings of the fourteenth international colloquium held at the University of Karlsruhe, Karlsruhe, July 13–17, 1987. Lecture Notes in Computer Science, 267. Springer, Berlin, 1987, 212–225. MR 0912710 Zbl 0625.16026 q.v. 744 [89] W. Kuich and A. Salomaa, Semirings, automata, and languages. EATCS Monographs on Theoretical Computer Science, 5. Springer, Berlin, 1986. MR 0817983 q.v. 730, 737

20. Equational theories for automata

763

[90] E. G. Manes and M. A. Arbib, Algebraic approaches to program semantics. Texts and Monographs in Computer Science. AKM Series in Theoretical Computer Science. Springer, New York, 1986. MR 0860560 Zbl 0599.68008 q.v. 744 [91] G. Markowsky, Chain-complete posets and directed sets with applications. Algebra Universalis 6 (1976), no. 1, 53–68. MR 0398913 Zbl 0332.06001 q.v. 746 [92] R. Milner, A calculus of communicating systems. Lecture Notes in Computer Science, 92. Springer, Berlin, 1980. MR 0590046 Zbl 0452.68027 q.v. 757 [93] M. Morisaki and K. Sakai, A complete axiom system for rational sets with multiplicity. Theoret. Comput. Sci. 11 (1980), no. 1, 79–92. MR 0566695 Zbl 0443.68068 q.v. 749 [94] K. C. Ng and A. Tarski, Relation algebras with transitive closure. Notices Amer. Math. Soc. 24 (1977), A29–A30. q.v. 756 [95] D. M. R. Park, Fixpoint induction and proofs of program properties. In Machine intelligence. 5. (B. Meltzer and D. Michie, eds). American Elsevier, New York, 1970, 59–78. MR 0323149 Zbl 0219.68007 q.v. 747 [96] D. M. R. Park, Concurrency and automata on infinite sequences. Theoret. Comput. Sci. 104 (1981), 167–183. 5th GI-Conf., Karlsruhe 1981. Zbl 0457.68049 q.v. 757 [97] D. Perrin and J.-É. Pin, Infinite words. Automata, semigroups, logic and games. Pure and Applied Mathematics (Amsterdam) 141. Elsevier/Academic Press, Amsterdam, 2004. Zbl 1094.68052 q.v. 757 [98] J.-É. Pin, Tropical semirings. In Idempotency (J. Gunawardena, ed.). Papers from the workshop held in Bristol, October 3–7, 1994. Publications of the Newton Institute, 11. Cambridge University Press, Cambridge, 1998, 50–69. MR 1608374 Zbl 0909.16028 q.v. 730 [99] V. R. Pratt, Dynamic algebras and the nature of induction. In STOC ’80 (R. E. Miller, S. Ginsburg, W. A. Burkhard, and R. J. Lipton, eds.). Proceedings of the twelfth annual ACM symposium on Theory of computing, Los Angeles, April 28–30, 1980. Association for Computing Machinery, New York, 1980, 22–28. q.v. 753 [100] V. R. Pratt, Dynamic algebras as a well-behaved fragment of relation algebras. In Algebraic logic and universal algebra in computer science (C. Bergman, R. D. Maddux, and D. L. Pigozzi, eds.). Lecture Notes in Computer Science, 425. Springer, Berlin, 1990, 77–110. MR 1077838 Zbl 0783.03036 q.v. 753, 754, 756 [101] V. R. Pratt, Action logic and pure induction. In Logics in AI (J. van Eijck, ed.). Proceedings of the European Workshop (JELIA ’90) held in Amsterdam, September 10–14, 1990. Lecture Notes in Artificial Intelligence. Springer, Berlin, 1991, 97–120. MR 1099624 Zbl 0814.03024 q.v. 732, 751, 753, 754, 755, 756 [102] V. R. Pratt, Dynamic algebras: examples, constructions, applications. Studia Logica 50 (1991), no. 3–4, 571–605. MR 1170187 Zbl 0752.03033 q.v. 753 [103] V. R. Pratt, Origins of the calculus of binary relations. In Proceedings of the Seventh Annual IEEE Symposium on Logic in Computer Science. Held in Santa Cruz, CA, June 22–25, 1992. IEEE Computer Society, Los Alamitos, CA, 1992, 248–254. IEEEXplore 185537 q.v. 756 [104] V. N. Red’ko, On algebra of commutative events. Ukrain. Mat. Ž. 16 (1964), 185–195. In Russian. MR 0162712 Zbl 0199.04303 q.v. 749, 750 [105] V. N. Red’ko, On the determining totality of relations of an algebra of regular events. Ukrain. Mat. Ž. 16 (1964), 120–126. In Russian. MR 0179033 Zbl 0199.04302 q.v. 750

764

Zoltán Ésik

[106] K. I. Rosenthal, Quantales and their applications. Pitman Research Notes in Mathematics Series, 234. Longman Scientific & Technical, Harlow, and John Wiley & Sons, New York, 1990. MR 1088258 Zbl 0703.06007 q.v. 744 [107] J. Sakarovitch, Kleene’s theorem revisited. In Trends, techniques, and problems in theoretical computer science (A. Kelemenová and J. Kelemen, eds.). Papers from the fourth international meeting of young computer scientists held in Smolenice, October 13–17, 1986. Lecture Notes in Computer Science, 281. Springer, Berlin, 1987, 39–50. MR 0921502 Zbl 0637.68096 q.v. 746 [108] J. Sakarovitch, Éléments de théorie des automates. Vuibert Informatique, Paris, 2003. English translation, Elements of automata theory. Cambridge University Press, 2009. Translated by R. Thomas. Cambridge University Press, Cambridge, 2009. MR 2567276 Zbl 1188.68177 (English ed.) Zbl 1178.68002 (French ed.) q.v. 730, 737, 755 [109] A. Salomaa, Two complete axiom systems for the algebra of regular events. J. Assoc. Comput. Mach. 13 (1966), 158–169. MR 0189995 Zbl 0149.24902 q.v. 749 [110] A. Salomaa, Theory of automata. International Series of Monographs in Pure and Applied Mathematics, 100. Pergamon Press, Oxford etc., 1969. MR 0262021 Zbl 0193.32901 q.v. 737, 749, 750, 755 [111] A. Salomaa and M. Soittola, Automata-theoretic aspects of formal power series. Texts and Monographs in Computer Science. Springer, Berlin, 1978. MR 0483721 Zbl 0377.68039 q.v. 748 [112] M. P. Schützenberger, Certain elementary families of automata. In Proceedings of the Symposium on Mathematical Theory of Automata. New York, April 24–26, 1962. Microwave Research Institute Symposia Series, XII. Polytechnic Press of Polytechnic Institute of Brooklyn, Brooklyn, N.Y., 1963, 139–153. MR 0168433 Zbl 0221.94080 q.v. 729 [113] L. J. Stockmeyer and A. R. Meyer, Word problems requiring exponential time. In Fifth Annual ACM Symposium on Theory of Computing (A. V. Aho, A. Borodin, R. L. Constable, R. W. Floyd, M. A. Harrison, R. M. Karp, and H. R. Strong, eds.) Papers presented at the Symposium, Austin, Tex., April 30–May 2, 1973. Association for Computing Machinery, New York, 1973, 1–9. MR 0418518 Zbl 0359.68050 q.v. 751 [114] W. Wechler, Universal algebra for computer scientists. EATCS Monographs on Theoretical Computer Science, 25. Springer, Berlin, 1992. MR 1177406 Zbl 0748.68002 q.v. 730, 742

Chapter 21

Language equations Michal Kunc and Alexander Okhotin

Contents 1. 2. 3. 4. 5. 6. 7.

Introduction . . . . . . . . . . . . . . General properties of operations . . . . Equations with one-sided concatenation Resolved systems of equations . . . . Equations with constant sides . . . . . Equations of the general form . . . . . Equations with erasing operations . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

765 766 769 771 780 782 793

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

793

1. Introduction Language equations are equations of the form '.X1 ; : : : ; Xn / D .X1 ; : : : ; Xn /, where the variables Xi represent formal languages, and the expressions ' and are comprised of the variables, constant languages, and some language-theoretic operations. The first and the best known use of language equations was to define the semantics of the context-free grammars (Ginsburg and Rice [27]). Many other applications of language equations arise in virtually all areas where formal languages appear. Numerous theoretical results in this area have been obtained in recent years, ending with several characterisations of effective computability by equations of a very simple form. In this chapter, the work on language equations is arranged according to the general form of equations and the sets of allowed operations. The properties of language equations particularly depend upon monotonicity and continuity of operations, defined in § 2. The typical operations used in equations are concatenation and the Boolean settheoretic operations. Concatenation is sometimes restricted to be linear (where one of the arguments is a constant language) or one-sided linear (where the constant is always the right argument or always the left argument). Equations with one-sided concatenation and Boolean operations are notable for being directly expressible in monadic second-order logic over infinite k -ary trees (Sk S), described by in Chapter 8. It follows that whenever such an equation has a unique solution, all languages in it are regular. The basic problems for these equations (such as testing whether a given system of equations has any solutions, a unique solution, etc.) are decidable by expressing them in Sk S and using its decision procedure. Some more efficient special methods for dealing with these equations are considered in § 3.

766

Michal Kunc and Alexander Okhotin

§ 4 is concerned with systems of the resolved form Xi D 'i .X1 ; : : : ; Xn / with i 2 ¹1; : : : ; nº. These systems define languages inductively, and hence are naturally connected to formal grammars and parsing. Context-free grammars are the most wellknown kind of such equations, with the operations restricted to union and concatenation. Using other sets of operations leads to natural variants and generalisations of the context-free grammars, such as conjunctive grammars [62]. The next kind of equations are those of the form '.X1 ; : : : ; Xn / D C with constant right-hand sides, which are considered in § 5. If the operations used in the left-hand sides are limited to union and concatenation, then such a system can be analysed within the syntactic monoid of all constants on the right-hand side. In particular, the regularity of all maximal solutions can be established, and these solutions can be constructed by an algorithm. Furthermore, solution existence can be effectively decided, and the computational complexity of testing this property is known. A more general kind of system with left- and right-hand sides of an unrestricted form, and with the only limitation that the operations are continuous, is considered in § 6. The upper bound on the expressive power of unique, least and greatest solutions of such equations is given by recursive, recursively enumerable (r.e.) and co-r.e. sets, respectively. This upper bound is actually reached by all but the very simplest equations. In particular, equations over a one-letter alphabet and using concatenation as the only operation are already computationally universal. However, there exist some special cases of equations, in which the solutions are known to be regular. Finally, there are language equations with non-continuous operations such as homomorphisms and the quotient, which are discussed in § 7. Such operations are typically erasing. Under the weak assumption that the operations are definable in first-order arithmetic, unique solutions of such equations must be hyper-arithmetical sets and conversely, an arbitrary such set over an arbitrary alphabet can be represented by a unique solution of a system using union, concatenation and quotient.

2. General properties of operations Most of the language equations studied in the literature are limited to the basic operations on languages: concatenation, Boolean operations, and occasionally the Kleene star. This section assumes an abstract outlook on the operations, regarding them simply   as functions 'W .2A /n ! 2A , and presents two general properties of such functions that are particularly important in language equations: continuity and monotonicity. Continuity of functions is generally understood as preserving limits. Define a convergent sequence of languages ¹Lk º1 kD1 by imposing the condition that for every word w there is a number kw with w belonging either to all Lk with k > kw , or to none   of them; denote lim Lk D ¹w j w 2 Lkw º. Then an operation 'W 2A ! 2A is continuous if '.lim Lk / D lim '.Lk / for every convergent sequence of languages ¹Lk º1 . Convergence of a sequence of n-tuples of languages is defined compokD1 nentwise, and the definition of continuity accordingly extends to functions of multiple

21. Language equations

767

arguments. Proving that Boolean operations and concatenation are continuous is an exercise. On the other hand, erasing homomorphisms are not continuous: if h.a/ D ", then h.lim a>k / D h.¿/ D ¿, but lim h.a>k / D lim¹"º D ¹"º. The notion of continuity of operations on languages defined above is actually the continuity of functions in a metric space defined by d.K; L/ D 2 min ¹ jwj W w2K  L º , where K  L denotes the symmetric difference of K and L. The definition extends to n-tuples of languages by setting d..K1 ; : : : ; Kn /; .L1 ; : : : ; Ln // D max d.Ki ; Li /. This metric is an ultrametric because it satisfies the inequality d.K; L/ 6 max.d.K; M /; d.M; L//, and the set of n-tuples of languages equipped with d forms a compact ultrametric space. Two languages, K and L, are said to be equal modulo A6` , for some ` > 0, if a word w 2 A of length at most ` is in K if and only if it is in L, see [70]. This definition is naturally extended to n-tuples of languages. The set of languages equal to a given language modulo A6` forms an open ball of radius 2 ` in the metric space. Note that since the space is compact, continuity is equivalent to uniform continuity, and the latter can be expressed in terms of these balls as follows: an operation ' is continuous if for every ` > 0 there exists m D m.`/ > 0, such that K D L .mod A6m / implies '.K/ D '.L/ .mod A6` /. For concatenation, Kleene star and Boolean operations, this definition of uniform continuity uses m.`/ D `, that is, equivalence of arguments modulo A6` implies the equivalence of results modulo A6` . On the other hand, the quotient with a letter, '.L/ D a 1 L D ¹w j aw 2 Lº, is uniformly continuous with m.`/ D ` C 1. Examples of continuous operations with m.`/ increasing faster than any given function can also be constructed. Consider a system of language equations with continuous operations. An n-tuple of languages L is its solution modulo A6` , if each equation holds modulo A6` under the substitution of these languages for variables. Due to the continuity, this property depends only on the value of L modulo A6m , for an appropriate m D m.`/. This notion leads to the following important characterisation of solutions in the proper sense by solutions modulo A6` for different numbers `. An n-tuple L of subsets of A6m is said to be extendable to a solution modulo A6` , if there exists a solution L0 modulo A6` , such that L0 D L .mod A6m /. Such an n-tuple L is said to be extendable to a solution, if the system has a solution (in the proper sense), that equals L modulo A6m . Lemma 2.1 (generalising Okhotin [70]). Let 'i D i be a system of equations, in which 'i and i are continuous functions. Then for every number m > 0 there exists a number ` > 0, such that all n-tuples of subsets of A6m , that are extendable to solutions modulo A6` , are extendable to solutions. The proof of Lemma 2.1 is based on the following small result on equations in compact metric spaces. Lemma 2.2. Let X and Y be metric spaces, with X compact, and let '; W X ! Y be continuous mappings. Then there exists " > 0, such that if d.'.K/; .K// 6 " for some K 2 X , then some K 2 X satisfies '.K/ D .K/.

768

Michal Kunc and Alexander Okhotin

Using this result, Lemma 2.1 is established as follows. For every n-tuple L of subsets of A6m , let XL be the set of all n-tuples of languages equal to L modulo A6m , that is, the closed ball in A of radius 2 m 1 with centre L. Since XL is compact, by Lemma 2.2, there exists `L > 0, such that if XL contains a solution modulo A6`L , then it contains a solution. Because A is the union of (finitely many) sets XL , it is sufficient to take ` as the maximum of `L for all L. From Lemma 2.1, one can infer the following necessary and sufficient conditions for solution existence and solution uniqueness: Theorem 2.3 (generalising Okhotin [70]). A system of language equations with continuous operations has a solution if and only if for every m > 0 there exists a solution modulo A6m . A system has a unique solution if and only if for every m > 0 there exists ` > m, such that the system has at least one solution modulo A6` , and all the solutions modulo A6` are equal modulo A6m . The second important abstract property of operations is monotonicity. It is defined with respect to the partial order of componentwise inclusion of n-tuples of languages, under which .K1 ; : : : ; Kn / v .L1 ; : : : ; Ln / if Ki  Li for all i . An n-ary operation on   languages 'W .2A /n ! 2A is monotone if '.K/  '.L/ whenever K v L. For example, union, intersection, and concatenation are both monotone and continuous, complementation is continuous but not monotone, while erasing homomorphisms are monotone but not continuous. A large, commonly used class of monotone operations on languages are the wordbased operations. Every such operation is induced by a multiple-valued operation on S  words f W .A /m ! 2A and defined as fO.L1 ; : : : ; Lm / D wi 2Li f .w1 ; : : : ; wm /. Concatenation, quotient and shuffle are examples of such operations; union and intersection also fall under this definition. Note that though every word-based operation is monotone, it is not necessarily continuous: for example, the quotient operation on two languages is not continuous. The partial order of n-tuples of languages by componentwise inclusion is also commonly applied to the set of solutions of a given system of equations. This allows talking about minimal and maximal solutions of an equation (that is, those, for which there is no smaller and no greater solution, respectively) as well as least and greatest solutions (those that are less than all solutions and greater than all solutions, respectively). Not every equation has a least or a greatest solution: for instance, the equation X Y D ¹aº has two incomparable solutions: X D ¹"º, Y D ¹aº and vice versa. Lemma 2.4. Let 'i D i be a system of equations, in which 'i and i are continuous functions. Then every solution L of the system is contained in a maximal solution and contains a minimal solution. Such a maximal solution can be obtained as a componentwise union of an ascending sequence of solutions as follows (a minimal solution is obtained in the same way as a componentwise intersection). Let L.0/ D L. If a solution L.k/ is not yet maximal, then find a shortest word w which belongs to some component of some solution

21. Language equations

769

L.kC1/ greater than L.k/ , while not belonging to the corresponding component of L.k/ . Proceeding in this way either eventually leads to a maximal solution, or produces an infinite ascending sequence of solutions L.k/ , which converges to their componentwise union. Then this limit is also a solution, since continuity of operations implies 'i .lim L.k/ / D lim 'i .L.k/ / D lim

.k/ / i .L

D

.k/ /: i .lim L

Because the length of the words w considered during the construction of the sequence L.k/ increases, if a word belongs to some component of a solution greater than lim L.k/ , then it must have been added to this component of L.k/ before longer words w were considered. Therefore, the solution lim L.k/ is maximal.

3. Equations with one-sided concatenation In language equations of the simplest kind, the concatenation is one-sided, that is, may be used only in expressions C  , where C is a constant language and  is a subexpression (or, alternatively, only in expressions   C , but these two types of expressions may not be mixed in a single system). These equations are closely connected to finite automata. 3.1. Representation of finite automata. The most basic connection between language equations and computation is the representation of finite automata as systems of equations Xi D 'i .X1 ; : : : ; Xn /, with union and one-sided concatenation. This representation is briefly described in this section; more details can be found in Chapher 1, § 5.3. Each state qi of an NFA A D .Q; q1 ; ı; F / with Q D ¹q1 ; : : : ; qn º is represented by a variable Xi , with its equation transcribing the outgoing transitions: Xi D

[

if qi 2 F

‚…„ƒ ¹aº  Xj [ ¹"º :

.qi ;a;qj /2ı

This construction is illustrated in the following example: Example 3.1. The following system of language equations transcribes a two-state finite automaton recognising the language L D ¹w 2 ¹a; bº j the number of b s in w is oddº;

that is, the system ´ X1 D aX1 [ bX2 ;

X2 D aX2 [ bX1 [ ¹"º;

has a unique solution with X1 D L.

q1

b

q2

b a

a

770

Michal Kunc and Alexander Okhotin

This representation was first proposed by Bodnarchuk [12] and further explored in the monograph by A. Salomaa [84]. Brzozowski and Leiss [13] extended these equations with all Boolean operations to define a generalisation of alternating finite automata. 3.2. Regularity and decidability. Consider equations of the general form '.X1 ; : : : ; Xn / D

.X1 ; : : : ; Xn /;

with all Boolean operations, one-sided concatenation of the form .X1 ; : : : ; Xn /  ¹aº, and constant ¹"º (other regular constants may be represented as in the previous subsection). These equations can be directly expressed in the MSO logic on infinite trees, which is described by Löding in Chapter 8 of this handbook. Nodes of these trees correspond to words, and each arc represents appending a letter to the end of the current word. Every expression .X1 ; : : : ; Xn / occurring in an equation is transcribed as an MSO formula f .x/, with a first-order variable x and with second-order variables X1 ; : : : ; Xn , so that f .x/ is true if and only if the word x is in .X1 ; : : : ; Xn /. This is done inductively, with fXi .x/ D .x 2 Xi /;

fa D .9y/.x D ya ^ f .y//;

f[ .x/ D f .x/ _ f .x/;

and with the rest of the Boolean operations defined similarly. Finally, an equation ' D is represented as a formula .8x/f' .x/ $ f .x/. Then, by the general results on this logic, if a system of equations has any solutions, there must be a regular solution among them. All natural properties, such as uniqueness of a solution, can be directly expressed in the logic, and then decided by applying its decision procedure. However, the above approach is too general to yield any efficient algorithms. 3.3. Complexity of decision problems. Several efficient algorithms for testing properties of language equations with one-sided concatenation, as well as for computing their solutions, have been designed. In the more general case of set constraints, an exponential-time algorithm for testing solution existence was given by Aiken et al. [1]. This problem was proved to be EXPTIME-complete for language equations by Baader and Küsters [3], while Baader and Narendran [4] established EXPTIME-completeness of the existence of a finite solution. The idea of these arguments was generalised by Baader and Okhotin [6] to a general method of dealing with these equations. The method is based upon converting a system of language equations to a certain structure representing all its solutions. This structure is a nondeterministic tree automaton, operating on an unlabelled infinite tree without any acceptance conditions. Furthermore, the state in each successor of a node is determined independently of the choice of the states in its siblings. Such a tree automaton is known as an independent looping tree automaton (ILTA), and it is induced by an NFA .Q; I; ı; F / with an input alphabet A as follows. A run of the ILTA is any function rW A ! Q with r."/ 2 I and .r.w/; a; r.wa// 2 ı for all w 2 A and a 2 A. Every run defines a language

21. Language equations

771

L.r/ D ¹w 2 A j r.w/ 2 F º, and the ILTA defines the set of languages corresponding to all valid runs. Using multiple sets of accepting states F1 ; : : : ; Fn allows defining a set of n-tuples of languages.

Theorem 3.1 (Baader and Okhotin [6]). For every system of language equations using one-sided concatenation and Boolean operations, one can construct, in exponential time, an ILTA representing the set of solutions of this system. Using this representation, the basic questions about the set of solutions can be answered by simple algorithms analysing the graph structure of the ILTA. This, in particular, yields exponential-time algorithms for testing whether an equation has (a) any solutions, (b) a unique solution, (c) finitely many solutions, (d) countably many solutions, (e) least or greatest solutions. All these problems are EXPTIME-complete [6]. Finite automata for unique, least and greatest solutions can be constructed as well. These results have later been extended to mixed systems of equations ' D and inequations ' ¤ , see [5]. 3.4. Inequalities with union and one-sided concatenation. Consider systems of inequalities with one-sided concatenation, where the only allowed Boolean operation is union, that is, where inequalities are of the form K0 [ X1 K1 [    [ Xn Kn  L0 [ X1 L1 [    [ Xn Ln :

Such a system always possesses a greatest solution, and this solution is regular, as long as all constant languages L0 , L1 , . . . , Ln in the right-hand sides are regular, and with no restrictions on the constants K0 , K1 , . . . , Kn (Kunc [45]). Moreover, for any given right-hand sides, there are only finitely many candidate languages, which may occur as components of greatest solutions of such systems for arbitrary constants K1 , . . . , Kn . These languages are regular and can be algorithmically calculated. Once the list of candidate languages is compiled, there are finitely many possible greatest solutions to be checked. In order to check each of them, one should have an effective procedure for testing the containment of the languages on the left-hand sides in regular languages. Hence, for instance, the greatest solution can be effectively computed for context-free constants on the left-hand sides.

4. Resolved systems of equations This section is about systems of equations of the form Xi D 'i .X1 ; : : : ; Xn / with i 2 ¹1; : : : ; nº, resolved with respect to their variables. These systems are known in the literature as resolved or explicit, and their most important quality is that they represent inductive definition of languages, in the sense that the properties of longer words are expressed as Boolean combinations of concatenations of shorter words. If the only allowed Boolean operation is union, these are the systems defining the semantics of the context-free grammars, and some other sets of operations yield their natural variants and generalisations.

772

Michal Kunc and Alexander Okhotin

Lemma 4.1. If '1 ; : : : ; 'n are monotone and continuous, then the least solution of a system Xi D 'i .X1 ; : : : ; Xn / with i 2 ¹1; : : : ; nº is the n-tuple L D lim ' k .?/, where  ' D .'1 ; : : : ; 'n / is the right-hand side as an operator on .2A /n , and ? D .¿; : : : ; ¿/. The greatest solution is similarly obtained as lim ' k .>/, where > D .A ; : : : ; A /.

Sketch of a proof. This well-known argument can be regarded as folklore. First, the sequence ¹' k .?/º1 monotonically increases. Then L is a solution of the system, kD0 because '.L/ D '.lim ' k .?// D lim '.' k .?// D lim ' kC1 .?/ D L;

which essentially uses the continuity of ' . In order to show that this solution is the least one, consider any other n-tuple K with '.K/ D K . Then each element of the sequence ¹' k .K/º1 is a superset of the corresponding element of ¹' k .?/º1 , and kD0 kD0 this inequality is extended to the least upper bounds of the sequences. Another important folklore result is the following sufficient condition of solution uniqueness. Let a system Xi D 'i .X1 ; : : : ; Xn / be called strict, if for every two n-tuples of languages K; L that are equal modulo A6` , the languages 'i .K/ and 'i .L/ are equal modulo A6`C1 . Lemma 4.2. Every strict system has at most one solution. In other words, ' , as a function on the metric space of n-tuples of languages defined in § 2, is a contraction, and the fact that every contraction has at most one fixed point is a known result of basic analysis. The most typical case of strict systems are those with each 'i being a Boolean combination of concatenations, where every concatenation either contains a constant language without ", or is a single constant language. The proof of Lemma 4.2 for such systems was provided by Autebert et al. [2]. 4.1. Union and concatenation: context-free languages. Resolved systems of the most well-known kind are limited to the operations of union and concatenation. These equations constitute one of the two definitions of the semantics of the context-free grammars. A context-free grammar is a quadruple G D .A; N; R; S /, in which A is a finite alphabet of terminal symbols, N is a finite set of variables, also known as nonterminal symbols, disjoint with A; R is a finite set of rules, each of the form X !  with X 2 N and  2 .A [ N / ; S 2 N is a variable designated as the start symbol. According to Ginsburg and Rice [27], such a grammar is interpreted as a system of language equations with the following equation for each variable X 2 N : [ XD ; X!2R

where each terminal symbol a 2 A in each concatenation  represents a constant language ¹aº, while  D " is transcribed as a constant ¹"º. By Lemma 4.1, this system always has a least solution, and the value of each variable X in this least solution is

21. Language equations

773

taken as the language generated by X , denoted by LG .X /. The language generated by the grammar is L.G/ D LG .S /. Another definition of the semantics of the context-free grammars was given by Chomsky [17], who considered a rewriting system operating on strings over the alphabet A [ N . Each rule X !  can be used to rewrite a symbol X by a substring  . Formally, define a binary relation H) of one-step derivability on .A [ N / , by X H)  for all ;  2 .A [ N / and X !  2 R. Its reflexive and transitive closure H) corresponds to derivability in zero or more steps, and through this relation, the language generated by each ˛ 2 .A [ N / is defined as LG .˛/ D ¹w 2 A j ˛ H) wº. The language generated by the grammar is L.G/ D LG .S /. These two definitions are known to be equivalent. Consider the following basic examples of context-free grammars, each presented in the form of language equations: Example 4.1. The equation S D aS b [ " has the unique solution L D ¹an b n j n > 0º. It is the limit of the sequence ¿, ¹"º, ¹"; abº, ¹"; ab; aabbº, etc. (as in Lemma 4.1). Example 4.2. The least solution of the equation S D S S [ aS b [ " is the set of all strings of balanced parentheses, known as the Dyck language. Its greatest solution is S D A .

Because of their practical importance, context-free grammars became one of the most widely studied formalisms in the theory of computation. For an account of their properties, the reader is directed to a survey by Autebert et al. [2], and also to a survey of formal grammars by Okhotin [72]. As far as language equations are concerned, it is worth mentioning that every context-free grammar that does not generate " can be transformed to an equivalent grammar in Chomsky normal form, with all rules of the form X ! Y Z with Y; Z 2 N , or X ! a with a 2 A. The system of language equations corresponding to such a grammar always has a unique solution in "-free languages. Another important form is Greibach normal form with rules X ! a with  2 .A [ N / or X ! "; the corresponding system of language equations is strict, and hence has a unique solution. Context-free languages over a one-letter alphabet are known to be regular [27]. Some properties of the context-free languages are already undecidable, and they are important as the germs of the undecidability found in more general language equations. In particular, it is undecidable whether a given context-free grammar generates A , and this directly implies that testing whether a given resolved system with union and concatenation has a unique solution is undecidable; in contrast, testing the existence of a solution is trivially decidable (every system has one). A general undecidability technique for context-free languages was introduced by Hartmanis [29]. A generalisation of context-free grammars to their probabilistic, stochastic or fuzzy variants is important in some applications, where the most probable parse of a word is being sought. These grammars can be defined by equations over a generalisation of languages to mappings from A to a semiring, known as formal power series in noncommuting variables. Ordinary formal languages are then regarded as mappings from

Michal Kunc and Alexander Okhotin

774

A to the Boolean semiring. For an introduction to the theory of equations over power series, the reader is referred to Petre and Salomaa [80].

4.2. Union, intersection and concatenation: conjunctive languages. The next kind of language equation is a direct generalisation of the previous one, featuring an extra operation of intersection. Systems of such equations always have a least and a greatest solution due to Lemma 4.1. These systems can be used to define formal languages similarly to the context-free grammars, and hence are naturally regarded as another, more general family of formal grammars. Conjunctive grammars were introduced by Okhotin [62] as a generalisation of the context-free grammars with an explicit conjunction operation in the rules. A conjunctive grammar is again a quadruple G D .A; N; R; S /, in which A, N and S are as in the context-free case, while the rules in R are of the form X ! 1 &    &m

.with X 2 N , m > 1 and 1 ; : : : ; m 2 .A [ N / /;

and such a rule informally means that every string generated by each of the conjuncts i is therefore generated by X . If 1 ; : : : ; m 2 A NA [A in every rule, such a grammar is called linear conjunctive. The system of language equations associated to a grammar has equations of the form m [ \ XD i .for all X 2 N /: X!1 &&m 2 R i D1

Its least solution defines the languages LG .X / for X 2 N , and L.G/ D LG .S /. An equivalent definition of the semantics of conjunctive grammars is given by term rewriting, which generalises the string rewriting used for context-free grammars. Given a conjunctive grammar G , consider terms with concatenation and conjunction as operation symbols and with symbols from A[N as atomic terms. A term is derivable in one step from a term ' , written as ' H) , if is obtained from '  either by rewriting a subterm X 2 N with .1 &    &m /, using a rule X ! 1 &    &m

from R,  or by rewriting, for w 2 A , a subterm .w &    &w/ with a single string w .

The language generated by a term ' is LG .'/ D ¹w 2 A j ' H) wº. The language generated by the grammar is L.G/ D LG .S / D ¹w 2 A j S H) wº. The two definitions of the semantics are again equivalent [63]. Since intersection is explicit in these equations, any finite intersection of context-free languages, such as ¹an b n c n j n > 0º, can be represented by a conjunctive grammar. However, conjunctive grammars can represent many languages outside of this intersection closure, such as the language ¹wcw j w 2 ¹a; bº º:

21. Language equations

775

Example 4.3 (Okhotin [62]). The system of language equations S D U \ T;

U D aUa [ aU b [ bUa [ bU b [ c;

T D .aX \ aT / [ .bY \ bT / [ cW;

X D aXa [ aXb [ bXa [ bXb [ cW a; Y D aYa [ aY b [ bYa [ bY b [ cW b;

W D aW [ bW [ "

has a unique solution with S D ¹wcw j w 2 ¹a; bº º and T D ¹uczu j u; z 2 ¹a; bº º. The equation for T matches a single symbol in the left part of a word ucv to the corresponding symbol in its right part using X or Y , and the recursive reference to aT or bT makes the remaining symbols be compared in the same way. The intersection with the language U D ¹ucv j u; v 2 ¹a; bº ; juj D jvjº completes the definition of the language. Three normal forms for conjunctive grammars are known. One of them is a direct generalisation of the Chomsky normal form [62], with the rules of the form X ! Y1 Z1 &    &Ym Zm with m > 1 and Yi ; Zi 2 N , or X ! a with a 2 A. Another normal form [75] has all rules of the form X ! Y1 a1 Z1 &    &Ym am Zm with m > 1, Yi ; Zi 2 N and ai 2 A, or X ! a with a 2 A, or S ! aX or S ! Xa with a 2 A and X 2 N ; the system of equations corresponding to such a grammar is strict. One more normal form theorem [75] converts an arbitrary conjunctive grammarTto one with the i corresponding system of language equations of the form Xi D Fi [ jmD1 i;j , where   Fi  A is a finite set, mi > 1 and i;j 2 .A [ N / , that is, with union restricted to union with a singleton constant. The known bounds on the time complexity of membership queries for conjunctive grammars are the same as in the context-free case: there is a straightforward algorithm working in time O.n3 / [64], which can be accelerated to O.n! / with ! < 3 by offloading some computations to a matrix multiplication procedure [73]. These algorithms actually apply to the larger family of Boolean grammars [64], which allow an explicit negation in the rules. The most general definition of Boolean grammars, given by Kountouriotis et al. [43], involves a generalisation of languages to mappings ¯ ® LW A ! 0; 12 ; 1 , where 21 represents uncertainty, and all operations are redefined according to the three-valued logic. A further generalisation to a semiring-valued formalism was devised by Ésik and Kuich [25]. The properties of Boolean grammars are beyond the scope of this chapter, and the interested reader is directed to a recent survey [72]. Conjunctive grammars over a unary alphabet form a special area of study. Such grammars are completely irrelevant to parsing applications, but on the other hand they are important as a nontrivial special case of language equations, and form the basis for the study of more general language equations over a unary alphabet described in § 6.4. n The following grammar for ¹a4 j n > 0º was the first evidence of their non-triviality.

Michal Kunc and Alexander Okhotin

776

Example 4.4 (Jeż [32]). The least solution of the system X1 D .X1 X3 \ X2 X2 / [ a; X2 D .X1 X1 \ X2 X6 / [ aa; n

is Xi D ¹ai 4 j n > 0º for i 2 ¹1; 2; 3; 6º.

X3 D .X1 X2 \ X6 X6 / [ aaa; X6 D .X1 X2 \ X3 X3 /

This example is best explained in terms of the base-4 representation of the lengths of the words. Then each variable Xi represents all base-4 numbers i 0    0. Substituting these four languages into the first equation, the first concatenation X1 X3 produces all numbers with the notation 10 30 , 30 10 and 10C . The second concatenation X2 X2 yields 20 20 and 10C . The intersection of these two concatenations is the set of words with base-4 length 10C ; in other words, both concatenations contain some garbage, yet the garbage in the concatenations is disjoint, and is accordingly filtered out by the n intersection. Finally, the union with ¹aº yields the language ¹a4 j n > 0º, and thus the first equation turns into an equality. The rest of the equations are verified similarly. The idea of manipulating the positional notation of numbers, as in Example 4.4, was extended to the following general result: Theorem 4.3 (Jeż and Okhotin [33]). Let Ak D ¹0; 1; : : : ; k 1º with k > 2 be an alphabet of k -ary digits, and let L  Ak be a language generated by a linear conjunctive grammar, which contains no words beginning with 0. Then there exists a conjunctive grammar generating the language ¹an j the k -ary representation of n is in Lº, and it can be effectively constructed. A similar technique was used to construct an EXPTIME-complete set of numbers with its unary representation generated by a conjunctive grammar [34]. Theorem 4.3 is used to establish undecidability results for conjunctive grammars over ¹aº. Consider the language of valid accepting computations of a Turing machine T , VALC.T/, described in more detail in § 6.2. It is known to be linear conjunctive. Let it be defined over some k -letter alphabet. Then the symbols of this alphabet can be re-interpreted as digits in Ak , which leads to a conjunctive grammar for the language of unary representations of the numbers whose k -ary representations are in VALC.T/. Since VALC.T/ is empty if and only if L.T/ is empty, the emptiness problem for conjunctive grammars is undecidable. The following stronger result holds: Theorem 4.4 (Jeż and Okhotin [33]). For every fixed conjunctive language L0  a , the problem of whether a given conjunctive grammar over ¹aº generates L0 is …01 -complete. Consider conjunctive grammars over a unary alphabet with a single variable. The first example of the nontriviality of such grammars was given by Okhotin and Rondogiannis [76], by encoding the four languages Xi with i 2 ¹1; 2; 3; 6º in Example 4.4 S within a single language i 2¹1;2;3;6º ¹anp di j an 2 Xi º, for some p; d1 ; d2 ; d3 ; d6 > 1. This example was extended by Jeż and Okhotin [35] to a general technique of encoding any unary conjunctive language in a solution of a univariate language equation X D '.X /, leading to the following undecidability results: testing whether a given

21. Language equations

777

conjunctive grammar over a unary alphabet and with a single variable generates a finite language is †01 -complete, testing co-finiteness is †01 -complete as well, and testing equivalence of two given grammars is …01 -complete. 4.3. Union, intersection, and linear concatenation: trellis automata. Resolved systems of language equations with union, intersection and linear concatenation correspond to linear conjunctive grammars (Okhotin [65]). For instance, the grammars in the above Examples 4.1 and 4.3 are linear. The grammar for the Dyck language given in Example 4.2 is not linear, but a linear conjunctive grammar for this language can be obtained using the method of Dyer [23], as in Example 2 of [65].

a1

a2

a3

a4

An important property of this family of language equations is their computational equivalence to the simplest type of cellular automata, known as one-way real-time cellular automata, or trellis automata, studied by Dyer [23], Čulík et al. [20] and [21], Ibarra and Kim [30], and others. A trellis automaton (see [20] and [65]), defined as a quadruple .Q; I; ı; F /, processes an input word a1    am of length m > 1 using a uniform array of m.mC1/ nodes, as presented in the figure. Each node computes a value 2 from the finite set of states Q. The nodes in the bottom row obtain their values directly from the input symbols using a function I W A ! Q. The value of all other nodes is computed from the values of their predecessors using the function ıW Q  Q ! Q. The word is accepted if and only if the value computed by the topmost node belongs to the set of accepting states F  Q. Theorem 4.5 (Okhotin [65] and [66]). A language L  AC is recognised by a trellis automaton if and only if it is generated by a linear conjunctive grammar. Furthermore, every such language can be represented by a linear conjunctive grammar with two variables, and with the equations S D '.X /, X D .X /.

A general method for proving non-representability of languages by trellis automata was discovered by Terrier [85], and it is based on counting the number of cases that the trellis automaton needs to distinguish in the last few levels of its computation. Let L  A be a language, let k > 1, and consider the last k levels of a possible computation of a trellis automaton on a word w D a1    am with m > k . In these k levels, an automaton would have to decide on the membership of k.kC1/ subwords of w in L, as 2 represented in the following set: SL;k;w D ¹.i; j / j i; j > 0; i C j < k; ai C1    am

j

2 Lº:

778

Michal Kunc and Alexander Okhotin

Hence, for each k , a trellis automaton recognising L must distinguish between different sets SL;k;w that occur for different words w 2 A , that is, between as many as fL .k/ D j¹SL;k;w j w 2 A>k ºj

cases, and must do so on the basis of k states in the k -th last line. Theorem 4.6 (Terrier [85]). If L is linear conjunctive, then fL .k/ 6 p k for some p > 2. In particular, Terrier [85] used this result to show that the square of a certain linear context-free language is not recognised by any trellis automaton, and hence this language family is not closed under concatenation. Another interesting result concerning this family is the existence of a trellis automaton for ¹am b mCn an j m; n > 1º, demonstrated by Čulík [19] by simulating a cellular automaton solving the well-known firing squad synchronisation problem. The existence of P-complete languages recognised by trellis automata was discovered by Ibarra and Kim [30]. All the above results apply to solutions of language equations via Theorem 4.5. 4.4. Complementation and concatenation. Turning to systems of equations of the form Xi D 'i .X1 ; : : : ; Xn / with the operations of concatenation and complementation, the first thing to note is that they are not subject to Lemma 4.1, as complementation is not monotone. Such systems need not have a solution, as demonstrated by an equation X D Xx. However, this possibility of expressing a contradiction works exclusively modulo the empty word. Once a system has a solution modulo ¹"º, it can be extended to a solution: Lemma 4.7 (Okhotin and Yakimova [77]). If a system Xi D 'i .X1 ; : : : ; Xn / (with 1 6 i 6 n) with concatenation and complementation and with any constant languages has a solution .L1 ; : : : ; Ln / modulo A6` for some ` > 0, then the system has a solution y 1; : : : ; L y n /, which is equal to .L1 ; : : : ; Ln / modulo A6` . .L The following is the first example of nontriviality in these equations (and actually the first known language equation over a unary alphabet with a non-regular unique solution): Example 4.5 (Leiss [53]). The strict language equation X D a  Xx 2 solution ¹an j 9i > 0W 23i 6 n < 23i C2º.

2

2

has the unique

Example 4.6 (Okhotin and Yakimova [78]). The strict equation X D aXb over an alphabet A D ¹a; bº has a non-regular unique solution ¹an wb n j w D " or w 2 bA º. There exists a method of proving non-representability of languages by such equations, based upon the notion of a prime language (a language L is called prime if L D MN implies M D ¹"º or N D ¹"º).

21. Language equations

779

x Lemma 4.8 (Okhotin and Yakimova [78]). Let L  A and its complement L be prime languages. Then, if a system of language equations Xi D 'i .X1 ; : : : ; Xn / x among its with concatenation and complementation has a unique solution with L or L components, one of these languages must be among the constant languages used in the system.

For example, the regular language L D aA b [ bA a [ "  ¹a; bº and its x D aA a [ bA b [ ¹a; bº are both prime, and thus the lemma asserts that complement L neither of them is representable by these equations with finite constants. Similarly, the non-regular language L0 D .aA b [ bA a [ "/ n ¹an b n j n > 1º and its complement L0 are prime, and therefore non-representable by equations with concatenation and complementation and regular constants. Turning to the basic decision problems for this class of language equations, solution existence can be checked modulo ¹"º according to Lemma 4.7, and it is NP-complete for large classes of constants [77]. For the solution uniqueness, it is not known whether the problem is decidable; it is only known to be co-r.e. and PSPACE-hard. In the special case of a one-letter alphabet, testing solution uniqueness is complete for the complexity class US (which stands for unique satisfiability) studied by Blass and Gurevich [11]. 4.5. Concatenation and various sets of Boolean operations. Consider resolved systems of language equations with concatenation and all Boolean operations. These equations are powerful enough to simulate equations of the general form ' D . Indeed, such an equation is equivalent to X D Xx \ .'  /, where X is a new variable and  denotes symmetric difference. A systematic study of the families of languages represented by resolved systems Xi D 'i .X1 ; : : : ; Xn / with concatenation, singleton constants and any possible fixed set of Boolean operations was done by Okhotin [74]. The study is based upon Post’s classification of all functions of Boolean logic. For language equations, there are exactly seven distinct classes of languages represented by unique solutions of equations over different bases of Boolean functions. First, there are three trivial subregular classes generated by the sets of operations ¿, ¹>º and ¹\; >º, where the logical truth > represents the language A . Next come the context-free languages with ¹[º, the conjunctive languages with ¹[; \º, and the class generated by complementation only. Finally, if all Boolean operations are used, then these equations define exactly the recursive languages by their unique solutions; furthermore, rather than all Boolean x /, operations, it is sufficient to use any of the three functions f1 .K; L; M / D K [.L\ M x / and f3 .K; L; M / D K  L  M . The properties of these f2 .K; L; M / D K \ .L [ M equations are considered in § 6. The above analysis essentially depends on having the concatenation operation, which contains some Boolean logic inside its definition. Indeed, the membership of a word w in K L is defined as a disjunction, over all partitions w D uv , of the conjunction of u 2 K and v 2 L. If the logic therein is modified, then language equations with the resulting quasi-concatenation and various sets of Boolean operations may have entirely different properties. Recently, Bakinova et al. [7] investigated language equations

780

Michal Kunc and Alexander Okhotin

with two operations: symmetric difference, and a variant of concatenation based on GF(2) logic, that is, using sum modulo two over all partitions of w . Since formal languages form a ring under symmetric difference and GF(2)-concatenation, in which all languages containing the empty word are invertible, the resulting language equations have quite special properties that deserve further investigation. A family of formal grammars using these operations was defined as well, and it looks like a promising variant of context-free grammars with the same complexity upper bounds [7].

5. Equations with constant sides This section is about systems of equations of the form 'i .X1 ; : : : ; Xn / D Ci , sometimes referred to as implicit equations. The most well-studied case are equations with only rational operations, which can be analysed within the syntactic monoid of Ci . This technique, along with its implications for computability of solutions and decidability of their properties, is described in the first subsection. The next subsection surveys some precise results on the complexity of solvability, while equations with other sets of operations are considered in § 5.3. 5.1. Regularity of maximal solutions. Consider equations 'i .X1 ; : : : ; Xn / D Ci , with regular constants Ci and with 'i using union, concatenation, Kleene star and regular constants. Since 'i is monotone, every maximal solution of such a system is also a maximal solution of the system of inequalities 'i .X1 ; : : : ; Xn /  Ci . The latter system shall be proved to have finitely many maximal solutions, which are regular and can be algorithmically computed, and among them one can find the solutions of the original system. Such a result was first formulated by Conway [18]. It is based on the fact that all maximal solutions of the system can be calculated within any monoid recognising all constant languages on the right-hand sides. This idea can also be used to prove regularity of maximal solutions when the left-hand sides employ more general operations, such as infinitary union and arbitrary constants. All assertions of the following theorem also hold if any inequalities are replaced by equations. S Theorem 5.1. Let j 2Ii Eij  Ci , i D 1; : : : ; m, be a system of language inequalities, where Ci are constant regular languages, Ii are ( possibly infinite) index sets and expressions Eij .X1 ; : : : ; Xn / are products of arbitrary constant languages and variables. Then the system has only finitely many maximal solutions and every maximal solution has all components regular. If all left-hand sides in the system are regular expressions, then all maximal solutions can be algorithmically computed. Proof. The idea of the proof is to consider some fixed congruence  of A of finite index, under which all languages Ci are closed, and to show that every solution can be extended to a solution whose every component is a union of -classes. If .K1 ; : : : ; Kn / is an arbitrary solution, the new solution .L1 ; : : : ; Ln / is obtained as the closure of the original solution under , that is, Lj D ¹ w 2 A j 9v 2 Kj W v  w º. To see that .L1 ; : : : ; Ln / is a solution of the system, consider any w 2 Eij .L1 ; : : : ; Ln /. Then

21. Language equations

781

there exists v 2 Eij .K1 ; : : : ; Kn / satisfying v  w . Since .K1 ; : : : ; Kn / is a solution, the word v belongs to Ci . This shows that w belongs to Ci , because Ci is closed under . Now, once it is known that every solution is contained in some -closed solution, all statements of the theorem can be easily verified; for instance, in order to find all maximal solutions, it is sufficient to test finitely many candidates for being a solution. Note that the state of the minimal DFA of a language L reached by a word w 2 A can be identified with the greatest solution of the language inequality wX  L with constant right-hand side (see Chapter 1, § 4.1). Similarly, maximal solutions of the inequality X Y  L in two variables X and Y are exactly states of the so-called universal (non-deterministic) automaton of the language L, see § II.4 in [83]. Therefore, the existence of a finite universal automaton of every regular language can be viewed as a special case of the above theorem. 5.2. Complexity of decision problems. The method used to prove Theorem 5.1 was employed by Bala [8] to analyse the computational complexity of solvability of systems of equations or inequalities with constant right-hand sides. These constants are given by nondeterministic automata, and the left-hand sides use the operations of union, concatenation and star. In order to decide the existence of a non-empty solution of a system of inequalities, due to the monotonicity of operations, only singleton solutions .¹w1 º; : : : ; ¹wn º/ need to be considered. According to the proof of Theorem 5.1, such a solution can be considered as an n-tuple of classes of the congruence, to which the words wj belong. The algorithm [8] guesses an n-tuple of such congruence classes, represents these classes by relations on the states of NFAs defining the constants Ci , and checks them for being a solution using a polynomial-space algorithm for testing containment of NFAs. In order to decide solvability for equations, it is sufficient to look for maximal solutions. According to the proof of Theorem 5.1, every component of a maximal solution is a union of classes of the congruence. Therefore, the existence of a solution can be decided in exponential space by calculating such a congruence, nondeterministically choosing some of its classes for every component of the solution, and verifying the equality of the resulting regular languages. On the other hand, the EXPSPACE-complete problem of universality of rational expressions with intersection, where intersections are not nested, can be expressed as solvability of systems of equations, so the solvability problem is EXPSPACE-complete [8]. Additionally, using a known polynomial-space algorithm for the limitedness problem of desert automata due to Bala [8] and Kirsten [42], one can test whether a solution of a system can be replaced by a finite one, and so the problem of existence of finite solutions turns out to be EXPSPACE-complete as well [8]. For the particular equation X Y D C , where C is given by a DFA, PSPACE-completeness of testing whether it has any nontrivial solution has recently been proved by Martens et al. [58].

782

Michal Kunc and Alexander Okhotin

5.3. Generalisations to other operations. The above results on systems of the form 'i .X1 ; : : : ; Xn / D Ci with union and concatenation essentially depend on having union as the only Boolean operation. If, for instance, the symmetric difference  is allowed, then an arbitrary equation ' D can be expressed as '  D ¿, and hence such equations fall under the more general case described in the next section. The same happens if both union and intersection are allowed. Consider a variant of these equations, in which concatenation is replaced by a  different word operation ÞW A  A ! 2A . Then one can define another word operation  by the rule w 2 u  v () u 2 w Þ v , for all words u, v and w . For example, if Þ is concatenation, then  is the quotient operation. As proved by L. Kari [41], the greatest solution of an inequality X Þ L  C is X D Cx  L. The solution existence problem for equations of this special form, with regular or contextfree constants C and L and with different pairs of operations .Þ; /, were studied by L. Kari [41] and Domaratzki and K. Salomaa [22]; most of the operations are derived from the shuffle along trajectories [60]. In the case of shuffle along letter-bounded regular sets of trajectories, a result analogous to Theorem 5.1 was proved [22], which allowed dealing with equations of the form X Þ Y D C , that is, with the problem of existence of a decomposition of a given regular language. This decomposition problem was particularly studied for the ordinary shuffle operation, where only very little is known, see [15] and [10].

6. Equations of the general form Consider systems of equations of the general form 'i .X1 ; : : : ; Xn / D i .X1 ; : : : ; Xn /, with different allowed operations. It is assumed that all operations are continuous (the case of non-continuous operations is considered in § 7), and furthermore, computable in the sense defined in § 6.1. Typically, these will be concatenation and Boolean operations. In almost all cases, these equations are computationally universal. 



6.1. Upper bounds. Let 'W .2A /n ! 2A be a continuous operation on languages, that is, for every ` > 0 there exists m D m.`/ > 0, such that K D L .mod A6m / implies '.K/ D '.L/ .mod A6` /. Further assume that this definition is algorithmically effective, that is, that m.`/ can be effectively computed, and that '.L/ \ A6` can be computed from L \ A6m and `. Such an operation is called computable, in accordance with a more general definition given in the theory of computable analysis on metric spaces [86]. All standard continuous operations on languages, such as concatenation, Boolean operations, Kleene star, shuffle, non-erasing homomorphisms, etc., are computable. Consider a system of language equations with continuous and computable sides. Then the characterisations of solution existence and solution uniqueness given in Theorem 2.3 can be regarded as first-order formulae of the form .8m/R.m/ and .8m/.9`/R1 .m; `/, respectively, where the predicates R and R1 are recursive due to

21. Language equations

783

the computability condition. This immediately implies that, as decision problems, solution existence and solution uniqueness are in the arithmetical hierarchy: in …01 and in …02 , respectively. Some further analysis leads to the following upper bounds on the sets representable as solutions: Theorem 6.1 (generalising Okhotin [70]). If a system of language equations with continuous and computable operations and with any recursive constants has a unique (least, greatest) solution, then the components of this solution are recursive (r.e., co-r.e., respectively). The cited paper establishes the theorem for equations with Boolean operations and concatenation, but the proof can be extended to computable ultrametric spaces. These bounds shall now be proved tight for all but the simplest language equations. 6.2. Computationally universal solutions. The first signs of computational universality in language equations were discovered by Parikh et al. [79], who used a concise ad hoc argument to prove that testing whether a language equation with Boolean operations and concatenation has any solutions is undecidable. Later systematic studies of language equations adopted a uniform method for encoding universal computation in formal languages, which is based upon the language of computation histories of a Turing machine T , introduced by Hartmanis [29]. This language is denoted by VALC.T/ D ¹w\CT .w/ j w 2 L.T/º;

where the string CT .w/ over a second alphabet € somehow lists all consecutive configurations in the computation of T on the input w , and the symbol \ … A [ € is used as a separator. For a suitable encoding CT , this language is an intersection of two contextfree languages [29] (actually, these can be LL(1) linear context-free languages [71]), and hence can be defined by a resolved system of language equations, of the form as in § 4.3, which has a unique solution with X1 D VALC.T/. These equations use the operations of union, intersection and linear concatenation, though intersection can be eliminated [68]. Adding an extra equation X1 D ¿, the system has a solution if and only if L.T/ D ¿. The same construction can be used to represent the language L.T/ as a component of the least solution of some system. Once equations defining X D VALC.T/ are constructed, it remains to “extract” L.T/ out of X D VALC.T/ using additional equations. Let Y be a new variable and consider the inequality VALC.T/  Y \€  ;

which can be formally rewritten as an equation X [ Y \€  D Y \€  . This inequality states that for every w 2 L.T/, the string w\CT .w/ should be in Y \€  , that is, w should be in Y . This makes L.T/ the least solution of this inequality. By a dual argument involving the complement of VALC.T/, the complement of an arbitrary r.e. set can be represented by a greatest solution of some system. These two constructions can then be combined to represent every recursive set by a unique solution.

784

Michal Kunc and Alexander Okhotin

Theorem 6.2 (Okhotin [70], [68], and [71]). Every recursive (r.e., co-r.e.) set is representable as a component of a unique solution (least, greatest, respectively) of a system of language equations with union, linear concatenation and singleton constants. The same result holds for equations with intersection or symmetric difference instead of the union. Using the same construction based on VALC.T/, the undecidability level of the main decision problems can be precisely determined. Already for systems 'i D i with union (or intersection), linear concatenation and singleton constants, (a) solution existence is co-r.e.-complete, (b) solution uniqueness is …02 -complete, (c) existence of a least/greatest solution is …02 -complete (see [70] and [68]), and (d) having finitely many solutions is †03 -complete (see [67]). All these problems maintain their complexity for systems of equations with all Boolean operations, unrestricted concatenation and recursive constants. 6.3. Equations XK D LX and related systems. The equations XL D LX , known as the commutation equations, were first considered by Conway [18]. Since then, the commutation equations, as well as the more general conjugacy equation XK D LX , have attracted attention as the simplest examples of language equations, which cannot be handled using the known methods. For any constant language L, the union of any solutions of these equations is a solution as well, and hence there is a greatest solution among them. Conway [18] asked whether this greatest solution is always regular, as long as the constants are regular; this question became known as Conway’s problem. It was anticipated that this greatest solution could be analysed using the methods of combinatorics on words, which was achieved for some very special classes of constants: by Karhumäki et al. [39] for regular codes, and by Massazza and Salmela [59] for languages whose lexicographically minimal word is sufficiently distinguished. In a related work, Frid [26] characterised all pairs of commuting languages over a binary alphabet that are closed under taking subwords. Yet in the end it turned out that the greatest solution of the equation XL D LX can encode universal computation already for a finite constant L: Theorem 6.3 (Kunc [48]). There exists a finite language L over a binary alphabet, such that the greatest solution of the equation XL D LX is co-r.e.-complete. The proof is best explained using the following intuition about the commutation equation. This equation can be viewed as a game between two players, the attacker and the defender, in which the defender tries to prove that a word belongs to a solution of the equation. Therefore, a position of the game is an arbitrary word w 2 A . In each round of the game, the attacker first chooses any word from L and adds it to one or the other side of w . In this way, the attacker decides which of the two inclusions he wants to verify. Now the defender has to respond by removing some word belonging to L from the opposite side of the resulting word. The word thus obtained is a new position of the game. The attacker wins the game if the defender has no move available, and the defender wins if he manages to continue playing forever. Then w belongs to the greatest

21. Language equations

785

solution of XL D LX if and only if the defender has a winning strategy when the game begins with w , and all positions of the game that can be reached from w when the defender follows his winning strategy form a solution of the equation that contains w . The proof of the theorem consists of two steps. At the first step, the complement of an arbitrary language recognised by a computationally universal machine is encoded into the greatest solution of a commutation equation with a regular constant L. The following example shows the principles of this construction by sketching an encoding of two counters and the operation of simultaneous decrementation of both counters, together with testing whether both counters are equal to zero. This example is already sufficient to prove that the greatest solution need not be regular. Example 6.1 (Kunc [48]). There exists such a regular language L over an alphabet A D ¹a; b; e; e; O f; f;O g; gº O , that the greatest solution of the equation XL D LX contains a word am ban with 0 6 m 6 n if and only if m D n. This is a game in which the defender has to prove that the values of two counters are equal, while the attacker tries to show that the value of the first counter is in fact smaller. If the values of the two counters are m; n > 0, this is represented by the word am ban . Stages of the computation are encoded using auxiliary letters e , eO , f , fO, g , and gO , placed around the word representing counter values. The game is played on words of the form uam ban v , where u is a suffix of the word efg, and v is a prefix of the word gO fOeO . Furthermore, it is required that the words u and v together contain exactly 3 letters, that is, the only allowed pairs are .efg; "/, .g; gO fO/, ."; gO fOe/ O and .fg; g/ O . Finally, words of the form efgban are disallowed, since they represent the situation when the first counter already has value zero, and the attacker has been successful in proving that the value of the first counter is smaller. Let M denote the set of all these correct configurations. A configuration can be modified by adding or removing one of the words from the set N D ¹ef; ga; e; fg; ag; O fOe; O gO f;O eº: O The attacker verifies the correctness of a current configuration by placing the letter c at either side of the word. If c is added to an incorrect configuration, then the defender is unable to remove a word belonging to L from the other side, and loses. All words obtained from a correct configuration by adding c are placed into the language L, so that the defender immediately wins, whenever the attacker plays c on a correct word. To ensure that the attacker does not produce incorrect configurations himself, all incorrect words that can be possibly produced by the attacker (either containing two occurrences of b or with an incorrect ordering of letters), are collected in a set L0 . Finally, the configuration fgbagO is added to L to make it winning for the defender, since it can be reached only right before successfully decrementing both counters to zero. Altogether, the regular language L is defined as L D N [ ¹c; fgbagº O [ cM [ M c [ L0 :

786

Michal Kunc and Alexander Okhotin

If the game begins with the word efgam ban , it has to proceed according to the scenario described below (possibly changing the direction of computation in any position). Whenever one of the players performs a different action, he immediately loses the game: the attacker can use the letter c to verify correctness of the configuration, while the defender can delete the whole incorrect word produced by the attacker, because it belongs to L0 . The attacker begins by appending gO fO to produce the word efgam ban gO fO, from which the defender has to remove ef to preserve the number of auxiliary letters. The resulting position of the game is gam ban gO fO. Then the attacker appends eO , and the defender removes ga, and so the next position is am 1 ban gO fOeO , with the first counter decremented. By performing the symmetric moves, that is, adding fg and e , the attacker then forces the defender to decrement the second counter as well, and the resulting position is efgam 1 ban 1 . If m < n, then a position of the form efgban will be eventually reached, and the attacker can win the game, because this word does not belong to M . If m D n, then the word fgbagO will be obtained as the last position before both counters reach zero. Because this word belongs to L, it represents a winning position for the defender. Altogether, this means that the set of defender’s winning positions cannot be regular due to the pumping lemma. Jeandel and Ollinger [31] describe how this first step of the construction can be split into several more transparent constructions using certain special combinatorial games. Once an infinite regular language L, for which the equation XL D LX has a co-r.e.-complete greatest solution, is constructed, the second step of the proof of Theorem 6.3 consists in encoding the finite automaton for L into a finite language. This is achieved by splitting the words from L into finitely many fragments annotated by special symbols, which describe the computation of this automaton. These fragments are incorporated into a finite language L0 , along with auxiliary words, so that the attacker-defender game on L0 ensures that the fragments are concatenated to a word accepted by the automaton for L. Note that the equation XL D LX can be viewed as a system of two inequalities XL  LX , LX  XL. This suggests that similar universality results as for commutation can be obtained also for other systems employing inequalities of the form XK  LX . Theorem 6.4 (Kunc [46]). There exist finite languages K , P and regular languages L, M , R such that the greatest solutions of both systems ¹XK  LX; X  M º and ¹XK  LX; XP  RXº are co-r.e.-complete. The commutation equation is generalised to the conjugacy equation XK D LX . Languages K and L are called conjugates via the language M if M is a non-empty solution of this equation. Conjugacy of languages was proved decidable in the case of finite biprefix codes by Cassaigne et al. [16]. On the other hand, the expressive power of commutation can be used to show that one cannot decide whether given regular languages are conjugates via a language containing the empty word:

21. Language equations

787

Theorem 6.5 (Kunc [48]). One cannot algorithmically decide whether for two given regular languages K and L there exists a language M , which contains the empty word and satisfies MK D LM . This result exhibits a class of simple systems of language equations with regular constants and the only operation of concatenation, where solvability is not algorithmically decidable, namely systems of the form XK D LX , XA D A . The present proof of this result is deeply based on the construction for the commutation equation, and no proof, which would either be independent or would use only the computational universality of commutation, is known. It is still an open question whether a similar result can be proved also for conjugacy in general or for conjugacy of finite languages. The results on commutation of languages contrast with the known properties of words, polynomials and formal series in non-commuting variables over a field, where two elements commute if and only if they are powers of the same element (see § 1.2 of Lothaire [55] and Lothaire [56], Chapter 9). 6.4. Equations over a unary alphabet. It was already shown in § 4.2 that language equations over a unary alphabet A D ¹aº have quite a significant expressive power. In particular, Theorem 4.3 asserts that for every trellis automaton recognising basek representations of numbers, there is a system of language equations defining the language of unary representations of those numbers by its unique solution. This result allows representing the unary encoding of the language VALC.T/ of computation histories of any Turing machine T . In the multiple-letter case, the logical dependency of L.T/ upon VALC.T/ can be expressed in a simple language equation explained in § 6.2. This argument was recreated for the case of a unary alphabet by using a very special encoding of VALC.T/, leading to the following theorem: Theorem 6.6 (Jeż and Okhotin [37]). For every recursive (r.e., co-r.e.) language L  a there is a system of language equations of the form 'i .X1 ; : : : ; Xn / D i .X1 ; : : : ; Xn / over the alphabet ¹aº, with 'i ; i using union, concatenation and singleton constants, which has a unique (least, greatest, respectively) solution with X1 D L. The same results hold for unresolved systems with intersection, concatenation and singleton constants. The construction is relatively compact if both union and intersection can be used, where Theorem 4.3 can be directly applied. However, using a single Boolean operation, whether it is union or intersection, requires remaking the construction in Theorem 4.3 using equations of this particular form. The systems constructed in Theorem 6.6 can be further transformed to eliminate the union, leaving the only operation of concatenation. This is done by encoding both union and concatenation by concatenation of unary languages of a specific form, and using additional equations to ensure that specific form. Every variable X of the original system is represented by a new variable X 0 of the new system, with each solution Xi D Li  a of the original system corresponding to a solution Xi0 D .Li / of   the new system, where W 2a ! 2a is a certain encoding function. As proved by

788

Michal Kunc and Alexander Okhotin

Jeż and Okhotin [38], the mapping  can be chosen so that an 2 L if and only if a16nC13 2 .L/, while .L/ n a13 .a16 / is a regular constant that does not depend upon L; this is called representing L on track 13 of .L/. The encoding is checked by an equation X 0 C1 D C2 with fixed regular constants C1 ; C2  a , which is satisfied by X 0 D .L/ for any L  a , and which does not hold for X 0 of any other form. The simulation of equations is based on the second property of the encoding, that for certain finite constants C3 and C4 , the concatenation KL is represented on track 10 of .K/.L/C3 (and hence .K/.L/C3 depends entirely on KL), and similarly .K/.L/C4 represents K [L on track 13. Then an equation X Y D U V of the original system can be replaced by X 0 Y 0 C3 D U 0 V 0 C3 , every equation X [ Y D U [ V is simulated by X 0 Y 0 C4 D U 0 V 0 C4 , and X D C becomes X 0 D .C /. Applying one more layer of a similar encoding to the resulting system allows further simplification by encoding all variables into one [51], with a word an in the variable Xi of the original system represented by the word apnCdi in the unique variable of the new system, for some suitable numbers p and di . Finally, all equations can be transcoded into just two equations by yet another encoding of the same kind. Since all these encodings are applicable to a system with an arbitrary solution given by Theorem 6.6, the following computational universality result holds: Theorem 6.7 (Jeż and Okhotin [38]; Lehtinen and Okhotin [51]). For every recursive (r.e., co-r.e.) set of numbers S  N0 there exist numbers p; d > 1, finite languages K; L  a and regular languages M; N  a , such that the system of two equations ´ XXK D XXL; XM D N with a variable X  a has a unique (least, greatest, respectively) solution, such that n 2 S if and only if apnCd 2 X . Given a Turing machine recognising S (the complement of S in the case of a greatest solution), such p , d , K , L, M and N can be effectively constructed. Thus the systems of such a simple form may already have computationally universal solutions. The same technique is used to prove that their decision problems are undecidable: solution existence is co-r.e.-complete, uniqueness is …02 -complete, and existence of finitely many solutions is †03 -complete, see [37], [38], and [51]. Although encoded forms of computationally complete sets can be represented by equations as simple as in Theorem 6.7, equations with concatenation and no Boolean operations have some combinatorial limitations, which make some simple unary languages non-representable. The known non-representability method is based upon the notions of prime languages (as in § 4.4) and fragile languages. A unary language L  a is said to be fragile, if L  ¹an1 ; an2 º is co-finite for all n1 ; n2 2 N with n1 ¤ n2 , whereas L itself is not co-finite. Examples of fragile sets representable by these equations are known, and prime sets can be represented as well. However, no set that possesses both properties can be represented:

21. Language equations

789

Theorem 6.8 (Lehtinen and Okhotin [52]). Let L  a be prime and fragile. Then it is not representable as a component of a least or a greatest solution of any system of language equations over a unary alphabet, with concatenation as the only operation and using regular constants. An example of a fragile and prime language is the set of all words an , for which Q k 2k the .n C 1/-th bit of the binary sequence 100011110000 1 is 1. kD2 .1 0/

6.5. Infinite systems of equations. A system of infinitely many language equations can be regarded as a binary relation between left-hand sides and right-hand sides, and effectively described by any suitable means, such as a finite transducer recognising the set of equations. Though one could consider different sets of operations and constants, it is known that satisfiability is undecidable already for very simple infinite systems without any occurrences of variables and using only concatenation and finite constants. In this case, the satisfiability problem actually asks whether a given infinite system of equalities holds true. This was first proved by Lisovik [54] and later improved by Karhumäki and Lisovik [40] and Kunc [49] to systems of the simplest form for which the problem is not trivially decidable: Theorem 6.9. It is undecidable whether three given finite languages K , L, M satisfy K n M D Ln M for every integer n > 0. In particular, the system of language equations ¹ X n Z D Y n Z j n 2 N0 º in variables X; Y; Z is not equivalent to any finite system.

Consider infinite systems of language equations with constants restricted to be singletons and with concatenation as the only operation. Such language equations are similar to word equations, with words as unknowns. An arbitrary solution of a word equation constitutes a solution of a language equation in singleton languages. Conversely, for any non-empty solution of a language equation with singleton constants and concatenation, one can construct a solution in words by taking the lexicographically smallest word of minimal length from each component of the original solution. Therefore, solvability of infinite systems of language equations with singleton constants and concatenation, given by finite transducers, is reduced to solvability of the corresponding word equations, which is decidable (see the survey by Harju and Karhumäki [28], §§ 6.1–7.2). 6.6. Well quasi-orders. This section presents a powerful technique of proving regularity of maximal solutions that generalises the method using congruences of finite index, employed in § 5.1. This technique is based on the fact that in order to prove regularity of a given language, it is sufficient to show that the language is upward closed with respect to a monotone well quasi-order on A . A quasi-order 6 is monotone if from the inequalities u 6 v and uQ 6 vQ it follows that uuQ 6 v vQ . The upward closure of a language K  A with respect to a quasi-order 6 on A is hKi6 D ¹ u 2 A j 9v 2 KW v 6 u º. A quasi-order 6 on A is called a well quasi-order (wqo) if for every language L  A upward closed with respect to 6 there exists a finite subset K of L such that L D hKi6.

Michal Kunc and Alexander Okhotin

790

Theorem 6.10 (Ehrenfeucht et al. [24]). A language L  A is regular if and only if it is upward closed with respect to some monotone wqo on A . The goal is to show that every solution of a certain system of language inequalities is contained in some solution upward closed with respect to a certain monotone wqo. Then Theorem 6.10 implies that all its maximal solutions are regular. The advantage of using wqos, compared to the use of congruences of finite index in § 5.1, is that they can also be employed when variables and concatenation occur in right-hand sides of inequalities. This is because a quasi-order can classify words according to their decompositions into several factors, taking into account the contexts of these factors in the constant languages occurring in the inequalities. Case study I: Restrictions on operations and constants. Bucher et al. [14] introduced an important class of monotone quasi-orders of words, based on homomorphisms to finite ordered semigroups, that is, semigroups equipped with a monotone ordering of their elements. Given a homomorphism W AC ! S to a finite ordered semigroup .S; 6/, a monotone quasi-order 6 on A is defined as follows: for u; v 2 A , where v D a1 : : : am , with ak 2 A, let v 6 u if there exists a decomposition u D u1 : : : um , with uk 2 AC , such that .uk / 6 .ak / for k D 1; : : : ; m. Then the languages upward closed with respect to 6 are precisely those which are generated from some set of initial words using the context-free rewriting rules a ! u, for a 2 A and u 2 AC satisfying .u/ 6 .a/. Note that 6 saturates the quasi-order induced on AC by  , that is, if v 6 u then either u D v D " or u; v 2 AC and .v/ > .u/. This means that every language L  AC recognised by  is upward closed with respect to 6 . Characterising all homomorphisms W AC ! S to an ordered semigroup, for which 6 is a wqo, is an open problem. The answer is known only for semigroups .S; D/ ordered by the equality relation. Theorem 6.11 (Kunc [47]). For an arbitrary homomorphism W AC ! S to a finite semigroup ordered by equality, the relation 6 is a well quasi-order on A if and only if for arbitrary words u; v 2 AC there exist x; y 2 A such that .xuvy/ is equal either to .u/ or to .v/ (algebraically, this means that .AC / is a chain of simple semigroups). The following generalisation of a result of Kunc [47] states that all maximal solutions of very general systems of inequalities are always regular, provided that all operations used in these inequalities (including constants) respect some well quasi-order of the form 6 , where  is a fixed homomorphism to a finite ordered semigroup. Theorem 6.12. Let W AC ! S be a homomorphism to a finite ordered semigroup .S; 6/, such that 6 is a wqo on A . Let 'i .X1 ; : : : ; Xn /  i .X1 ; : : : ; Xn /, for i 2 I , be a ( possibly infinite) system of language inequalities, where and

'i .hL1 i6 ; : : : ; hLn i6 /  h'i .L1 ; : : : ; Ln /i6 h

i .L1 ; : : : ; Ln /i6



i .hL1 i6 ; : : : ; hLn i6 /;

21. Language equations

791

for every n-tuple of languages .L1 ; : : : ; Ln /. Then each component of every maximal solution of the system is regular and can be expressed as a finite union of concatenations of languages recognised by  . Proof. The proof proceeds similarly to the proof of Theorem 5.1, but instead of taking the closure of a given solution .L1 ; : : : ; Ln / with respect to a congruence relation, one has to take its upward closure with respect to 6 , that is, .hL1 i6 ; : : : ; hLn i6 /. To verify that this n-tuple is also a solution, it is sufficient to combine the assumptions on the functions 'i and i with the inequality h'i .L1 ; : : : ; Ln /i6  h i .L1 ; : : : ; Ln /i6 , which holds because .L1 ; : : : ; Ln / is a solution. Since 6 is a wqo, every language hLj i6 is a union of finitely many languages of the form ha1    am i6 with ak 2 A, and for this particular wqo, the latter language is equal to ¹u1    um j .uk / 6 .ak /º D ha1 i6    ham i6 . Since each language hak i6 is equal to  1 ¹ s 2 S j s 6 .ak / º, it is recognised by  . Therefore, the language hLj i6 is of the required form. If the sides of inequalities are given as formulae over some basic operations, then the next corollary gives sufficient conditions on these operations, which ensure that their compositions satisfy the theorem.

Corollary 6.13. Assume that a system 'i  i has all 'i ; i composed of (1) monotone operations f , which satisfy .f .L1 ; : : : ; Ln // D f ..L1 /; : : : ; .Ln // for all C substitutions W A ! 2A and for all languages L1 ; : : : ; Ln , and (2) constant languages recognised by a fixed homomorphism W AC ! S to a finite ordered semigroup .S; 6/, such that 6 is a wqo on A . Then the conclusions of Theorem 6.12 hold. Since the operations of concatenation, Kleene star and (infinitary) union satisfy the requirements of the corollary, their unrestricted use in equations agrees with the assumptions of Theorem 6.12. Moreover, the theorem also applies if shuffle and (infinitary) intersection are used in the right-hand sides and arbitrary constants are used on the left. If constants occurring in the system are finitely many group languages, then  can be taken as a homomorphism to a finite group, and Theorem 6.12 can be applied, because 6 is a wqo by Theorem 6.11. Therefore, one of the consequences of Theorem 6.12 is that the class of polynomials of group languages (that is, regular languages open in the group topology [81]; also see Beaudry et al. [9], Corollary 6.1) is closed under taking maximal solutions of arbitrary systems with rational operations. Case study II: Inequalities XK  LX . Using well quasi-orders, it can be also proved that, unlike in the case of systems of inequalities XK  LX considered in § 6.3, the greatest solution of a single inequality of this form is always regular. Theorem 6.14 (Kunc [47]). If K  A is arbitrary and L  A is regular, then the greatest solution of the inequality XK  LX is regular.

The basic idea of this result is to think of the inequality XK  LX as a game, as in the case of equations XL D LX in § 6.3, and encode all strategies of the defender on a given word as a labelled tree. Then the standard quasi-ordering of trees, which is a wqo

792

Michal Kunc and Alexander Okhotin

due to Kruskal’s tree theorem [44], corresponds to simulating the defender’s winning strategies. Therefore, the wqo of words, induced by this wqo of the corresponding trees, recognises the greatest solution. Theorem 6.14, in particular, implies that for a context-free language K and a regular language L, one can algorithmically decide whether a given word belongs to the greatest solution of XK  LX . On the one hand, the greatest solution is co-r.e. by Theorem 6.1. On the other hand, it is r.e., because one can test every regular language for being a solution, and any element of the greatest solution shall be found in one of these solutions. However, it is not known whether an automaton for the greatest solution can be algorithmically constructed, except for the two special cases dealt with by Ly and Wu [57]. 6.7. Identity checking and inequations. A language equation is an identity, if any substitution of languages for its variables turns it into an equality. A typical decision problem is testing whether a given language equation is an identity, and the state of the art on this problem is its decidability for equations with singleton constants, union, concatenation, Kleene star and shuffle, established by Meyer and Rabinovich [61]. Without constants, this problem is decidable if intersection is allowed as well [61]. On the other hand, it remains open whether identity testing is decidable for equations allowing intersection and regular constants along with the above operations. Note that ' D is an identity if and only if the inequation ' ¤ has no solutions. This suggests the study of such inequations, as well as of mixed systems of equations ' D and inequations ' ¤ , with the same research problems as for any other language equations. The framework of § 2 and § 6.1 is directly applicable to equations in such systems, and can be applied to inequations as to logical negations of equations. Thus, a necessary and sufficient condition for solution existence was obtained [67]: a mixed system with concatenation and Boolean operations has a solution if and only if there exists a number k > 0, such that for every ` > k there exists a solution of the equations modulo A6` that satisfies the inequations modulo A6k . In order to characterise solution uniqueness, consider that a mixed system has at most one solution if and only if for every ` > 0, there exists m > `, such that all solutions of the equations modulo A6m that satisfy the inequations modulo A6` coincide modulo A6` . Uniqueness of a solution is a conjunction of these two conditions [67]. One can infer from the above conditions that unique solutions of mixed systems using Boolean operations and concatenation must be recursive languages, as in the case of systems of standard equations considered in Theorem 6.1. However, there is an important distinction. Unique solutions of systems of equations are effectively recursive, in the sense that one can construct a halting Turing machine recognising the solution. But for mixed systems, they are non-effectively recursive: it is already undecidable whether the unique solution contains the empty word [67]. Decision problems for mixed systems have the following complexity [67]: testing whether a system has a unique solution is complete for the Boolean closure of †02 , while solution existence is †02 -complete.

21. Language equations

793

7. Equations with erasing operations All equations considered so far have been restricted to continuous operations. However, some non-continuous operations, such as erasing homomorphisms and the quotient, are commonly used, and equations involving them are also of interest. An early paper by Ruohonen [82] considered equations with homomorphisms and union in connection with Lindenmayer systems. Later, Okhotin [69] briefly investigated equations with quotient, and used them to represent all languages in the arithmetical hierarchy. But this was, in fact, still below the actual expressive power of such equations. Consider an obvious upper bound on the unique solutions of language equations, assuming that each operation is representable in first-order arithmetic, which is about the weakest reasonable assumption one can make. Then every equation '.X1 ; : : : ; Xn / D .X1 ; : : : ; Xn / can be transcribed by an arithmetical formula f .X1 ; : : : ; Xn / using free second-order variables X1 ; : : : ; Xn (as well as any quantified first-order variables). Assume that the equation has a unique solution; then membership of words in this solution can be equivalently represented either as g.x/D.9X1 /    .9Xn /f .X1 ; : : : ; Xn /^x 2 X1 , or as g0 .x/ D .8X1 /    .8Xn /f .X1 ; : : : ; Xn / ! x 2 X1 . Therefore, this solution belongs to the class †11 \ …11 , known as the class of hyper-arithmetical sets. On the other hand, an arbitrary hyper-arithmetical subset of a can be represented by a unique solution of a system of language equations of a rather simple form: Theorem 7.1 (Jeż and Okhotin [36]). For every hyper-arithmetical set S  N there exists a system of language equations over the alphabet A D ¹aº, using the operations of concatenation, quotient and union, as well as singleton constants, which has a unique solution with X1 D ¹an j n 2 S º. Testing solution existence for such systems of equations is †11 -complete. The argument is technically based upon the constructions of Theorem 6.6. It proceeds by expressing the known definition of hyper-arithmetical sets by infinite unions and intersections in a system of language equations. One can also represent hyper-arithmetical sets over multiple-letter alphabets by the same kind of equations as in Theorem 7.1, using a technically much simpler construction. Recently, Lehtinen [50] showed how to encode each system of the form constructed in Theorem 7.1 into a system of two equations XXK D .XX 1 /L, XM D N , where K; L; M; N  a are regular constants and X is an unknown language. Thus, for every hyper-arithmetical set S there is such a system of two equations with a unique solution satisfying apnCd 2 X if and only if an , for some p; d > 1 (cf. Theorem 6.7).

References [1] A. Aiken, D. Kozen, M. Vardi, and E. Wimmers, The complexity of set constraints. In Computer science logic (E. Börger, Y. Gurevich, and K. Meinke, eds.). Selected papers from the Seventh Workshop (CSL ’93) held in Swansea, September 13–17, 1993. Lecture Notes in Computer Science, 832. Springer, Berlin, 1994, 1–17. MR 1316895 Zbl 0953.68557 q.v. 770

794

Michal Kunc and Alexander Okhotin

[2] J.-M. Autebert, J. Berstel, and L. Boasson, Context-free languages and pushdown automata. In Handbook of formal languages. (G. Rozenberg and A. Salomaa, eds.). Vol. 1. Springer, Berlin, 1997, 111–174. MR 1469995 q.v. 772, 773 [3] F. Baader and R. Küsters, Unification in a description logic with transitive closure of roles. In Logic for programming, artificial intelligence, and reasoning (R. Nieuwenhuis and A. Voronkov, eds.). Papers from the 8th International Conference (LPAR 2001) held at the University of Havana, Havana, December 3–7, 2001. Lecture Notes in Computer Science, 2250. Lecture Notes in Artificial Intelligence, 217–232. MR 1933175 Zbl 1275.68134 q.v. 770 [4] F. Baader and P. Narendran, Unification of concept terms in description logics. J. Symbolic Comput. 31 (2001), no. 3, 277–305. MR 1814334 Zbl 0970.68166 q.v. 770 [5] F. Baader and A. Okhotin, Solving language equations and disequations with applications to disunification in description logics and monadic set constraints. In Logic for programming, artificial intelligence, and reasoning (N. Bjørner and A. Voronkov, eds.) Proceedings of the 18th International Conference (LPAR-18) held in Mérida, March 11–15, 2012. Lecture Notes in Computer Science, 7180. Springer, Berlin, 2012, 107–121. MR 2965638 Zbl 1352.68124 q.v. 771 [6] F. Baader and A. Okhotin, On language equations with one-sided concatenation. Fund. Inform. 126 (2013), no. 1, 1–35. MR 3114191 Zbl 1359.68155 q.v. 770, 771 [7] E. Bakinova, A. Basharin, I. Batmanov, K. Lyubort, A. Okhotin, and E. Sazhneva, Formal languages over GF.2/. In Language and automata theory and applications (S. T. Klein, C. Martín-Vide, and D. Shapira, eds.). Proceedings of the 12th International Conference, LATA 2018, Ramat Gan, Israel, April 9–11, 2018. Lecture Notes in Computer Science, 10792. Springer, Cham, 2018, 68–79. MR 3788042 Zbl 06894740 q.v. 779, 780 [8] S. Bala, Complexity of regular language matching and other decidable cases of the satisfiability problem for constraints between regular open terms. Theory Comput. Syst. 39 (2006), no. 1, 137–163. MR 2189804 Zbl 1101.68050 q.v. 781 [9] M. Beaudry, F. Lemieux, and D. Thérien, Finite loops recognize exactly the regular open languages. In Automata, languages and programming (P. Degano, R. Gorrieri, and A. Marchetti-Spaccamela, eds.). Proceedings of the 24th International Colloquium (ICALP’97) held in Bologna, July 7–11, 1997. Lecture Notes in Computer Science, 1256. Springer, Berlin, 1997, 110–120. MR 1616178 Zbl 1401.68213 q.v. 791 [10] F. Biegler, M. Daley, M. Holzer, and I. McQuillan, On the uniqueness of shuffle on words and finite languages. Theoret. Comput. Sci. 410 (2009), no. 38–40, 3711–3724. MR 2553324 Zbl 1343.68132 q.v. 782 [11] A. Blass and Y. Gurevich, On the unique satisfiability problem. Inform. and Control 55 (1982), no. 1–3, 80–88. MR 0727739 Zbl 0543.03027 q.v. 779 [12] V. Bodnarchuk, Systems of equations in the algebra of events. Ž. Vyčisl. Mat i Mat. Fiz. 3 (1963), 1077–1088. In Russian. English translation, U.S.S.R. Comput. Math. Math. Phys. 3 (1963), 1470–1487. MR 0162711 Zbl 0148.25001 q.v. 770 [13] J. A. Brzozowski and E. L. Leiss, On equations for regular languages, finite automata, and sequential networks. Theoret. Comput. Sci. 10 (1980), no. 1, 19–35. MR 0549752 Zbl 0415.68023 q.v. 770 [14] W. Bucher, A. Ehrenfeucht, and D. Haussler, On total regulators generated by derivation relations. Theoret. Comput. Sci. 40 (1985), no. 2–3, 131–148. MR 0835410 Zbl 0606.68074 q.v. 790

21. Language equations

795

[15] C. Câmpeanu, K. Salomaa, and S. Vágvölgyi, Shuffle decompositions of regular languages. Internat. J. Found. Comput. Sci. 13 (2002), no. 6, 799–816. MR 1941491 Zbl 1067.68085 q.v. 782 [16] J. Cassaigne, J. Karhumäki, and P. Salmela, Conjugacy of finite biprefix codes. Theoret. Comput. Sci. 410 (2009), no. 24–25, 2345–2351. MR 2522439 Zbl 1168.68023 q.v. 786 [17] N. Chomsky, Three models for the description of language. IRE Trans. Info. Theory 2 (1956), no. 3, 113–124. Also published in Readings Math. Psychol. 2 (1965), 105-124. Zbl 0156.25401 q.v. 773 [18] J. Conway, Regular algebra and finite machines. Chapman & Hall/CRC Mathematics. Chapman and Hall, London, 1971. MR 3967692 Zbl 0231.94041 q.v. 780, 784 [19] K. Culik, II, Variations of the firing squad problem and applications. Inform. Process. Lett. 30 (1989), no. 3, 153–157. MR 0983761 Zbl 0665.68043 q.v. 778 [20] K. Culik, II, J. Gruska, and A. Salomaa, Systolic trellis automata. I. Internat. J. Comput. Math. 15 (1984), no. 3–4, 195–212. MR 0754266 Zbl 0571.68041 q.v. 777 [21] K. Culik, II, J. Gruska, and A. Salomaa, Systolic trellis automata. II. Internat. J. Comput. Math. 16 (1984), no. 1, 3–22. MR 0757600 Zbl 0571.68042 q.v. 777 [22] M. Domaratzki and K. Salomaa, Decidability of trajectory-based equations. Theoret. Comput. Sci. 345 (2005), no. 2–3, 304–330. MR 2171616 Zbl 1079.68049 q.v. 782 [23] C. R. Dyer, One-way bounded cellular automata. Inform. and Control 44 (1980), no. 3, 261–281. MR 0574487 Zbl 0442.68082 q.v. 777 [24] A. Ehrenfeucht, D. Haussler, and G. Rozenberg, On regularity of context-free languages. Theoret. Comput. Sci. 27 (1983), no. 3, 311–332. MR 0731068 Zbl 0553.68044 q.v. 790 [25] Z. Ésik and W. Kuich, Boolean fuzzy sets. Internat. J. Found. Comput. Sci. 18 (2007), no. 6, 1197–1207. MR 2363800 Zbl 1183.68338 q.v. 775 [26] A. E. Frid, Simple equations on binary factorial languages. Theoret. Comput. Sci. 410 (2009), no. 30–32, 2947–2956. MR 2543347 Zbl 1176.68104 q.v. 784 [27] S. Ginsburg and H. G. Rice, Two families of languages related to ALGOL. J. Assoc. Comput. Mach. 9 (1962), 350–371. MR 0152158 Zbl 0161.13903 Zbl 0196.01803 q.v. 765, 772, 773 [28] T. Harju and J. Karhumäki, Morphisms. In Handbook of formal languages (G. Rozenberg and A. Salomaa, eds.). Vol. 1. Word, language, grammar. Springer, Berlin, 1997, 439–510. MR 1469999 q.v. 789 [29] J. Hartmanis, Context-free languages and Turing machine computations. In Mathematical aspects of computer science (J. T. Schwartz, ed.). Proceedings of Symposia in Applied Mathematics. XIX. American Mathematical Society, Providence, R.I., 1967, 42–51. MR 0235938 Zbl 0189.29101 q.v. 773, 783 [30] O. H. Ibarra and S. M. Kim, Characterizations and computational complexity of systolic trellis automata. Theoret. Comput. Sci. 29 (1984), no. 1–2, 123–153. MR 0742405 Zbl 0536.68048 q.v. 777, 778 [31] E. Jeandel and N. Ollinger, Playing with Conway’s problem. Theoret. Comput. Sci. 409 (2008), no. 3, 557–564. MR 2473930 Zbl 1156.68030 q.v. 786 [32] A. Jeż, Conjunctive grammars generate non-regular unary languages. Internat. J. Found. Comput. Sci. 19 (2008), no. 3, 597–615. MR 2417958 Zbl 1155.68040 q.v. 776 [33] A. Jeż and A. Okhotin, Conjunctive grammars over a unary alphabet: undecidability and unbounded growth. Theory Comput. Syst. 46 (2010), no. 1, 27–58. MR 2574644 Zbl 1183.68327 q.v. 776

796

Michal Kunc and Alexander Okhotin

[34] A. Jeż and A. Okhotin, Complexity of equations over sets of natural numbers. Theory Comput. Syst. 48 (2011), no. 2, 319–342. MR 2763104 Zbl 1209.68263 q.v. 776 [35] A. Jeż and A. Okhotin, One-nonterminal conjunctive grammars over a unary alphabet. Theory Comput. Syst. 49 (2011), no. 2, 319–342. MR 2805893 Zbl 1248.68304 q.v. 776 [36] A. Jeż and A. Okhotin, Representing hyper-arithmetical sets by equations over sets of integers. Theory Comput. Syst. 51 (2012), no. 2, 196–228. MR 2922697 Zbl 1279.68158 q.v. 793 [37] A. Jeż and A. Okhotin, Computational completeness of equations over sets of natural numbers. Inform. and Comput. 237 (2014), 56–94. MR 3231931 Zbl 1291.03078 q.v. 787, 788 [38] A. Jeż and A. Okhotin, Equations over sets of integers with addition only. J. Comput. System Sci. 82 (2016), no. 6, 1007–1019. MR 3510412 Zbl 1342.68185 q.v. 788 [39] J. Karhumäki, M. Latteux, and I. Petre, Commutation with codes. Theoret. Comput. Sci. 340 (2005), no. 2, 322–333. MR 2150757 Zbl 1079.68051 q.v. 784 [40] J. Karhumäki and L. P. Lisovik, The equivalence problem of finite substitutions on ab  c , with applications. Internat. J. Found. Comput. Sci. 14 (2003), no. 4, 699–710. MR 2010592 Zbl 1101.68660 q.v. 789 [41] L. Kari, On language equations with invertible operations. Theoret. Comput. Sci. 132 (1994), no. 1–2, 129–150. MR 1290539 Zbl 0821.68075 q.v. 782 [42] D. Kirsten, A Burnside approach to the finite substitution problem. Theory Comput. Syst. 39 (2006), no. 1, 15–50. MR 2189557 Zbl 1102.68498 q.v. 781 [43] V. Kountouriotis, C. Nomikos, and P. Rondogiannis, Well-founded semantics for Boolean grammars. Inform. and Comput. 207 (2009), no. 9, 945–967. MR 2547854 Zbl 1181.68162 q.v. 775 [44] J. B. Kruskal, Well-quasi-ordering, the Tree Theorem, and Vazsonyi’s conjecture. Trans. Amer. Math. Soc. 95 (1960), 210–225. MR 0111704 Zbl 0158.27002 q.v. 792 [45] M. Kunc, Largest solutions of left-linear language inequalities. In Automata and formal languages (Z. Ésik and Z. Fülöp, eds.). Proceedings of the 11th International Conference (AFL 2005) held in Dobogókő, May 17–20, 2005. University of Szeged. Institute of Informatics, Szeged, 2005, 178–186. MR 2153677 q.v. 771 [46] M. Kunc, On language inequalities XK  LX . In Developments in language theory (C. de Felice and A. Restivo, eds.). Papers from the 9th International Conference (DLT 2005) held in Palermo, July 4–8, 2005. Lecture Notes in Computer Science, 3572. Springer, Berlin, 2005, 327–337. MR 2187274 Zbl 1132.68454 q.v. 786 [47] M. Kunc, Regular solutions of language inequalities and well quasi-orders. Theoret. Comput. Sci. 348 (2005), no. 2–3, 277–293. MR 2181382 Zbl 1081.68047 q.v. 790, 791 [48] M. Kunc, The power of commuting with finite sets of words. Theory Comput. Syst. 40 (2007), no. 4, 521–551. MR 2305376 Zbl 1121.68065 q.v. 784, 785, 787 [49] M. Kunc, The simplest language where equivalence of finite substitutions is undecidable. In Fundamentals of computation theory (E. Csuhaj-Varjú and Z. Ésik, eds.). Proceedings of the 16th international symposium, FCT 2007, Budapest, Hungary, August 27–30, 2007. Lecture Notes in Computer Science, 4639. Springer, Berlin, 2007, 365–375. Zbl 1135.68457 q.v. 789 [50] T. Lehtinen, Equations X C A D B and .X C X/ C C D .X X/ C D over sets of natural numbers. In Mathematical foundations of computer science 2012 (B. Rovan, V. Sassone, and P. Widmayer, eds.). Proceedings of the 37th International Symposium (MFCS 2012) held in Bratislava, August 27–31, 2012. Lecture Notes in Computer Science, 7464. Springer, Berlin, 2012, 615–629. MR 3030466 Zbl 1365.68308 q.v. 793

21. Language equations

797

[51] T. Lehtinen and A. Okhotin, On language equations XXK D XXL and XM D N over a unary alphabet. In Developments in language theory (Y. Gao, H. Lu, S. Seki, and S. Yu, eds.). Proceedings of the 14th International Conference (DLT 2010) held at the University of Western Ontario, London, ON, August 17–20, 2010. Lecture Notes in Computer Science, 6224. Springer, Berlin, 2010, 291–302. MR 2725652 Zbl 1205.68207 q.v. 788 [52] T. Lehtinen and A. Okhotin, On equations over sets of numbers and their limitations. Internat. J. Found. Comput. Sci. 22 (2011), no. 2, 377–393. MR 2772815 Zbl 1209.68301 q.v. 789 [53] E. L. Leiss, Unrestricted complementation in language equations over a one-letter alphabet. Theoret. Comput. Sci. 132 (1994), no. 1–2, 71–84. MR 1290536 Zbl 0821.68076 q.v. 778 [54] L. P. Lisovik, The equivalence problem for finite substitutions in a regular language. Dokl. Akad. Nauk 357 (1997), no. 3, 299–301. In Russian. English translation, Dokl. Math. 56 (1997), no. 3, 867–869. MR 1606433 Zbl 0961.68531 q.v. 789 [55] M. Lothaire, Combinatorics on words. A collective work by D. Perrin, J. Berstel, C. Choffrut, R. Cori, D. Foata, J.-É. Pin, G. Pirillo, C. Reutenauer, M.-P. Schützenberger, J. Sakarovitch, and I. Simon. With a foreword by R. Lyndon and a preface by Perrin. Corrected reprint of the 1983 original, with a new preface by Perrin. Cambridge Mathematical Library. Cambridge University Press, Cambridge, 1997. MR 1475463 Zbl 0874.20040 q.v. 787 [56] M. Lothaire, Algebraic combinatorics on words. A collective work by J. Berstel, D. Perrin, P. Seebold, J. Cassaigne, A. De Luca, S. Varricchio, A. Lascoux, B. Leclerc, J.-Y. Thibon, V. Bruyère, C. Frougny, F. Mignosi, A. Restivo, C. Reutenauer, D. Foata, G.-N. Han, J. Désarménien, V. Diekert, T. Harju, J. Karhumäki and W. Plandowski. With a preface by Berstel and Perrin. Encyclopedia of Mathematics and its Applications, 90. Cambridge University Press, Cambridge, 2002. MR 905123 Zbl 1001.68093 q.v. 787 [57] O. Ly and Z. Wu, On effective construction of the greatest solution of language inequality XA  BX . Theoret. Comput. Sci. 528 (2014), 12–31. MR 3175076 Zbl 1282.68155 q.v. 792 [58] W. Martens, M. Niewerth, and T. Schwentick, Schema design for xml repositories: complexity and tractability. In PODS ’10. Proceedings of the twenty-ninth ACM SIGMODSIGACT-SIGART symposium on Principles of database systems. Indianapolis, June 6–11, 2010. Association for Computing Machinery, New York, 2010, 239–250. q.v. 781 [59] P. Massazza and P. Salmela, On the simplest centralizer of a language. Theor. Inform. Appl. 40 (2006), no. 2, 295–301. MR 2252640 Zbl 1112.68097 q.v. 784 [60] A. Mateescu, G. Rozenberg, and A. Salomaa, Shuffle on trajectories: syntactic constraints. Theoret. Comput. Sci. 197 (1998), no. 1–2, 1–56. MR 1615787 Zbl 0902.68096 q.v. 782 [61] A. R. Meyer and A. Rabinovich, Valid identity problem for shuffle regular expressions. J. Autom. Lang. Comb. 7 (2002), no. 1, 109–125. MR 1915294 Zbl 1021.68085 q.v. 792 [62] A. Okhotin, Conjunctive grammars. J. Autom. Lang. Comb. 6 (2001), no. 4, 519–535. 2nd Workshop on Descriptional Complexity of Automata, Grammars and Related Structures (London, ON, 2000). MR 1897299 Zbl 1004.68082 q.v. 766, 774, 775 [63] A. Okhotin, Conjunctive grammars and systems of language equations. Programmirovanie 2002, no. 5, 3–11. In Russian. English translation, Comput. Software 28 (2002), no. 5, 243–249. MR 2023633 Zbl 1036.68063 q.v. 774 [64] A. Okhotin, Boolean grammars. Inform. and Comput. 194 (2004), no. 1, 19–48. MR 2084724 Zbl 1073.68037 q.v. 775

798

Michal Kunc and Alexander Okhotin

[65] A. Okhotin, On the equivalence of linear conjunctive grammars and trellis automata. Theor. Inform. Appl. 38 (2004), no. 1, 69–88. MR 2059029 Zbl 1084.68079 q.v. 777 [66] A. Okhotin, On the number of nonterminals in linear conjunctive grammars. Theoret. Comput. Sci. 320 (2004), no. 2–3, 419–448. MR 2064310 Zbl 1068.68072 q.v. 777 [67] A. Okhotin, Strict language inequalities and their decision problems. In Mathematical foundations of computer science 2005 (J. Jędrzejowicz and A. Szepietowski, eds.). Proceedings of the 30th International Symposium (MFCS 2005) held in Gdansk, August 29–September 2, 2005. Lecture Notes in Computer Science, 3618. Springer, Berlin, 2005, 708–719. MR 2237410 Zbl 1156.68461 q.v. 784, 792 [68] A. Okhotin, Unresolved systems of language equations: expressive power and decision problems. Theoret. Comput. Sci. 349 (2005), no. 3, 283–308. MR 2183157 Zbl 1086.68077 q.v. 783, 784 [69] A. Okhotin, Computational universality in one-variable language equations. Fund. Inform. 74 (2006), no. 4, 563–578. MR 2286863 Zbl 1106.68063 q.v. 793 [70] A. Okhotin, Decision problems for language equations. J. Comput. System Sci. 76 (2010), no. 3–4, 251–266. MR 2656491 Zbl 1201.68067 q.v. 767, 768, 783, 784 [71] A. Okhotin, Language equations with symmetric difference. Fund. Inform. 116 (2012), no. 1–4, 205–222. MR 2977844 Zbl 1263.68101 q.v. 783, 784 [72] A. Okhotin, Conjunctive and Boolean grammars: the true general case of the context-free grammars. Comput. Sci. Rev. 9 (2013), 27–59. Zbl 1286.68268 q.v. 773, 775 [73] A. Okhotin, Parsing by matrix multiplication generalized to Boolean grammars. Theoret. Comput. Sci. 516 (2014), 101–120. MR 3141587 Zbl 1277.68108 q.v. 775 [74] A. Okhotin, On language equations with concatenation and various sets of Boolean operations. RAIRO Theor. Inform. Appl. 49 (2015), no. 3, 205–232. MR 3434599 Zbl 1347.68209 q.v. 779 [75] A. Okhotin and C. Reitwießner, Conjunctive grammars with restricted disjunction. Theoret. Comput. Sci. 411 (2010), no. 26–28, 2559–2571. MR 2666349 Zbl 1203.68078 q.v. 775 [76] A. Okhotin and P. Rondogiannis, On the expressive power of univariate equations over sets of natural numbers. Inform. and Comput. 212 (2012), 1–14. MR 2889459 Zbl 1263.68102 q.v. 776 [77] A. Okhotin and O. Yakimova, Language equations with complementation: decision problems. Theoret. Comput. Sci. 376 (2007), no. 1–2, 112–126. MR 2316395 Zbl 1111.68062 q.v. 778, 779 [78] A. Okhotin and O. Yakimova, Language equations with complementation: expressive power. Theoret. Comput. Sci. 416 (2012), 71–86. MR 2876109 Zbl 1279.68168 q.v. 778, 779 [79] R. Parikh, A. Chandra, J. Halpern, and A. Meyer, Equations between regular terms and an application to process logic. SIAM J. Comput. 14 (1985), no. 4, 935–942. MR 0807892 Zbl 0587.68031 q.v. 783 [80] I. Petre and A. Salomaa, Algebraic systems and pushdown automata. In Handbook of weighted automata (M. Droste, W. Kuich, and H. Vogler, eds.). Monographs in Theoretical Computer Science. An EATCS Series. Springer, Berlin, 2009, 257–289. MR 2777733 q.v. 774 [81] J.-É. Pin, Polynomial closure of group languages and open sets of the Hall topology. Theoret. Comput. Sci. 169 (1996), no. 2, 185–200. ICALP ’94 (Jerusalem, 1994). MR 1426411 Zbl 0877.68076 q.v. 791

21. Language equations

799

[82] K. Ruohonen, A note on language equations involving morphisms. Inform. Process. Lett. 7 (1978), no. 5, 209–212. MR 0483780 Zbl 0385.68058 q.v. 793 [83] J. Sakarovitch, Éléments de théorie des automates. Vuibert Informatique, Paris, 2003. English translation, Elements of automata theory. Cambridge University Press, 2009. Translated by R. Thomas. Cambridge University Press, Cambridge, 2009. MR 2567276 Zbl 1188.68177 (English ed.) Zbl 1178.68002 (French ed.) q.v. 781 [84] A. Salomaa, Theory of automata. International Series of Monographs in Pure and Applied Mathematics, 100. Pergamon Press, Oxford etc., 1969. MR 0262021 Zbl 0193.32901 q.v. 770 [85] V. Terrier, On real time one-way cellular array. Theoret. Comput. Sci. 141 (1995), no. 1–2, 331–335. MR 1323161 Zbl 0873.68114 q.v. 777, 778 [86] K. Weihrauch, Computable analysis. An introduction. Texts in Theoretical Computer Science. An EATCS Series. Springer, Berlin, 2000. MR 1795407 Zbl 0956.68056 q.v. 782

Chapter 22

Algebra for trees Mikołaj Bojańczyk

Contents 1. 2. 3. 4. 5. 6. 7. 8.

Introduction . . . . . . . . . . . Trees as ground terms . . . . . . A recipe for designing an algebra Preclones . . . . . . . . . . . . Forest algebra . . . . . . . . . . Seminearring . . . . . . . . . . Nesting algebras . . . . . . . . . Recent developments . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

801 804 812 813 818 826 832 835

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

835

1. Introduction This chapter presents several algebraic approaches to regular tree languages. An algebra is a powerful tool for studying the structure of a regular tree language, often more powerful than tree automata. This makes algebra a natural choice for proving lower bounds. Another area that makes frequent use of algebra is the search for effective characterisations. Since this is the guiding motivation for this chapter, we begin with a discussion of effective characterisations. Effective characterisations. Let L be a class of regular languages (for words or trees). We say that L has an effective characterisation if there is an algorithm that inputs a representation of a regular language, and says if the language belongs to L. We are mainly interested in decidability, so we do not pay too much attention to the format in which the input language is represented, it could be e.g., an automaton or a formula of monadic second-order logic. On the surface, finding an effective characterisation looks like a strange problem. Its practical applications seem limited. And yet for the last several decades, people have intensively searched for effective characterisations of language classes, originally for word languages, and recently also for tree languages. Why? The reason is that each time someone proved an effective characterisation of some class of languages, the proof would be accompanied by a deeper insight into the structure of the class. One can say that “give an effective characterisation” is a synonym for “understand.” In this

802

Mikołaj Bojańczyk

sense, we have still not understood numerous tree logics, including first-order logic with descendant, chain logic, PDL, CTL*, and CTL. A classical example is first-order logic for words. It is one thing to prove that firstorder logic cannot define some language, such as .aa/ . It is another thing to prove the theorem of Schützenberger, McNaughton, and Papert, which says that a word language can be defined in first-order logic if and only if its syntactic monoid is group-free. The theorem does not just give one example of a language that cannot be defined in first-order logic, but it describes all such examples – in this case the languages whose syntactic monoids contain a group. Also, the theorem establishes a beautiful connection between formal language theory and algebra. In recent years, many researchers have tried to extend effective characterisations from word languages to tree languages. This chapter describes some of the known effective characterisations for tree languages, and it gives references to the others. Nevertheless, as mentioned above, many important logics for trees are missing effective characterisations. Of course, algebra has played an important role in the research on tree languages. The goal of this chapter is to describe the algebraic structures that have been developed. However, part of the focus is always on the applications to logic. Therefore, this chapter is as much about logics for trees as it is about algebras for trees. This chapter is exclusively about finite trees. For word languages, the algebraic approach is to use monoids or semigroups. For tree languages, there is much more diversity. This chapter describes four different algebraic approaches, and gives a recipe to define any number of new ones. One reason for this diversity is that trees are more complicated than words, and there are more parameters to control, such as: are trees ranked, or unranked? are trees sibling-ordered or not? Another reason is that the algebraic theory of tree languages is in a state of flux. We still do not know which of the competing algebraic approaches will prove more successful in the long run. Instead of trying to choose the right algebra, we take a more pragmatic approach. Every algebraic approach is illustrated with some results on tree logics that can be proved using the approach, preferably examples which would require more work using the other approaches. There is always the question: what is an algebra? What is the difference between an algebra and an automaton? Two differences are mentioned below, but we make no attempt to answer this interesting question definitively. The first difference is that, simply put, an algebra has more elements than an automaton. For instance, in the word case, a deterministic automaton assigns meaningful information to every prefix of a word, while a homomorphism into a finite monoid assigns meaningful information to every infix. This richer structure is often technically useful. For instance, in monoids, on can define a transitive relation on elements, which considers an infix s to be simpler than its extension ust . This relation is used as an induction parameter in numerous proofs about monoids. The relation does not make sense for automata.

22. Algebra for trees

803

The second difference is that in algebra, unlike in automata, the recognising device is usually split into two parts: a homomorphism and a target algebra. Consider, as an example, the case of words and monoids. If we just know the target monoid and not the homomorphism, it is impossible to tell if the identity of the monoid is the image of only the empty word, or also of some other words. For reasons that are unclear, this loss of information seems to actually make proofs easier and not harder. At any rate, separating the recognising device into two parts is a method of abstraction that is distinctive of the algebraic approach. Potthof’s example. Before we begin our discussion of the various algebras, we present a beautiful example due to Andreas Potthoff, see Lemma 5.1.8 in [34]. This example shows how intuitions gained from studying words can fail for trees. Consider an alphabet ¹a; bº. This alphabet is ranked in the sense that each label of the alphabet determines the number of children used; in this particular example we only consider trees where nodes with letter a have two children, and nodes with letter b are leaves. Let P be the set of such trees where every leaf is at even depth. For instance, P contains the tree b which has only one node labelled b , but it does not contain the tree a.b; b/ which has a root with label a and two children which are leaves labelled by b . We assume here that the depth of a node x is the number of ancestors including x , e.g., the depth of the root is 1. Intuition suggests that P cannot be defined in first-order logic, for the same reasons that first-order logic on words cannot define .aa/ . If we consider first-order logic with the descendant predicate, then this intuition is correct. Consider the balanced tree tn of depth n, defined by t0 D b and tnC1 D a.tn ; tn /. An argument using Ehrenfeucht–Fraissé games shows that every formula of size n will give the same results for all balanced trees of depth greater than 2n . What if, apart from the descendant order, we also allow formulas to use sibling order? Sibling order is the relation x  y , which holds when x is a sibling of y , and x is to the left of y . For the alphabet in question, sibling order could be replaced by a unary “left child” predicate, but we use sibling order, since it works well for unranked trees. We first show that a formula with descendant and sibling orders can distinguish binary complete trees of even and odd depth. The idea is to look at the zigzag path, which begins in the root, goes to the left child, then the right child, then the left child and so on until it reaches a leaf. We say a tree satisfies the zigzag property if the zigzag path has even length, which is the same as saying that the unique leaf on the zigzag path is a left child. Consequently, a balanced binary tree of depth n satisfies the zigzag property if and only if n is even. The zigzag property can be defined in first-order logic, using the descendant and sibling orders: one says that there exists a leaf x which is a left child, and such that for every ancestor y of x that has parent z , one of the nodes y; z is a left child, and the other is a right child or the root. What is more, the zigzag property can be used to actually define the language P . A tree does not have all leaves at even depth if and only if either: a) the zigzag property is not satisfied; or b) for some two siblings x; y the zigzag property is satisfied

804

Mikołaj Bojańczyk

by the subtree of x , but not the subtree of y . These conditions can be formalised in first-order logic; and therefore the language P can be defined in first-order logic with descendant and sibling orders. This example can be used to disprove some intuitions. For instance, the language P is order invariant (i.e., invariant under swapping sibling subtrees). It follows that firstorder logic with descendant order only is strictly weaker than order invariant first-order logic with descendant and sibling orders.

2. Trees as ground terms The first type of algebra that we talk about works for ranked trees. Ranked trees are built using a ranked alphabet, where each letter is assigned a number, called the letter’s arity. A tree over a ranked alphabet is a tree where the number of children of each node is the same as the arity of its label. We write t; s for trees. In particular, leaves are letters of arity zero, also called nullary letters. Since we are considering finite trees, it only makes sense to consider alphabets with at least one nullary letter. The algebraic approach is to see trees as terms, in an algebra whose signature is given by the ranked alphabet. (More exactly, trees are ground terms, i.e., terms that do not use any variables.) The free algebra corresponds to the set of all trees. A finite algebra corresponds to a (deterministic bottom-up) tree automaton, where the domain of the algebra is the state space. The original paper [38] on regular tree languages by Thatcher and Wright talks about trees and tree automata this way. Here is the setup. A ranked alphabet is treated as a signature in the sense of universal algebra. Each letter of the ranked alphabet is a function symbol of the same arity. In particular, the nullary letters, or leaves, are constants. We study algebras over this signature. We call them A-algebras when the alphabet is A. Following universal algebra, an A-algebra A is defined by giving a domain H , and an interpretation of each n-ary letter a in the ranked alphabet as a function aA W H n ! H . A morphism from one A-algebra to another is a mapping from the domain of the first algebra to the domain of the second algebra that preserves the operations in the usual sense. We are only interested in algebras that are accessible, which means that every element of the domain can be obtained by evaluating some expression built out of constants and function symbols. This makes morphisms uninteresting, since there is exactly one morphism between any two accessible A-algebras. We use two types of A-algebra: the free algebra, and algebras with finite domain. The free algebra will correspond to trees, and the finite algebras will correspond to automata. The free algebra. The domain of the free A-algebra is the set of all trees over the ranked alphabet A, which we denote trees.A/. Each n-ary letter a is interpreted as an n-ary operation which takes trees t1 ; : : : ; tn and returns the tree a.t1 ; : : : ; tn / shown below:

22. Algebra for trees

805

a t1

t2

tn

This algebra is free in the following sense. For every A-algebra A, there is a unique morphism ˛ from the algebra to A. If A is an A-algebra and t is a tree (an element of the free A-algebra), we write t A for the image ˛.t/ under this unique morphism. (Unlike the case of monoids or semigroups, the alphabet is interpreted in the signature, and not as generators. In this sense, the set of generators for the free A-algebra is empty, since the trees are built out of constants, or nullary letters. This will change for the other algebras in this chapter.) Recognising languages. A tree language L over alphabet A is said recognised by an algebra A if membership t 2 L depends only on t A . When A is finite, it can be viewed as a deterministic bottom-up tree automaton: the state t A assigned to a tree depends only on the root label of t and the states assigned to the children. Consequently, a tree language over a ranked alphabet is regular if and only if it is recognised by some finite algebra. Syntactic algebra. We now define the syntactic algebra of a tree language, which plays the same role as the syntactic (or minimal) deterministic automaton for a word language. The definition uses a Myhill–Nerode style congruence, which talks about putting trees in different contexts. Here, a context over alphabet A is defined as a tree over an extended alphabet A[¹º, which includes an additional nullary hole symbol . A context must use the hole symbol exactly once. The hole plays the role of a variable, but we use the name hole and the symbol  for consistency with the other parts of this chapter. We write p; q; r for contexts. If p is a context and s is a tree, we write ps for the tree obtained by replacing the hole of p by s . We now define the syntactic A-algebra of a tree language L over an alphabet A. A non-regular language also has a syntactic A-algebra, but it is infinite. We say that two trees s and t are L-equivalent if there is no context that distinguishes them, i.e., no context p such that exactly one of the trees ps and pt is in L. This equivalence relation is a congruence in the free A-algebra, so it makes sense to consider the quotient of the free A-algebra with respect to L-congruence. This quotient is called the syntactic A-algebra of L. One can show that A-algebra is a morphic image of any other A-algebra that recognises L. A consequence is that a tree language is regular if and only if its syntactic A-algebra is finite. Limitations of A-algebras. An advantage of the approach described above is that it uses very simple concepts to describe regular tree languages. Arguably, the definition is simpler than the algebraic approach to word languages via semigroups. But is it fair to use the name algebra for an A-algebra? Or is it an automaton? Below we describe some benefits from using algebra (semigroups and monoids) in the word case which are not available for A-algebras. An important theme in the algebraic approach to word languages is that properties of word languages, such as “the word language can be defined in first-order logic,”

806

Mikołaj Bojańczyk

correspond to properties of syntactic semigroups, such as “the semigroup is group-free.” Also, important properties of semigroups can be stated by identities, such as the identity s ! D s !C1 ;

which says that a semigroup is group-free, or the identities st D ts

and ss D s;

which say that a semigroup is commutative and idempotent. Unfortunately, we run into problems when we try to do this for A-algebras. The first problem is that the signature (i.e., the set of operations) in an A-algebra depends on the ranked alphabet A. Suppose that we want to talk about the class of tree languages that are invariant under reordering siblings. This property can be expressed using identities, but in a cumbersome way: for each letter a of the alphabet, of arity n, we need identities that imply that the children can be reordered, e.g., a.x1 ; : : : ; xn / D a.xj ; x2 ; : : : ; xj

1 ; x1 ; xj C1 ; : : : ; xn 1 ; xn /

for j D 2; : : : ; n:

A second, and more important, problem is that the set of objects is not rich enough. Consider the group-free identity s ! D s !C1 . What makes this identity so powerful is that it says that the left side can be replaced by the right side in any environment. In terms of words, this means that for sufficiently large n, an infix w n can be replaced by an infix w nC1 . In A-algebras, elements of the algebra correspond to subtrees, and any identity will only allow replacing one subtree with another. For words, this would be like using identities to describe suffixes, and not infixes. This means that very few important properties of tree languages can be described using identities in A-algebras. Terms. As we remarked above, talking about trees and subtrees may be insufficient. Sometimes, we want to talk about contexts, or contexts with several holes, which we call “multicontexts.” Formally speaking, a multicontext over the alphabet A is a tree over alphabet A [ ¹º, where  is a nullary letter. In a multicontext, there is no restriction on the number of times (possibly zero) the hole symbol  is used; this number is called the arity of the multicontext. We number the holes from left to right, beginning with 1 and ending with the arity. There are two kinds of substitution for multicontexts. Suppose that p is an n-ary multicontext, and q is an m-ary multicontext. The first kind of substitution places q in one hole. For any i 2 ¹1; : : : ; nº, we can replace the i -th hole of p by q , the resulting multicontext is denoted p i q , and its arity is n C m 1. The second kind of substitution places q in all holes simultaneously; the resulting multicontext is denoted p  q , and its arity is m  n. When talking about ranked trees, we will only use the first kind of substitution. In the language of universal algebras, the first kind of substitution corresponds to treating a multicontext as a term with multiple variables, each of which is used exactly once, while the second kind of substitution corresponds to treating a multicontext as a term with one variable, which is used possibly many times.

22. Algebra for trees

807

Suppose that A is an A-algebra, with domain H . Every k -ary multicontext p over alphabet A can be interpreted, in a natural way, as a function p A W H k ! H:

For technical reasons, we assume that p is not the empty multicontext . We use the name k -ary A-term for any such function. Nullary A-terms can be identified with the domain H . When A is the free A-algebra, A-terms can be identified with the set of k -ary multicontexts over A. There is a natural definition of substitution for A-terms which mirrors substitution on multicontexts, defined by p A i q A D .p i q/A :

2.1. Definite languages. To illustrate the power of A-algebras, but also the difficulties of trees, we will use A-algebras to study definite languages. This is a class of languages, which in the case of words has a simple and elegant algebraic characterisation. Definite word languages. A word language L is called definite if there is a threshold n 2 N such that membership w 2 L depends only on the first n letters of w . Stated differently, a definite word language over alphabet A is a finite Boolean combination of languages of the form wA for w 2 A . There is simple algebraic characterisation of definite word languages. Suppose that L  A is a word language, whose syntactic semigroup morphism is ˛W AC ! S . Then L is definite if and only if the identity s! t D s! u

(1)

holds for any two elements s; t; u 2 S . The idea is that s represents a long word, and anything after it is not important.1 We prove this characterisation below. We say that elements s; t of the syntactic semigroup S have arbitrarily long common prefixes if for any n 2 N, there are words w; v 2 AC , which are mapped by ˛ to s; t respectively, and which have the same labels up to position n. This can be stated without referring to the morphism ˛ as 8n 2 N

9u1 ; : : : ; un 2 S

!

s; t 2 u1    un S:

It is easy to see that a language is definite if and only if any two elements of S that have arbitrarily long common prefixes are equal. We now explain how the latter property is captured by the identity (1). If n is sufficiently large, then a Ramsey argument can be used to show that for any elements u1 ; : : : ; un , there exist 1 < i < j < n 2 ¹1; : : : ; nº such that ui    uj is idempotent. Therefore, a condition necessary for s; t having arbitrarily long common prefixes is s; t 2 xy ! zS

for some x; y; z 2 S:

(2)

It suffices to take x D u1    ui 1 , y D ui    uj and z D uj C1    un . It is not difficult to see that the above condition is also sufficient, since y ! is of the form u1    un for 1 It is important that ˛ is the semigroup morphism, which represents nonempty words, and not the syntactic monoid morphism, which also represents the empty word. Otherwise, we would have to restrict the identity (1) so that s represents at least one nonempty word.

808

Mikołaj Bojańczyk

arbitrarily large n, e.g., by taking u1 D    D un D y . It follows that L is definite if and only if its syntactic semigroup S satisfies the identity xy ! zs 0 D xy ! zt 0 :

One can show that the above identity is equivalent to (1). Definite tree languages. We now try to generalise the ideas above from words to trees. As long as we are only interested in testing if a tree language is definite, then the above approach works. On the other hand, if we want to know which trees have arbitrarily long prefixes, perhaps in a language that is not definite, then the above approach no longer works. We explain this in more detail below. Consider an A-algebra A. We say that g; h 2 A have arbitrarily deep common prefixes if for any n 2 N there are trees s; t from the free A-algebra, with t A D g and s A D h, which have the same nodes and labels up to depth n. We would like to give an alternative definition, which does not mention trees, which are elements of the infinite free A-algebra. Preferably, the alternative definition would give an effective criterion to decide which elements have arbitrarily deep common prefixes. A simple tree analogue of (2) would be g; h 2 xy ! zA D ¹.xy ! z/.f /W f 2 Aº

for some unary A-terms x; y; z:

(3)

(In the notation xy ! z we treat unary A-terms as elements of a finite semigroup.) Unfortunately, the condition above is not the same as saying that g; h have arbitrarily deep common prefixes. The condition is sufficient (for sufficiency, it is important that our definition of A-term does not allow the empty context ), but not necessary, as demonstrated by the following example, which is due to Igor Walukiewicz. Example 2.1. The alphabet has a binary letter a and nullary letters b; c . Consider the language “all leaves have the same label,” and its syntactic algebra, which has three elements: hb D all leaves have label b; hc D all leaves have label c; ? D the rest:

All three elements of the syntactic algebra have arbitrarily deep common prefixes, but the elements hb and hc cannot be presented as hb D xy ! zgb ;

hc D xy ! zgc

for any choice of gb ; gc . The only possible choice would be gb D hb and gc D hc . The problem is that each context x; y; z comes with its own leaves (recall that the symbol a is binary). The first equality requires the contexts x; y; z to have all leaves with label b , and the second equality requires all leaves to have label c .

22. Algebra for trees

809

Tree prefix game. The above example indicates that idempotents in the semigroup of unary A-terms are not the right tool to determine which trees have arbitrarily deep common prefixes. Then what is the right tool? We propose a game, called the tree prefix game. The game is played by two players, Spoiler and Duplicator. It is played in rounds, and may have infinite duration. At the beginning of each round, there are two elements f1 ; f2 of the algebra, which are initially f1 D g and f2 D h. A round is played as follows. First player Duplicator chooses a letter a of the alphabet, say of arity n, and 2n elements of the algebra f11 ; : : : ; f1n

and f21 ; : : : ; f2n

with f1 D aA .f11 ; : : : ; f1n /

and f2 D aA .f21 ; : : : ; f2n /:

If Duplicator cannot find such elements the game is terminated, and Spoiler wins. Otherwise, Spoiler chooses some i 2 ¹1; : : : ; nº and the game proceeds to the next round, with the elements f1i and f2i . If n D 0 (which implies f1 D f2 ), then the game is terminated, and Duplicator wins. If the game continues forever, then Duplicator wins. Theorem 2.1. Two elements of an algebra have arbitrarily deep prefixes if and only if Duplicator wins the tree prefix game. Note that the above theorem gives a polynomial-time algorithm to decide if two elements of a finite algebra have arbitrarily deep common prefixes, since the tree prefix game is a safety game with a polynomial-size arena, which can be solved in polynomial time. Definite tree languages, again. We have seen before that idempotents are not the right tool to describe trees with arbitrarily deep common prefixes. The solution we proposed was the tree prefix game. This game provides an algorithm that decides if a tree language is definite: try every pair of distinct elements in the syntactic algebra, and see if Duplicator can win the game. If there is a pair where Duplicator wins, then the language is not definite. If there is no such pair, then the language is definite. However, if we are only interested in arbitrarily deep common prefixes as a tool to check if a tree language is definite, then idempotents are enough, as shown by the following theorem. (The language in Example 2.1 is not definite.) Theorem 2.2. Let L be a tree language whose syntactic algebra is A. Then L is definite if and only if every unary A-term u and every pair of elements f; g 2 A satisfy u! f D u! g:

Proof. It is easy to see the “only if” direction. We prove the “if” direction. Let ˛ be the syntactic morphism. Since unary A-terms form a finite semigroup, there must be a number n such that for any unary A-terms u1 ; : : : ; un , there is a decomposition u1    un D xy ! z for some unary A-terms x; y; z . Let a be some nullary (leaf) letter in the alphabet. We claim that every tree s over alphabet A has the same image under ˛ as the tree sO obtained from s by replacing all nodes at depth n by a leaf

810

Mikołaj Bojańczyk

with label a. This implies that L is definite, since ˛ gives the same result for any two trees that have the same nodes up to depth n. So consider a tree s . We prove the claim that ˛.s/ D ˛.Os / by induction on the number of nodes in s that have depth n and are not a leaf with label a. The base case, when all nodes at depth n have label a, is immediate, since it implies s D sO . Consider the induction step. Let v be a node at depth n inside s that is not a leaf with label a. Let p be the context obtained from s by putting the hole in v , and let t be the subtree of v ; we have s D pt . By choice of v and n, there exist unary A-terms x; y; z with ˛.p/ D xy ! z . We use the identity from the statement of the theorem to prove that pt and pa have the same image under ˛ : ˛.pt/ D ˛.p/˛.t/ D xy ! z˛.t/ D xy ! z˛.a/ D ˛.p/˛.a/ D ˛.pa/:

The tree pa has more nodes at depth n with label a than the tree s , so we can use the induction assumption to conclude that pa has the same image under ˛ as p ca D sO.

2.2. First-order logic with child relations. In this section, we state one of the more advanced results connecting logic and algebra. The result talks about a variant of firstorder logic that is allowed to use the child predicate, but not the descendant predicate. Fix a ranked alphabet A. We now define a logic that is used to describe trees over alphabet A. For each label a 2 A, there is a unary predicate a.x/, which says that node x has label a. Let n be the maximal arity of a symbol from A. For any i 2 ¹1; : : : ; nº we have a binary predicate, which says that y is the i -th child of x . Importantly, we do not have a predicate x 6 y for the descendant relation. In this section, we talk about first-order logic with these predicates, which we call first-order logic with child relations. Which tree languages can be defined in first-order logic with child relations? We begin with the straightforward observation that the logic can only define “local” properties. Then we state the main result, Theorem 2.3, which characterises the logic in terms of two identities. Suppose that p is a multicontext of arity k . We say that p appears in node x of a tree t , if the subtree of t in node x can be decomposed as p.t1 ; : : : ; tk / for some trees t1 ; : : : ; tk . A local formula is a statement of the form “multicontext p appears in at least m nodes of the tree,” or a statement of the form “multicontext p appears in the root of the tree.” Of course every local formula can be expressed in first-order logic with child relations. The Hanf locality theorem gives the converse: any formula of first-order logic with child relations is equivalent, over trees, to a Boolean combination of local formulas. This normal form using local formulas explains what can and what cannot be expressed in first-order logic with child relations. We give an illustration below. Example 2.2. The alphabet has a binary letter a, a unary letter b and nullary letters c; d . Consider the language L that consists of trees where the root has label a, and some descendant of the root’s left child has label c . We will show that L cannot be described

22. Algebra for trees

811

by a Boolean combination of local formulas, and therefore L cannot be defined in firstorder logic with child relations. For n 2 N, consider the two trees a.b n .c/; b n .d // 2 L

and a.b n .d /; b n .c// 62 L:

Consider any multicontext p . If all holes in p are at depth at most n, then p appears in the same number of nodes in both trees above. Consequently, the two trees cannot be distinguished by any local formula that uses a multicontext with all holes at depth at most n. It follows that any Boolean combination of local formulas will confuse the two trees, for sufficiently large n. In the example above, any local formula would be confused by swapping two subtrees b n .c/ and b n .d / for sufficiently large n. The reason is that the two subtrees agree on nodes up to depth n. This leads us back to the notion of trees that have arbitrarily deep common prefixes, which was discussed in § 2.1. This notion will be key to the following Theorem 2.3, which characterises the tree languages that can be defined in first-order logic with child relations. To state the theorem, we extend the notion of having arbitrarily deep common prefixes from elements of A to A-terms. We say two A-terms u; v have arbitrarily deep common prefixes if for any n 2 N, one can find multicontexts p; q that have the same nodes and labels up to depth n, and such that u D p A and v D q A . Theorem 2.3. A tree language is definable in first-order logic with child relations if and only if its syntactic algebra A satisfies the following two conditions.  Vertical swap. Suppose that u1 ; u2 are unary A-terms with arbitrarily deep common prefixes, and likewise for v1 ; v2 . Then v1 u1 v2 u2 D v2 u1 v1 u2 :  Horizontal swap. Suppose that h1 ; h2 2 A have arbitrarily deep common prefixes, and w is a binary A-term. Then w.h1 ; h2 / D w.h2 ; h1 /:

References. The algebraic approach presented in this section dates from the first paper on regular tree languages [38]. Variety theory for tree languages seen as term algebras was developed in [37]. Theorem 2.2, one of the first effective characterisations of logics for trees, was first proved in [23]. The idea to study languages via the monoid of contexts is from [39]. The generalisation of classical concepts, such as aperiodicity, star-freeness, and definability in logics such as first-order logic, chain logic or antichain logic was studied in [39], [24], [25], [36], and [35]. Theorem 2.3 was proved in [1]. Effective characterisations for some temporal logics on ranked trees were given in [15], [19], and [32], while [33] gives an effective characterisation of locally testable tree languages.

812

Mikołaj Bojańczyk

3. A recipe for designing an algebra Here are some disadvantages of the A-algebras discussed in Section 2.  The principal disadvantage is that the set of objects described by an A-algebra, namely trees, is not rich enough. Almost any nontrivial analysis of a tree language requires talking about contexts (terms with one hole), or even terms with more than one hole. Why not make these part of the algebra?  The set of operations depends on the ranked alphabet A. One consequence of this is that one cannot define any class of tree languages by a single set of identities; since identities need to refer to a common set of operations. Of course a quick fix is to give separate identities for each alphabet.  The trees are ranked. Unranked trees, where there is no limit on the number of children of a node, are important in computer science, especially in XML. In the unranked case, an alphabet just provides names for the letters, without specifying their arities.  From the point of view of many logics, fixing the number of children for each node is artificial. Consider modal logic, which accesses the tree structure via operators “in some child ' ” and “in all children ' .” A property of trees defined in modal logic should be closed under reordering and duplicating children. Reordering is not a problem, but duplicating is disallowed by the syntax of ranked trees. In the rest of this chapter, we present some algebras that try to solve these problems. However, a problem with designing an algebra for trees is that there are so many parameters to control. Are the trees ranked or unranked? Does the algebra represent only trees? Or does it also represent contexts? Or perhaps also terms of arbitrary arity? Is it legal for a context to have the hole in the root? When studying unranked trees, it makes sense to study trees, contexts and terms which have many roots – which leads to a whole new set of parameters. For each choice of parameters there is an algebra. Due to a lack of space and interest, we will not enumerate all these algebras. One solution for controlling the parameters is the framework of C-varieties given in [31], which also works for trees. We choose a different solution. We give a general recipe for designing an algebra, and then uses it to design some algebras for trees. The recipe requires three steps. Each step is described by a question. 1. What are the objects? In the first step, we choose what objects will be represented. Some possible choices: a. Multicontexts of arities ¹0; 1; : : :º, which may have several roots; b. multicontexts as above, but where none of the holes is in a root; c. multicontexts of arity at most one, which may have several roots. If there are different kinds of objects, then a multisorted algebra might be needed, i.e., an algebra with several different types of objects. For instance, in the last case we would have two sorts: for arities zero and one.

22. Algebra for trees

813

2. What are the operations? In the second step, we design the operations. The operations should not depend on the alphabet. These are designed so that if we take an unranked alphabet A, and start with contexts of the form a (a node with label a, with a single child that is a hole), then all the other objects can be generated using the operations. There are other ways of interpreting letters as generators, but we stay with a in the interest of reducing the already large number of models. Note that the objects represented in the algebra, as chosen in the first step, must include at least the generator contexts. 3. What are the axioms? In the first two steps, we have basically designed the free algebra. In the last step, we provide the axioms. These should be chosen so that the free algebra designed in the first two steps is free in the sense of universal algebra. In other words, if we take all possible expressions that can be constructed from the generators a and the operations designed in the second step; and take quotients of these expressions by the least congruence including the axioms, then we get the objects designed in the first step. We use the recipe to design three algebras: preclones in § 4, seminearrings in § 6, and forest algebra in § 5. The reader can use the recipe to design other algebras. An important algebra not included in this chapter is the tree algebra of Thomas Wilke; see [40]. The algebraic approach to tree languages, as described in the recipe above, was pioneered by this tree algebra. Also, tree algebra was used to give one of the first nontrivial effective characterisations of a tree logic, namely an effective characterisation of frontier testable languages; again see [40]. Nevertheless, tree algebra is omitted from this chapter, mainly due to its close similarity with the forest algebra, which is described in § 5, and which was inspired by tree algebra.

4. Preclones As we saw in the study of definite tree languages in the previous section, in some cases it is convenient to extend an A-algebra with terms of arities 1; 2; 3 and so on. So why not include all these objects in the algebra? This is the idea behind preclones. Objects. The objects represented by a preclone are all multicontexts. For each arity, there is a separate sort. Consequently, there are infinitely many sorts. Operations. Suppose that A is a ranked alphabet. Each letter of arity k can be treated as an element of the sort for arity k . We want to design the operations so that from the letters, all possible multicontexts can be built. All we need is substitution: for an m-ary multicontext, an n-ary multicontext, and a hole number i 2 ¹1; : : : ; mº, return the multicontext p i q of arity m Cn 1, obtained by replacing the i -th hole of p with q . Formally speaking, if the sorts are ¹Tm ºm2N , then we have an infinite set of operations .u 2 Tm ; v 2 Tn / 7 ! u i v 2 TmCn

1

for m; n 2 N and i 2 ¹1; : : : ; mº.

814

Mikołaj Bojańczyk

Axioms. A preclone should satisfy the following associativity axiom for any arities k; n; m and terms u; v; w of these arities, respectively: 8 ˆ 2 in the clone, and a nullary term h. For i 2 N, consider the term ui of arity k i defined by u1 D u;

ui C1 D u.ui ; : : : ; ui /:

Let hi be the nullary term obtained from ui by substituting h in all holes. Since the clone is finitary, there must be some i < j such that hi D hj . By induction on expression size, one shows that any expression built out of uj i and hi that evaluates to a nullary term has value hi . Therefore, the sub-preclone generated by uj i and hi has only one nullary term, namely hi . We will use a lemma on finite semigroups, which is stated below. A proof can be found in [8]. An alternative proof would use Green’s relations. Lemma 4.3. Let s; t be elements of a finite semigroup S . For some s 0 ; t 0 2 S we have sO D sOsO D sOtO and tO D tOtO D tOsO for sO D ss 0 ; tO D t t 0 : Proof of Theorem 4.1. Let the finitary clone be T . Thanks to Lemma 4.2, we may assume that T has only one nullary term, call it h. Let u be some binary term in T . Our goal is to find a sub-preclone U, which has only one unary term v1 , and one binary term v2 . Define two unary terms: s D u 2 h;

t D u 1 h:

Consider the semigroup S of unary terms generated by s; t , and apply Lemma 4.3, resulting in unary terms sO; tO. The term sO will be the unique unary term v1 in the subpreclone U that we are defining. The unique binary term v2 is defined as v1 u s0

t0

v1

v1

Mikołaj Bojańczyk

816

Let U be the sub-preclone of T generated by h; v1 and v2 . We claim that v1 is the only unary term in U and that v2 is the only binary term in U. To prove the claim, we use two sets of identities. The first set of identities says that extending any term from U with v1 , either at the root or in some hole, does not affect that term: v1  v D v

and v i v1 D v

for any k -ary term v in U and i 2 ¹1; : : : ; kº:

(In v1  v , we use the operation  which substitutes v for all holes of v1 . Since v1 is a unary term, this is the same as v1 1 v .) The identities hold when k D 0, since there is only one nullary term. When k > 1, then the root and all holes of v are padded by v1 , which is idempotent, since it was obtained from Lemma 4.3. The second set of identities says that plugging either hole in v2 with the unique nullary term h gives v1 : v2 1 h D v1

and v2 2 h D v1 :

We only prove the first identity; the second one is shown the same way. v2 1 h D

v1

D v1

u

u

s0

t0

v1

v1

h

D v1 D v1 : t t0

t0

v1

v1

h

The first equality is by definition of v2 . The second equality is because h is the only nullary term. The third equality is by definition of s D u 1 h. The last equality is by the properties of v1 D sO from Lemma 4.3. Using the two sets of identities, one shows that if an expression built from h; v1 and v2 evaluates to a term of arity at most two, then that term is one of h; v1 ; v2 . 4.2. An application to logic. In this section we present an application of Theorem 4.1 to logic. Consider words, and first-order logic with the order relation 6 on word positions. Every sentence (a formula without free variables) is logically equivalent to a sentence that uses only three variables. This follows, for instance from Kamp’s theorem [28] on the equivalence of first-order logic and LTL, or from the McNaughtonPapert theorem [30] on the equivalence of first-order logic and star-free expressions. The reason is that any LTL formula, or any star-free expression, can be translated into a sentence of first-order logic with at most three variables. The three-variable theorem fails on trees if the signature has the descendant order, but does not have access to sibling order. The counterexample is very simple. Consider an alphabet with a letter a of arity 2n C 1, and nullary letters b and c . A sentence with n

22. Algebra for trees

817

variables, which uses the descendant order and labels, cannot distinguish the two trees n times

n C 1 times

‚ …„ ƒ ‚ …„ ƒ a. b; : : : ; b; c; : : : ; c /

n C 1 times n times

‚ …„ ƒ ‚ …„ ƒ and a.b; : : : ; b ; c; : : : ; c /:

A sentence with n C 1 variables can distinguish the two trees. For n D 3, the example above shows that three variables are not sufficient to capture all first-order logic. The counterexample above no longer works if we allow sibling order. (Sibling order is the partial order which orders siblings, and does not order node pairs that are not siblings. Equivalently, one can use the lexicographic linear order on nodes. This is because in the presence of the descendant order, the sibling and lexicographic orders can be defined in terms of each other.) Actually, under the signature which has the descendant and sibling orders, every sentence can be expressed using only three variables. The picture becomes more interesting if we allow free variables in formulas. Of course, if a formula has more than three free variables, then some special statement of the three-variable theorem is needed. Here is a solution, which works for words: any formula with free variables x1 ; : : : ; xn is equivalent, over words, to a Boolean combination of formulas .xi ; xj / that have two free variables and use three variables. What about trees? The following theorem shows that the result fails. Theorem 4.4. Consider first-order logic with the descendant and sibling orders. The following formula with free variables x; y; z , 8u

.u 6 x ^ u 6 y/ H) .u 6 z/;

is not equivalent to any Boolean combination of formulas with two free variables. A corollary is that the formula from the theorem is not equivalent to any formula which uses only three variables, including the bound variables. Suppose that the equivalent formula is '.x; y; z/. By stripping ' to the first quantifiers, we see that ' is a Boolean combination of two-variable formulas, which is impossible by Theorem 4.4. The rest of this section is devoted to proving Theorem 4.4. The proof uses preclones and Theorem 4.1. We first show how preclones can describe formulas with free variables. Let xN D x1 ; : : : ; xn be a tuple of nodes in a tree t . The order type of xN in t consists of information about how the nodes in the tuple are related with respect to the descendant and lexicographic order. Define x0 to be the root. The xN -decomposition of t is a tuple .p0 ; p1 ; : : : ; pn /, where pi is the multicontext obtained from t by setting the root in xi and placing holes in all the minimal proper descendants of xi from ¹x1 ; : : : ; xn º.

Lemma 4.5. Let  be a formula with free variables, over a ranked alphabet A. There is a morphism ˛ from the free preclone over A into a finitary preclone T such that the answer of  for a tuple of nodes xN in a tree t depends only on the order type of xN and the image under ˛ of the xN -decomposition of t .

The lemma above actually holds even if  is defined in a stronger logic, namely MSO. Since the lemma is our only interface to the logic in the proof below, we see that

Mikołaj Bojańczyk

818

a stronger version of Theorem 4.4 holds: the formula from the theorem is not equivalent to any Boolean combination of MSO formulas with two free variables. Proof of Theorem 4.4. Suppose that .x; y; z/ is equivalent to a Boolean combination of formulas with two free variables. These formulas can be of the form .x; y/, .y; z/ or .x; z/. Let € be the set of these formulas  . Toward a contradiction, we will construct trees t1 ; t2 and tuples of nodes .x1 ; y1 ; z1 /; .x2 ; y2 ; z2 / such that t1 ˆ

.x1 ; y1 ; z1 /

but for every formula  2 € , we have

and t2 6ˆ

.x2 ; y2 ; z2 /;

(4)

t1 ˆ .x1 ; y1 / () t2 ˆ .x2 ; y2 /; t1 ˆ .y1 ; z1 / () t2 ˆ .y2 ; z2 /; t1 ˆ .x1 ; z1 / () t2 ˆ .x2 ; z2 /:

For each  2 € , apply Lemma 4.5, resulting in a morphism ˛ from the free preclone over alphabet A into a finitary preclone T . Let T be the product of these preclones, and let ˛ be the corresponding product morphism. Apply Theorem 4.1 to the preclone T , obtaining a sub-preclone U. Let s be some tree that is mapped by ˛ to the unique nullary term v0 in U, and let p be some binary multicontext that is mapped by ˛ to the unique binary term v2 in U. Consider the two trees t1 D

t2 D

p s

p

p s

p s

s

s s

Define x1 ; y1 ; y1 to be the roots of the three s trees in t1 , from left to right. Likewise for x2 ; y2 ; z2 . It is not difficult to see that (4) holds. We now show that each binary query  2  gives the same answer for .x1 ; y1 / in t1 as it does for .x2 ; y2 / in t2 . The same argument works for the other two combinations of variables. By Lemma 4.5, it suffices to show that the order type is the same for .x1 ; y1 / is the same as for .x2 ; y2 /, which it is, and that the images under ˛ are the same for the .x1 ; y1 /-decomposition of t1 and for the .x2 ; y2 /-decomposition of t2 . But these images are necessarily the same, since they belong to the pre-clone U, which has only one term in the nullary and binary sorts. References. Preclones where introduced in [20]. One of the themes studied in [20] was the connection of first-order logic to the block product for preclones; we will come back to such questions in § 7. Theorem 4.1 was proved in [8], although not in the formalism of preclones. Theorem 4.4 was suggested by Balder ten Cate.

5. Forest algebra We now present the second algebraic structure designed according to the recipe from § 3, which is called forest algebra [16]. Forest algebra is defined for unranked trees. In an unranked tree, the alphabet A does not give arities for the letters. A tree

22. Algebra for trees

819

over an unranked alphabet has no restriction on the number of children, apart from finiteness. Objects. We work with ordered sequences of unranked trees, which we call forests. We adapt the definition of contexts to the unranked setting in the natural way, with the added difference that we allow several roots. More formally, a context over an unranked alphabet A is an ordered sequence of trees over alphabet A [ ¹º, where the symbol  appears in exactly one leaf. We allow the hole to appear in a root, and also a context  that consists exclusively of the hole. (Multicontexts, which have several holes, are not considered in forest algebra. They will appear in seminearrings, an algebra described in § 6). Here are some examples: a a

a b

c

a

b a

c

b

a tree

c

b

a tree

b b

a

b

c

b

a c

a

a b

a b

c

a forest

a

c

c

a context

c

c

a context

In a forest algebra, we choose our objects to be forests and contexts. These live in two separate sorts. These sorts are denoted H (as in horizontal), for the forest sort, and V (as in vertical) for the context sort. Operations. We interpret each letter a of an unranked alphabet as the following context, which is also denoted a. The operations below are designed so that for any alphabet, starting with contexts above, we can build all forests and contexts.  Two constants: an empty forest 0, and an identity context .

 Concatenating forests s and t , written s Ct . This operation is illustrated below: a a

b c

D

c

b a

b

C b

a a

b c

C c a

D b

a a

b b

C c a

b

c

We can also concatenate a context p and a forest t , the result is a context p C t ; likewise we can get t C p . We cannot concatenate two contexts, since the result would have arity two. Formally speaking, there are three concatenation operations, of types forest-forest, context-forest and forest-context, which formally should be denoted CHH , CVH and CHV . In most cases, we will skip these subscripts, which are determined by the sorts of the arguments.

Mikołaj Bojańczyk

820

 Composing two contexts p and q , written p  q . This operation is illustrated below: a c a b a c

a a

a b

a b

c

c

c

p

c

c

a a c a b

b q

c

c

b

pq

We can also substitute a forest t into the hole of a context p , the result is written p  t . (Again, we have two different types of  operation, which should formally be distinguished by subscripts V V and VH .) As usual with multiplicative notation, we sometimes skip the dot and write pq instead of p  q . We also assume that  has precedence over C, so pq C s means .p  q/ C s and not p  .q C s/. It is not difficult to see that any forest or context over alphabet A can be constructed using the above operations from the contexts 0,  and ¹aºa2A . The construction is by induction on the size of the forest or context, and corresponds to a bottom up-pass. Axioms. So far, we know that a forest algebra is presented by giving two sorts: forests H and contexts V , along with operations: CHH W H  H ! H;

0 2 H;  2 V; CHV W H  V ! V;

V V W V  V

! V;

CVH W V  H ! V;

VH W V  H ! H:

Of course, there are some axioms that need to satisfied, if we want the objects represented by the algebra to be forests and contexts: 1. .H; CHH ; 0/ is a monoid (called the horizontal monoid); 2. .V; V V ; / is a monoid (called the vertical monoid); 3. the operations CHV W H  V

! V;

CVH W V  H ! V;

VH W V  H ! H

are, respectively, a left monoidal action of H on V , a right monoidal action of H on V , and a left monoidal action of V on H . In other words, the following hold for any g; h 2 H and v; w 2 V : .h CHH g/ CHV v D h CHV .g CHV v/; 0 CHV v D v; v CVH .h C g/ D .v CVH h/ CVH g; v CVH 0 D v; .v V V w/ VH h D v VH .w VH h/;  VH h D h; .h CHV v/ CVH g D h CHV .v CVH g/:

This completes the design process.

22. Algebra for trees

821

Free forest algebra. The notion of forest algebra morphism is inherited from universal algebra. A forest algebra morphism between two forest algebras is a function which maps the horizontal sort of the first algebra into the horizontal sort of the second algebra, and which maps the vertical sort of the first algebra into the vertical sort of the second algebra. We write a morphism ˛ from a forest algebra .H; V / into a forest algebra .G; W / as ˛W .H; V / ! .G; W /:

For an unranked alphabet A, we define free forest algebra over an alphabet A, which is denoted by .HA ; VA /. The elements of the horizontal sort HA are all forests over A, and the elements of the vertical sort are all contexts over A. This is indeed a free object in the category of forest algebras, as stated by the following result. Let .H; V / be a forest algebra, and consider any function f W A ! V . There exists a unique forest algebra morphism ˛W .HA ; VA / ! .H; V / which extends the function f in the sense that ˛.a/ D f .a/ for all a 2 A.

Recognising languages. A forest language L over alphabet A is said recognised by a morphism ˛ from the free forest algebra over A into a forest algebra .H; V / if membership t 2 L depends only on ˛.t/. A language is recognised by a forest algebra if it is recognised by a forest algebra morphism into that algebra. One can show finite forest algebras recognise exactly the regular forest languages, in any one of the many equivalent definitions of regularity for forest languages (such as hedge automata or MSO). Syntactic forest algebra. Forest algebra also has a notion of syntactic object. For an alphabet A, consider a forest language L, which is not necessarily regular. We define a Myhill–Nerode equivalence relation on the free forest algebra as follows: s L t p L q

holds for s; t 2 HA if ps 2 L () pt 2 L for all p 2 VA ;

holds for p; q 2 VA if rps 2 L () rqs 2 L for all r 2 VA , s 2 HA :

This two-sorted equivalence relation is a congruence for all operations of forest algebra, hence it makes sense to consider a quotient of the free forest algebra with respect to the equivalence. This quotient is called the syntactic forest algebra of L, and it is denoted by .HL ; VL /. The morphism which maps each forest or context to its equivalence class is called the syntactic forest algebra morphism, and is denoted by ˛L . As is typical for syntactic morphisms, any (surjective) forest algebra morphism that recognises L can be uniquely extended to ˛L . Consequently, a language is regular if and only if its syntactic forest algebra is finite. A simple example: label testable languages. A forest language is called label testable if membership of a forest in the language depends only on the set of labels that appear in the forest. In other words, this is a Boolean combinations of languages of the form “forests that contain some node with label a.”

Mikołaj Bojańczyk

822

Theorem 5.1. A forest language L is label testable if and only if its syntactic forest algebra .HL ; VL / satisfies the identities: vv D v;

vw D wv

for v; w 2 VL :

Proof. The “only if” part is straightforward; we only concentrate on the “if” part. Suppose that identity in the statement of the theorem is satisfied. We will show that for every forest t , its image ˛L .t/ under the syntactic forest algebra morphism ˛L depends only on the set of labels appearing in t . We start by showing that the two equations from the statement of the theorem imply another three. The first is the idempotency of the horizontal monoid: h C h D .h C /.h C /0 D .h C /0 D h:

The second is the commutativity of the horizontal monoid:

h C g D .h C /.g C /0 D .g C /.h C /0 D g C h:

Finally, we have an equation that allows us to flatten the trees: vh D h C v0:

The proof uses, once again, commutativity of the vertical monoid: vh D v.h C /0 D .h C /v0 D h C v0:

We will show that using the identities above, every forest t has the same image under ˛ as a forest in a normal form a1 0 C    C an 0, where each tree contains only one node, labelled ai . Furthermore, the labels a1 ; : : : ; an are exactly the labels used in t , sorted without repetition under some arbitrary order on the set A. Starting from the normal form one can first use idempotency to “produce” as many copies of each label as the number of its appearances in the tree. Then using the last equation and the commutativity one can reconstruct the tree starting from leaves and proceeding to the root. If we omit the equation vv D v , we get languages that can be defined by a Boolean combination of clauses of the forms: “label a occurs at least k times,” or “the number of occurrences of label a is k mod n.” 5.1. A more difficult example: the logic EF. In this section we use forest algebras to give an effective characterisation of a temporal logic called EF. Even thought the proof is a bit involved (but not too much), we present it in full, because it illustrates how concepts familiar from semigroup theory appear in forest algebra, in a suitably generalised form. These concepts include ideals and Green’s relations. The logic EF. We begin by defining the logic. Because forest algebras are better suited to studying forests, rather than trees, we provide a slightly unusual definition of the logic EF, which allows formulas to express properties of forests. There are two kinds of EF formulas: tree formulas, which define tree languages, and forest formulas, which define forest languages. The most basic formula is a, for any letter of the alphabet, this is a tree formula that is satisfied by trees with a in the

22. Algebra for trees

823

root. If ' is a tree formula, then EF' is a forest formula, which defines the set of forests t where some subtree satisfies ' . (If t1 ; : : : ; tn are trees, then a subtree of the forest t1 C    C tn is a subtree of any of the trees t1 ; : : : ; tn , possibly one of these trees.) If ' is a forest formula, then Œ' is a tree formula that is satisfied by trees of the form at , where t is a forest satisfying ' , and a is any label. Finally, both tree and forest formulas allow Boolean operations _, ^ and :. As far as properties of trees are concerned, our definition of EF is equivalent to the standard definition – just use ŒEF' instead of EF' . Theorem 5.2. A forest language is definable by a forest formula of EF if and only if its syntactic forest algebra satisfies the following identities, called the EF identities: g C h D h C g;

vh D h C vh:

(5) (6)

This theorem is stated for the nonstandard forest variant of EF. However, it can be used to characterise the tree variant. One can show that a tree language L can be defined by a tree formula of EF if and only if for every label a, the forest language ¹tW at 2 Lº can be defined by a forest formulas of EF. We begin the proof with the “only if” implication in the theorem. Fix a forest formula ' of EF. For a forest t , consider the set of tree subformulas of ' that are true in some subtree of t . We say that two forests are ' -equivalent if these sets coincide for them. Observe that if forests s; t are ' -equivalent, then so are ps; pt for any context p . Consider the syntactic forest algebra morphism of the forest language defined by ' . By definition of the syntactic morphism, it follows that ' -equivalent forests have the same image under thee syntactic forest algebra morphism. It is also easy to see that s C t is ' -equivalent to t C s for any forests s; t . Likewise, pt is ' -equivalent to t C pt . It follows that the EF identities must be satisfied by the syntactic forest algebra of a forest language defined by ' . The rest of § 5.1 is devoted to the more interesting “if” implication in Theorem 5.2. Consider a forest algebra morphism ˛ from a free forest algebra .HA ; VA / into a finite forest algebra .H; V / that satisfies the two EF identities. We will show that any forest language recognised by ˛ can be defined by a forest formula of EF. This gives the “if” implication in the case when ˛ is the syntactic forest algebra morphism. The proof is by induction on the size of H . The induction base, when H has one element, is immediate. A forest algebra with one element in H can only recognise two forest languages: all forests, or no forests. Both are definable by forest formulas of EF. The key to the proof is a relation on H , which we call reachability. We say that h 2 H is reachable from g 2 H if there is some v 2 V such that h D vg . Another definition is that h is reachable from g if V h  Vg. That is why we write h 6 g when h is reachable from g . This relation is similar in spirit to Green’s relations. (Note that both H and V are monoids, so they have their own Green’s relations in the classical sense.) Lemma 5.3. If the EF identities are satisfied, then reachability is a partial order.

Mikołaj Bojańczyk

824

Proof. The reachability relation is transitive and reflexive in any forest algebra. To prove that it is antisymmetric, we use the EF identities. Suppose that g; h 2 H are reachable from each other, i.e., g D vh and h D wg for some v; w 2 V . Then they must be equal: (6)

(5)

(6)

h D wvh D vh C wvh D vh C h D h C vh D vh D g:

The reachability order has smallest and greatest elements, which we describe below. Every forest is reachable from the empty forest 0, which is the greatest element. The smallest element, call it h? , is obtained by concatenating all the elements of H into a single forest h1 C    C hn . This element is reachable from all elements of H , by h? D .h1 C    C hi

1

C  C hi C1 C    C hn /hi :

For h 2 H , we define Hh  H to be the set of elements from which h is not reachable. This set is an ideal of forest algebra in the sense that the following inclusions hold: Hh C H  Hh ;

H C Hh  Hh ;

VHh  Hh :

Consider the equivalence h , which identifies all elements of Hh into one equivalence class, and leaves all other elements distinct. We can extend this equivalence to V by keeping all elements of V distinct. Because Hh is an ideal, the resulting two-sorted equivalence is a congruence for all operations of forest algebra. Therefore, it makes sense to consider the quotient forest algebra morphism ˛h from .H; V / into the quotient forest algebra .H; V /=h . Our strategy for the rest of the proof is as follows. For every element h 2 H , we will prove that the inverse image ˛ 1 .h/ can be defined by a forest formula of EF – call it 'h . Consequently, every language recognised by ˛ is definable by a forest formula of EF, as a finite disjunction of formulas 'h . In our analysis, we pay special attention to subminimal elements. An element h 2 H is called subminimal if there is exactly one element g < h, the smallest element h? . Consider an element h that is neither subminimal nor the smallest element. Recall the congruence h and the resulting forest algebra morphism ˛h . By definition of ˛h , for any forest t with h reachable from ˛.t/, the images of t under ˛h and ˛ are the same. It follows that ˛ 1 .h/ is recognised by the morphism ˛h . Since Hh contains at least two elements (a subminimal element and the smallest element) then h is a nontrivial congruence. Consequently, ˛ 1 .h/ can be defined by a forest formula of EF, using the induction assumption. Consider an element h that is subminimal. Then Hh contains the smallest element h? and all the subminimal elements different from h. (Here we use the assumption that 6 is a partial order.) Therefore, if there are at least two subminimal elements, then ˛ 1 .h/ is definable by a forest formula of EF, say 'h , for any element h ¤ h? . It remains toWgive a formula for h? . This formula says that h? corresponds to all other forests: : h¤h? 'h . We are left with the case where there is exactly one subminimal element, say h . By the reasoning above we have formulas for all elements except the smallest h? and

22. Algebra for trees

825

the unique subminimal h . It is therefore enough to give a formula that distinguishes between the two. We do this below. We begin by defining some tree formulas. Consider an element h that is neither h? nor h . Below we define a tree formula h of EF that describes trees mapped by ˛ to h. _ a ^ Œ'g : h D a2A;g>h ˛.a/gDh

Thus far, for each element h other than h? and h we have a forest formula 'h and a tree formula h . We would like similar formulas for h . However, we will have to deal with a certain loss of precision: instead of saying that a tree (or forest) is mapped to h , we will say that it is mapped to either h or h? . This is accomplished by the formulas ^ ^ 'h and 'h D h: h D h¤h? ;h

h¤h? ;h

We now conclude the proof, by giving a forest formula of EF that describes ˛ 1 .h? /. When is a forest t mapped by ˛ to h? ? One possibility is that t has a subtree mapped to h? . If that subtree is taken minimal, then it is of the form as , with ˛.s/ ¤ h? and ˛.a/˛.s/ D h? . Another possibility is that t has no subtrees with mapped to h0 , in which case t is a concatenation of trees t D t1 C    C tn with ˛.t1 /; : : : ; ˛.tn / 2 H

¹h? º;

˛.t1 / C    C ˛.tn / D h? :

By expressing the above analysis in a formula, we see that any forest mapped by ˛ to h? satisfies the following formula: _ _ a ^ Œ'h  _ EF h1 ^    ^ EF hn : a2A;h¤h? ˛.a/hDh?

h1 ;:::;hn 2H ¹h? º h1 CChn Dh?

We also prove the converse implication: any forest that satisfies the above formula is mapped by ˛ to h? . Suppose that a forest satisfies the first disjunct, for some a and h with ˛.a/h D h? . This means that the forest has a subtree as , where s satisfies 'h . If h is not h , then we know that s is mapped to h by ˛ , and therefore as is mapped to h? by ˛ . Any forest that has a subtree mapped to h? must necessarily be mapped to h? itself, by definition of the reachability order. Suppose that a forest t satisfies the second disjunct, for some choice of h1 ; : : : ; hn . Let t1 ; : : : ; tn be the subtrees of t that satisfy the formulas h1 ; : : : ; hn . One possibility is that some hi is h , and the tree ti is mapped to h? (recall the loss of precision in the formula h ). In this case we are done, since also t is mapped to h? . Otherwise, every tree ti is mapped to hi . Now consider any of the trees ti . Since it is a subtree of t , there is a context pi with t D pi ti . Therefore, we have (6)

˛.t/ D ˛.pi ti / D ˛.ti / C ˛.pi ti / D hi C ˛.t/:

Mikołaj Bojańczyk

826

By applying the above to all the trees ti , we get ˛.t/ D h1 C    C hn C ˛.t/ D h? C ˛.t/ D h? :

References. Forest algebra was introduced in [16]. The characterisation of EF also comes from [16], although it is based on a characterisation for ranked trees from [15]. Forest algebra was used to obtain algebraic characterisations of several tree logics: a variant of EF with past modalities [5]; Boolean combinations of purely existential firstorder formulas [13]; languages that correspond to level 2 of the quantifier alternation hierarchy [12]. A variant of forest algebra for infinite trees was proposed in [9].

6. Seminearring In this section we present the third and final algebraic structure that is designed using the recipe in § 3. The algebraic structure is called a seminearring; it is like a semiring, but with some missing axioms. Objects. We want to represent all multicontexts, which are unranked forests with any number of holes. More formally, a multicontext over an unranked alphabet A is an ordered sequence of unranked trees over alphabet A[¹º, where the symbol  appears only in leaves. We allow the hole to appear in a root, and also multicontexts that consist exclusively of holes. Here are some examples: a a c

b b

a

b

c

b

a c

a a c

a b

b a b

c

a forest, which is a multicontext of arity 0

c

c

a context, which is a multicontext of arity 1

c

a multicontext of arity 3

In a seminearring, we choose our objects to be all multicontexts. A seminearring has only one sort; all multicontexts live in the same sort. Operations. We have to design the operations so that for any alphabet A, starting with contexts a, we can build all other multicontexts.  Two constants: a multicontext 0 with no holes and no nodes; and a multicontext  with a single hole.  Concatenating two multicontexts p and q . The result, denoted p C q has an arity which is sum of arities of p and q . This operation is illustrated below: a a

b c

=

c

b a

b

C b

a a

b c

C c a

= b

a a

b b c

C c a

b

22. Algebra for trees

827

 Composing two multicontexts p and q . The result, denoted p  q or pq , is obtained from p by substituting each hole by q . The arity of pq is the product of arities of p and q : a a c a c a a c

a c

a b

b a b

a b

c

c

c

b a b c

a c

c

a b

c

c

p

pq

q

Any multicontext p over alphabet A can be constructed using concatenation and composition from the multicontexts 0,  and ¹aºa2A . The construction is by induction on the size of p , and corresponds to a bottom up-pass. This completes the second state of the design process. Axioms. Consider two expressions that use the operations above, e.g. .a C .b C c//  d

and ..a  d / C .b  d // C .c  d /:

These expressions should be equal, since they describe the same multicontext, namely a

b

c

d

d

d

(As usual, we treat each letter a as describing the multicontext a.) We need to design the axioms so that they imply equality of the two expressions given above, and other equalities like it. We use the following axioms, which can be stated as identities: 1. 2. 3. 4.

concatenation C is associative, with neutral element 0; composition  is associative, with neutral element 2 ; for any p; q; r 2 N we have left-distributivity: .p C q/  r D p  r C q  r ; in the composition monoid, 0 annihilates to the left: 0  p D 0.

The axioms above complete the design process. The algebraic object described above is called a seminearring. It is like a semiring, but some axioms are missing. In § 6.2, we will say what properties of trees can be described by those seminearrings which are semirings. We will write U; V; W for seminearrings, and u; v; w for their elements. 2 For consistency with the other algebras in this chapter, we use  instead of the more common 1.

Mikołaj Bojańczyk

828

Free seminearring. If A is an alphabet, we write VA for the seminearring whose domain consists of all multicontexts over alphabet A. This seminearring is free in the following sense. Let A be an alphabet and V be a seminearring. Any function from ¹aW a 2 Aº to the domain of V extends uniquely to a morphism ˛W VA ! V.

Recognising languages. We say that a forest language L is recognised by a seminearring morphism ˛W VA ! V if for every forest t , treated as a nullary multicontext, membership t 2 L depends only on the image ˛.t/. Syntactic seminearring. A syntactic seminearring can be defined, like for forest algebra, using a Myhill–Nerode equivalence relation on the free seminearring as follows: p 'L q

holds for p; q 2 VA if r1 pr2 0 2 L () r1 qr2 0 2 L for all r1 ; r2 2 VA :

Since elements of VA are multicontexts, and elements of the form r2 0 are all forests, the condition for p 'L q can be restated as rpt 2 L () rqt

for every forest t and multicontext r .

This is a congruence for all operations of seminearring, and hence it makes sense to define the syntactic seminearring and the syntactic seminearring morphism using the quotient under this relation. 6.1. From a seminearring to a forest algebra and back again. Suppose that .H; V / is a forest algebra. There is a natural way to construct a seminearring out of this forest algebra. Consider the least family V of functions vW H ! H which

 contains the constant function g 7! h for every h 2 H ;  contains the function h 7! vh for every v 2 V ;  is closed under concatenation: if v; w belong V, then so does the function h 7! vh C wh;  is closed under composition: if v; w belong V, then so does the function h 7! w.vh/.

When equipped with the seminearring operations in the natural way, this set forms a seminearring, which we call the seminearring induced by the forest algebra .H; V /.

Lemma 6.1. The syntactic seminearring of a forest language is isomorphic to the seminearring induced by its syntactic forest algebra. From the above lemma it follows that two languages have the same syntactic forest algebra, then they have the same syntactic seminearring. In general, the converse fails. For the forest sort, there is no problem: use elements of the form v0. However, there is a problem for the context sort, since there is no way of telling which elements of a seminearring correspond to multicontexts of arity 1. This is illustrated in the following example. Example 6.1. We present two languages, which have the same syntactic seminearring, but different syntactic forest algebras.

22. Algebra for trees

829

 The alphabet is ¹a; bº. The language is “there is an a.” The syntactic seminearring has three elements 0; ; 1. The syntactic morphism is described below: 0 is the image of multicontexts of arity 0 without an a;  is the image of multicontexts of arity at least 1 without an a; 1 is the image of multicontexts with an a. The operations in the seminearring are defined by the axioms and by 1 C v D v C 1 D 1  v D v  1 D 1:

The syntactic forest algebra of this language has two elements in the context sort, these elements correspond to  and 1. There is no element corresponding to 0, since every context must have a hole.  The alphabet is ¹a; b; cº. The language is “some node has label a, but no ancestor with label b .” The syntactic seminearring is isomorphic to the one above, only the syntactic morphism is different, as described below: 0 is the image of multicontexts that have no a without b ancestors, and where every hole (if it exists) has a b ancestor.  is the image of multicontexts that have no a without b ancestors, but where some hole has no b ancestors. 1 is the image of multicontexts that have an a without b ancestors. The syntactic forest algebra of this language has three elements in the context sort, these elements correspond to 0;  and 1.

Note that the first language in the example can be defined in the logic EF. The second example cannot be defined in EF, since violates identity (6), e.g. the forests ba and ba C a have different images under the syntactic forest algebra morphism. This shows that there is no way of telling if a forest language can be defined in EF just by looking at its syntactic seminearring. This is a general theme in the algebraic theory of tree languages. When we use an algebra that represents a richer set of objects, we lose information in the syntactic object. The classic example is that when we look at the syntactic monoid instead of the syntactic semigroup, we do not know if the identity element in the syntactic monoid represents some nonempty words in addition to the empty word. Of course a solution is to recover the lost information by taking into account the syntactic morphism. This solution is illustrated by the following theorem, which characterises EF in terms of the syntactic seminearring morphism. Theorem 6.2. A forest language can be defined by a forest formula of EF if and only if its syntactic seminearring morphism ˛L W VA ! VL satisfies the identity vCw DwCv

(7)

v D vC

(8)

for any elements v; w 2 VA and the identity

for any element v 2 VA that is the image under ˛L of a multicontext of arity at least one.

830

Mikołaj Bojańczyk

One way to prove the theorem above is to use Theorem 5.2. One shows that the conditions (7) and (8) on the syntactic seminearring morphism above are equivalent to the conditions (5) and (6) on the syntactic forest algebra. Next, one applies Theorem 5.2. Is there another way? What if one tries to prove Theorem 6.2 directly, using seminearrings? If, like in the forest algebra version, one tries to take the quotient under an ideal, the exposition becomes cumbersome, since one must take care to distinguish elements that are images of multicontexts of arity at least one. The problems indicated above suggest the following heuristic: when studying a class of languages, use the algebraic structure which has the richest possible set of objects, and still allows one to describe the class without referring to the syntactic morphism. Example 6.2. Consider first-order logic with descendant and sibling orders. This example shows that seminearrings might not be the right formalism, since just by looking at the syntactic seminearring one cannot tell if a language can be defined in the logic.  The alphabet is A D ¹aº. The language, call it L, contains trees where each leaf is at even depth; and each node has either zero or two children. As we have shown in the introduction, this language can be defined in first-order logic.  The alphabet is B D ¹a; bº. The language, call it K , contains trees where each leaf is at even depth; each node with label a has zero or two children; each node with label b has zero or one child. This language cannot be defined in first-order logic, since it requires distinguishing between b n and b nC1 for arbitrarily large n.

We claim the two languages have the same syntactic seminearring. Consider the two seminearring morphisms ˛W VA ! VB ;

ˇW VB ! VA ;

which preserve multicontexts over alphabet ¹aº, and such that ˇ.b/ D aCa. One can show that LD˛

1

.K/;

K Dˇ

1

.L/:

Consequently, L is recognised by the syntactic seminearring of K , and K is recognised by the syntactic seminearring of L. It follows that these seminearrings are isomorphic. On the other hand, one can show that the syntactic forest algebra of a language uniquely determines if the language can be defined in first-order logic (although we still do not know an algorithm which does this). 6.2. Which languages are recognised by semirings? The axioms of a seminearring look very similar to those of a semiring, but some semiring axioms are missing. First, a semiring requires addition to be commutative. w C v D v C w:

(9)

22. Algebra for trees

831

This is the least important difference, since in many examples addition will actually be commutative. The second difference with semirings is that we do not require C to distribute over  to the right, i.e. we do not require u  .v C w/ D u  v C u  w:

(10)

v  0 D 0:

(11)

As we will see below, adding the above axiom corresponds to restricting all regular tree languages to a natural subclass, called the path testable languages. Often, one requires a semiring to also satisfy the following axiom: Unlike right distributivity (10), adding the above axiom does not seem to make any sense for multicontext algebra. We use the term semiring for a seminearring that satisfies the axioms (9) and (10). We do not require 0 to annihilate on the right in composition, i.e. we do not require (11). Which forest languages are recognised by semirings? It turns out that these are languages which are determined by the set words labelling paths in a forest. These are described in more detail below. Path testable languages. Let x be a node in a forest over an alphabet A. The path leading to x is the set of ancestors of x , including x . The labelling of a path is defined to be the word in A obtained by reading the labels of the nodes in the path, beginning in the unique root in the path, and ending in x . For any word language L  A , we define a forest language EL as follows. A forest belongs to EL if the labellings of some path (possibly a path that does not end in a leaf) belongs to L. A path testable language is a Boolean combination of languages of the form EL (there can be different languages L involved in the Boolean combination). Theorem 6.3. A regular forest language is path testable if and only if its syntactic seminearring is a semiring where addition (i.e. concatenation) is idempotent. Proof. The only if implication is straightforward, because the syntactic seminearring of a path testable language satisfies the two identities. Note that when w D 0, right distributivity says u  .v C 0/ D u  v C u  0;

which explains why the paths in path testable languages need not end in a leaf. For instance, the language “some leaf has label a” is not path testable. Consider now the converse implication. We will prove that if V is a semiring with idempotent addition, then any forest language recognised by a seminearring morphism ˛W VA ! V is path testable. Consider a forest s . For a path  in s , we write trees ./ for the tree which consists exclusively of the nodes from s that appear in  (all nodes in this tree, except the last one, have one child). The key observation is that any forest is equivalent to the concatenation of its paths: X ˛.s/ D ˛.trees .//: (12)  a path in s

Mikołaj Bojańczyk

832

P Note that the symbol can be used without indicating an order on paths, since concatenation is commutative. The statement above is proved by induction on the size of s , using right distributivity. For any v 2 V, let Lv be the set of forests s where some path  satisfies ˛.trees .// D v . This language is clearly path testable. Consequently, for any W  V, the language LW , defined as \ [ Lw Lw w2W

w62W

is also path testable; this is the set of forests where paths are mapped exactly to the by ˛ to an elements of W . By (12) and idempotency of addition, a forest s is mappedP element w 2 V if and only if there is some set W  V such that s 2 LW and W D w . Therefore, the forests that are mapped by ˛ to w form a path testable language, as a finite union of path testable languages LW , and likewise for the language L itself. References. To the author’s best knowledge, this chapter is the first time that seminearrings are explicitly used to recognise forest languages. The results on path testable languages in § 6.2 are adapted from the forest algebra characterisation of path testable languages in [16].

7. Nesting algebras In this section we define an operation on algebras that simulates nesting of formulas. This type of operation can be defined for all the algebraic structures described in this chapter. We show it for seminearrings. Wreath product of seminearrings. Consider two seminearrings U; V. We distinguish the operations of these seminearrings by subscripts, e.g., CU is the concatenation in U and 0V is the additive neutral element in V. Below we define a new seminearring U ı V, which is called the wreath product. In the definition, we use the set V0 D ¹v V 0V W v 2 Vº, whose elements we denote by letters f; g; h. The carrier of the wreath product consists of pairs .; v/, where the first coordinate is a function W V0 ! U and the second coordinate is an element v 2 V. The carrier can be viewed as a cartesian product of V0 copies of U and a single copy of V. The cartesian product interpretation explains the constants 0;  and concatenation operation in the wreath product, which are defined coordinate-wise. The composition operation is not defined coordinate-wise, it is defined by .; u/  .; v/ D .f 7 ! .v V f /  .f /; u V v/

for ; W V0 ! U and u; v 2 V.

Lemma 7.1. The wreath product of two seminearrings is also a seminearring. Proof. The additive structure is a monoid, as a cartesian product of monoids. It is not difficult to see that  is an identity for composition, and that 0 from the additive monoid is a left zero for composition. For associativity of the composition operation,

22. Algebra for trees

833

one notices that wreath product of seminearrings is a special case of wreath product of transformation semigroups, and the latter preserves associativity. It remains to show left distributivity: ‹

..1 ; v1 / C .2 ; v2 //.; v/ D ..1 ; v1 /  .; v// C ..2 ; v2 /  .; v//:

(13)

Let us inspect the first coordinate of the left side of the equation f 7 ! .v V f / U .f /;

where  is the function f 7 ! 1 .f / CU 2 .f /:

Therefore, the above becomes f 7 ! .1 .v V f / CU 2 .v V f // U .f /:

By right distributivity of U, the above is the same as f 7 ! ..1 .v V f / U .f // CU .2 .v V f / U .f ///:

which is the first coordinate of the right side of (13). The second coordinates agree by assumption on V. Nesting forest languages. To explain the connection between wreath product and nesting of logical formulas, we provide a general notion of temporal logic, where the operators are given by regular forest languages. Fix an alphabet A, and let L1 ; : : : ; Lk be forest languages that partition all forests over A. We can treat this partition as an alphabet B , with one letter per block Li of the partition. The partition and alphabet are used to define a relabelling, which maps a forest t over alphabet A to a forest tŒL1 ; : : : ; Lk  over alphabet A  B . The set of nodes in tŒL1 ; : : : ; Lk  is the same as in t , except that each node x gets a label .a; Li /, where the first coordinate a is the old label of x in t , while the second coordinate Li is the unique language Li that contains the subforest of x in t . If L is a forest language over alphabet A  B , we define L¹L1 ; : : : ; Lk º to be the set of all forests t over alphabet A for which tŒL1 ; : : : ; Lk  2 L: The operation of language composition is similar to formula composition. The definition below uses this intuition, in order to define a “temporal logic” based on operators given as forest languages. First however, we comment on a technical detail concerning alphabets. In the discussion below, a forest language is given by two pieces of information: the forests it contains, and the input alphabet. For instance, we distinguish between the set L1 of all forests over alphabet ¹aº, and the set L2 of all forests over the alphabet ¹a; bº where b does not appear. The idea is that sometimes it is relevant to consider a language class L that contains L1 but does not contain L2 (although such classes will not appear in this particular chapter). This distinction will be captured by our notion of language class: a language class is actually a mapping L, which associates to each finite alphabet a class of languages over this alphabet. Let L be a class of forest languages, which will be called the language base. The temporal logic with language base L is defined to be the smallest class TLŒL of forest languages that contains L, is closed under Boolean operations and under language

Mikołaj Bojańczyk

834

composition, i.e., L1 ; : : : ; Lk ; L 2 TLŒL H) LŒL1 ; : : : ; Lk  2 TLŒL:

Below we give examples of how the above definition can be used to describe some common temporal logics. The proofs in the examples are straightforward inductions.  Consider the class LEF of languages “forests over alphabet A that contain some a,” for every alphabet A and letter a 2 A. Then TLŒLEF  is exactly the class of forest languages that can be defined by a forest formula of EF, as defined in § 5.1.  Consider the class LP T of path testable languages, as defined in § 6.2. Then TLŒLP T  is exactly the class of forest languages that can be defined by a formula of the temporal logic PDL.  Consider the subclass LLT L of path testable languages, where the word languages EL in the definition of path testable languages are restricted LTL definable word languages. Then TLŒLLT L  is exactly the class of forest languages that can be defined by a formula of the temporal logic CTL*.

Wreath product and nesting temporal formulas. We can now state the connection between wreath product of seminearrings and nesting of languages. Theorem 7.2. Let U be a class of seminearrings, and L the forest languages recognised by seminearrings in U. Then TLŒL is exactly the class of languages recognised by iterated wreath products of U, i.e. by seminearrings in ¹U1 ı    ı Un W U1 ; : : : ; Un 2 Uº. Corollary 7.3. A forest language is definable in PDL if and only if it is recognised by an iterated wreath product of additively idempotent semirings. Likewise for CTL*, with the additional requirement that the semirings are multiplicatively aperiodic. The good thing about Corollary 7.3 is that it connects three concepts from different areas: temporal logic, wreath products, and semirings. The bad thing is that it does not really give any insight into the structure of the syntactic seminearrings of languages from PDL and CTL*. All we know is that these syntactic seminearrings are quotients of iterated wreath products; but the whole structure of wreath product gets lost in a quotient. References. In this section, we talked about wreath products of seminearrings. Similar operations have been studied for all the other algebraic structures. For A-algebras, one can consider cascade product, as studied in [18, 4]. For preclones, the more powerful block product was studied in [20], one can also use cascade/wreath product [18]. For forest algebras, the natural product seems to be wreath product, as studied in [14]. In all cases, there is a strong connection with nesting of temporal formulas (an exception is the block product, which is better suited to simulating quantifiers in first-order logic). The definition of language nesting is based on notions introduced by Ésik in [17], and identical to the definition in [14].

22. Algebra for trees

835

8. Recent developments Because of the complexity of putting together a big collection like this handbook, a few years have passed between writing this chapter and submitting its final version. Instead of rewriting the chapter – which I would probably do using the terminology of monads [6] – We only add a list of references to some new developments. Probably the most important material that is missing in this chapter is the connection with the theory of finite algebras that was developed by the universal algebra community; mainly Tame Congruence Theory developed by Hobby and McKenzie [26] in the 1980’s. This material can hardly be called recent, but for some reason it has been unknown in the automata and logic community, a situation that has only begun to change since techniques from universal algebra have received exposure due to their applications to Constraint Satisfaction Problems [29]. An attempt to relate some of the theory of universal algebra to classifying logics on trees can be found in [10]. There has also been some progress on classifying regular tree languages. For example [22], gives some new sufficient conditions for definability in the logic PDL discussed in Corollary 7.3, and outlines a program for showing that definability in PDL is decidable. Another new theme is infinite trees: [11], [2], [3], and [27] develop algebraic frameworks to classify languages of infinite trees, and apply these frameworks to classify some language classes, such as Boolean combinations of open sets or certain variants of weak monadic second-order logic. Also, an increasing role in the classification of regular tree languages is being played by the separation problem. In [21], the authors show that it is decidable if two regular tree languages can be separated by a piecewise testable language. This separation result has a more elegant proof than its special case of deciding if a tree language is itself piecewise testable [13]; greater simplicity and generality seems to be typical for separation results. Another result on separation [7] is that it is undecidable if two regular tree languages can be separated by a deterministic tree-walking automaton; a result which suggests that some more effort could be devoted finding classification questions for regular tree languages that are undecidable. Acknowledgement. I would like to thank my colleagues for their helpful comments, especially Tomasz Idziaszek, Luc Segoufin, Howard Straubing, Igor Walukiewicz, Pascal Weil, and Thomas Wilke. Special thanks are due to Jeffrey Shallit, for his many helpful suggestions for improving my writing. The work of this chapter was supported by ERC Starting Grant “Sosna.”

References [1] M. Benedikt and L. Segoufin, Regular tree languages definable in FO and in FOmod . ACM Trans. Comput. Log. 11 (2009), no. 1, article 4, 1–32. MR 2664302 Zbl 1351.68134 q.v. 811 [2] A. Blumensath, Recognisability for algebras of infinite trees. Theoret. Comput. Sci. 412 (2011), no. 29, 3463–3486. MR 2839691 Zbl 1233.68159 q.v. 835

836

Mikołaj Bojańczyk

[3] A. Blumensath, An algebraic proof of Rabin’s tree theorem. Theoret. Comput. Sci. 478 (2013), 1–21. MR 3028619 Zbl 1283.68222 q.v. 835 [4] M. Bojańczyk, Decidable properties of tree languages. Ph.D. thesis, Warsaw University, Warsaw, 2004. q.v. 834 [5] M. Bojańczyk, Two-way unary temporal logic over trees. Log. Methods Comput. Sci. 5 (2009), no. 3, 3:5, 29 pp. MR 2529658 Zbl 1168.03009 q.v. 826 [6] M. Bojańczyk, Recognisable languages over monads. In I. Potapov (ed.), Developments in language theory. Proceedings of the 19th International Conference (DLT 2015) held in Liverpool, July 27–30, 2015. Lecture Notes in Computer Science, 9168. Springer, Cham, 2015, 1–13. MR 3440657 Zbl 1434.68307 q.v. 835 [7] M. Bojańczyk, It is undecidable if two regular tree languages can be separated by a deterministic tree-walking automaton. Fund. Inform. 154 (2017), no. 1–4, 37–46. MR 3690569 Zbl 1390.68374 q.v. 835 [8] M. Bojańczyk and T. Colcombet, Tree-walking automata cannot be determinized. Theoret. Comput. Sci. 350 (2006), no. 2–3, 164–173. MR 2197199 Zbl 1086.68070 q.v. 815, 818 [9] M. Bojańczyk and T. Idziaszek, Algebra for infinite forests with an application to the temporal logic EF. In CONCUR 2009 – concurrency theory (M. Bravetti and G. Zavattaro, eds.). Proceedings of the 20 th International Conference held in Bologna, September 1–4, 2009. Lecture Notes in Computer Science, 5710. Springer, Berlin, 2009, 131–145. MR 2556882 Zbl 1254.68158 q.v. 826 [10] M. Bojańczyk and H. Michalewski, Some connections between universal algebra and logics for trees. Preprint, 2017. arXiv:1703.04736 [cs.FL] q.v. 835 [11] M. Bojańczyk and T. Place, Regular languages of infinite trees that are Boolean combinations of open sets. In Automata, languages, and programming (A. Czumaj, K. Mehlhorn, A. Pitts, and R. Wattenhofer, eds.). Part II. Proceedings of the 39th International Colloquium (ICALP 2012) held at the University of Warwick, Warwick, July 9–13, 2012. Lecture Notes in Computer Science, 7392. Springer, Berlin, 2012, 104–115. MR 2995598 Zbl 1367.68160 q.v. 835 [12] M. Bojańczyk and L. Segoufin, Tree languages defined in first-order logic with one quantifier alternation. Log. Methods Comput. Sci. 6 (2010), no. 4, 4:1, 26 pp. MR 2729598 Zbl 1202.03047 q.v. 826 [13] M. Bojańczyk, L. Segoufin, and H. Straubing, Piecewise testable tree languages. Log. Methods Comput. Sci. 8 (2012), no. 3, 3:26, 32 pp. MR 2987919 Zbl 1261.03126 q.v. 826, 835 [14] M. Bojańczyk, H. Straubing, and I. Walukiewicz, Wreath products of forest algebras, with applications to tree logics. Log. Methods Comput. Sci. 8 (2012), no. 3, 3:19, 39 pp. MR 2981938 Zbl 1258.03044 q.v. 834 [15] M. Bojańczyk and I. Walukiewicz, Characterizing EF and EX tree logics. Theoret. Comput. Sci. 358 (2006), no. 2–3, 255–272. MR 2250435 Zbl 1097.03013 q.v. 811, 826 [16] M. Bojańczyk and I. Walukiewicz, Forest algebras. In Logic and automata (J. Flum, E. Grädel, and T. Wilke, eds.). History and perspectives. Texts in Logic and Games, 2. Amsterdam University Press, Amsterdam, 2008, 107–131. MR 2508742 Zbl 1217.68123 q.v. 818, 826, 832 [17] Z. Ésik, Characterizing CTL-like logics on finite trees. Theoret. Comput. Sci. 356 (2006), no. 1–2, 136–152. MR 2217833 Zbl 1160.68408 q.v. 834 [18] Z. Ésik and S. Iván, Products of tree automata with an application to temporal logic. Fund. Inform. 82 (2008), no. 1–2, 61–78. MR 2372750 Zbl 1136.68037 q.v. 834

22. Algebra for trees

837

[19] Z. Ésik and S. Iván, Some varieties of finite tree automata related to restricted temporal logics. Fund. Inform. 82 (2008), no. 1–2, 79–103. MR 2372751 Zbl 1136.68038 q.v. 811 [20] Z. Ésik and P. Weil, Algebraic recognizability of regular tree languages. Theoret. Comput. Sci. 340 (2005), no. 2, 291–321. MR 2150756 Zbl 1078.68100 q.v. 818, 834 [21] J. Goubault-Larrecq and S. Schmitz, Deciding piecewise testable separability for regular tree languages. In 43 rd International Colloquium on Automata, Languages, and Programming (I. Chatzigiannakis, M. Mitzenmacher, Y. Rabani, and D. Sangiorgi, eds.). Proceedings of the colloquium (ICALP 2016) held in Rome, July 12–15, 2016. LIPIcs. Leibniz International Proceedings in Informatics, 55. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2016, art. no. 97, 15 pp. MR 3577158 Zbl 1388.68172 q.v. 835 [22] M. Hahn, A. Krebs, and H. Straubing, Wreath products of distributive forest algebras. In LICS ’18—33 rd Annual ACM/IEEE Symposium on Logic in Computer Science (A. Dawar and E. Grädel, eds.). Held in Oxford, July, 09–12, 2018. ACM Press, New York, 2018, 512–520. MR 3883758 q.v. 835 [23] U. Heuter, Definite tree languages. Bull. European Assoc. Theor. Comput. Sci. 35 (1988), 137–142. Zbl 0676.68027 q.v. 811 [24] U. Heuter, Zur Klassifizierung regulaerer Baumsprachen. Ph.D. thesis. RWTH Aachen, Aachen, 1989. q.v. 811 [25] U. Heuter, First-order properties of trees, star-free expressions, and aperiodicity. RAIRO Inform. Théor. Appl. 25 (1991), no. 2, 125–145. MR 1110980 Zbl 0741.68065 q.v. 811 [26] D. Hobby and R. McKenzie, The structure of finite algebras. Contemporary Mathematics, 76. American Mathematical Society, Providence, R.I., 1988. MR 0958685 Zbl 0721.08001 q.v. 835 [27] T. Idziaszek, M. Skrzypczak, and M. Bojańczyk, Regular languages of thin trees. Theory Comput. Syst. 58 (2016), no. 4, 614–663. MR 3483906 Zbl 1350.68173 q.v. 835 [28] J. A. Kamp, Tense logic and the theory of linear order. Ph.D. thesis. Univ. of California, Los Angeles, 1968. q.v. 816 [29] A. Krokhin and S. Živný (eds.), The constraint satisfaction problem: complexity and approximability. Papers based on the Dagstuhl Seminar 15301 held in July 2015. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2017. MR 3631046 Zbl 1375.68019 q.v. 835 [30] R. McNaughton and S. A. Papert, Counter-free automata. With an appendix by W. Henneman. MIT Research Monograph, 65. The MIT Press, Cambridge, MA, and London, 1971. MR 0371538 Zbl 0232.94024 q.v. 816 [31] J.-É. Pin and H. Straubing, Some results on C-varieties. Theor. Inform. Appl. 39 (2005), no. 1, 239–262. MR 2132590 Zbl 1083.20059 q.v. 812 [32] T. Place, Characterization of logics over ranked tree languages. In Computer science logic (M. Kaminski and S. Martini, eds.). Proceedings of the 22nd International Workshop (CSL 2008), the 17th Annual Conference of the EACSL, held in Bertinoro, September 16–19, 2008. Springer, Berlin, 2008, 401–415. MR 2540258 Zbl 1156.03328 q.v. 811 [33] T. Place and L. Segoufin, A decidable characterization of locally testable tree languages. Log. Methods Comput. Sci. 7 (2011), no. 4, 4:3, 25 pp. MR 2861683 Zbl 1237.68119 q.v. 811 [34] A. Potthoff, Logische Klassifizierung regulärer Baumsprachen. Ph.D. thesis. Institut für Informatik und Praktische Mathematik. Universität Kiel, Kiel, 1994, Bericht Nr. 9410. q.v. 803

838

Mikołaj Bojańczyk

[35] A. Potthoff, Modulo-counting quantifiers over finite trees. Theoret. Comput. Sci. 126 (1994), no. 1, 97–112. MR 1268024 Zbl 0798.03007 q.v. 811 [36] A. Potthoff and W. Thomas, Regular tree languages without unary symbols are star-free. In Fundamentals of computation theory (Z. Ésik, ed.). Proceedings of the Ninth International Conference (FCT ’93) held in Szeged, August 23–27, 1993. Lecture Notes in Computer Science, 710. Springer, Berlin, 1993, 396–405. MR 1260508 Zbl 0794.68088 q.v. 811 [37] M. Steinby, A theory of tree language varieties. In Tree automata and languages (M. Nivat and A. Podelski, eds.). Papers from the workshop held in Le Touquet, June 1990. Studies in Computer Science and Artificial Intelligence, 10. North-Holland Publishing Co., Amsterdam, 1992, 57–81. MR 1196732 Zbl 0798.68087 q.v. 811 [38] J. W. Thatcher and J. B. Wright, Generalized finite automata theory with an application to a decision problem of second-order logic. Math. Systems Theory 2 (1968), 57–81. MR 0224476 Zbl 0157.02201 q.v. 804, 811 [39] W. Thomas, Logical aspects in the study of tree languages. In Ninth colloquium on trees in algebra and programming (B. Courcelle, ed.). Proceedings of the colloquium held in Bordeaux, March 5–7, 1984. Cambridge University Press, Cambridge, 1984, 31–50. MR 0787450 Zbl 0557.68051 q.v. 811 [40] T. Wilke, An algebraic characterization of frontier testable tree languages. Theoret. Comput. Sci. 154 (1996), no. 1, 85–106. MR 1374381 Zbl 0871.68110 q.v. 813

Index

2-renewing sequence . . . . . . . . 538 —A— Ap . . . . . . . . . . . . abelianisation map . . . . AC0 . . . . . . . . . . . ACC0 . . . . . . . . . . accessible – automaton . . . . . . – state . . . . . . . . . action – algebra . . . . . . . . – free – . . . . . . . . . – lattice . . . . . . . . a-cycle . . . . . . . . . . additive theory of the reals advice . . . . . . . . . . a-graph . . . . . . . . . a-level . . . . . . . . . . algebra – action – . . . . . . . – compact – . . . . . . – congruence – . . . . . – free – . . . . . . . . . – homomorphism . . . – Kleene – . . . . . . . – residuated – . . . . – with domain . . . . – with tests . . . . . – locally finite – . . . . – profinite – . . . . . . – Hopfian – . . . . . – self-free – . . . . . – pro-T – . . . . . . . . – quotient – . . . . . . – relatively free – . . .

. . . .

. . . .

. . . .

572, 626 . . 961 497, 574 . . 497

. . . . . . . . . . . .

8 5

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . .

. . 755 . . 621 . . 617 . . 617 . . 617 751–754 . . 755 . . 753 . . 753 . . 634 . . 624 . . 627 . . 628 . . 624 . . 617 . . 617

. . . . . . . . . .

. . . . .

755 871 756 550 229 1483 . 550 . 550

– residually in C . . . . . . . . . 617 – subalgebra . . . . . . . . . . . 617 – syntactic ranked tree – . . . . . 805 – term . . . . . . . . . . . . . . 617 – topological – . . . . . . . . . . 621 – generator . . . . . . . . . . . 621 – recognisable subset . . . . . 622 – residually in a class . . . . . 622 – self-free – . . . . . . . . . . 628 – trivial – . . . . . . . . . . . . 617 – uniform – . . . . . . . . . . . 622 algorithm – compression . . . . . . . . . . 537 – extension . . . . . . . . . . . . 541 – Hopcroft’s – . . . . . . . . . . 345 – McNaughton–Yamada – 45, 49, 66, 72 – state elimination – . . . . . . . 424 almost – finite type shift . . . . . . . . 1022 – prime . . . . . . . . . . . . . . 920 alphabet . . . . . . . . . . . . . . . 4 – canonical – . . . . . . . . . . . 950 – input – . . . . . . . . . . . . . . 81 – involutive – . . . . . . . . . . 842 – output – . . . . . . . . . . . . . 81 – ranked – . . . . . . . . . . . 1300 amalgamated free product . . . . . 864 amenable group . . . . . . . . . . 891 amplitude . . . . . . . . . 1459, 1462 analytic set . . . . . . . . . . . . . 702 anticipation block map . . . . . . . 990 aperiodic – identity . . . . . . . . . . . . 47, 72 – monoid . . . . . . . . . . . 572, 853 – group-free – . . . . . . . . . 499

xxiv

Index

– semigroup . . . . . . . . . . . 626 – tiling . . . . . . . . . . . . . . . 93 Arden’s lemma . . . . . . 46, 49, 65, 67 arena . . . . . . . . . . . . . . . . 274 arithmetic – cardinal – . . . . . . . . . . . 698 – integer – . . . . . . . . . . . 1205 – Presburger – . 957, 1193, 1199, 1201 – progression . . . . . . . . . . . 948 – real – . . . . . . . . . . . . . 1205 arithmetical hierarchy . . . . . . . 793 Artin group . . . . . . . . . . . . 879 Artin–Schreier polynomial . . . . . 938 atom . . . . . . . . . . . . . . . . 413 átomaton . . . . . . . . . . . . . . 413 automata – enumeration . . . . . . . . . . 460 – group . . . . . . . . . 842, 885–902 – minimisation of – . . . 1190, 1472 – morphism of – . . . . . . . . . . 17 – see also automaton automatic – group . . . . . . . . . . . 875–883 – normal forms . . . . . . . . 875 – right/left – . . . . . . . . . . 878 – mapping . . . . . . . . . . . . 885 – monoid . . . . . . . . . . . . . 878 – presentation . . . . . . 1036, 1038 – equivalent – . . . . . . . . 1057 – injective – . . . . . . . . . 1040 – with advice . . . . . . . . 1061 – real numbers . . . . . . . . . . 922 – semigroup . . . . . . . . . . . 878 – sequence . . . . . . . . . . 914, 953 – set . . . . . . . . . . . . . . . 916 – structure . . . . . . 875, 1036, 1038 – geodesic . . . . . . . . . . . 877 – with uniqueness . . . . . . . 877 – transformation . . . . . . . . . 885 automaton . . . . . . . . 5, 42, 61, 737 – 1.5-way quantum – . . . . . . 1478 – 2D – . . . . . . . . . . . . . . 312 – deterministic – . . . . . . . . 316

– accessible – . . . . . . . . . . . 8 – adjacency matrix . . . . . . . . 999 – almost finite type – . . . . . . . 355 – ˛ -extensible – . . . . . . . . . 541 – alternating – – finite – (AFA) . . 443, 770, 1419 – tree . . . . . . . . . . . . 1433 – Antimirov – . . . . . . . . . . 422 – asynchronous – . . . . . . . 1176 – behaviour . . . . . . . . 43, 62, 66 – bideterministic – . . . . . . 417, 442 – bipartite – . . . . . . . . . . 1013 – biseparable – . . . . . . . . . . 414 – bistochastic quantum finite – (BiQFA) . . . . . . . . . . . 1465 – Blum and Hewitt – . . . . . . . 317 – Boolean – . . . . . . . . . 443, 770 – bounded – . . . . . . . . . . . 891 – Büchi – . . . . . . . . . . . 1419 – Carton–Michel – . . . . . . . . 194 – Cayley – . . . . . . . . . 897–898 – cellular – . . . . . . . . . . . . 777 – Černý – . . . . . . . . . . . . 535 – characteristic – . . . . . . . . . 173 – coaccessible – . . . . . . . . . . 8 – Cocke–Younger–Kasami – . . 1407 – communicating – . . . . . . . 1149 – complete – . . . . . . . 8, 444, 1019 – unambiguous – . . . . . . . 194 – conjugate – . . . . . . . . . . . 999 – contained in another . . . . . 1018 – contracting – . . . . . . . . 891, 897 – cyclic – . . . . . . . . . . . . 353 – decomposition – . . . . . . . 1013 – depth – . . . . . . . . . . . . . 343 – desert – . . . . . . . . . . . . 781 – deterministic finite – (DFA) . 7, 411, 1004, 1462 – complete – . . . . . . . . . . 412 – minimal – . . . . . . . . . . 412 – one-way – (1DFA) . . . . . 1462 – quotient . . . . . . . . . . . 412 – two-way – (2DFA) . . . . . 1462

Index

– dimension . . . . . . . . . . . . 42 – distance – . . . . . . . . . . . 672 – Earley’s – . . . . . . . . . . 1405 – edge . . . . . . . . . . . . . . 998 – equation – . . . . . . . . . . . 422 – equivalence . . . . . . . . . . . 6 – essential – . . . . . . . . . . . 998 – Eulerian – . . . . . . . . . . . 542 – expansion . . . . . . . . . . 1026 – extended – . . . . . . . . . . . . 32 – extension – . . . . . . . . . . 1002 – finite tree – (NFTA) . . . . . . 239 – finite truncated – . . . . . . . . 865 – finitely unambiguous – . . . . . 155 – Fischer – . . . . . . . . . . . 1008 – flower – . . . . . . . . . . . . 845 – follow – . . . . . . . . . . 73, 422 – fundamental theorem . 41, 43, 62, 66 – generalised finite – (GFA) 1462, 1466 – Glushkov – . . . . . 54, 68, 73, 421 – guidable – . . . . . . . . . . . 285 – heap . . . . . . . . . . . . . . 154 – hedge – – language . . . . . . . . . . . 256 – inherently weak – . . . . . . 1211 – in-split . . . . . . . . . . . . 1016 – inverse – . . . . . . . . . . . . 844 – involutive – . . . . . . . . . . 844 – irreducible – . . . . . . . . . . 920 – k - – . . . . . . . . . . . . . . 914 – Kari – . . . . . . . . . . . . . 544 – Kondacs–Watrous quantum finite – (KWQFA) . . . . . . . . . . 1465 – Krieger – . . . . . . . . . . . 1005 – latest appearance – (LAA) . . . 202 – Latvian quantum finite – (LaQFA) . . . . . . . . . . . 1465 – left delay . . . . . . . . . . . 1022 – level – . . . . . . . . . . . . 1113 – limited – . . . . . . . . . . . . 428 – nondeterminism . . . . . . . 421 – local – . . . . . . . . 25, 355, 1018 – LR – . . . . . . . . . . 1400–1402

xxv

– max-plus – . . . . . . . . . . . 154 – Mealy – . . . . . . . . . . . . 885 – bireversible – . . . 887, 899–902 – contracting – . . . . . . . . . 891 – dual – . . . . . . . . . . . . 887 – nuclear – . . . . . . . . . . . 891 – reset machine . . . . . . . . 898 – reversible – . . . . . . . 887, 897 – minimal – . . . 17, 339, 1005, 1009 – complete – . . . . . . . . . . . 17 – nondeterministic – . . . . . . 368 – Moore–Crutchfield quantum finite – (MCQFA) . . . . . . . . . . 1465 – multidimensional – . . . . . . . 918 – multi-head – . . . . . . . . . . 426 – multiple initial state . 420–421, 442 – multitape – . . . . . . . . . . . . 82 – Nayak quantum finite – (NaQFA) . . . . . . . . . . . 1465 – Nerode – . . . . . . . . . . . . . 17 – nondeterministic – finite – (NFA) 411, 769, 1418, 1462 – Chrobak normal form 420–421, 425, 431 – one-way – (1NFA) . . . 1462 – two-way – (2NFA) . . . 1462 – finite hedge – (NFHA) . . . . 256 – non-uniform – . . . . . . . . . 501 – nuclear – . . . . . . . . . . . . 891 – of an expression – derived-term . . . . . . 60, 71, 73 – equation – . . . . . . . . . . . 73 – Thompson – . . . . . . . . 57, 62 – on finite-trees . . . . . . . . 1038 – on ! -strings . . . . . . 1036, 1046 – on ! -trees . . . . . . . . . . 1036 – one-cluster – . . . . . . . . . . 544 – one-way – quantum finite – (1QFA) . . 1463 – real-time cellular – . . . . . . 777 – parity – . . . . . . . . . . . . 1220 – partially ordered – . . . . . . . 520 – polynomially unambiguous – . 155

xxvi

Index

– position . . . . . . . . 54, 73, 421 – probabilistic finite – (PFA) . . 1462 – one-way – (1PFA) . . . . . 1462 – prophetic – . . . . . . . . . . . 194 – pushdown – (PDA) . . . 1385, 1389 – deterministic – . . . . . . . 1389 – probabilistic – (PPDA) . . . 1390 – reduced – . . . . . . . . . 1389 – tabulation . . . . . . . . . 1390 – valid prefix property . . . . 1395 – weighted – (WPDA) . . . . 1389 – quasi-reversible – . . . . . . . 442 – quotient – . . . . . . . . . 340, 412 – random – . . . . . . . . . . . . 546 – real – number – (RNA) . . . . . . . 972 – vector – (RVA) 972, 1204–1205, 1208–1209, 1211 – reduced – . . . . . . . . . . 1005 – reduction . . . . . . . . . . . 1008 – region – . . . . . . . . . . . 1269 – residual – . . . . . . . . 369, 1005 – restarting – . . . . . . . . . . . 427 – reversal – . . . . . . . . . . . 341 – reversible – . . . . . . . . 433, 442 – right delay . . . . . . . . . . 1021 – rotating limited – . . . . . . . . 429 – semi- – . . . . . . . . . . . . . 344 – sequential – . . . . . . . . . . 155 – shift recognised – . . . . . . . 998 – simple – . . . . . . . . . . . . 358 – slow – . . . . . . . . . . . . . 344 – for Hopcroft . . . . . . . 344, 352 – for Moore . . . . . . . . 344, 351 – split . . . . . . . . . . . . . . 724 – splitter . . . . . . . . . . . . . 340 – Stallings – . . . . . . . . . . . 845 – standard – . . 10, 55–56, 62, 68, 73 – weighted – . . . . . . . . . . . 68 – standard local – . . . . . . . 1019 – state . . . . . . . . . . . . . . 998 – strongly connected . . . . . . 1007 – subset – . . . . . . . . . . . . 530

– subset accepted . . . . . . . . . 62 – sweeping limited – . . . . . . . 429 – symbolic conjugate – . . . . . 1011 – synchronised – . . . . . . . . 1005 – synchronising – . . . . . . . . 525 – tessellation – . . . . . . . . . . 312 – timed – . . . . . . . . . . . . 1263 – deterministic – . . . . . . . 1288 – diagonal-free – . . . . . . . 1263 – with invariants – . . . . . . 1263 – tree – . . . . . . . . . . . . . . 804 – tree walking – (TWA) . . . . . 250 – trellis – . . . . . . . . . . . . . 777 – trie – . . . . . . . . . . . . . . 356 – trim – . . . . . . . . . 8, 844, 998 – two-way – . . 425, 520, 1462, 1474 – quantum finite – (2QFA) . . 1474 – simulation . . . . . . . . . . 425 – unambiguous – . . . . . . . . . 155 – universal – . . . . . . . . . . . 781 – weak – . . . . . . . . . . . . 1205 – weighted – . . . . . . 65, 115, 1477 – weighted finite – (WAF) . . . 1112 – with advice . . . . . . . . . . 1061 – Zielonka – . . . . . . . 1176, 1250 – see also automata automorphism – orbits . . . . . . . . . . . . . . 865 – Whitehead – . . . . . . . . . . 856 —B— Baire space . . . . . . . . . . . . . 698 base – -complement . . . . . . 1191, 1202 – -k expansion . . . . . . . . . . 914 – numeration – . . . . 949, 1191, 1202 – Bertrand – . . . . . . . . . . 951 – linear – . . . . . . . . . . . 950 basis . . . . . . . . . . . . . . . . 497 – complete – . . . . . . . . . . . 497 – converging to the identity . . . 644 – of self-free topological algebra . 628

Index

– standard – . . . . . . . . . . . 497 – theorem . . . . . . . . . . . . 640 Benois’ theorem . . . . . . . . . . 858 beta-polynomial . . . . . . . . . . 951 biclique edge cover . . . . . . . . 414 bicombing . . . . . . . . . . . . . 883 biseparable residual . . . . . . . . 414 bisimulation . . . . . . . . . . . 1232 – strong timed – . . . . . . . . 1267 – time-abstract – . . . . . . . . 1266 bistochastic quantum finite automaton (BiQFA) . . . . . . . . . . . 1465 black-box checking . . . . . . . . 403 Blikle net . . . . . . . . . . . . . 744 block . . . . . . . . . . . . . . 642, 990 – decomposition . . . . . . . . . . 51 – groups . . . . . . . . . . . . 1470 – map . . . . . . . . . . . . . . 973 – substitution . . . . . . . . . . . 990 blocking condition . . . . . . . . 1230 Blum and Hewitt automaton . . . . 317 Boolean – automaton . . . . . . . . . . . 443 – operator . . . . . . . . . . . 1190 – semiring . . . . . . . . 3, 730, 1388 – space . . . . . . . . . . . . . . 626 bootstrapping . . . . . . . . . . . 471 Borel – hierarchy . . . . . . . . . . . . 699 – set . . . . . . . . . . . . . . . 698 bounded – gap . . . . . . . . . . . . . . . 953 – section problem . . . . . . . . 672 – for CFMs . . . . . . . . . 1153 boundedness problem . . . . . . . 673 Bounded-Synchronising-Colouring 556 branched covering . . . . . . . . . 896 branching-time logic . . . . . . . 1429 – CTL . . . . . . . . . . . . . 1430 – CTL? . . . . . . . . . . . . 1429 – -calculus . . . . . . . . . . 1431 breakpoint construction . . . . 199, 279 Brown’s lemma . . . . . . . . . . 669

xxvii

Büchi – automaton . . . . . . . . . . 1419 – condition . . . . . . . . . . . . 267 – recurrence condition . . . . 193, 267 – ’s theorem . . . . . . . . 311, 1046 – set . . . . . . . . . . . . . . . 193 Büchi–Elgot–Trakhtenbrot theorem 1077 —C— Cantor – normal form – (CNF) . . . . . 697 – set . . . . . . . . . . . . . . 1106 – space . . . . . . . . . . . . . . 698 cardinal – arithmetic . . . . . . . . . . . 698 – number . . . . . . . . . . 695–696 Carton–Michel automaton . . . . . 194 Catalan numbers . . . . . . . . . . 464 CC0 . . . . . . . . . . . . . . . . 497 Černý – automaton . . . . . . . . . . . 535 – conjecture . . . . . . . . . . . 536 – function . . . . . . . . . . . . 536 channel . . . . . . . . . . . . . 1149 – bounded – . . . . . . . . . . 1166 – fifo . . . . . . . . . . . . . . 1149 – system – insertion . . . . . . . . . . 1157 – lossy – . . . . . . . . . . . 1156 characteristic . . . . . . . . . . . . 475 – automaton . . . . . . . . . . . 173 – mapping . . . . . . . . . . . . 379 – polynomial – . . . . . . . . . . 950 – sequence . . . . . . . . . . . . 954 – vector . . . . . . . . . . . . . 543 Chomsky normal form . . . . . . 1407 Chomsky–Schützenberger theorem 463, 465 Christol’s theorem . . . . . . . . . 937 Chrobak normal form 420–421, 425, 431 Church problem 1218–1220, 1222–1224, 1226–1227, 1234–1235

Index

xxviii

circuit . . . . . . . . . . . – class . . . . . . . . . . – critical – . . . . . . . . – depth . . . . . . . . . . – size . . . . . . . . . . . – uniformity . . . . . . . – width . . . . . . . . . . classification theorem . . . clique-width . . . . . . . . clock constraint . . . . . . clopen . . . . . . . . . . . closed timelike curves . . . closure – backward – . . . . . . . – deterministic – . . . . . – horizontal – . . . . . . – concatenation – . . . – polynomial – . . . . . . – properties . . . . . . . – reflexive-transitive – . . – unambiguous – . . . . . – vertical – . . . . . . . . – concatenation – . . . coaccessible – automaton . . . . . . . – state . . . . . . . . . . Cobham–Semenov theorem

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . 54 . . 602 . . 305 . . 326 . . 603 . 1131 732, 754 . . 602 . . 305 . . 326

495 497 153 496 496 496 496 995 689 1263 . 620 1482

. . . . . 8 . . . . . 5 . 948, 1057, 1193, 1205 Cocke–Younger–Kasami automaton 1407 code – bifix – . . . . . . . . . . . . . 849 – sliding block – . . . . . . . . . 973 coding . . . . . . . . . . . . . 915, 954 coefficient in a series . . . . . . . . . 64 cofinality . . . . . . . . . . . . . . 709 coincidence condition . . . . . . . 529 co-induction . . . . . . . . . . . . . 74 colours . . . . . . . . . . . . . . . 221 combing . . . . . . . . . . . . . . 878 communicating – automaton . . . . . . . . . . 1149 – finite-state machine (CFM) . . 1152 – insertion . . . . . . . . . . 1157

– local acceptance . . . . . . 1172 – lossy – . . . . . . . . . . . 1156 – model checking . . . . . . 1166 communication complexity . . . . 416 commutator in groups . . . . . . . 842 compact space . . . . . . . . . . . 620 complement . . . . . . . . . . . 1191 – picture language . . . . . . . . 313 complete – automaton . . . . . . . 8, 444, 1019 – minimal automaton . . . . . . . 17 – set . . . . . . . . . . . . . . . 701 – unambiguous automaton . . . . 194 compressible – pair . . . . . . . . . . . . . . . 550 – set . . . . . . . . . . . . . . . 550 computable real number . . . 921, 1462 computation . . . . . . . . . . . . . 82 concatenation . . . . . . . . . . . 502 – column – . . . . . . . . . . . . 305 – horizontal – . . . . . . . . . . 305 – modp -concatenation . . . . . . 502 – product . . . . . . . . . . . . . . 4 – row – . . . . . . . . . . . . . . 305 – vertical – . . . . . . . . . . . . 305 cone type . . . . . . . . . . . . 873, 882 congruence . . . . . . . 213, 842, 1009 – kernel – . . . . . . . . . . . . 619 – Nerode – . . . . . . . . . . . . 340 – on an algebra . . . . . . . . . . 617 – syntactic – . . . . 21, 214, 618, 1009 conjugacy . . . . . . . . . . . . . 786 – labelled – . . . . . . . . . . . 999 – of shifts . . . . . . . . . . . . 990 – problem . . . . . . . . . . . . 842 conjugate – automata . . . . . . . . . . . . 999 – elements . . . . . . . . . . . . 842 consecutive transitions . . . . . . . . 5 consistency problem . . . . . . . . 387 constant term – of a language . . . . . . . . . . . 42

Index

– of a series . . . . . . . . . . . . 65 – of an expression . . . . . . . . . 42 constrained queue-content decision diagrams (CQDD) . . . . . . 1155 constraint – equality . . . . . . . . 1194, 1205 – inequation . . . . . . . 1196, 1208 – linear – . . 1193–1194, 1196–1197, 1205, 1208 – modular – . . . . . . . 1193, 1197 context – of a word . . . . . . . . . . . 1009 – right – . . . . . . . . . . . . 1005 context-free – grammar (CFG) . . 464, 772, 1388 – probabilistic – (PCFG) . . . 1388 – consistency . . . . . . . 1388 – proper – . . . . . . . . . . 1388 – stochastic – . . . . . . . . 1388 – weighted – (WCFG) . . . . 1388 – Kolam grammar . . . . . . . . 319 – language . . 88, 464, 772–774, 1481 – matrix grammar . . . . . . . . 320 – word problem submonoid . . . 863 continuous – function . . . . . . . . . . . . 700 – operation . . . . . . . . . 766–768 contracting – automaton . . . . . . . . . 891, 897 – group . . . . . . . . . . . . 891, 897 contraction – graph – . . . . . . . . . . . . . 996 – symbol . . . . . . . . . . . . . 996 control problem . . . . . . . . . 1228 controllability condition . . . . . 1228 controller . . . . . . . . . . . . 1227 – synthesis – centralised – . . . . . 1228, 1234 – decentralised – . . . . . . . 1252 – distributed – . . . . . . . . 1237 – generalised – . . . . . . . . 1234 convolution – string . . . . . . . . . . 1038, 1058

xxix

– tree . . . . . . . . . . . 1036, 1038 Conway – partial – semiring . . . . . . . 731 – quasi- – semiring . . . . . . . . 63 – ’s leap . . . . . . . . . . . . . 732 – semiring . . . . . . . . 63, 72, 731 Coxeter group . . . . . . . . . . . 879 Cramér’s model . . . . . . . . . . 919 Crespi–Reghizzi–Pradella tile grammars . . . . . . . . . . . 324 critical – circuit . . . . . . . . . . . . . 153 – graph . . . . . . . . . . . . . . 153 cross-section theorem . . . . . . . . 87 Curtis–Lyndon–Hedlund theorem . 990 cut . . . . . . . . . . . . . . . . . 657 cycle rank . . . . . 417–418, 424, 440 cyclic automaton . . . . . . . . . . 353 cyclicity . . . . . . . . . . . . . . 152 cylinder . . . . . . . . . . . . . . 973 —D— D0L language . . . . . . . . DAG – finitary vertex . . . . . . – infinitary vertex . . . . . – leveled – . . . . . . . . . – peeling . . . . . . . . . – rank function . . . . . – run – . . . . . . . . . . . D-class . . . . . . . . . . . . – rank . . . . . . . . . . . – regular – . . . . . . . . . – structure group . . . . . . decentralised control problem decomposition – of an automaton . . . . . – prefix-suffix – . . . . . . – theorem . . . . . . . . . deduction system . . . . . . . degree of irreversibility . . .

. . . 103 . . . . . . . . . . .

. . . . . . . . . . .

. . . . . .

. . . . .

. 1013 . . 363 . . 995 . 1392 . . 443

199 199 191 204 204 203 1024 1024 1024 1024 1252

xxx

Index

delay – left – . . . . . . . . . . . . . 1022 – right – . . . . . . . . . . . . 1021 density matrix . . . . . . . . . . 1460 derivation – of an expression . . . . . 58, 62, 70 – broken – . . . . . . . . . . . . 73 descriptional complexity measure . . . . . . . . . 412–443 – quotient complexity . . . . 413, 430 – syntactic complexity . . . . . . 433 desert automaton . . . . . . . . . . 781 determinacy . . . . . . . 274, 702–703 – regular – . . . . . . . . . . . . 221 – Wadge–Borel – . . . . . . . . 706 determined game . . . . . . . . . . 703 determinisation . . . . . . . . . . 418 – NDD . . . . . . . . . . . . . 1197 – problem . . . . . . . . . . . . 420 deterministic – 2D automaton . . . . . . . . . 316 – finite automaton (DFA) 7, 411, 1004, 1462 – complete – . . . . . . . . . . 412 – minimal – . . . . . . . . . . 412 – one-way – (1DFA) . . . . . 1462 – quotient . . . . . . . . . . . 412 – two-way – (2DFA) . . . . . 1462 – state complexity . . . . . . 412, 430 – tiling system . . . . . . . . . . 316 – transducer . . . . . . . . . . . . 81 – transition complexity . . . . . . 414 Diag-DREC . . . . . . . . . . . . 316 digit . . . . . . . . . . . . . . . 1191 – sign – . . . . . . . . . 1191, 1203 dimension of an automaton . . . . . 42 Dirac notation . . . . . . . . . . 1460 direct – product . . . . . . . . . . . . . 617 – sum . . . . . . . . . . . . . . 732 Dirichlet’s theorem . . . . . . . . 920 discriminant . . . . . . . . . . . . 471 disjunctive rational subset . . . . . 862

distance – geodesic – . . . . . . . . . . . 850 – prefix metric . . . . . . . . . . 853 – profinite – . . . . . . . . . . . 576 distributed – alphabet . . . . . . . . . . . 1250 – synthesis – architecture . . . . . . . . 1236 – local specifications . . . . . 1239 – problem . . . . . . . . . . 1237 divisible group . . . . . . . . . . . 939 division matrix . . . . . . . . 993–994 document type definitions (DTD) . 259 domain of a tree . . . . . . . . 237, 266 dominant – eigenvalue . . . . . . . . . . . 962 – singularity . . . . . . . . . . . 470 domino system . . . . . . . . . . . 311 Doner–Thatcher–Wright theorem 1084 DOTA . . . . . . . . . . . . . . . 316 dot-depth hierarchy . . . . . 510, 1083 dual formula . . . . . . . . . . . . 278 duplicator player . . . . . . . . . . 226 dyadic rational . . . . . . . . . . 1115 Dyck language . . . . . . . . . 777, 862 dynamical system . . . . . . . . . 972 —E— Earley’s automaton . . . . . . . 1405 edge . . . . . . . . . . . . . . 989, 993 EF logic . . . . . . . . . . . . . . 822 Ehrenfeucht–Fraïssé game . . . . . 593 Eilenberg’s theorem . . . . . . . . 582 elementary – equivalent matrices . . . 995, 1003 – function . . . . . . . . . . . 1079 elimination ordering . . . . . . 424, 468 emptiness problem . . . . . . 241, 1265, 1472–1473, 1480 empty word . . . . . . . . . . . . . 4 encoding – dual – . . . . . . . . . . . . 1203 – extension . . . . . . . . . . . . 255

Index

– first-child-next-sibling (FNCS) . 254 – fcns.t/ . . . . . . . . . . . . 255 – fractional part . . . . . 1202, 1206 – integer – part . . . . . . . . . 1202, 1206 – vectors . . . . . . . . . . . 1191 – integers . . . . . . . . . . . 1191 – operation . . . . . . . . . . . . 255 – real – numbers . . . . . . . . . . 1202 – vectors . . . . . . . . . . . 1203 – relation . . . . . . . . . . . . 1189 – serialised – . 1192, 1196, 1201, 1203 – valid – . . . . . . 1190–1191, 1203 endomorphism – extension . . . . . . . . . . . . 857 – virtually injective – . . . . . . 857 entourage . . . . . . . . . . . . . 619 entropy . . . . . . . . . . . . . 643, 991 equation – existential theory of –s . . . . . 865 – language . . . . . . . . 29–32, 765 – profinite – . . . . . . . . . . . 578 – symmetrical – . . . . . . . . . 580 – system of –s . . . . . . . . . . 631 – explicit – . . . . . . . . . . . 771 – resolved – . . . . . . . . . . 771 – strict – . . . . . . . . . . . . 772 – with rational constraints . . . . 865 – word – . . . . . . . . . . . . . 789 equivalence – Moore – . . . . . . . . . . . . 343 – of automata . . . . . . . . . 1472 error – bounded – . 1462, 1467–1469, 1475, 1478 – one-sided unbounded . . . . 1471 – unbounded – 1461, 1467, 1471, 1480 – zero – . . . . . . . . . . . . 1482 essential – automaton . . . . . . . . . . . 998 – graph . . . . . . . . . . . . . . 989 Euclidean division . . . . . . . . . 697

xxxi

even shift . . . . . . . . . . . . . 989 exact computation . . . . . . . . 1482 example – negative – . . . . . . . . . . . 379 – positive – . . . . . . . . . . . 379 exceptional set . . . . . . . . . . . 471 existential theory of equations . . . 865 expansion – ˛ -expansion . . . . . . . . . . 951 – automaton . . . . . . . . . . 1026 – graph – . . . . . . . . . . . . . 996 – symbol – . . . . . . . . . . . . 996 explicit – operation . . . . . . . . . . . . 629 – profinite equation . . . . . . . 578 expression – broken derivation . . . . . . . . 73 – constant term . . . . . . . . . . . 42 – depth . . . . . . . . . . . . . . . 42 – derivation . . . . . . . . 58, 62, 70 – equivalent – . . . . . . . . . . . 42 – language denoted by . . . . . . . 42 – literal length . . . . . . . . . . . 42 – omega-regular – . . . . . . . . 195 – rational – . . . . . . . . . . 41, 741 – linear – . . . . . . . . . . . . 27 – weighted – . . . . . . . . . . . 65 – reduced – . . . . . . . . . . 46, 66 – regular – . . . . . 42, 411, 459, 502 – defined by grammar . . . . . 463 – length . . . . . . . . . . . . 461 – size . . . . . . . . . . . . . 461 – uncollapsible – . . . . . . . . 462 – series denoted by an – . . . . . . 65 – star-normal form . . . . . . . . . 57 – valid – . . . . . . . . . . . . . . 65 – weighted rational – . . . . . . . . 74 extension – algebraic – . . . . . . . . . . . 852 – finite-index – . . . . . . . . . . 853 – HNN – . . . . . . . . . . . . . 864 – operation . . . . . . . . . . . . 255

xxxii

Index

—F— factor map . . . . . . . . . . . . . 972 factorisation – forest theorem . . . . . . . 654, 659 – tree . . . . . . . . . . . . . . . 654 fellow traveller property . . . . . . 876 field . . . . . . . . . . . . . . . . 466 finite – closure property . . . . . . . . 669 – language . . . . 419–420, 425, 437 – operation problem . . . . . . 435 – model theory . . . . . . . . . 1071 – order element . . . . . . . . . 842 – substitution . . . . . . . . . 80, 103 finitely presented group . 842, 864, 872 first-order logic (FOL) . 289, 687, 1031, 1034, 1074 – interpretation . . . . . . . . . 1049 – on trees . . . . . . . . . . . . 803 – with child relations . . . . . . . 810 Fischer automaton . . . . . . . . 1008 fixed point . . . . . . . . 747, 915, 954 – induction . . . . . . . . . 751–752 – subgroup . . . . . . . . . . . . 856 – see also pre-fixed point fixpoint . . . . . . . . . . . . . 1024 flow equivalent . . . . . . . . . . . 996 flower automaton . . . . . . . . . 845 follow automaton . . . . . . . 73, 422 fooling set . . . . . . . . . . . . . 414 forbidden – factor . . . . . . . . . . . . . . 988 – pattern . . . . . . . . . . . . . 591 – characterisation . . . . . . . 443 forest . . . . . . . . . . . . . . . . 819 – algebra . . . . . . . . . . . . . 818 – syntactic – . . . . . . . . . . 821 formal – Laurent series . . . . . . . . . 466 – power series . . . . . . . . 464, 937 – algebraic – . . . . . . . . . . 464 formula – first-order – . . . . . . . . . . 592

– monadic second-order – . . . . 592 forward diamond condition . . . 1252 forward Ramsey split . . . . . . . 682 fragile language . . . . . . . . . . 788 Franks’ theorem . . . . . . . . . . 997 free – abelian group . . . . . . . 863, 875 – action . . . . . . . . . . . . . 871 – group . . . . . . . . . 843, 892, 901 – basis . . . . . . . . . . . . . 843 – free factor . . . . . . . . . . 851 – generalised word problem . . 847 – rank . . . . . . . . . . . . . 843 – monoid . . . . . . . . . . 4, 58, 498 – ! -semigroup . . . . . . . . . . 721 – partially abelian group . . . . . 863 – product . . . . . . . . . . . . . 901 – amalgamated – . . . . . . . . 864 – profinite – group – rank . . . . . . . . . . 643, 645 – monoid . . . . . . . . . . . 577 – semigroup . . . . . . . . . . . . 4 – variable . . . . . . . . . . . . 957 full shift . . . . . . . . . . . . . . 988 function – automatic real – . . . . . . . 1107 – complexity – . . . . . . . . . . 955 – continuous – . . . . . . . . . . 700 – elementary – . . . . . . . . . 1079 – multi-valued – . . . . . . . . . 470 – quasi-automatic – . . . . . . . 940 – rank – . . . . . . . . . . . . . 204 – rational – . . . . . . . . . . 84, 931 – real – – continuous – . . . . . . . . 1116 – smooth – . . . . . . . . . . 1116 – uniformly continuous – . . . . 620 – word – . . . . . . . . . . . . 1113 functionally recursive group . . . . 888 functorial star . . . . . . . . . . . 742 fundamental group . . 872, 874, 880, 884

Index

xxxiii

—G— grammar – 2D – . . . . . . . . . . . . . . 318 Gaifman’s theorem . . . . . . . . 1081 – Boolean – . . . . . . . . . . . 775 Gale–Stewart game . . . . . . . . 703 – conjunctive – . . . . . . . . . . 774 game . . . . . . . . . . . . . . 221, 274 – linear – . . . . . . . . . . . 777 – colouring function . . . . . . . 221 – context-free – (CFG) 464, 772, 1388 – colours . . . . . . . . . . . . . 221 – context-free matrix – . . . . . . 320 – cops and robber – . . . . 417–418 – Kolam – . . . . . . . . . . . . 319 – determined – . . . . . . . . . . 703 – picture – . . . . . . . . . . . . 318 – duplicator players . . . . . . . 226 – Průša grid – (PGG) . . . . . . . 323 – Gale–Stewart – . . . . . . . . . 703 – puzzle – . . . . . . . . . . . . 319 – graph – . . . . . . . . . . . . . 417 – regional-tile – (RTG) . . . . . . 324 – membership – . . . . . . . . . 276 – symbolic – . . . . . . . . . . . 319 – parity – . . . . . . . . . . . 221, 274 – tile – (TG) . . . . . . . . . . . 324 – players Zero and One . . . . . 221 – Crespi–Reghizzi–Pradella – . 324 – positional strategy . . . . . 222, 275 – unambiguous – . . . . . . . . . 464 – positionally determined – . . . 275 – with grid . . . . . . . . . . . . 321 – simulation – . . . . . . . . . . 225 – spoiler players . . . . . . . . . 226 graph . . . . . . . . . . . . . . 530, 989 – adjacency matrix . . . . . . . . 989 – two-player – . . . . . . . . . . 221 – Cayley – . . . . . . . . . . 863, 872 – Wadge – . . . . . . . . . . . . 705 – colouring . . . . . . . . . . . . 547 – see also determinacy – completely reducible – . . . . . 152 – see also winning – contraction . . . . . . . . . . . 996 gate . . . . . . . . . . . . . . . . 495 – critical – . . . . . . . . . . . . 153 – generalised input . . . . . . . . 496 – edge . . . . . . . . . . . . . . 989 – input . . . . . . . . . . . . . . 495 – essential – . . . . . . . . . . . 989 – output . . . . . . . . . . . . . 495 – Eulerian – . . . . . . . . . . . 542 genealogical ordering . . . . . . . 960 – expansion . . . . . . . . . . . 996 generalised – game . . . . . . . . . . . . . . 417 – centralised controller synthesis 1234 – higher edge – . . . . . . . . . . 990 – finite automaton (GFA) 1462, 1466 – hyperbolic – . . . . . . . . . . 891 – power series . . . . . . . . . . 938 – morphism . . . . . . . . . . . 992 – sequential machine (GSM) . . . . 81 – of a max-plus matrix . . . . . . 153 – word problem . . . . . . . 842, 883 – of constant out-degree . . . . . 547 generator of a topological class . . 701 – of groups . . . . . . . . . . . . 864 geodesic distance . . . . . . . . . 850 – path . . . . . . . . . . . . . . 989 gliding point . . . . . . . . . . 159, 161 – primitive – . . . . . . . . . . . 548 Glushkov automaton . . 54, 68, 73, 421 – Schreier – . . . . 848, 861, 873, 891 golden – state . . . . . . . . . . . . . . 989 – mean shift . . . . . . . . . . . 989 – strongly connected – . . . . . . 542 – ratio . . . . . . . . . . . . . . 442 – syntactic – . . . . . . . . . . 1024 good substitution . . . . . . . . . . 968 – tameness . . . . . . . . . . . . 637 graded monoid . . . . . . . . . . 46, 64 – totally synchronising – . . . . . 556

xxxiv

Index

– underlying – . . . . . . . . . . 530 – underlying – of a letter . . . . . 550 – vertex . . . . . . . . . . . . . 989 – Wielandt – . . . . . . . . . . . 555 greedy algorithm – compression . . . . . . . . . . 537 – extension . . . . . . . . . . . . 541 Green’s relations . . . . . . . 658, 1024 grid constructor . . . . . . . . . . 322 Grigorchuk group . 885, 887–888, 890, 892–895 Gröbner basis . . . . . . . . . . . 466 group – affine – . . . . . . . . . . . 888, 898 – Alëshin – . . . . . . . 888, 892, 900 – amenable – . . . . . . . . . . . 891 – Artin – . . . . . . . . . . . . . 879 – asynchronously automatic – . . 878 – automata – . . . . . . 842, 885–902 – regular weakly branch . . . . 893 – automatic – . . . . . . . 875–883 – normal forms . . . . . . . . 875 – right/left – . . . . . . . . . . 878 – ball of radius n . . . . . . . . . 874 – Basilica – . . . . . . 890, 892–893 – Baumslag–Solitar – . 884, 888, 898 – biautomatic – . . . . . . . . . 878 – conjugacy problem . . . . . 881 – bounded – . . . . . . . . 891–892 – Bowen–Franks – . . . . . . . . 997 – braid – . . . . . . . . . . . . . 879 – branch – . . . . . . . . . 893–894 – commensurator . . . . . . . . . 852 – conjugacy separable – . . . . . 890 – contracting – . . . . . . . . 891, 897 – Coxeter – . . . . . . . . . . . 879 – divisible – . . . . . . . . . . . 939 – exponential growth . . . . . . . 894 – finite – . . . . . . . . . . . . . 862 – finitely presented – . . 842, 864, 872 – free – . . . . . . . . . 843, 892, 901 – abelian – . . . . . . . . . 863, 875 – basis . . . . . . . . . . . . . 843

– free factor . . . . . . . . . . 851 – generalised word problem . . 847 – partially abelian – . . . . . . 863 – rank . . . . . . . . . . . . . 843 – functionally recursive – . . . . 888 – fundamental – . . 872, 874, 880, 884 – graph – . . . . . . . . . . . 863, 865 – Grigorchuk – . . 885, 887–888, 890, 892–895 – growth . . . . . . . . . . 894–895 – Gupta–Sidki – . . . . 885, 887–888, 892–893 – Heisenberg – . . . . . . . . . . 884 – identity . . . . . . . . . . . . . 741 – intermediate growth . . . . . . 894 – iterated monodromy . . . . . . 897 – Kazhdan – . . . . . . . . . . . 902 – kernel . . . . . . . . . . . . . 636 – lamplighter – . . . . . . . . . . 888 – language . . . . . . . . . 791, 1470 – mapping class – . . . . . . . . 879 – nilpotent – . . . . . . 855, 864, 884 – non-uniformly exponential growth . . . . . . . . . . . . . 895 – of an automaton . . . . . . . . 885 – p -group . . . . . 499, 855, 888, 892 – polynomial growth . . . . . . . 894 – pure braid – . . . . . . . . . . 879 – regular branch – . . . . . . . . 893 – relatively hyperbolic – . . . . . 883 – relators . . . . . . . . . . . . . 872 – residually finite – . . . . . . . . 855 – right angled Artin – . . . . . . 863 – self-similar – . . . . . . . . 887, 895 – semi-hyperbolic – . . . . . . . 883 – structure – . . . . . . . . . . 1024 – surface – . . . . . . . . . 873–874 – virtually – abelian – . . . . . . . . . . . 864 – free – . . . . . . . 862–863, 865 – weakly branch – . . . . . . 891, 894 – word-hyperbolic – . . 865, 881–884

Index

growing – letter . . . . . . – substitution . . . growth – function . . . . – maximal – . . . – of groups . . . . – series . . . . . . – type . . . . . . guidable automaton Gupta–Sidki group .

. . . . . . . . 965 . . . . . . . . 966 . . . . . . .

. . . . . . .

. . . . . . .

. . . . . 874 . . . . . 966 . . 894–895 . . . . . 874 . . 965–966 . . . . . 285 885, 887–888, 892–893

—H— Hadamard’s theorem . . . . . . . . 469 Hahn’s power series . . . . . . . . 938 Hankel matrix . . . . . . . . . . . 314 Hausdorff completion . . . . . . . 621 Hausdorffisation . . . . . . . 620–621 HD0L (ultimate) periodicity problem 977 heap – automaton . . . . . . . . . . . 154 – model . . . . . . . . . . . . . 154 hedge – automaton – language . . . . . . . . . . . 256 – nondeterministic finite – (NFHA) . . . . . . . . . . . . 256 – of a tree . . . . . . . . . . . . 254 Heisenberg group . . . . . . . . . 884 hierarchy – Borel – . . . . . . . . . . . . . 699 – dot-depth – . . . . . . . . 510, 1083 – Straubing – . . . . . . . . . . 1083 – Wadge – . . . . . . . . . . . . 700 higher-order – collapsible pushdown automaton 1315 – pushdown automaton . . . . 1315 – stack . . . . . . . . . . . . . 1313 Higman–Haines set . . . . . . . . 435 history tree . . . . . . . . . . . . . 210 HNN extension . . . . . . . . . . 864 homogeneous partition . . . . . . . 324

Hopcroft’s algorithm – configuration . . . – splitter . . . . . . – waiting set . . . . hyper-arithmetical set hyperbolic – boundary . . . . . – graph . . . . . . . – plane . . . . . . . – space . . . . . . .

xxxv

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

854, 857, 883 . . . . . 891 . . . . . 873 . . . . . 881

346 345 345 793

—I— I -closed . . . . . . . . . . . . . 1251 ideal . . . . . . . . . . . . . . 467, 730 idempotent . . . . . . . . . . . . 1024 – tree . . . . . . . . . . . . . . . 814 identity . . . . . . . . . . . . . . . 618 – aperiodic – . . . . . . . . . . 47, 72 – classical – . . . . . . . . . . . 749 – group – . . . . . . . . . . . . . 741 – matrix star – . . . . . . . . . . 734 – natural – . . . . . . . . . . . 46, 67 – of unary algebras . . . . . . . . 530 – permutation – . . . . . . . . . 734 – product star – . . . . . . . . . 731 – profinite . . . . . . . . . . . . 629 – rational – . . . . . . . . . . . . . 45 – sum star – . . . . . . . . . . . 731 – trivial – . . . . . . . . . 46, 67, 69 image . . . . . . . . . . . . . . 1109 implicit operation . . . . . . . . . 629 incidence matrix . . . . . . . . . . 961 inclusion problem . . . . . . . . 1284 incompressible – pair . . . . . . . . . . . . . . . 550 – set . . . . . . . . . . . . . . . 550 independent looping tree automaton (ILTA) . . . . . . . . . . . . . 770 index of a partition . . . . . . . . . 339 inference rule . . . . . . . . . . 1392 – instantiation . . . . . . . . . 1393 – item . . . . . . . . . . . . . 1392 initialisable set . . . . . . . . . . . 714

Index

xxxvi

in-merge – graph morphism . . – labelled – . . . . . – of graph . . . . . . inner-cuts u . . . . . . input alphabet . . . . . in-split – labelled – . . . . . – of graph . . . . . . integer – arithmetic . . . . . – multiplicatively – dependent – . . . – independent – . . intersection . . . . . . inverse – automaton . . . . . – monoid . . . . . . . – semigroup . . . . . involutive automaton . . irreducible – automaton . . . . . – matrix . . . . . . . – shift space . . . . . – substitution . . . . . isomorphism problem . isoperimetric inequality iterated – function system . . – monodromy group .

kernel . . . . . . . . . . . . . . . . 84 . 992 – congruence . . . . . . . . . . . 619 1000 – group – . . . . . . . . . . . . . 636 . 992 – k - – . . . . . . . . . 916, 919, 954 . 657 Kleene – lattice . . . . . . . . . . . . . 753 . . 81 – monoid . . . . . . . . . . . . . . 73 . . . . . 1000 – ’s theorem . . . . . . 39, 43, 62, 66 . . . . . . 992 Kleene algebra . . . . . . . . 751–754 Kleene–Schützenberger theorem . . 118 . . . . . 1205 Kondacs–Watrous quantum finite automaton (KWQFA) . . . . 1465 . . . . . . 948 Krieger automaton . . . . . . . . 1005 . . . . . . 948 Kripke structure . . . . . . . . . 1430 . . . . . 1190 Krohn–Rhodes – complexity . . . . . . . . . . . 640 – theorem . . . . . . . . . . . . 606 . . . . . . 844 . . . . 844, 853 Kronecker theorem . . . . . . . . . 948 . . . . . . 865 —L— . . . . . . 844 label . . . . . . . . . . . . . . . . . 5 . . . . . . 920 Lagrange implicit function theorem . . . . . . . . . 475–476 . . . . . . 961 . . . 642, 1007 lamplighter group . . . . . . . . . 888 . . . . . . 961 Landau’s function . . . . . . . . . 419 . . . . 842, 883 language . . . . . . . . . . . . . . . 4 – accepted – . . . . . . . . . . . . 43 . . . . 880, 883 – base . . . . . . . . . . . . . . 833 – class . . . . . . . . . . . . . . 580 . . . . . 1107 – computable operation . . . . . 782 . . . . . . 897 – conjugacy . . . . . . . . . . . 786 – conjunctive – . . . . . . . 774–777 —J— – linear – . . . . . . . . 777–778 J1 . . . . . . . . . . . . 570–571, 639 – constant term . . . . . . . . . . . 42 Jensen’s inequality . . . . . . . . . 477 – context-free – 88, 464, 772–774, 1481 join operation (of two lattices) . . . 599 – continuous operation . . . 766–768 Julia set . . . . . . . . . . . . . . 897 – convergent sequence . . . . . . 766 —K— – D0L – . . . . . . . . . . . . . 103 k -automaton . . . . . . . . . . . . 914 – definite – tree – . . . . . . . . . . . . 808 k -block . . . . . . . . . . . . . . 990  implicit signature . . . . . . . . 637 – word – . . . . . . . . . . . . 807 k -recognisable set . . . . 947, 956, 972 – dense – . . . . . . . . . . . . . 588 Kazhdan group . . . . . . . . . . . 902 – Dyck – . . . . . . . . . . . 777, 862 . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

Index

– equation . . . . . . . . 29–32, 765 – exclusive stochastic . . . . . 1471 – finite – . . . . . 419–420, 425, 437 – operation problem . . . . . . 435 – forest – – recognised – . . . . . . . . . 821 – fragile – . . . . . . . . . . . . 788 – group – . . . . . . . . . . 791, 1470 – implementable – . . . . . . . 1252 – infinite – – operation problem . . . . . . 430 – label testable – . . . . . . . . . 821 – lattice . . . . . . . . . . . . . 578 – left – . . . . . . . . . . . . . . 339 – local – . . . . . . . . . . . . . . 24 – max-regular – . . . . . . . . . 723 – monomial – . . . . . . . . . . 678 – monotone operation . 768–769, 791 – nesting tree – . . . . . . . . . . 833 – non-counting – . . . . . . . . . 509 – nonstochastic – . . . . . . . . 1480 – of a hedge automaton . . . . . 256 – path testable – . . . . . . . . . 831 – P-complete – . . . . . . . . . . 778 – periodic – . . . . . . . 1469, 1472 – picture – . . . . . . . . . . . . 304 – closure properties . . . . . . 312 – complement . . . . . . . . . 313 – local – . . . . . . . . . . . . 307 – logic formula . . . . . . . . 311 – recognisable – . . . . . . . . 308 – piecewise-testable – . 571, 584, 639 – polynomial – . . . . . . . 678, 1477 – prime – . . . . . . . 778, 788, 1468 – quotient . . . . . . . . . . 58, 573 – rational – . . . . . . . . . 4, 41, 857 – recognisable – . . . . . . 6, 43, 857 – tiling – . . . . . . . . . . . . 308 – recursive – 779, 782–789, 1481–1482 – recursively enumerable – . 782–789, 1471, 1481 – regional – . . . . . . . . . . . 324 – regular – . . . . . . . . . . . 4, 459

xxxvii

– reversible – . . . . . . . . . . 433 – right – . . . . . . . . . . . . . 339 – †1 -language . . . . . . . . . . 587 – slender – . . . . . . . . . . . . 588 – sparse – . . . . . . . . . . . . 588 – star-free – 420, 433, 503, 600, 1080 – stochastic – 1463, 1471, 1473, 1477 – subregular – . . . . . 420, 433, 437 – super-turtle – . . . . . . . . . . 521 – tree – – NFTA-recognisable . . . . . 239 – recognisable – . . . . . . 238, 805 – regular – . . . . . . 243, 268, 805 – turtle – . . . . . . . . . . . . . 520 – unambiguous – . . . . . . . . . 519 – unary – 419–420, 425–426, 428, 437, 440, 1463, 1469–1471, 1473–1474, 1483 – operation problem . . . . 430, 435 – universal witness – . . . . . . . 432 – valid accepted – computation (VALC) . . . . . . . . . . . . 776 – variety . . . . . . . . . . . . . 573 – with zero . . . . . . . . . . . . 587 – word-based operation . . . . . 768 Las Vegas computation . . . . . 1483 latest appearance – automaton (LAA) . . . . . . . 202 – record (LAR) . . . . . . . . . 202 lattice – generated by a set . . . . . . . 599 – of languages . . . . . . . . . . 578 Latvian quantum finite automaton (LaQFA) . . . . . . . . . . . 1465 leaf transitions . . . . . . . . . . . 239 learner . . . . . . . . . . . . . . . 379 learning – from given data . . . . . . . . 387 – in the limit . . . . . . . . . . . 379 – through a minimally adequate teacher (MAT) . . . . . . . . . . . . . 382 length – litteral- . . . . . . . . . . . . . . 42

Index

xxxviii

– of a path . . . . . . . – of a regular expression – of a word . . . . . . . letter . . . . . . . . . . . – arity . . . . . . . . . – growing – . . . . . . – neutral – . . . . . . . – nullary – . . . . . . . – occurrence . . . . . . level automaton . . . . . lift construction . . . . . limit space . . . . . . . . limited – automaton . . . . . . – nondeterminism . . . – series . . . . . . . . . limitedness problem . . . Lindenmayer system . . . linear – numeration base . . . – ordering . . . . . . . – rational expression . . – recurrence . . . . . . – representation . . . . – temporal logic (LTL) . – time computation linearisation . . . . . linked pair . . . . . . Liouville – number . . . . . . – ’s inequality . . . literal length . . . . . local – automaton . . . . – language . . . . . – picture language . – specifications . . . locally finite – algebra . . . . . . – semigroup . . . . – semiring . . . . . – series . . . . . . .

. . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . .

. 5 locally threshold testable . . . . . 1081 461 logic – definable relation . . . . . . . 1035 . 4 . 4 – first-order – (FOL) . 289, 687, 1031, 1034, 1074 804 – on trees . . . . . . . . . . . 803 965 – with child relations . . . . . 810 504 – formula . . . . . . . . . . . 1035 804 – picture language . . . . . . . 311 . 4 1113 – interpretation . . . 1043, 1045, 1049 – linear temporal – (LTL) . 816, 1081, . 200 1158, 1417 . 897 – monadic second-order – (MSO) 244, 289, 1075, 1163, 1221 . . . . . 428 – MSO-definable . . . . . . . 245 . . . . . 421 – weak – (WMSO) . . . . . . . 290 . . . . . 157 – weighted – . . . . . . . . . . 129 . . . . . 673 – propositional dynamic – (PDL) 1164 . . 793, 1107 – global formula . . . . . . . 1165 – local formula . . . . . . . 1165 . . . . . 950 – path expression . . . . . . 1165 . . . . . 657 – sentence . . . . . . . . . . . 1035 . . . . . . 27 – temporal – . . . . . . . . . . . 228 . . . . . 929 . . . . . 118 logical – interpretation . . . . . . . . . 1092 . . 816, 1081, 1158, 1417 – reduction . . . . . . . . 1092–1093 . . . . . 922 loop . . . . 1161 – complexity . . . . . . . . . . . . 53 . . . . . 721 – in an ! -automaton . . . . . . . 217 – index . . . . . . . . . . . . . . . 53 . . . . . 923 – testing predicate . . . . . . . 1233 . . . . . 922 lower bound technique . . . . 415–418 . . . . . . 42 – communication complexity . . 416 – pumping method . . . . . . . . 416 25, 355, 1018 LR automaton . . . . . . . 1400–1402 . . . . . . 24 Lyndon word . . . . . . . . . . . . 361 . . . . . 307 —M— . . . . 1239 magic number problem . . . . . . 421 . . . . . 634 Mal’cev product . . . . . . . . 602, 631 . . . . . 669 map . . . . . 134 – abelianisation – . . . . . . . . 961 . . . . . 119 – factor – . . . . . . . . . . . . . 972

Index

– in-merging – . . . . . . . . . 1011 – in-splitting – . . . . . . . . . 1011 – out-splitting – . . . . . . . . 1012 – Parikh – . . . . . . . . . . . . 961 – sliding block – . . . . . . . . . 990 Maple . . . . . . . . . . . . . 468, 472 mapping class group . . . . . . . . 879 marker . . . . . . . . . . . . . . . . 89 marking . . . . . . . . . . . . . . . 89 Markov – chain . . . . . . . . . . . . . 1347 – quantum – . . . . . . . . . 1474 – recursive – (RMC) . . . . . 1372 – decision process (MDP) . . . 1350 – recursive – . . . . . . . . . 1377 MAT-learning . . . . . . . . . . . 382 matrix – adjacency – . . . . . . . . 989, 999 – alphabetic – . . . . . . . . . . 999 – block decomposition . . . . . . . 51 – column division – . . . . . . . 993 – context-free – grammar . . . . 320 – density – . . . . . . . . . . . 1460 – elementary equivalent – . 995, 1003 – embedding . . . . . . . . . . . 887 – Hankel – . . . . . . . . . . . . 314 – incidence – . . . . . . . . . . . 961 – irreducible – . . . . . . . . . . 961 – primitive – . . . . . . . . . . . 961 – representation . . . . . . . . . . 87 – row division – . . . . . . . . . 994 – similar – . . . . . . . . . . . 1013 – stabilisation . . . . . . . . . . 674 – stable – . . . . . . . . . . . . . 670 – stochastic – . . . . . . 1462, 1466 – strong shift equivalent – . 995, 1003 – symbolic – elementary equivalent – . . 1013 – strong shift equivalent – . . 1013 – transition – . . . . . . . . . 43, 999 max-plus – automaton . . . . . . . . . . . 154

xxxix

– convex – hull . . . . . . . . . . . . . 162 – set . . . . . . . . . . . . . . 162 – eigenvalue . . . . . . . . . . . 164 – eigenvector . . . . . . . . . 161, 164 – semiring . . . . . . . . . . . . 152 – series . . . . . . . . . . . . . . 154 – spectral theorem . . . . . . . . 159 max-regular language . . . . . . . 723 Mazurkiewicz trace . . . . . . . 1176 McNaughton–Papert theorem 596, 1080 McNaughton–Yamada algorithm 45, 49, 66, 72 Mealy – automaton . . . . . . . . . . . 885 – bireversible – . . . 887, 899–902 – contracting – . . . . . . . . . 891 – dual – . . . . . . . . . . . . 887 – nuclear – . . . . . . . . . . . 891 – reset machine . . . . . . . . 898 – reversible – . . . . . . . 887, 897 – machine . . . . . . . . . . . . . 82 measure – ergodic – . . . . . . . . . . . . 973 – invariant – . . . . . . . . . . . 973 – uniquely ergodic – . . . . . . . 973 measurement – partial – . . . . . . . . . . . 1460 – quantum – . . . . . . . . . . 1459 membership – game . . . . . . . . . . . . . . 276 – problem . . . . . . . . . . . . 842 – rational subset – . . . . . . . 864 message sequence chart (MSC) . 1161 meta-transition . . . . . . . . . . . 181 method – recursive – . . . . . . . . 45, 51, 66 – state-elimination – . . 45, 47, 66, 72 – system-solution – . . 45, 48, 66–67 metric . . . . . . . . . . . . . . . 619 – space . . . . . . . . . . . . . . 619 – see also pseudometric – see also pseudo-ultrametric

xl

Index

military ordering . . . . . . . . 210, 960 minimal – automaton . . . 17, 339, 1005, 1009 – complete – . . . . . . . . . . . 17 – nondeterministic – . . . . . . 368 – dynamical system . . . . . . . 972 – strongly connected component 1007 – weighted finite automaton (WFA) . . . . . . . . . . . . 1122 minimisation of automata . 1190, 1472 min-plus semiring . . . . . . . 152, 669 model checking . . . . . . 1424, 1448 – CFMs . . . . . . . . . . . . 1166 – program complexity . . . . . 1450 monadic – second-order – logic (MSO) 244, 289, 1075, 1163, 1221 – MSO-definable . . . . . . 245 – weak – (WMSO) . . . . . 290 – weighted – . . . . . . . . . 129 – theory of one successor (S1S) 228 – transitive closure logic (MTC) 1079 monoid . . . . . . . . . . . . . . . . 3 – aperiodic – . . . . . . . . . 572, 853 – group-free – . . . . . . . . . 499 – automatic – . . . . . . . . . . 878 – commutative – . . . . . . . . . 570 – countably factorisable – . . . . 745 – finitely – factorisable – . . . . . . . . 734 – generated – . . . . . . . . 61, 64 – free – . . . . . . . . . . . 4, 58, 498 – free profinite – . . . . . . . . . 577 – generating set . . . . . . . . . . 61 – graded – . . . . . . . . . . . 46, 64 – group kernel – . . . . . . . . . 636 – idempotent – . . . . . . . . . . 570 – inverse – . . . . . . . . . . 844, 853 – J-trivial – . . . . . . . . . . . 571 – Kleene – . . . . . . . . . . . . . 73 – non-solvable – . . . . . . . . . 499 – partial – . . . . . . . . . . . . 734

– rational – . . . . . . . . . . . . . 73 – recognisable – . . . . . . . . . . 37 – solvable – . . . . . . . . . . . 499 – syntactic – . . . . . . . 20, 498, 861 – transition – . . . . . . . . . 844, 853 monomial language . . . . . . . . 678 monotone operation . . . 768–769, 791 Moore – equivalence . . . . . . . . . . 343 – ’s algorithm . . . . . . . . . . 343 – slow automaton . . . . . . 344, 351 Moore–Crutchfield quantum finite automaton (MCQFA) . . . . 1465 morphic – composition . . . . . . . . . . . 80 – equivalence problem . . . . . . 102 morphism . . . . . . . . . . . 529, 915 – fixed point . . . . . . . . . 915, 954 – graph – . . . . . . . . . . . . . 992 – injective – . . . . . . . . . . . . 92 – in-merge – . . . . . . . . . . . 992 – length multiplying – . . . . . . 575 – nonerasing (non-erasing) – . 80, 575, 585 – of automata . . . . . . . . . . . 17 – of deterministic automata . . . 844 – of shifts . . . . . . . . . . . . 990 – prolongable – . . . . . . . 915, 954 – recognising – . . . . . . . . . . . 37 – relational – . . . . . . . . . . . 632 – semigroup – . . . . . . . . . . 996 – syntactic – . . . . . . . . . . . . 20 – uniform – . . . . . . . 80, 915, 954 Morse–Hedlund theorem . . . . . . 955 mu-calculus . . . . . . . . . . . 1231 Muller – recurrence condition . . . . 193, 267 – set . . . . . . . . . . . . . . . 193 multicontext . . . . . . . . . . . . 806 multidimensional – automatic set . . . . . . . . . . 918 – automaton . . . . . . . . . . . 918 multi-head finite automaton . . . . 426

Index

multiple initial state . . multiplication (exterior) multiplicatively – dependent set . . . – independent set . . multitape automaton . . multi-valued function . Myhill–Nerode – relation . . . . . . . – atom . . . . . . . – theorem . . . . . .

. 420–421, 442 . . . . . . . 64 . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

948 948 . 82 470

. . . . . . 412 . . . . . . 413 . . . . . . 376

—N— natural language processing (NLF) 1383 Nayak quantum finite automaton (NaQFA) . . . . . . . . . . . 1465 NC . . . . . . . . . . . . . . . . . 497 NC1 . . . . . . . . . . . . . . . . 497 negative example . . . . . . . . . 379 Nerode – automaton . . . . . . . . . . . . 17 – congruence . . . . . . . . . . . 340 – equivalence . . . . . . . . . . . 18 net . . . . . . . . . . . . . . . . . 620 – Cauchy – . . . . . . . . . . . . 620 – convergence . . . . . . . . . . 620 network – communication – . . . . . . . 1149 nilpotent group . . . . . . 855, 864, 884 Nivat – conjecture . . . . . . . . . . . 956 – theorem . . . . . . . . . . . . 127 nondeterministic – finite – automaton (NFA) 411, 769, 1418, 1462 – Chrobak normal form 420–421, 425, 431 – one-way – (1NFA) . . . 1462 – two-way – (2NFA) . . . 1462 – tree automaton (NFTA) . . . 239 – finite hedge automaton (NFHA) 256 – message complexity . . . . . . 414

xli

– minimal automaton . . . . . . 368 – quantum finite automaton (NQFA) . . . . . . . . . . . 1471 – state complexity . . . . . . 412, 430 – transition complexity . . . . . . 414 nonerasing (non-erasing) morphism 80, 575, 585 non-self dual . . . . . . . . . . . . 707 – class . . . . . . . . . . . . . . 709 – degree . . . . . . . . . . . . . 708 nonstochastic language . . . . . . 1480 non-uniformity . . . . . . . . . . . 496 normal subgroup . . . . . . . . . . 850 normalisation . . . . . . . . . . . 950 number – ˇ - – . . . . . . . . . . . . . . 951 – cardinal – . . . . . . . . 695–696 – decision diagram (NDD) . . . . 948, 1192–1194, 1196–1199, 1201 – ordinal – . . . . . . . . . . . . 695 – Parry – . . . . . . . . . . . . . 951 – Perron – . . . . . . . . . . . . 962 – Pisot – . . . . . . . . . . . . . 950 numeration – base . . . . . . . . 949, 1191, 1202 – Bertrand – . . . . . . . . . . 951 – linear – . . . . . . . . . . . 950 – system – abstract – (ASN) . . . . . . . 959 – Pisot – . . . . . . . . . . . . 950 —O— observability condition . observation table . . . . occurrence of a letter . . ! -automatic – presentation . . . . – structures – first-order theory . ! -automaton . . . . . . – alternating – . . . . – dual . . . . . . . – state-controlled –

. . . . . 1230 . . . 377–378 . . . . . . . 4 . . . . . . 227 . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

227 189 222 223 222

xlii

Index

– transition-controlled – . . . . 223 – weak – . . . . . . . . . . . . 224 – cascade . . . . . . . . . . . . . 199 – conditional determination . . . 219 – finite-state – . . . . . . . . . . 192 – loop . . . . . . . . . . . . . . 217 – strongly unambiguous – . . . . 208 – tower . . . . . . . . . . . . . . 218 – unambiguous – . . . . . . . . . 208 – universal – . . . . . . . . . . . 222 – wall . . . . . . . . . . . . . . 218 – with output . . . . . . . . . . . 198 ! -concatenation . . . . . . . . . . 190 ! -idempotent Conway semiring . . 731 ! -language . . . . . . . . . . . . . 189 – initial congruence relation . . . 213 – parity index . . . . . . . . . . 219 – Rabin index . . . . . . . . . . 219 – regular . . . . . . . . . . . . . 194 – saturation . . . . . . . . . . . 215 – syntactic congruence . . . . . . 214 ! -power . . . . . . . . . . . . 190, 577 ! -product . . . . . . . . . . . . . 190 ! -regular expression . . . . . . . . 195 ! -semigroup . . . . . . . . . . . . 720 – free – . . . . . . . . . . . . . . 721 – pointed – . . . . . . . . . . . . 721 – syntactic – . . . . . . . . . . . 721 ! -word . . . . . . . . . . . . 189–190 operation – Boolean – . . . . . . . . 502, 1190 – explicit – . . . . . . . . . . . . 629 – implicit – . . . . . . . . . . . . 629 – problem . . . . . . . 429, 444–445 – finite language . . . . . . . . 435 – for finite automata . . . . 430, 445 – for regular expressions . . . . 440 – infinite language . . . . . . . 430 – unary language . . . . . 430, 435 – star – . . . . . . . . . . . . . . 502 order – problem . . . . . . . . . . 842, 883 – -type . . . . . . . . . . . . . . 696

ordered semigroup ordering – elimination – . – genealogical – – linear – . . . . – military – . . – radix – . . . . – term – . . . . ordinal number . . out-merge . . . . – labelled – . . output alphabet . . out-split – labelled . . . out-splitting map .

. . . . . . 790, 1009 . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

424, 468 . . 960 . . 657 210, 960 . . 960 . . 467 . . 695 . . 994 . 1001 . . . 81

. . . . . . . . . . . . . . . .

—P— Parikh map . . . . . . . parity – automaton . . . . . – game . . . . . . . . – index – of an ! -language – problem . . . . . – recurrence condition parse – forest . . . . . . . . – phrase . . . . . . . – tree . . . . . . . . . parsing . . . . . . . . . – strategy . . . . . . – tabular – method . . partial – Conway semiring . – signature . . . . . . partition . . . . . . . . – coarser – . . . . . . – function . . . . . . – homogeneous – . . – index . . . . . . . . – refinement . . . . . path . . . . . . . . . . – accepting – . . . . .

1001 1012

. . . . . . 961 . . . . . 1220 . . . . 221, 274 . . . . . . 219 . . . . . . 288 . . . . 193, 268 . . . . . .

. . . . . .

. . . .

. . 1387 . . 1384 . . 1384 . . 1384 1386, 1401 . . . 1387

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . 731 . . 356 339, 992 . . 339 . 1396 . . 324 . . 339 . . 339 . . . 5 . . . 5

Index

– end . . . . . . . . . . . . . . . . 5 – final – . . . . . . . . . . . . . . 5 – in a graph . . . . . . . . . . . 989 – initial – . . . . . . . . . . . . . . 5 – left-recurring – . . . . . . . . . 207 – length . . . . . . . . . . . . . . 5 – origin . . . . . . . . . . . . . . 5 – successful – . . . . . . . . . . . 5 P-complete language . . . . . . . . 778 perfect field . . . . . . . . . . . . 934 period . . . . . . . . . . . . . 954, 992 periodic – array . . . . . . . . . . . . . . 974 – dynamical system . . . . . . . 972 – inside a subset of Nd . . . . . . 974 – language . . . . . . . . 1469, 1472 – locally – set . . . . . . . . . . 975 – tiling . . . . . . . . . . . . . . . 93 – word . . . . . . . . . . . . . . 954 – see also ultimately periodic Perron – number . . . . . . . . . . . . . 962 – theorem . . . . . . . . . . . . 961 Perron–Frobenius’ theorem . . . . 962 p -group . . . . . . . 499, 855, 888, 892 picture . . . . . . . . . . . . . . . 304 – bordered – . . . . . . . . . . . 304 – domain . . . . . . . . . . . . . 304 – homogeneous – . . . . . . . 304 – grammar . . . . . . . . . . . . 318 – language . . . . . . . . . . . . 304 – closure properties . . . . . . 312 – complement . . . . . . . . . 313 – local – . . . . . . . . . . . . 307 – logic formula . . . . . . . . 311 – recognisable – . . . . . . . . 308 – regional – . . . . . . . . . . . 324 pipeline . . . . . . . . . . . . . 1240 Pisot – number . . . . . . . . . . . . . 950 – numeration system . . . . . . . 950 plant . . . . . . . . . . . . . . . 1227 play . . . . . . . . . . . . . . . . 274

xliii

pointlike – conjecture . . . . . . . . . . . 638 – subset . . . . . . . . . . . . . 638 Polish space . . . . . . . . . . . . 698 polynomial – Artin–Schreier – . . . . . . . . 938 – closure . . . . . . . . . . . . . 678 – language . . . . . . . . . 678, 1477 – series . . . . . . . . . . . . . . . 64 position automaton . . . . . . . . . 421 positional strategy . . . . . . . 222, 275 positionally determined game . . . 275 positive example . . . . . . . . . . 379 postselection . . . . . . . . . . . 1482 power series . . . . . . . . . . . . . 64 – algebraic – . . . . . . . . . . . 932 – generalised – . . . . . . . . . . 938 – Hahn’s – . . . . . . . . . . . . 938 powerset construction . . . . . 412, 418 prebase . . . . . . . . . . . . . . . . 73 preclone . . . . . . . . . . . . . . 813 – finitary – . . . . . . . . . . . . 815 – free – . . . . . . . . . . . . . . 814 prefix – -closed . . . . . . . . . . . . . 376 – common – . . . . . . . . . . . 363 – metric . . . . . . . . . . . . . 853 – -suffix decomposition . . . . . 363 – tree acceptor . . . . . . . . . . 376 prefix code . . . . . . . . . . . . . 527 – maximal – . . . . . . . . . . . 527 – synchronised – . . . . . . . . . 527 pre-fixed point . . . . . . . . . . . 747 – see also fixed point preorder . . . . . . . . . . . . . 1009 preperiod . . . . . . . . . . . . . . 954 Presburger arithmetic 957, 1193, 1199, 1201 presentation – ! -automatic – . . . . . . . . . 227 prime language . . . . . 778, 788, 1468 primitive – matrix . . . . . . . . . . . . . 961

xliv

Index

– substitution . . . . . . . . . 642, 961 Pringsheim’s theorem . . . . . . . 470 priority function . . . . . . . . . . 193 probabilistic – finite automaton (PFA) . . . . 1462 – pushdown automaton (PPDA) 1390 problem – bounded section – . . . . . . . 672 – boundedness – . . . . . . . . . 673 – for CFMs . . . . . . . . . 1153 – Church – . 1218–1220, 1222–1224, 1226–1227, 1234–1235 – conjugacy – . . . . . . . . . . 842 – decision – . . . . . . . . . 842, 873 – HD0L (ultimate) periodicity problem . . . . . . . . . . . . 977 – emptiness – 241, 1265, 1472–1473, 1480 – inclusion – . . . . . . . . . . 1284 – isomorphism – . . . . . . . 842, 883 – limitedness – . . . . . . . . . . 673 – membership – . . . . . . . . . 842 – rational subset – . . . . . . . 864 – model-checking – for CFMs . 1153 – NP-complete – . . . . . . . . 1471 – order – . . . . . . . . . . . 842, 883 – Post correspondence – . . . . . 864 – promise – . . . . . . . . . . 1482 – reachability – for CFMs 1152–1153, 1167 – universality – . . . . . . . . 1284 – word – 633, 842, 873, 880, 883, 1477 – generalised – . . . . . . . 842, 883 – in automata groups . . . . . . 890 – over a monoid . . . . . . . . 499 – submonoid . . . . . . . 862–863 product . . . . . . . . . . . . . . 3, 499 – 2-sided semidirect – . . . . . . 606 – concatenation – . . . . . . . . . 4 – deterministic – . . . . . . . . . 602 – direct – . . . . . . . . . . . . . 617 – Mal’cev- . . . . . . . . . . 602, 631 – prefix . . . . . . . . . . . . . . 499

– semidirect – . . . . . . . . 604, 640 – suffix . . . . . . . . . . . . . . 499 – unambiguous – . . . . . . . . . 602 – wreath – . . . . . . . . . . . . 605 – principle . . . . . . . . . . . 605 profinite – algebra . . . . . . . . . . . . . 624 – C-identity . . . . . . . . . . . 581 – ordered . . . . . . . . . . . 581 – distance . . . . . . . . . . . . 576 – equality . . . . . . . . . . . . 580 – equation . . . . . . . . . . . . 578 – free – monoid . . . . . . . . . 577 – Hopfian algebra . . . . . . . . 627 – identity . . . . . . . . . . . . . 629 – inequality . . . . . . . . . . . 580 – seft-free algebra . . . . . . . . 628 – topology . . . . . . . . . . . . 622 – uniformity . . . . . . . . . . . 622 program – branching . . . . . . . . . . . 501 – super-turtle – . . . . . . . . . . 520 – turtle – . . . . . . . . . . . . . 520 projection . . . . . . . . . 88, 507, 702 projective hierarchy . . . . . . . . 702 promise problems . . . . . . . . 1482 proof system – Arthur–Merlin – . . . . . . . 1481 – interactive – . . . . . . . . . 1480 property – decidable – . . . . . . . . . . . . 35 – geometric – . . . . . . . . . . 875 prophetic automaton . . . . . . . . 194 propositional dynamic logic (PDL) 1164 – global formula . . . . . . . . 1165 – local formula . . . . . . . . . 1165 – path expression . . . . . . . 1165 Prouhet–Thue–Morse substitution . 644 Průša grid grammar (PGG) . . . . . 323 pseudoidentity . . . . . . . . . . . 629 – basis . . . . . . . . . . . . . . 629 pseudometric . . . . . . . . . . . . 619 pseudoquasivariety . . . . . . . . . 618

Index

pseudo-ultrametric . . . . . . . . . 619 – pro-Q – . . . . . . . . . . . . . 623 pseudovariety . . . . . 570, 618, 1027 – C-pseudovariety of stamps . . . 582 – decidable – . . . . . . . . . . . 630 – generated by . . . . . . . . . . 618 – has computable  -closures . . . 634 – of finite semigroups . . . . . . 575 – order computable – . . . . . . . 635 –  -full – . . . . . . . . . . . . . 634 – weakly – reducible – . . . . . . . . . . 634 – tame – . . . . . . . . . . . . 634 pseudoword . . . . . . . . . . . . 615 p -substitution . . . . . . . . . . . 975 Puiseux series . . . . . . . . . . . 470 pumping lemma . . . . . . . . . 6, 920 pure induction . . . . . . . . . . . 756 purely substitutive word . . . . . . 954 pushdown automaton (PDA) 1385, 1389 – deterministic – . . . . . . . . 1389 – probabilistic – (PPDA) . . . . 1390 – reduced – . . . . . . . . . . 1389 – tabulation . . . . . . . . . . 1390 – valid prefix property . . . . . 1395 – weighted – (WPDA) . . . . . 1389 puzzle grammar . . . . . . . . . . 319 —Q— quadtree . . . . . . . . quantale . . . . . . . . quantifier – 9>@0 . . . . . . . . – 9>@1 . . . . . . . . – 9mod . . . . . . . . – elimination . . . . . – generalised – . . . . – Ramsey – . . . . . – unary cardinality . . quantum – automaton – 1.5-way – . . . . – alternating . . . .

. . . . . . 323 . . . . . . 744 . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . . . . .

1045 1045 1045 1032 1048 1048 1048

xlv

– computation . . . . . . 1457–1458 – fingerprints . . . . . . . . . . 1467 – finite automaton (QFA) – bistochastic – (BiQFA) . . 1465 – fully – (CiQFA) . . . . . . 1465 – Kondacs–Watrous – (KWQFA) . . . . . . . . . . 1465 – Latvian – (LaQFA) . . . . 1465 – Moore–Crutchfield – (MCQFA) . . . . . . . . . . 1465 – Nayak – (NaQFA) . . . . . 1465 – nondeterministic – (NQFA) 1471 – one-way nondeterministic – (1NQFA) . . . . . . . . . . 1471 – one-way – (1QFA) . . . . . 1463 – general – . . . . . . . . . 1465 – with quantum and classical states (1QCFA) . . . . . . . . . . . 1465 – two-way – (2QFA) . . . . . 1474 – with ancilla qubits (QFA-A) 1465 – with control language (QFA-CL) . . . . . . . . . . 1465 – Markov chain . . . . . . . . 1474 – measurement . . . 1459, 1463, 1465 – state . . . . . . . . . . . . . 1459 – superoperator . . . . . 1461, 1466 – system . . . . . . . . . . . . 1458 quasi-automatic function . . . . . . 940 quasi-convex subgroup . . . . . . . 881 quasi-Conway semiring . . . . . . . 63 quasi-geodesic . . . . . . . . . . . 883 quasi-identity . . . . . . . . . . . 742 quasi-isometry . . . . . . . . . 863, 875 quasi-variety . . . . . . . . . . 742, 753 queue-content decision diagram (QDD) . . . . . . . . . . . . 1154 quotient – of a language . . . . . . . . 58, 573 – of a series . . . . . . . . . . . . 70

—R— 1478 Rabin 1471 – index . . . . . . . . . . . . . . 219

xlvi

Index

– pair . . . . . . . . . . . . . . . 193 – recurrence condition . . . . 193, 268 – ’s basis theorem . . . . . . . 1042 – ’s theorem . . . . . . . . . . 1041 radius of convergence . . . . . . . 469 radix ordering . . . . . . . . . . . 960 Ramsey’s theorem . . . . . . . . . 215 random walk . . . . . . . . . 892, 1475 – self-similar – . . . . . . . . . . 892 rank – function . . . . . . . . . . . . 204 – of a D-class . . . . . . . . . 1024 – of a free profinite group . . 643, 645 rational – closure . . . . . . . . . . . . 45, 65 – composition . . . . . . . . . . . 89 – constraint . . . . . . . . . . . 865 – cross-section . . . . . . . . . . 877 – element . . . . . . . . . . . . 737 – expression . . . . . . . . . 41, 741 – linear – . . . . . . . . . . . . 27 – weighted – . . . . . . . . . . . 65 – formal power series . . . . . 1113 – function . . . . . . . . . . 84, 931 – identity . . . . . . . . . . . . . . 45 – language . . . . . . . . . 4, 41, 857 – monoid . . . . . . . . . . . . . . 73 – operation . . . . . . . . . . . . 737 – relation . . . . . . . . . . . . . . 86 – series . . . . . . . . . . 65, 120, 737 – subset . . . . . . . . . . 61, 72, 86 – transduction . . . . . . . . . . . 82 – weighted – expression . . . . . . 74 reachability relation . . . . . . . . 554 real – arithmetic . . . . . . . . . . 1205 – number automaton (RNA) . . . 972 – vector automaton (RVA) . . . . 972, 1204–1205, 1208–1209, 1211 realisability . . . . . . . . . . . 1171 – specification – MSO – . . . . . . . . . . . 1173

– PDL – . . . . . . . . . . . 1174 – sequential – . . . . . . . . 1172 realisation . . . . . . . . . . . . . . 87 recognisable – element . . . . . . . . . . . . 737 – language . . . . . . . . . 6, 43, 857 – tiling – . . . . . . . . . . . . 308 – monoid . . . . . . . . . . . . . . 37 – picture language . . . . . . . . 308 – series . . . . . . . . . . 66, 116, 737 – set . . . . . . . . . . . 62, 72, 949 – tree language . . . . . . . . . . 238 recognition . . . . . . 1388, 1391, 1400 recurrence condition . . . . . . . . 192 – Büchi – . . . . . . . . . . 193, 267 – co-Büchi – . . . . . . . . . 193, 267 – Muller – . . . . . . . . . . 193, 267 – parity – . . . . . . . . . . . 193, 268 – Rabin – . . . . . . . . . . 193, 268 – Streett – . . . . . . . . . . 193, 268 – transition – . . . . . . . . . . . 193 – weak – . . . . . . . . . . . . . 193 recursion scheme . . . . . . 1304, 1307 – Damm-safe – . . . . . . . . . 1335 – homogeneous – . . . . . . . 1334 – safe – . . . . . . . . . . . . 1334 recursive – language . 779, 782–789, 1481–1482 – Markov – chain (RMC) . . . . . . . . 1372 – decision process (MDP) . . 1377 recursively enumerable – language . . . 782–789, 1471, 1481 – pseudovariety – completely  -tame – . . . . . 634 –  -tame – . . . . . . . . . . . 634 reduced – automaton . . . . . . . . . . 1005 – expression . . . . . . . . . . 46, 66 – word . . . . . . . . . . . . . . 843 reduction – of automata . . . . . . . . . 1008 – relation . . . . . . . . . . . . . 701

Index

refinement of a partition . . . . . . 339 region . . . . . . . . . . . . . . 1267 – automaton . . . . . . . . . . 1269 – graph . . . . . . . . . . . . . 1269 regional – picture . . . . . . . . . . . . . 324 – -tile grammar (RTG) . . . . . . 324 regular – D-class . . . . . . . . . . . . 1024 – determinacy . . . . . . . . . . 221 – expression . . . . 42, 411, 459, 502 – 2D – . . . . . . . . 306, 320, 322 – defined by grammar . . . . . 463 – length . . . . . . . . . . . . 461 – size . . . . . . . . . . . . . 461 – language . . . . . . . . . . . 4, 459 – sequence . . . . . . . . . . . . 971 – tree . . . . . . . . . . . . . . . 287 – grammar . . . . . . . . . 246, 258 – language . . . . . . 243, 268, 805 – uncollapsible – expression . . . 462 – winning condition . . . . . . . 221 Reiterman’s theorem . . . . . . . . 582 relation – congruence . . . . . . . . . . 1050 – logically definable – . . . . . 1035 – regular – . . . . . . . . . . . 1036 – synchronous rational – . . . . 1058 relational – morphism . . . . . . . . . . . 632 – structure . . . . . . . . . . . 1073 representation . . . . . . . . . . . . 66 – linear – . . . . . . . . . . . . . 118 – normal – . . . . . . . . . . . . 950 – of a series . . . . . . . . . . . 118 – S- – . . . . . . . . . . . . . . 959 – U - – . . . . . . . . . . . . . . 950 reset – threshold . . . . . . . . . . . . 534 – word . . . . . . . . . . . . . . 525 residual automaton . . . . . . 369, 1005 residually finite group . . . . . . . 855 residuation . . . . . . . . . . . . . 755

xlvii

restarting automaton . . . . . . . . 427 retract . . . . . . . . . . . . . . . 644 return word . . . . . . . . . . . . 963 reversal automaton . . . . . . . . . 341 reverse polish length . . . . . . . . 416 reversible – automaton . . . . . . . . . 433, 442 – language . . . . . . . . . . . . 433 – Mealy automaton . . . . . 887, 897 rewriting – rule – isometric – . . . . . . . . . . 324 – system – confluent – . . . . . . . . . . 843 – length-reducing etc. . . . . 860 Ridout’s theorem . . . . . . . . . . 925 ring . . . . . . . . . . . . . . . . . 3 road colouring problem . . . 547–548 root . . . . . . . . . . . . . . . . 266 – of a vertex in a-graph . . . . . 550 rotating limited automaton . . . . . 429 Roth’s theorem . . . . . . . . . . . 923 run – DAG . . . . . . . . . . . . . . 203 – of a tree automaton . . . . . 267, 770 – of a TWA . . . . . . . . . . . 251 – of an alternating tree automaton 271 – of an NFHA . . . . . . . . . . 256 – of an NFTA . . . . . . . . . . 239 – tree – core of a – . . . . . . . . . . 208 – labelled compressed – . . . . 206 —S— safety properties . . . . . . . . . 1428 Safra’s construction . . . . . . . . 197 Sakarovitch conjecture . . . . . . . 862 Sakoda–Sipser problem . . . . . . 425 satisfiability . . . . . . . . 1424, 1448 saturated subset . . . . . . . . . . 339 Schreier graph . . . . 848, 861, 873, 891 Schützenberger’s theorem . . . . . 600

xlviii

Index

self dual . . . . . . . . . . . . . . 707 – class . . . . . . . . . . . . . . 709 – degree . . . . . . . . . . . . . 708 self-similarity biset . . . . . . . . . 895 semi-automaton . . . . . . . . . . 344 semidirect product . . . . . . . 604, 640 semigroup . . . . . . . . . . . . . . 3 – aperiodic – . . . . . . . . . . . 626 – automatic – . . . . . . . . . . 878 – completely regular – . . . . . . 639 – free – . . . . . . . . . . . . . . . 4 – inverse – . . . . . . . . . . . . 865 – left-zero – . . . . . . . . . . . 585 – locally finite – . . . . . . . . . 669 – morphism . . . . . . . . . . . 996 – of an automaton . . . . . . . . 885 – ordered – . . . . . . . . . 790, 1009 – projective profinite – . . . . . . 645 – simple – . . . . . . . . . . . . 790 – stable – . . . . . . . . . . . . . 586 – syntactic – . . . . . 575, 1009–1010 – torsion . . . . . . . . . . . . . 669 – transition – . . . . . . . . . . 1009 – zero . . . . . . . . . . . . . 1024 semilattice . . . . . . . . . . . . . 570 – order . . . . . . . . . . . . . . 746 semilinear sets . . . . . . . . . . . . 36 semimodule . . . . . . . . . . . . 125 seminearring . . . . . . . . . . . . 826 – induced – . . . . . . . . . . . 828 semiring . . . . . 3, 63, 115, 730, 1387 –  -semiring . . . . . . . . . . . 731 – left-handed inductive – . . . 751 – partial – . . . . . . . . . . . 731 – right-handed inductive – . . . 751 – symmetric inductive – . . . . 751 – Boolean – . . . . . . . 3, 730, 1388 – commutative – . . . . . . . . . 730 – complete – . . . . . . . . . . . 744 – complete iteration . . . . . . . 745 – completely idempotent – . . . . 744 – continuous – . . . . . . . . . . 746 – iteration . . . . . . . . . . . 747

– Conway – . . . . . . . 63, 72, 731 – countably complete – . . . . . 744 – countably complete iteration . . 745 – countably idempotent – . . . . 744 – dual – . . . . . . . . . . . 736, 743 – formal series . . . . . . . . 735, 743 – idempotent – . . . . . . . . . . 730 – iteration . . . . . . . . . . . . 741 – locally finite – . . . . . . . . . 134 – matrix – . . . . . . . . . . 733, 743 – max-plus – . . . . . . . . . . . 152 – min-plus – . . . . . . . . . 152, 669 – of binary relations . . . . . . . 730 – of languages . . . . . . . . . . 730 – ! -continuous – . . . . . . . . . 746 – ! -continuous iteration . . . . . 747 – ! -idempotent Conway – . . . . 731 – ! -idempotent iteration . . . . . 742 – ordered – . . . . . . . . . . . . 746 – partial – Conway – . . . . . . . . . . 731 – iteration – . . . . . . . . . . 741 – partial iterative – . . . . . . . . 732 – polynomial – . . . . . . . . . . 735 – quasi-Conway – . . . . . . . . . 63 – sum ordered – . . . . . . . . . 746 – symmetric partial iterative – . . 736 – tropical – . . . . . . . . . . 669, 730 – zero-sum free – . . . . . . . . 731 semistructured data . . . . . . . 1088 sentence . . . . . . . . . . . . 245, 592 sentential form . . . . . . . . . 319, 464 separator symbol . . . . . . . . . 1202 sequence – Beatty – . . . . . . . . . . . . . 94 – characteristic – . . . . . . . . . 954 – regular – . . . . . . . . . . . . 971 – Thue–Morse – . . . . . . . . . 914 sequential – automaton . . . . . . . . . . . 155 – series . . . . . . . . . . . . . . 155 – transducer . . . . . . . . . . . . 81 serialisation . . . . . . . . 1192, 1203

Index

series . . . . . . . . . . . . . . . . . 64 – characteristic – . . . . . . . . . 735 – coefficient in a – . . . . . . . . . 64 – constant term . . . . . . . . . . . 65 – formal – . . . . . . . . . . 735, 743 – limited – . . . . . . . . . . . . 157 – locally finite – . . . . . . . . . 119 – max-plus – . . . . . . . . . . . 154 – polynomial – . . . . . . . . . . . 64 – proper – . . . . . . . . . . 65, 735 – quotient . . . . . . . . . . . . . 70 – rational – . . . . . . . . 65, 120, 737 – rational formal power – . . . 1113 – recognisable – . . . . . 66, 116, 737 – representation . . . . . . . . . 118 – sequential – . . . . . . . . . . 155 – support . . . . . . 64, 116, 138, 735 – unambiguous – . . . . . . . . . 155 set . . . . . . . . . . . . . . . . . 414 – analytic – . . . . . . . . . . . 702 – Borel – . . . . . . . . . . . . . 698 – clopen – . . . . . . . . . . . . 620 – complete . . . . . . . . . . . . 701 – compressible . . . . . . . . . . 550 – constraints . . . . . . . . . . . 770 – difference . . . . . . . . . . 1190 – exceptional – . . . . . . . . . . 471 – Higman–Haines – . . . . . . . 435 – hyper-arithmetical – . . . . . . 793 – incompressible- . . . . . . . . 550 – initialisable – . . . . . . . . . . 714 – Julia – . . . . . . . . . . . . . 897 – k -recognisable – . . . 947, 956, 972 – locally periodic – . . . . . . . . 975 – multidimensional automatic – . 918 – periodic inside a subset of Nd . 974 – projection . . . . . . . . . . 1199 – recognisable – . . . . . . . . . 949 – S -recognisable – . . . . . . . . 959 – substitutive – . . . . . . . . . . 959 – syndetic – . . . . . . . . . . . 953 – test – . . . . . . . . . . . . . . . 97 – ultimately periodic – . . . . . . 948

xlix

– U -recognisable – . . . . . . . . 950 – vanishing – . . . . . . . . . . . 467 – well-ordered – . . . . . . . . . 939 shift – conjugacy . . . . . . . . . . . 990 – edge – . . . . . . . . . . . . . 989 – even – . . . . . . . . . . . . . 989 – full – . . . . . . . . . . . . . . 988 – golden mean – . . . . . . . . . 989 – higher block . . . . . . . . . . 990 – sofic – . . . . . . . . . . . 642, 989 – space . . . . . . . . . . . . 642, 988 – almost finite type – . . . . 1022 – entropy . . . . . . . . . 643, 991 – finite type – . . . . . . . . . 988 – flow equivalent – . . . . . . 996 – forbidden factor . . . . . . . 988 – in-splitting – . . . . . . . . 1011 – irreducible – . . . . . . 642, 1007 – minimal – . . . . . . . . . . 642 – morphism . . . . . . . . . . 990 – periodic – . . . . . . . . . . 642 – recognised by an automaton . 998 – transformation . . . . . . . 973, 988 – see also subshift  -algebra . . . . . . . . . . . . . 617 †1 -sentence . . . . . . . . . . . . 573 sign – header . . . . . . . . . 1192, 1203 – symbol . . . . . . . . . 1191, 1203 signature – algebraic – . . . . . . . . . . . 617 – computable implicit – . . . . . 633 –  implicit – . . . . . . . . . . . 637 – of a state . . . . . . . . . . 355, 357 – partial – . . . . . . . . . . . . 356 – tree – . . . . . . . . . . . . . . 356 simple – automaton . . . . . . . . . . . 358 – semigroup . . . . . . . . . . . 790 – transducer . . . . . . . . . . . . 81 simulation – delayed – of an ! -automaton . . 226

l

Index

– direct – of an ! -automaton . . . 225 – forward – simulation of an ! -automaton . . . . . . . . . . 225 – game for an ! -automaton . . . 225 – of limited automata . . . . . . 428 – of multi-head finite automata . . 426 – of restarting automata . . . . . 427 – of two-way finite automata . . . 425 – relation for ! -automata . . . . 225 singularity – analysis . . . . . . . . . . . . 469 – dominant – . . . . . . . . . . . 470 Skolem–Mahler–Lech theorem . . 929 sliding block – code . . . . . . . . . . . . . . 973 – map . . . . . . . . . . . . . . 990 small cancellation . . . . . . . 873, 879 Snake-DREC . . . . . . . . . . . 317 sofic shift . . . . . . . . . . . . . 989 space – Baire – . . . . . . . . . . . . . 698 – Boolean – . . . . . . . . . . . 626 – Cantor . . . . . . . . . . . . . 698 – compact – . . . . . . . . . . . 620 – hyperbolic – . . . . . . . . . . 881 – metric – . . . . . . . . . . . . 619 – Polish – . . . . . . . . . . . . 698 – uniform – . . . . . . . . 619–621 spanning tree . . . . . . . . . . . . 847 spectral vector . . . . . . . . . . . 160 split . . . . . . . . . . . . . . . . 659 – forward Ramsey – . . . . . . . 682 – normalised – . . . . . . . . . . 659 – of an automaton . . . . . . . . 724 – Ramsey – . . . . . . . . . . . 659 spoiler player . . . . . . . . . . . 226 stability relation . . . . . . . . . . 548 stable – matrix . . . . . . . . . . . . . 670 – pair . . . . . . . . . . . . . . . 548 – preorder . . . . . . . . . . . 1009 – subset . . . . . . . . . . . . . . 70 Stalling s’ construction . . 845–849, 864

stamp . . . . . . . . . . . . . . . 582 – ordered – . . . . . . . . . . . . 582 – quasi-aperiodic – . . . . . . . . 586 star – functorial- . . . . . . . . . . . 742 – height . . . . . . . . . . . 416, 440 – of a rational language . . . . . 72 – of an expression . . . . . . . . 53 – preserving homomorphism . 418 – problem . . . . . . . . . . 53, 72 – (restricted) . . . . . . . . . 417 – normal form – strong – . . . . . . . . . 422, 482 – operation . . . . . . . . . . . . . 36 starable element . . . . . . . . . . . 64 star-free language . 420, 433, 503, 600, 1080 star-normal form – of an expression . . . . . 57, 69, 73 state . . . . . . . . . . . . . 5, 81, 989 – accessible – . . . . . . . . . . . 5 – Büchi – . . . . . . . . . . . . 193 – coaccessible – . . . . . . . . . . 5 – co-Büchi – . . . . . . . . . . . 193 – complexity . 412, 1468–1469, 1477, 1483 – confluent – . . . . . . . . . . . 356 – final – . . . . . . . . . . . . 5, 81 – fusion . . . . . . . . . . . . . 355 – future . . . . . . . . . . . . . 339 – height . . . . . . . . . . . . . 357 – initial – . . . . . . . . . . . . 5, 81 – merge . . . . . . . . . . . . . 355 – mergeable – . . . . . . . . . . 355 – partial signature . . . . . . . . 356 – past . . . . . . . . . . . . . . 339 – separated – . . . . . . . . . . . 340 – signature . . . . . . . . . . 355, 357 – weak – . . . . . . . . . . . . . 193 stochastic – language . . 1463, 1471, 1473, 1477 – matrix . . . . . . . . . 1462, 1466

Index

strategy . . . . . . . . . . . . . . 274 – positional – . . . . . . . . 222, 275 – winning – . . . . . . . . . . . 274 Streett pair . . . . . . . . . . . . . 193 strictly connected component . . . 714 strongly connected component . 152, 554 – minimal – . . . . . . . . . . 1007 structure . . . . . . . . . . . . . 1034 – automatic – . . . . . . 1036, 1038 – Boolean algebra . . . . 1040, 1052 – Büchi-automatic – . . . . . . 1038 – domain . . . . . . . . . . . . 1034 – elementary substructure . . . 1060 – free – group . . . . . . . . . . . 1052 – monoid . . . . . . . . . . 1063 – integral domain . . . . 1052, 1063 – isomorphic . . . . . . . . . . 1034 – ordinal – . . 1052, 1055–1056, 1063 – power – . . . . . . . . . . . 1044 – quotient . . 1050–1051, 1063–1064 – Rabin-automatic – . . . 1036, 1038 – random graph 1052, 1054, 1063–1064 – real arithmetic – . . . . 1031, 1064 – relational – . . . . . . . . . . 1073 – signature . . . . . . . . . . . 1034 – universal automatic – . 1039, 1045 – word – . . . . . . . . . . . . 1073 subgroup – finitely generated – . . . . . . . 841 – fixed point – . . . . . . . . . . 856 – index of a – . . . . . . . . . . 842 – intersection . . . . . . . . . . . 851 – normal – . . . . . . . . . . . . 850 – [p -]pure . . . . . . . . . . . . 853 – quasi-convex – . . . . . . . . . 881 subminimal element . . . . . . . . 824 subset – automaton . . . . . . . . . . . 530 – disjunctive rational – . . . . . . 862 – pointlike – . . . . . . . . . . . 638 – recognisable – . . . . . . . . . 619 – of topological algebras . . . . 622

li

– recognised by homomorphism . 619 subshift . . . . . . . . . . . . 642, 973 – entropy . . . . . . . . . . . . . 643 – generated – . . . . . . . . . . . 973 – irreducible – . . . . . . . . . . 642 – minimal – . . . . . . . . . . . 642 – periodic – . . . . . . . . . . . 642 – sofic – . . . . . . . . . . . . . 642 – see also shift substitution . . . . . . . 529, 642, 954 – block – . . . . . . . . . . . . . 990 – erasing – . . . . . . . . . . . . 963 – good – . . . . . . . . . . . . . 968 – growing – . . . . . . . . . . . 966 – irreducible – . . . . . . . . . . 961 – of constant length . . . . . . . 529 – !˛ - – . . . . . . . . . . . . . . 970 – periodic – . . . . . . . . . . . 644 – primitive – . . . . . . . . . 642, 961 – projection . . . . . . . . . . . 963 – proper – . . . . . . . . . . . . 644 – Prouhet–Thue–Morse – . . . . 644 – sub- – . . . . . . . . . . . . . 967 subtree . . . . . . . . . . . . . . . 238 subword . . . . . . . . . . . . . . 571 successor . . . . . . . . . . . . . . 957 suffix-closed . . . . . . . . . . . . 376 sum . . . . . . . . . . . . . . . . . 3 – order . . . . . . . . . . . . . . 746 superposition . . . . . 1459, 1463, 1478 supremum of a countable sequence 712 symbol – contraction – . . . . . . . . . . 996 – expansion . . . . . . . . . . . 996 – terminal – . . . . . . . . . 246, 258 symbolic – conjugacy of automata . . . . 1011 – grammar . . . . . . . . . . . . 319 – representation . . . . . . . . 1189 synchronised automaton . . . . . 1005 synchronising – ratio . . . . . . . . . . . . . . 556

lii

– word . . . . . . . . – of a code . . . . . syndetic set . . . . . . syntactic – congruence . . . . . – of an ! -language – disambiguation . . . – graph . . . . . . . . – monoid . . . . . . . – morphism . . . . . – ! -semigroup . . . . – ranked tree algebra . – semigroup . . . . .

Index

. . . . . 1005 – with invariants – . . . . . . . 1263 . . . . . . 527 topological – algebra . . . . . . . . . . . . . 621 . . . . . . 953 – generator . . . . . . . . . . . 621 – recognisable subset . . . . . 622 . 21, 618, 1009 . . . . . . 214 – residually in a class . . . . . 622 . . . . . 1385 – self-free – . . . . . . . . . . 628 – class . . . . . . . . . . . . . . 701 . . . . . 1024 . . . . 20, 861 – space – compact – . . . . . . . . . . 620 . . . . . . . 20 – totally disconnected – . . . . 620 . . . . . . 721 – zero-dimensional – . . . . . 620 . . . . . . 805 575, 1009–1010 topology – compact-open – . . . . . . . . 627 —T— – pointwise convergence – . . . . 627 tame – profinite – . . . . . . . . . . . 622 – graph . . . . . . . . . . . . . . 637 – pro-Q – . . . . . . . . . . . . . 622 – pseudovariety . . . . . . . . . 634 – pro-V . . . . . . . . . . . . . 855 TC0 . . . . . . . . . . . . . . . . 497 transducer – deterministic – . . . . . . . . . . 81 temporal logic . . . . . . . . . . . 228 term . . . . . . . . . . . . . . . . 804 – normalised . . . . . . . . . . . . 83 – sequential – . . . . . . . . . . . 81 – applicative – . . . . . . . . . 1303 – simple – . . . . . . . . . . . . . 81 – ordering . . . . . . . . . . . . 467 test set . . . . . . . . . . . . . . . . 97 – unambiguous – . . . . . . . . . . 86 – weighted finite – (WFT) . . . 1135 theory . . . . . . . . . . . . . . 1035 – decidable – . 1032, 1035, 1037, 1039, transduction 1063 – finite-state – . . . . . . . . . . . 82 Thompson construction . . . . . . 422 – finite-valued – . . . . . . . . . . 84 Thue–Morse – rational – . . . . . . . . . . . . . 82 – sequence . . . . . . . . . . . . 914 transformation between models . . 418 – word . . . . . . . . . . . . . . 955 transition . . . . . . . . . . . . . 5, 81 tile – consecutive – . . . . . . . . . . 5 – matrix . . . . . . . . . . . 43, 999 – Wang – . . . . . . . . . . . . . 92 tiling . . . . . . . . . . . . . . . . . 92 – monoid . . . . . . . . . . . 844, 853 – aperiodic – . . . . . . . . . . . . 93 – recurrence condition . . . . . . 193 – periodic – . . . . . . . . . . . . 93 – semigroup . . . . . . . . . . 1009 – recognisability . . . . . . . . . 314 – spontaneous – . . . . . . . . . . 54 – system . . . . . . . . . . 308, 1090 – system . . . . . . . . . 1156, 1300 – deterministic – . . . . . . . . 316 transitive closure – logic . . . . . . . . . . . . . 1079 – unambiguous – . . . . . . . 309 timed automaton . . . . . . . . . 1263 – operator . . . . . . . . . . . 1079 – deterministic – . . . . . . . . 1288 tree . . . . . . . . . . . . . . 266, 1300 – diagonal-free – . . . . . . . . 1263 – automaton . . . . . . . . . . . 804

Index

– accepting run . . . . . . . . 268 – nondeterministic – (NFTA) . 239 – unambiguous – . . . . . . . 284 – bounded width . . . . . . . . 1091 – convolution – . . . . . 1036, 1038 – decomposition . . . . . . . . 1091 – domain . . . . . . . . . . . 237, 266 – factorisation – . . . . . . . . . 654 – height . . . . . . . . . . . . . 238 – history – . . . . . . . . . . . . 210 – language – NFTA-recognisable . . . . . 239 – recognisable – . . . . . . 238, 805 – regular – . . . . . . 243, 268, 805 – model property . . . . . . . . 1094 – parse – . . . . . . . . . . . . 1384 – prefix game . . . . . . . . . . 809 – ranked – . . . . . . . . . . . . 804 – regular – . . . . . . . . . . . . 287 – grammar . . . . . . . . . 246, 258 – signature . . . . . . . . . . . . 356 – spanning – . . . . . . . . . . . 847 – Sturmian – . . . . . . . . . . . 353 – unranked – . . . . . . . . . . . 819 – walking automaton (TWA) . . . 250 – width . . . . . . . . . . . . . 1091 – wreath product . . . . . . . . . 832 – see also quadtree – see also subtree trie – representation . . . . . . . . . 473 two-way – automaton . . . . . 520, 1462, 1474 – simulation . . . . . . . . . . 425 – classical head . . . . . . . . 1474 – quantum finite automaton (2QFA) . . . . . . . . . . . 1474 – quantum head . . . . . . . . 1474 type . . . . . . . . . . . . . . . 1302 – homogeneous – . . . . . . . 1334 type II conjecture . . . . . . . 636, 638

liii

—U— U-equation . . . . . . . . . . . . . 631

ultimately periodic – array . . . . . . . . . . . . . . 974 – set . . . . . . . . . . . . . . . 948 – word . . . . . . . . . . . . 190, 954 unambiguous – automaton . . . . . . . . . . . 155 – ! -automaton . . . . . . . . . . 208 – series . . . . . . . . . . . . . . 155 – tiling system . . . . . . . . . . 309 – transducer . . . . . . . . . . . . 86 – tree automaton . . . . . . . . . 284 unary – algebra – heterotypical identity . . . . 530 – homotypical identity . . . . . 530 – language . . 419–420, 425–426, 428, 437, 440, 1463, 1469–1471, 1473–1474, 1483 – operation problem . . . . 430, 435 – term . . . . . . . . . . . . . . 529 undecidability . . . . . . . 1472, 1480 uniform – algebra . . . . . . . . . . . . . 622 – morphism . . . . . . . 80, 915, 954 – space . . . . . . . . . . . 619–621 – winning strategy . . . . . . . . 274 uniformity – basis . . . . . . . . . . . . . . 619 – discrete – . . . . . . . . . . . . 622 – DLOGTIME . . . . . . . . . . 496 – polynomial-time . . . . . . . . 496 – product – . . . . . . . . . . . . 621 – profinite – . . . . . . . . . . . 622 – quotient – . . . . . . . . . . . 620 – transitive – . . . . . . . . . . . 619 union . . . . . . . . . . . . . . . 1190 unitary transformation . . . 1459, 1465 universal – automaton . . . . . . . . . . . 781 – ! -automaton . . . . . . . . . . 222

Index

liv

– cover . . . . . . – witness language universality problem unranked tree . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . 872 . . 432 . 1284 . . 819

—V— valid accepted language computation (VALC) . . . . . . . . . . . . 776 valuation . . . . . . . . . . . . . 1262 vanishing set . . . . . . . . . . . . 467 variable – free – . . . . . . . . . . . . . . 957 variety . . . . . . . . . . 617, 730, 756 – C-positive . . . . . . . . . . . 581 – generated by – a class of algebras . . . . . . 617 – a set . . . . . . . . . . . . . 599 – of languages . . . . . . . . . . 573 – positive – . . . . . . . . . . . 576 vertex . . . . . . . . . . . . . . . 989 – initial – . . . . . . . . . . . . . 989 – ramified – . . . . . . . . . . . 551 – terminal – . . . . . . . . . . . 989 visualisation . . . . . . . . . . . . 165 Vorobets–Mariya–Yaroslav theorem 901 —W— Wadge – class . . . . . . . . . . – degree . . . . . . . . . – game . . . . . . . . . . – hierarchy . . . . . . . . – order . . . . . . . . . . – rank . . . . . . . . . . Wadge–Borel determinacy . Wang – system . . . . . . . . . – tile . . . . . . . . . . . weak – alternating ! -automaton – automaton . . . . . . . – conjugation . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

701 708 705 700 701 709 706

. . . . 310 . . . . 309 . . . . 224 . . . 1205 . . . . 636

– monadic second-order logic (WMSO) . . . . . . . . . . . . 290 – recurrence condition . . . . . . 193 – state . . . . . . . . . . . . . . 193 – tameness . . . . . . . . . . . . 634 weight . . . . . . . . . . . 1386–1387 weighted – automaton . . . . . . 65, 115, 1477 – context-free grammar (WCFG) 1388 – finite automaton (WAF) . . . 1112 – average preserving – . . . . 1119 – faithful – . . . . . . . . . . 1119 – minimal – . . . . . . . . . 1122 – strongly continuous – . . . 1118 – finite transducer (WFT) . . . 1135 – monadic second-order logic . . 129 – pushdown automaton (WPDA) 1389 – rational expression . . . . . . 65, 74 – relation . . . . . . . . . . . . 1135 well quasi-order (wqo) . . . . 789–792 well-structured transition systems (WSTS) . . . . . . . . . . . 1156 winning – condition . . . . . . . . . . . . 274 – region . . . . . . . . . . . 221, 274 – strategy . . . . . . . . . . . . 274 wire . . . . . . . . . . . . . . . . 496 word . . . . . . . . . . . . . . . . . 4 – accepted – . . . . . . . . . . . . 6 – almost periodic – . . . . . . . . 978 – automatic – . . . . . . . . . . 953 – characteristic – . . . . . . . . . 954 – context . . . . . . . . . . . . 1009 – cyclically reduced – . . . . . . 843 – empty – . . . . . . . . . . . . . 4 – equation . . . . . . . . . . . . 789 – length . . . . . . . . . . . . . . 4 – Lyndon – . . . . . . . . . . . . 361 – maximal growth – . . . . . . . 966 – metric . . . . . . . . . . . . . 874 – !˛ -substitutive – . . . . . . . . 970 – periodic – . . . . . . . . . . . 954 – problem 633, 842, 873, 880, 883, 1477

Index

– generalised – . . . . – in automata groups . – over a monoid . . . – submonoid . . . . . – reduced – . . . . . . – reset – . . . . . . . . – return – . . . . . . . – structure . . . . . . . – substitutive – . . . . . – synchronising – . . . – of a code . . . . . . – Thue–Morse – . . . . – Tribonacci – . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. 842, 883 . . . 890 . . . 499 862–863 . . . 843 . . . 525 . . . 963 . . 1073 . 954, 962 . . 1005 . . . 527 . . . 955 . . . 962

– ultimately periodic – – valid – . . . . . . . – see also subword wreath product . . . . . – for trees . . . . . .

lv

. . . . 190, 954 . . . . . 1158 . 605, 887–888 . . . . . . 832

—X— XML . . . . . . . . . . . . . . . 1088 – Schema . . . . . . . . . . . . 259 —Z— zeta function . . . . . . . . . . . . 992 Zielonka automaton . . . . 1176, 1250

Handbook of Automata Theory Volume I. Theoretical Foundations

Automata theory is a subject of study at the crossroads of mathematics, theoretical computer science, and applications. In its core it deals with abstract models of systems whose behaviour is based on transitions between states, and it develops methods for the description, classification, analysis, and design of such systems. The Handbook of Automata Theory gives a comprehensive overview of current research in automata theory, and is aimed at a broad readership of researchers and graduate students in mathematics and computer science. Volume I is divided into three parts. The first part presents various types of automata: automata on words, on infinite words, on finite and infinite trees, weighted and maxplus automata, transducers, and two-dimensional models. Complexity aspects are discussed in the second part. Algebraic and topological aspects of automata theory are covered in the third part. Volume II consists of two parts. The first part is dedicated to applications of automata in mathematics: group theory, number theory, symbolic dynamics, logic, and real functions. The second part presents a series of further applications of automata theory such as message-passing systems, symbolic methods, synthesis, timed automata, verification of higher-order programs, analysis of probabilistic processes, natural language processing, formal verification of programs and quantum computing. The two volumes comprise a total of thirty-nine chapters, with extensive references and individual tables of contents for each one, as well as a detailed subject index.

https://ems.press ISBN Set 978-3-98547-006-8 ISBN Vol. I 978-3-98547-002-0

Handbook of Automata Theory Volume II Automata in Mathematics and Selected Applications Edited by Jean-Éric Pin

Editor: Jean-Éric Pin Institut de Recherche en Informatique Fondamentale (IRIF) Université de Paris and CNRS Bâtiment Sophie Germain, Case courier 7014 8 Place Aurélie Nemours 75205 Paris Cedex 13 E-mail: [email protected] Volume II: 2020 Mathematics Subject Classification: 68Q45; 03B25, 03B70, 03C13, 03D05, 11A67, 11B85, 11J81, 11U05, 20E05, 20E08, 20F10, 20F65, 20F67, 37B10, 37B20, 68Q10, 68Q12, 68Q15, 68Q17, 68Q19, 68Q45, 68Q60, 68Q70, 68Q85, 68R15, 68T50, 68U10 Keywords: free groups, Stallings automata, automatic groups, self-similar groups, automatic sequences, numeration systems, Cobham’s theorem, symbolic dynamics, synchronous automata, finite model theory, fractal image generation, communicating automata, model-checking, Church’s problem, distributed synthesis, timed automata, recursion schemes, Markov decision processes, infinite-state systems, natural language processing, temporal logic, branching time logic, quantum finite automata ISBN Vol. I ISBN Vol. II ISBN Set

978-3-98547-002-0 978-3-98547-003-7 978-3-98547-006-8 (set of both volumes)

Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. Published by EMS Press, an imprint of the European Mathematical Society – EMS – Publishing House GmbH Institut für Mathematik Technische Universität Berlin Straße des 17. Juni 136 10623 Berlin, Germany https://ems.press © 2021 EMS Press Cover drawing of Jacques de Vaucanson’s digesting duck (canard digérant) published in ­Scientific American Vol. 80 (3), 1899. Fractal tree on the first page by Nicolas Janey. Typeset using the authors’ LaTeX sources: Marco Zunino, Savona, Italy Printing and binding: Beltz Bad Langensalza GmbH, Bad Langensalza, Germany ♾ Printed on acid free paper 987654321

Dedicated to the memory of Professor Zoltán Ésik (1951–2016 )

Preface The Handbook of Automata Theory has its origins in the research programme AutoMathA (Automata: from Mathematics to Applications, 2005–2010), a multidisciplinary programme of the European Science Foundation at the crossroads of mathematics, theoretical computer science, and applications. It is designed to provide a broad audience of researchers and students in mathematics and computer science with a comprehensive overview of research in automata theory. Automata theory is one of the longest established areas in computer science. It was born over sixty years ago, with the seminal work of Kleene, who first formalised the early attempts of McCulloch and Pitts, and was originally motivated by the study of neural networks. For many years, its main applications have been computer design, compilation of programming languages, and pattern matching. But over the last twenty years, applications of automata theory have considerably diversified, and now include verification methods to cope with such emerging technical needs as network security, mobile intelligent devices, and high performance computing. At the same time, the mathematical foundations of automata theory rely on more and more advanced parts of mathematics. While only elementary graph theory and combinatorics were required in the early sixties, new tools from non-commutative algebra (semigroups, semirings and formal power series), logic, probability theory, and symbolic dynamics have been successively introduced, and the latest developments borrow ideas from topology and geometry. It was time to gather these mathematical advances and their numerous applications in a reference book. The Handbook of Automata Theory is intended to serve this purpose. It comprises thirty-nine chapters, presented in two volumes: Volume I: Theoretical foundations Volume II: Automata in mathematics and selected applications Together, the two volumes cover most of the topics related to automata. Volume I presents, in the first part, the basic models of the theory: finite automata working on finite words, infinite words, finite trees and infinite trees, transducers, weighted automata and max-plus automata, and two-dimensional models. In the second part, complexity and algorithmic issues are discussed extensively, including connections with circuit complexity and finite model theory. In the third part, the algebraic and topological aspects of automata theory are treated. Volume II first offers a wide range of connections between automata and mathematics, including group theory, number theory, symbolic dynamics, finite model theory, and fractal-type images. Secondly, selected applications are covered, including

viii

Preface

message-passing systems, symbolic methods, synthesis, timed automaton model, verification of higher-order programs, analysis of probabilistic processes, natural language processing, formal verification of programs, and quantum computing. Much of this material had never been published in a book before, making the Handbook a unique reference in automata theory. Due to the length of the Handbook, the chapters are divided into two volumes. For the convenience of the reader, the front matter and the index appear in both volumes (paginated with roman numerals). As this project started over ten years ago, some recent developments may not have been addressed. Nevertheless, the reader will be able to find updates and possible corrections on https://ems.press/isbn/978-3-98547-006-8

Acknowledgements. I would like to thank the European Science Foundation, and in particular the Standing Committee for Physical & Engineering Sciences (PESC), for funding the research programme AutoMathA within their Research Networking Programme (2005–2010). The Handbook would not have been possible without their moral and financial support. The programme AutoMathA brought together a research community of wide scope; its joint work and efforts have been vital for composing the present handbook. The AutoMathA project was initially launched by Jorge Almeida (Lisboa), Stefano Crespi Reghizzi (Milano) and myself. Let me also thank the other members of the AutoMathA Steering Committee: Jacques Duparc, Jozef Gruska, Juhani Karhumäki, Mikołaj Bojańczyk, Søren Eilers, Stuart W. Margolis, Tatiana Jajcayova, Véronique Bruyère, Werner Kuich, Wolfgang Thomas, and Zoltán Ésik. Sadly and unexpectedly, Zoltán Ésik passed away during the final stages of the Handbook project. We dedicate the Handbook to the memory of this great scientist and friend. The constant support of the AutoMathA Steering Committee during the preparation of this handbook was an invaluable help. Narad Rampersad’s assistance during the early stage of the Handbook was also particularly appreciated. All the authors are particularly indebted to Jeffrey Shallit. As one of the few native English-speaking authors of the book, Jeffrey has accepted the daunting task of reviewing all the chapters in their entirety. He not only detected a considerable number of English mistakes, but he also greatly improved the style and mathematical content of the chapters. I would therefore like to express my deepest thanks to him. The advisory board consisting of Søren Eilers (Copenhagen) and Wolfgang Thomas (Aachen) was instrumental in defining the early version of the Handbook project. I am particularly indebted to Wolfgang Thomas for his advice and constant encouragement and help during the long gestation period of this handbook. Of course, this handbook would not have been possible without the authors of the thirty-nine chapters. I would like to thank them all for their high quality scientific contribution and their perseverance during the chapter review process.

Preface

ix

For their patience and extreme care in the production of the Handbook, I would like to thank the typesetter Marco Zunino and all the people of EMS Press I have been working with: Apostolos Damialis, Sylvia Fellmann, Thomas Hintermann, Manfred Karbe, Vera Spillner, and Simon Winter. Special thanks to Nicolas Janey who kindly designed the fractal tree on the first page. Jean-Éric Pin Managing editor Paris, 2021

Contents VOLUME ONE Preface

vii

List of contributors

xvii Part I Foundations

Chapter 1

Finite automata

3

Jean-Éric Pin Chapter 2

Automata and rational expressions

39

Jacques Sakarovitch Chapter 3

Finite transducers and rational transductions

79

Tero Harju and Juhani Karhumäki Chapter 4

Weighted automata

113

Manfred Droste and Dietrich Kuske Chapter 5

Max-plus automata

151

Sylvain Lombardy and Jean Mairesse Chapter 6

!-Automata

189

Thomas Wilke (revised by Sven Schewe) Chapter 7

Automata on finite trees Christof Löding and Wolfgang Thomas

235

xii

Contents

Chapter 8

Automata on infinite trees

265

Christof Löding Chapter 9

Two-dimensional models

303

Stefano Crespi Reghizzi, Dora Giammarresi, and Violetta Lonati

Part II Complexity issues Chapter 10

Minimisation of automata

337

Jean Berstel, Luc Boasson, Olivier Carton, and Isabelle Fagnot Chapter 11

Learning algorithms

375

Henrik Björklund, Johanna Björklund, and Wim Martens Chapter 12

Descriptional complexity of regular languages

411

Hermann Gruber, Markus Holzer, and Martin Kutrib Chapter 13

Enumerating regular expressions and their languages

459

Hermann Gruber, Jonathan Lee, and Jeffrey Shallit Chapter 14

Circuit complexity of regular languages

493

Michal Koucký Chapter 15

Černý’s conjecture and the road colouring problem Jarkko Kari and Mikhail Volkov

525

Contents

xiii

Part III Algebraic and topological theory of automata Chapter 16

Varieties

569

Howard Straubing and Pascal Weil Chapter 17

Profinite topologies

615

Jorge Almeida and Alfredo Costa Chapter 18

The factorisation forest theorem

653

Thomas Colcombet Chapter 19

Wadge–Wagner hierarchies

695

Jacques Duparc Chapter 20

Equational theories for automata

729

Zoltán Ésik Chapter 21

Language equations

765

Michal Kunc and Alexander Okhotin Chapter 22

Algebra for trees

801

Mikołaj Bojańczyk

Index

xxiii

Contents

xiv

VOLUME TWO Preface

vii

List of contributors

xvii Part IV Automata in mathematics

Chapter 23

Rational subsets of groups

841

Laurent Bartholdi and Pedro V. Silva Chapter 24

Groups defined by automata

871

Laurent Bartholdi and Pedro V. Silva Chapter 25

Automata in number theory

913

Boris Adamczewski and Jason Bell Chapter 26

On Cobham’s theorem

947

Fabien Durand and Michel Rigo Chapter 27

Symbolic dynamics

987

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin Chapter 28

Automatic structures

1031

Sasha Rubin Chapter 29

Automata and finite model theory

1071

Wouter Gelade and Thomas Schwentick Chapter 30

Finite automata, image manipulation, and automatic real functions 1105 Juhani Karhumäki and Jarkko Kari

Contents

xv

Part V Selected applications Chapter 31

Communicating automata

1147

Dietrich Kuske and Anca Muscholl Chapter 32

Symbolic methods and automata

1189

Bernard Boigelot Chapter 33

Synthesis with finite automata

1217

Igor Walukiewicz Chapter 34

Timed automata

1261

Patricia Bouyer Chapter 35

Higher-order recursion schemes and their automata models

1295

Arnaud Carayol and Olivier Serre Chapter 36

Analysis of probabilistic processes and automata theory

1343

Kousha Etessami Chapter 37

Natural language parsing

1383

Mark-Jan Nederhof and Giorgio Satta Chapter 38

Verification

1415

Javier Esparza, Orna Kupferman, and Moshe Y. Vardi Chapter 39

Automata and quantum computing

1457

Andris Ambainis and Abuzer Yakaryılmaz

Index

xxiii

List of contributors

Boris Adamczewski (Chapter 25) CNRS, Université de Lyon Institut Camille Jordan 43 boulevard du 11 novembre 1918 69622 Villeurbanne Cedex France

Jason Bell (Chapter 25) Department of Pure Mathematics University of Waterloo Waterloo, ON N2L 3G1 Canada [email protected]

[email protected] Jorge Almeida (Chapter 17) CMUP, Departamento de Matemática Faculdade de Ciências Universidade do Porto Rua do Campo Alegre 687 4169-007 Porto Portugal

Jean Berstel (Chapters 10, 27) Laboratoire d’Informatique Gaspard-Monge Université Paris-Est Marne-la-Vallée 5, boulevard Descartes Champs-sur-Marne 77454 Marne-la-Vallée Cedex 2 France [email protected]

[email protected] Andris Ambainis (Chapter 39) University of Latvia Faculty of Computing Raina bulv. 19 Rīga 1586 Latvia [email protected] Laurent Bartholdi (Chapters 23, 24) Mathematisches Institut Georg-August Universität zu Göttingen Bunsenstraße 3–5 37073 Göttingen Germany [email protected] Marie-Pierre Béal (Chapter 27) Laboratoire d’Informatique Gaspard-Monge Université Paris-Est Marne-la-Vallée 5, boulevard Descartes Champs-sur-Marne 77454 Marne-la-Vallée Cedex 2 France [email protected]

Henrik Björklund (Chapter 11) Department of Computing Science Umeå University 90187 Umeå Sweden [email protected] Johanna Björklund (Chapter 11) Department of Computing Science Umeå University 90187 Umeå Sweden [email protected] Luc Boasson (Chapter 10) IRIF, Université de Paris et CNRS Bâtiment Sophie Germain Case courrier 7014 8 Place Aurélie Nemours 75205 Paris Cedex 13 France [email protected]

xviii Bernard Boigelot (Chapter 32) Institut Montefiore, B28 Université de Liège 10, Allée de la découverte 4000 Liège Belgium [email protected]

List of contributors Thomas Colcombet (Chapter 18) IRIF, Université de Paris et CNRS Bâtiment Sophie Germain Case courrier 7014 8 Place Aurélie Nemours 75205 Paris Cedex 13 France [email protected]

Mikołaj Bojańczyk (Chapter 22) MIMUW Banacha 2 02-097 Warszawa Poland [email protected] Patricia Bouyer (Chapter 34) Université Paris-Saclay CNRS, ENS Paris-Saclay Laboratoire Méthodes Formelles 91190 Gif-sur-Yvette France [email protected] Arnaud Carayol (Chapter 35) LIGM, Université Gustave Eiffel 5, boulevard Descartes Champs-sur-Marne 77454 Marne-la-Vallée Cedex 2 France Arnaud.Carayol@univ-eiffel.fr Olivier Carton (Chapter 10) IRIF, Université de Paris et CNRS Bâtiment Sophie Germain Case courrier 7014 8 Place Aurélie Nemours 75205 Paris Cedex 13 France [email protected]

Alfredo Costa (Chapter 17) CMUC, Department of Mathematics University of Coimbra Apartado 3008 EC Santa Cruz 3001-501 Coimbra Portugal [email protected] Manfred Droste (Chapter 4) Institut für Informatik Universität Leipzig Augustusplatz 10-11 04109 Leipzig Germany [email protected] Jacques Duparc (Chapter 19) Department of Operations Faculty of Business and Economics University of Lausanne 1015 Lausanne Switzerland [email protected] Fabien Durand (Chapter 26) Université de Picardie Jules Verne CNRS UMR 6140 33 rue Saint Leu 80039 Amiens Cedex 1 France [email protected]

Stefano Crespi Reghizzi (Chapter 9) Dipartimento di Elettronica Informazione e Bioingegneria Politecnico di Milano Piazza Leonardo da Vinci 32 20133 Milano Italy

Søren Eilers (Chapter 27) Institut for Matematiske Fag Københavns Universitet Universitetsparken 5 2100 København Ø Denmark

[email protected]

[email protected]

List of contributors Zoltán Ésik (Chapter 20) Javier Esparza (Chapter 38) Institut für Informatik Technische Universität München Boltzmannstraße 3 85748 Garching bei München Germany [email protected] Kousha Etessami (Chapter 36) School of Informatics University of Edinburgh 10 Crichton Street Edinburgh EH8 9AB United Kingdom [email protected] Isabelle Fagnot (Chapter 10) Laboratoire d’Informatique Gaspard-Monge Université Paris-Est Marne-la-Vallée 5, boulevard Descartes Champs-sur-Marne 77454 Marne-la-Vallée Cedex 2 France Isabelle.Fagnot@univ-eiffel.fr

xix

Tero Harju (Chapter 3) Department of Mathematics and Statistics University of Turku FI-20014 Turku Finland harju@utu.fi Markus Holzer (Chapter 12) Institut für Informatik Universität Giessen Arndtstraße 2 35392 Giessen Germany [email protected] Juhani Karhumäki (Chapters 3, 30) Department of Mathematics and Statistics University of Turku FI-20014 Turku Finland karhumak@utu.fi Jarkko Kari (Chapters 15, 30) Department of Mathematics and Statistics University of Turku FI-20014 Turku Finland jkari@utu.fi

Wouter Gelade (Chapter 29) Centre of Research in the Economics of Development (CRED) University of Namur Rempart de la Vierge, 8 5000 Namur Belgium

Michal Koucký (Chapter 14) Computer Science Institute of Charles University Malostranské nám 25 118 00 Praha 1 Czech Republic

[email protected]

[email protected]ff.cuni.cz

Dora Giammarresi (Chapter 9) Dipartimento di Matematica Università di Roma “Tor Vergata” via della Ricerca Scientifica 1 00133 Roma Italy

Michal Kunc (Chapter 21) Department of Mathematics and Statistics Masaryk University Kotlářská 2 611 37 Brno Czech Republic

[email protected] Hermann Gruber (Chapters 12, 13) Knowledgepark GmbH Leonrodstr. 68 80636 München Germany [email protected]

[email protected] Orna Kupferman (Chapter 38) School of Computer Science and Engineering Hebrew University Jerusalem 91904 Israel [email protected]

xx Dietrich Kuske (Chapters 4, 31) Institut für Theoretische Informatik Fakultät Informatik und Automatisierung Technische Universtität Ilmenau Postfach 100565 98693 Ilmenau Germany [email protected] Martin Kutrib (Chapter 12) Institut für Informatik Universität Giessen Arndtstraße 2 35392 Giessen Germany [email protected]

List of contributors Jean Mairesse (Chapter 5) LIP6 – Laboratoire d’Informatique de Paris 6 UMR 7606, CNRS Université Pierre et Marie Curie Boîte courrier 169 Tour 26, Couloir 26-00, 2è étage 4 place Jussieu 75252 Paris Cedex 05 France [email protected] Wim Martens (Chapter 11) Institut für Informatik Universität Bayreuth 95440 Bayreuth Germany [email protected]

Jonathan Lee (Chapter 13) Department of Mathematics Stanford University Building 380, Sloan Hall Stanford, CA 94305 USA Christof Löding (Chapters 7, 8) Lehrstuhl Informatik 7 RWTH Aachen 52056 Aachen Germany [email protected] Sylvain Lombardy (Chapter 5) LaBRI, Université de Bordeaux et CNRS Institut Polytechnique de Bordeaux 351 cours de la Libération 33405 Talence Cedex France [email protected] Violetta Lonati (Chapter 9) Dipartimento di Informatica Università degli Studi di Milano via Celoria 18 20100 Milano Italy [email protected]

Anca Muscholl (Chapter 31) LaBRI, Université de Bordeaux et CNRS 351 cours de la Libération 33405 Talence Cedex France [email protected] Mark-Jan Nederhof (Chapter 37) School of Computer Science University of St Andrews North Haugh St Andrews KY16 9SX United Kingdom [email protected] Alexander Okhotin (Chapter 21) Department of Mathematics and Computer Science St. Petersburg State University 14th Line V.O., 29 199178 Saint Petersburg Russian [email protected] Dominique Perrin (Chapter 27) Laboratoire d’Informatique Gaspard-Monge Université Paris-Est Marne-la-Vallée 5, boulevard Descartes Champs-sur-Marne 77454 Marne-la-Vallée Cedex 2 France [email protected]

List of contributors Jean-Éric Pin (Chapter 1) IRIF, Université de Paris et CNRS Bâtiment Sophie Germain Case courrier 7014 8 Place Aurélie Nemours 75205 Paris Cedex 13 France

Thomas Schwentick (Chapter 29) Fakultät für Informatik Technische Universität Dortmund Otto-Hahn-Straße 12 44227 Dortmund Germany [email protected]

[email protected] Michel Rigo (Chapter 26) Université de Liège Institut de Mathématiques 12 Allée de la découverte (B37) 4000 Liège Belgium

Olivier Serre (Chapter 25) IRIF, Université de Paris et CNRS Bâtiment Sophie Germain Case courrier 7014 8 Place Aurélie Nemours 75205 Paris Cedex 13 France

[email protected]

[email protected]

Sasha Rubin (Chapter 28) School of Computer Science The University of Sydney Building J12/1, Cleveland St. Camperdown NSW 2006 Australia

Jeffrey Shallit (Chapter 13) School of Computer Science University of Waterloo Waterloo, ON N2L 3G1 Canada [email protected]

[email protected] Jacques Sakarovitch (Chapter 2) IRIF, Université de Paris et CNRS Bâtiment Sophie Germain Case courrier 7014 8 Place Aurélie Nemours 75205 Paris Cedex 13 France

Pedro V. Silva (Chapter 23, 24) Centro de Matemática Faculdade de Ciências Universidade do Porto R. Campo Alegre 687 4169-007 Porto Portugal [email protected]

[email protected] Giorgio Satta (Chapter 37) Department of Information Engineering University of Padua via Gradenigo 6/A 35131 Padova Italy [email protected] Sven Schewe (Chapter 6) Department of Computer Science University of Liverpool Ashton Building Ashton Street Liverpool L69 3BX United Kingdom [email protected]

Howard Straubing (Chapter 16) Computer Science Department Boston College Chestnut Hill, MA 02467 USA [email protected] Wolfgang Thomas (Chapter 7) Lehrstuhl Informatik 7 RWTH Aachen 52056 Aachen Germany [email protected]

xxi

xxii

List of contributors

Moshe Y. Vardi (Chapter 38) Department of Computer Science Mail Stop 132 Rice University 6100 S. Main Street Houston, TX 77005-1892 USA

Thomas Wilke (Chapter 6) Department of Computer Science Christian-Albrechts-Universität zu Kiel 24098 Kiel Germany

[email protected]

Abuzer Yakaryılmaz (Chapter 39) Faculty of Computing University of Latvia Raina bulv. 19 Rīga 1586 Latvia

Mikhail Volkov (Chapter 15) Institute of Natural Sciences and Mathematics 620000 Ural Federal University Ekaterinburg Russia [email protected] Igor Walukiewicz (Chapter 33) LaBRI, Université de Bordeaux et CNRS 351 cours de la Libération 33405 Talence Cedex France [email protected] Pascal Weil (Chapter 16) LaBRI, Université de Bordeaux et CNRS 351 cours de la Libération 33405 Talence Cedex France ReLaX, CNRS IRL 2000 and Chennai Mathematical Institute SIPCOT IT Park 603103 Siruseri, Chennai India [email protected]

[email protected]

[email protected]

Part IV

Automata in mathematics

Chapter 23

Rational subsets of groups Laurent Bartholdi and Pedro V. Silva

Contents 1. 2. 3. 4.

Introduction . . . . . . . . . . . . . . . . . Finitely generated groups . . . . . . . . . . Inverse automata and Stallings’ construction Rational and recognisable subsets . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

841 841 843 857

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

866

1. Introduction Over the years, finite automata have been used effectively in the theory of infinite groups to represent rational subsets. This includes the important particular case of finitely generated subgroups (and the beautiful theory of Stallings automata for the free group case), but goes far beyond that: certain inductive procedures need a more general setting than mere subgroups, and rational subsets constitute the natural generalisation. The connections between automata theory and group theory are rich and deep, and many are portrayed in Sims’ book [60]. This chapter is divided into three parts: in § 2 we introduce basic concepts, terminology and notation for finitely generated groups, devoting special attention to free groups. These will also be used in Chapter 24. § 3 describes the use of finite inverse automata to study finitely generated subgroups of free groups. The automaton recognises elements of a subgroup, represented as words in the ambient free group. § 4 considers, more generally, rational subsets of groups, when good closure and decidability properties of these subsets are satisfied. The authors are grateful to Stuart Margolis, Benjamin Steinberg, and Pascal Weil for their remarks on a preliminary version of this text.

2. Finitely generated groups Let G be a group. Given A  G , let hAi D .A [ A 1 / denote the subgroup of G generated by A. We say that H 6 G is finitely generated and write H 6f:g: G if H D hAi for some finite subset A of H .

842

Laurent Bartholdi and Pedro V. Silva

Given H 6 G , we let ŒG W H  denote the index of H in G , that is, the number of right cosets Hg for all g 2 G ; or, equivalently, the number of left cosets. If ŒG W H  is finite, we write H 6f:i: G . It is well known that every finite-index subgroup of a finitely generated group is finitely generated (see Corollary 2.7.1 in [30] or Example 3.4). We let 1 denote the identity of G . An element g 2 G has finite order if hgi is finite. Elements g; h 2 G are conjugate if h D x 1 gx for some x 2 G . We use the notation g h D h 1 gh and Œg; h D g 1 g h to denote, respectively, conjugates and commutators. Given an alphabet A, we let A 1 denote a set of formal inverses of A, and write Az D A [ A 1 . We say that Az is an involutive alphabet. We extend 1 W A ! A 1 ; a 7! a 1 to an involution on Az  through .a 1 / 1 D a; .uv/ 1 D v 1 u 1 .a 2 A; u; v 2 Az  /: a

˙1

If G D hAi, we have a canonical epimorphism W Az   G , mapping a˙1 2 Az to 2 G . We present next some classical decidability problems:

Definition 2.1. Let G D hAi be a finitely generated group.

Word problem: is there an algorithm that, on input a word u 2 Az  , determines whether or not .u/ D 1? Conjugacy problem: is there an algorithm that, on input words u; v 2 Az  , determines whether or not .u/ and .v/ are conjugate in G ? Membership problem for K  2G : is there for every X 2 K an algorithm that, on input a word u 2 Az  , determines whether or not .u/ 2 X ?

Generalised word problem: is the membership problem for the class of finitely generated subgroups of G solvable? Order problem: is there an algorithm that, on input a word u 2 Az  , determines whether .u/ has finite or infinite order? Isomorphism problem for a class G of groups: is there an algorithm that, on input a description of groups G; H 2 G, decides whether or not G Š H ? Typically, G may be a subclass of finitely presented groups (given by their presentation), or automata groups (see Chapter 24) given by automata. We can also require complexity bounds on the algorithms; more precisely, we may ask with which complexity bound an answer to the problem may be obtained, and also with which complexity bound a witness (a normal form for the word problem, an element conjugating .u/ to .v/ in case they are conjugate, an expression of u in the generators of X in the membership problem) may be constructed. 2.1. Free groups. We recall that an equivalence relation  on a semigroup S is a congruence if a  b implies ac  bc and ca  cb for all a; b; c 2 S .

Definition 2.2. Given an alphabet A, let  denote the congruence on Az  generated by the relation ¹.aa 1 ; 1/ j a 2 Az º: (1)

23. Rational subsets of groups

843

The quotient FA D Az  = is the free group on A. We let W Az  ! FA denote the canonical morphism u 7! Œu . Free groups admit the following universal property: for every map f W A ! G , there is a unique group morphism FA ! G that extends f . Alternatively, we can view (1) as a confluent length-reducing rewriting system on Az  , where each word w 2 Az  can be transformed into a unique reduced word wN with no factor of the form aa 1 ; see [10]. As a consequence, the equivalence u  v () uN D vN

solves the word problem for FA .

.u; v 2 Az  /

We shall use the notation RA D Az  . It is well known that FA is isomorphic to RA under the binary operation u ? v D uv

.u; v 2 RA /:

We recall that the length jgj of g 2 FA is the length of the reduced form of g, also denoted by gN . The letters of A provide a natural basis for FA : they generate FA and satisfy no nontrivial relations, that is, all reduced words on these generators represent distinct elements of FA . A group is free if and only if it has a basis. Throughout this chapter, we assume A to be a finite alphabet. It is well known that free groups FA and FB are isomorphic if and only if Card A D Card B , see Theorem 2.4 in [30]. This leads to the concept of rank of a free group F : the cardinality of a basis of F , denoted by rk F . It is common to use the notation Fn to denote a free group of rank n. We recall that a reduced word u is cyclically reduced if uu is also reduced. Any reduced word u 2 RA admits a unique decomposition of the form u D vwv 1 with w cyclically reduced. A solution for the conjugacy problem easily follows from this: first reduce the words cyclically; then two cyclically reduced words in RA are conjugate if and only if they are cyclic permutations of each other. On the other hand, the order problem admits a trivial solution: only the identity has finite order. Finally, the generalised word problem will be discussed in the following section.

3. Inverse automata and Stallings’ construction The study of finitely generated subgroups of free groups entered a new era in the early eighties when Stallings made explicit and effective a construction [61] that can be traced back to the early part of the twentieth century in Schreier’s coset graphs (see [60] and § 2 of Chapter 24) and to Serre’s work [52]. Stallings’ seminal paper was built on immersions of finite graphs, but the alternative approach using finite inverse automata became much more popular over the years; for more on their link, see [32]. An extensive survey has been written by Kapovich and Miasnikov [24].

844

Laurent Bartholdi and Pedro V. Silva

Stallings’ construction for H 6f:g: FA consists in taking a finite set of generators for H in reduced form, building the so-called flower automaton and then proceeding to make this automaton deterministic through the operation known as Stallings foldings. This turns out to be a terminating procedure, but the key fact is that the construction is independent from both the given finite generating set and the chosen folding sequence. A short simple automata-theoretic proof of this claim will be given. The finite inverse automaton S.H / thus obtained is usually called the Stallings automaton of H . Over the years, Stallings automata became the standard representation for finitely generated subgroups of free groups and are involved in many of the algorithmic results presently obtained. Several of these algorithms are implemented in computer software, see, e.g., CRAG [2], or the packages Automata and FGA in GAP [17]. 3.1. Inverse automata. An automaton A is trim if every vertex lies in some successful path (i.e. a path starting at some initial vertex and ending at some final vertex). An automaton A over an involutive alphabet Az is involutive if, whenever .p; a; q/ is an edge of A, so is .q; a 1 ; p/. Therefore it suffices to just depict the positively labelled edges (having label in A) in their graphical representation. Definition 3.1. An involutive automaton is inverse if it is deterministic, trim and has a single final vertex. If the latter happens to be the initial vertex, it is called the basepoint. It easily follows from the computation of the Nerode equivalence (see Chapter 10, §2) that every inverse automaton is a minimal automaton. Finite inverse automata capture the idea of an action (of a finite inverse monoid, their transition monoid) on a finite set (the vertex set) through partial bijections. We recall that a monoid M is inverse if, for every x 2 M , there exists a unique y 2 M such that xyx D x and y D yxy ; then M acts by partial bijections on itself. The next result is easily proven, but is quite useful. Proposition 3.1. Let A be an inverse automaton and let p uw Then there also exists a path p ! q in A.

uvv

1w

! q be a path in A.

Another important property relates languages to morphisms. For us, a morphism between deterministic automata A and A0 is a mapping ' between their respective vertex sets which preserves the initial vertex, final vertices, and edges, in the sense that .'.p/; a; '.q// is an edge of A0 whenever .p; a; q/ is an edge of A.

Proposition 3.2. Given inverse automata A and A0 , then L.A/  L.A0 / if and only if there exists a morphism 'W A ! A0 . Moreover, such a morphism is unique. Proof. .H)/ Given a vertex q of A, take a successful path u

v

q0 ! q ! t

in A, for some u; v 2 Az  . Since L.A/  L.A0 /, there exists a successful path u

in A0 . We take '.q/ D q 0 .

v

q00 ! q 0 ! t 0

23. Rational subsets of groups

845

To show that ' is well defined, suppose that u0

v0

q0 ! q ! t

is an alternative successful path in A. Since u0 v 2 L.A/  L.A0 /, there exists a successful path u0 v q00 ! q 00 ! t 0 in A0 and it follows that q 0 D q 00 since A0 is inverse. Thus ' is well defined. It is now routine to check that ' is a morphism from A to A0 and that it is unique. .(H/ It is immediate from the definition of morphism.

3.2. Stallings’ construction. Let X be a finite subset of RA . We build an involutive automaton F.X / by fixing a basepoint q0 and gluing to it a petal labelled by every word in X as follows: if x D a1    ak 2 X , with ai 2 Az, the petal consists of a closed path of the form ak a1 a2 q0 !  !    ! q0 and the respective inverse edges. All such intermediate vertices  are assumed to be distinct in the automaton. For obvious reasons, F.X / is called the flower automaton of X . The automaton F.X / is almost an inverse automaton – except that it need not be deterministic. We can fix it by performing a sequence of so-called Stallings foldings. Assume that A is a trim involutive automaton with a basepoint, possessing two distinct edges of the form a a p ! q; p ! r (2)

for a 2 Az. The folding is performed by identifying these two edges, as well as the two respective inverse edges. In particular, the vertices q and r are also identified (if they were distinct). The number of edges is certain to decrease through foldings. Therefore, if we perform enough of them, we are sure to turn F.X / into a finite inverse automaton. Definition 3.2. The Stallings automaton of X is the finite inverse automaton S.X / obtained through folding F.X /. We shall see that S.X / depends only on the finitely generated subgroup hX i of FA generated by X , being in particular independent from the choice of foldings taken to reach it. Since inverse automata are minimal, it suffices to characterise L.S.X // in terms of H to prove uniqueness (up to isomorphism): Proposition 3.3. Fix H 6f:g: FA and let X  RA be a finite generating set for H . Then \ L.S.X // D ¹L  Az  j L is recognised by a finite inverse automaton with a basepoint and Hx  Lº:

Laurent Bartholdi and Pedro V. Silva

846

Proof. .  / Clearly, S.X / is a finite inverse automaton with a basepoint. Since X [ X 1  L.F.X //  L.S.X //, it easily follows from Proposition 3.1 that x  L.S.X //: H (3)

.  / Let L  Az  be recognised by a finite inverse automaton A with a basepoint, with Hx  L. Since X  Hx , we have an automaton morphism from F.X / to A, hence L.F.X //  L. To prove that L.S.X //  L, it suffices to show that inclusion in L is preserved through foldings. Indeed, assume that L.B/  L and B0 is obtained from B by folding the two u edges in (2). It is immediate that every successful path q0 ! t in B0 can be lifted to v a successful path q0 ! t in B by successively inserting the word a 1 a into u. Now v 2 L D L.A/ implies u 2 L in view of Proposition 3.1.

Now, given H 6 FA finitely generated, we take a finite set X of generators. Without loss of generality, we may assume that X consists of reduced words, and we may define S.H / D S.X / to be the Stallings automaton of H .

Example 3.1. Stallings’ construction for X D ¹a 1 ba; ba2 º, where the next edges to be identified are depicted by dotted lines, is

a F.X / D

a q0

b

a

a

b

b a

a q0

b a

a q0

b

D S.X /

a

b

A simple, yet important example is given by applying the construction to Fn itself, when we obtain the so-called bouquet of n circles: c a

q0 S.F1 /

a

q0 S.F2 /

b

a

q0 S.F3 /

b

23. Rational subsets of groups

847

In terms of complexity, the best known algorithm for the construction of S.X / is due to Touikan [63]. Its time complexity is O.n log n/, where n is the sum of the lengths of the elements of X . 3.3. Basic applications. The most fundamental application of Stallings’ construction is an elegant and efficient solution to the generalised word problem: Theorem 3.4. The generalised word problem in FA is solvable. We will see many groups in Chapter 24 that have solvable word problem; however, few of them have a solvable generalised word problem. The proof of Theorem 3.4 relies on Proposition 3.5. Consider H 6f:g: FA and u 2 FA . Then u 2 H if and only if uN 2 L.S.H //.

Proof. .H)/ Follows from (3). .(H/ It easily follows from the last paragraph of the proof of Proposition 3.3 that, if B0 is obtained from B by performing Stallings foldings, then L.B0 / D L.B/. Hence, if H D hX i, we get L.S.H // D L.F.X // D .X [ X

1 /

and the implication follows.

x DH

It follows from our previous remark that the complexity of the generalised word problem is O.n log n C m/, where n is the sum of the lengths of the elements of X and m is the length of the input word. In particular, once the subgroup X has been fixed, the complexity is linear in m. Example 3.2. We may use the Stallings automaton constructed in Example 3.1 to check that baba 1 b 1 2 H D ha 1 ba; ba2 i but ab … H .

Stallings automata also provide an effective construction for bases of finitely generated subgroups. Consider H 6f:g: FA , and let m be the number of vertices of S.H /. A spanning tree T for S.H / consists of m 1 edges and their inverses which, together, connect all the vertices of S.H /. Given a vertex p of S.H /, we denote by gp the T -gegp odesic connecting the basepoint q0 to p , that is, q0 ! p is the shortest path contained in T connecting q0 to p . Proposition 3.6. Let H 6f:g: FA and let T be a spanning tree for S.H /. Let EC be the set of positively labelled edges of S.H /. Then H is free with basis Y D ¹gp agq 1 j .p; a; q/ 2 EC n T º:

Proof. It follows from Proposition 3.5 that L.S.H //  H , hence Y  H . To show that H D hY i, take h D a1    ak 2 H in reduced form .ai 2 Az /. By Proposition 3.5, there exists a successful path a1

a2

ak

q0 ! q1 !    ! qk D q0

848

Laurent Bartholdi and Pedro V. Silva

in S.H /. For i D 1; : : : ; k , we have either gqi 1 ai gqi1 2 Y [ Y the latter occurring if .qi 1 ; ai ; qi / 2 T . In any case, we get h D a1    ak D .gq0 a1 gq11 /.gq1 a2 gq21 /    .gqk

1

1

or gqi

1

ai gqi1 D 1,

ak gq01 / 2 hY i

and so H D hY i. It remains to show that the elements of Y satisfy no nontrivial relations. Let y1 ; : : : ; yk 2 Y [ Y 1 with yi ¤ yi 11 for i D 2; : : : ; k . Write yi D gpi ai gri 1 , where ai 2 Az labels the edge not in T . It easily follows from yi ¤ yi 11 and the definition of spanning tree that y1    yk D gp1 a1 gr11 gp2 a2    ak

1 1 grk

1

gpk ak grk ;

a nonempty reduced word if k > 1. Therefore Y is a basis of H as claimed. In the process, we also obtain a proof of the Nielsen–Schreier Theorem, in the case of finitely generated subgroups. A simple topological proof may be found in [42]: Theorem 3.7 (Nielsen and Schreier). Every subgroup of a free group is itself free. Example 3.3. We use the Stallings automaton constructed in Example 3.1 to construct a basis of H D ha 1 ba; ba2 i. If we take the spanning tree T defined by the dotted lines in b a q0

a

b

then Card EC n T D 2 and the corresponding basis is ¹ba2 ; baba 1 b 1 º. Another choice of spanning tree actually proves that the original generating set is also a basis. We remark that Proposition 3.6 can be extended to the case of infinitely generated subgroups, proving the general case of Theorem 3.7. However, in this case there is no effective construction such as Stallings’, and the (infinite) inverse automaton S.H / remains a theoretical object, using appropriate cosets as vertices. Example 3.4. For H 6f:i FA , the inverse automaton S.H / coincides with the Schreier graph (see § 2 of Chapter 24) of H nFA , namely the graph with vertex set H nFA D ¹Hg j g 2 FA º, with an edge from Hg to Hga for each vertex Hg and each generator a 2 Az. Since this graph is finite, this proves that H is finitely generated, and therefore that finite-index subgroups of finitely generated groups are finitely generated. Another classical application of Stallings’ construction regards the identification of finite-index subgroups.

23. Rational subsets of groups

849

Proposition 3.8. Consider H 6f:g: FA .

i. H is a finite-index subgroup of FA if and only if S.H / is a complete automaton. ii. If H is a finite-index subgroup of FA , then its index is the number of vertices of S.H /. Proof. (i, H) ) Suppose that S.H / is not complete. Then there exist some vertex q and some a 2 Az such that q  a is undefined. Let g be a geodesic connecting the basepoint q0 to q in S.H /. We claim that Hgam ¤ Hgan

if m

(4)

n > jgj:

Indeed, Hgam D Hgan implies gam n g 1 2 H and so gam n g 1 2 L.S.H // by Proposition 3.5. Since ga is reduced due to S.H / being inverse, it follows from m n > jgj that gaam n 1 g 1 D gam n g 1 2 L.S.H //: indeed, g 1 is not long enough to erase all the a’s. Since S.H / is deterministic, q  a must be defined, a contradiction. Therefore (4) holds and so H has infinite index. gq

(i, (H ) Let Q be the vertex set of S.H / and fix a geodesic q0 ! q for each q 2 Q. u Take u 2 FA . Since S.H / is complete, we have a path q0S! q for some q 2 Q. Hence ugq 1 2 H and so u D ugq 1 gq 2 Hgq . Therefore FA D q2Q Hgq and so H 6f:i: FA . S (ii) In view of FA D q2Q Hgq , it suffices to show that the cosets Hgq are all distinct. Indeed, assume that Hgp D Hgq for some vertices p; q 2 Q. Then gp gq 1 2 H and so gp gq 1 2 L.S.H // by Proposition 3.5. On the other hand, since S.H / is complete, we have a path q0

gp gq 1

!r

for some r 2 Q. In view of Proposition 3.1, and by determinism, we get r D q0 . Hence we have paths p

gq 1

!q0 ;

q

gq 1

!q0 :

Since S.H / is inverse, we get p D q as required. Example 3.5. Since the Stallings automaton constructed in Example 3.1 is not complete, it follows that ha 1 ba; ba2 i is not a finite-index subgroup of F2 . Corollary 3.9. If H 6 FA has index n, then rk H D 1 C n.Card A

1/.

Proof. By Proposition 3.8, the automaton S.H / has n vertices and n Card A positive edges. A spanning tree has n 1 positive edges, so rk H D n Card A .n 1/ D 1 C n.Card A 1/ by Proposition 3.6. Beautiful connections between finite-index subgroups and certain classes of bifix codes – set of words none of which is a prefix or a suffix of another – have recently been unveiled by Berstel, De Felice, Perrin, Reutenauer and Rindone [6].

850

Laurent Bartholdi and Pedro V. Silva

3.4. Conjugacy. We start now a brief discussion of conjugacy. Recall that the outdegree of a vertex q is the number of edges starting at q ; for inverse automata, it equals the indegree. The geodesic distance in a connected graph is the length of the shortest undirected path connecting two vertices. Since the original generating set is always taken in reduced form, it easily follows that there is at most one vertex in a Stallings automaton having outdegree < 2: the basepoint q0 . Assuming that H is nontrivial, S.H / must always be of the form  q0

u

q1

 

where q1 is either q0 or the closest vertex to q0 (in terms of geodesic distance) having u outdegree > 2. Note that q1 D q0 if q0 has outdegree > 2 itself. We call q0 ! the tail (which is empty if q1 D q0 ) and the remaining subgraph the core of S.H /. The Schreier graph of H nFA and S.H / are related as follows: the Schreier graph consists of finitely many trees attached to the core of S.H /. Theorem 3.10. There is an algorithm that decides whether or not two finitely generated subgroups of FA are conjugate. Proof. Finitely generated subgroups G; H are conjugate if and only if the cores of S.G/ and S.H / are equal (up to change of basepoint). The Stallings automata of the conjugates of H can be obtained in the following alternative ways: (1) declaring a vertex in the core of S.H / to be the basepoint; (2) gluing a tail to some vertex in the core and taking its other endpoint to be the basepoint. Note that the tail must be glued in some way that keeps the automaton inverse, so in particular this second type of operation can only be performed if the automaton is not complete, or equivalently, if H has infinite index. An immediate consequence is the following classical result. Proposition 3.11. A finitely generated normal subgroup of a free group is either trivial or has finite index. Moreover, a finite-index subgroup H is normal if and only if its Stallings automaton is vertex-transitive, that is, if all choices of basepoint yield isomorphic automata.

23. Rational subsets of groups

851

Example 3.6. Stallings automata of some conjugates of H D ha 1 ba; ba2 i:

b a

S.H / D q0

b

a

S.b

1

a

H b/ D

b

a

b

q0

b S.b

2

H b2/ D

a

a

b

b

q0

We can also use the previous discussion on the structure of (finite) Stallings automata to provide them with an abstract characterisation. Proposition 3.12. A finite inverse automaton with a basepoint is a Stallings automaton if and only if it has at most one vertex of outdegree 1: the basepoint. Proof. Indeed, for any such automaton we can take a spanning tree and use it to construct a basis for the subgroup as in the proof of Proposition 3.6. 3.5. Further algebraic properties. The study of intersections of finitely generated subgroups of FA provides further applications of Stallings automata. Howson’s classical theorem admits a simple proof using the direct product of two Stallings automata; it is also an immediate consequence of Theorem 4.1 and Corollary 4.4 (ii). Theorem 3.13 (Howson). If H; K 6f:g: FA , then also H \ K 6f:g: FA . Stallings automata are also naturally related to the famous Hanna Neumann Conjecture, recently proved by Mineyev [38] and Friedman [16]: given H; K 6f:g: FA , then rk.H \ K/ 1 6 .rk H 1/.rk K 1/. The conjecture arose in a paper of Hanna Neumann [40], where the inequality rk.H \ K/ 1 6 2.rk H 1/.rk K 1/ was also proved. In one of the early applications of Stallings’ approach, Gersten provided an alternative geometric proof of Hanna Neumann’s inequality [18]. A free factor of a free group FA can be defined as a subgroup H generated by a subset of a basis of FA . This is equivalent to saying that there exists a free product decomposition FA D H  K for some K 6 FA .

852

Laurent Bartholdi and Pedro V. Silva

Since the rank of a free factor never exceeds the rank of the ambient free group, it is easy to construct examples of subgroups which are not free factors: it easily follows from Proposition 3.6 that any free group of rank > 2 can have subgroups of arbitrary finite rank (and even infinite countable). The problem of identifying free factors has a simple solution based on Stallings automata [57]: one must check whether or not a prescribed number of vertex identifications in the Stallings automaton can lead to a bouquet. However, the most efficient solution, due to Roig, Ventura and Weil [46], involves Whitehead automorphisms and will therefore be postponed to § 3.7. Given a morphism 'W A ! B of inverse automata, let the morphic image '.A/ be the subautomaton of B induced by the image under ' of all the successful paths of A. The following classical result characterises the extensions of H 6f:g: FA contained in FA . We present the proof from [37]: Theorem 3.14 (Takahasi [62]). Given H 6f:g: FA , one can effectively compute finitely many extensions K1 ; : : : ; Km 6f:g: FA of H such that the following conditions are equivalent for every K 6f:g: FA : i. H 6 K ; ii. Ki is a free factor of K for some i 2 ¹1; : : : ; mº.

Proof. Let A1 ; : : : ; Am denote all the morphic images of S.H /, up to isomorphism. Since a morphic image cannot have more vertices than the original automaton, there are only finitely many isomorphism classes. Moreover, it follows from Proposition 3.12 that, for i D 1; : : : ; m, Ai D S.Ki / for some Ki 6f:g: FA . But then, since we have L.S.H //  L.Ai / D L.S.Ki //, it follows from Proposition 3.5 that H 6 Ki . Clearly, we can construct all Ai and therefore all Ki . (i) H) (ii). If H 6 K , it follows from Stallings’ construction that L.S.H //  L.S.K// and so there is a morphism 'W S.H / ! S.K/ by Proposition 3.2. Let Ai be, up to isomorphism, the morphic image of S.H / through ' . Since Ai D S.Ki / is a subautomaton of S.K/, it easily follows from Proposition 3.6 that Ki is a free factor of K : it suffices to take a spanning tree for S.Ki /, extend it to a spanning tree for S.K/, and the induced basis of Ki will be contained in the induced basis of K . (ii) H) (i) is immediate.

An interesting research line related to this result is built on the concept of algebraic extension, introduced by Kapovich and Miasnikov [24], and inspired by the homonymous field-theoretical classical notion. Given H 6 K 6 FA , we say that K is an algebraic extension of H if no proper free factor of K contains H . Miasnikov, Ventura and Weil [37] proved that the set of algebraic extensions of H is finite and effectively computable, and it constitutes the minimum set of extensions K1 ; : : : ; Km satisfying the conditions of Theorem 3.14. Consider a subgroup H of a group G . The commensurator of H in G , is

CommG .H / D ¹g 2 G j H \ H g has finite index in H and H g º:

For example, the commensurator of GLn .Z/ in GLn .R/ is GLn .Q/.

(5)

23. Rational subsets of groups

853

The special case of finite-index extensions, H 6f:i: K 6 FA is of special interest, and can be interpreted in terms of commensurators. It can be proved (see Lemma 8.7 in [24] and [59]) that every H 6f:g: FA has a maximum finite-index extension inside FA , denoted by Hf:i: ; and Hf:i: D CommFA .H /. Silva and Weil [59] proved that S.Hf:i: / can be constructed from S.H / using a simple automata-theoretic algorithm: 1. the standard minimisation algorithm is applied to the core of S.H /, taking all vertices as final; 2. the original tail of S.H / is subsequently reinstated in this new automaton, at the appropriate vertex. We present now an application of different type, involving transition monoids. It easily follows from the definitions that the transition monoid of a finite inverse automaton is always a finite inverse monoid. Given a group G , we say that a subgroup H 6 G is pure if the implication g n 2 H H) g 2 H

(6)

holds for all g 2 FA and n > 1. If p is a prime, we say that H is p -pure if (6) holds whenever .n; p/ D 1. The next result is due to Birget, Margolis, Meakin, and Weil, and relates these properties of subgroups to those of the transition monoid of their Stallings automaton [8]; they also show that these problems are PSPACE-complete. Proposition 3.15. For every H 6f:g: FA ,

i. H is pure if and only if the transition monoid of S.H / is aperiodic; ii. H is p -pure if and only if the transition monoid of S.H / has no subgroups of order p . Proof. Both conditions in (i) are easily proved to be equivalent to the nonexistence in S.H / of a cycle of the form u q1

q2

.k > 1; q1 ¤ q2 /

uk

where u can be assumed to be cyclically reduced. The proof of (ii) runs similarly. 3.6. Topological properties. We require for this subsection some basic topological concepts, which the reader can recover from Chapter 17. For all u; v 2 FA , written in reduced form as elements of RA , let u ^ v denote the longest common prefix of u and v . The prefix metric d on FA is defined, for all u; v 2 FA , by ² 2 ju^vj if u ¤ v; d.u; v/ D 0 if u D v:

854

Laurent Bartholdi and Pedro V. Silva

It easily follows from the definition that d is an ultrametric on FA , satisfying in particular the axiom d.u; v/ 6 max¹d.u; w/; d.w; v/º:

The completion of this metric space is compact; its extra elements are infinite reduced words a1 a2 a3    , with all ai 2 Az, and constitute the hyperbolic boundary @FA of FA ; see Chapter 24, § 2.5. Extending the operator ^ to FA [@FA in the obvious way, it easily follows from the definitions that, for every infinite reduced word ˛ and every sequence .un /n in FA , ˛ D lim un () n!C1

lim j˛ ^ un j D C1:

n!C1

(7)

The next result shows that Stallings automata are given a new role in connection with the prefix metric. We let cl H denote the closure of H in the completion of FA . Proposition 3.16. If H 6f:g: FA , then cl H is the union of H with the set of all ˛ 2 @FA that label paths in S.H / out of the basepoint. Proof. Since the topology of FA is discrete, we have cl H \ FA D H . .  / If ˛ 2 @FA does not label a path in S.H / out of the basepoint, then ¹j˛^hjW h 2 H º is finite and so no sequence of H can converge to ˛ by (7). .  / Let ˛ D a1 a2 a3    2 @FA , with ai 2 Az, label a path in S.H / out of the basepoint. Let m be the number of vertices of S.H /. For every n > 1, there exists some word wn of length < m such that a1    an wn 2 H . Now ˛ D limn!C1 a1    an wn by (7) and so ˛ 2 cl H .

The profinite topology on FA is defined in Chapter 17: for every u 2 FA , the collection ¹Ku j K 6f:i: FA º constitutes a basis of clopen neighbourhoods of u. In his seminal 1983 paper [61], Stallings gave an alternative proof of Marshall Hall’s theorem: Theorem 3.17 (M. Hall). Every finitely generated subgroup of FA is closed for the profinite topology. Proof. Fix H 6f:g: FA and let u 2 FA n H be written in reduced form as an element of RA . In view of Proposition 3.5, u does not label a loop at the basepoint q0 of S.H /. If u there is no path q0 !    in S.H /, we add new edges to S.H / to get a finite inverse u automaton A having a path q0 ! q ¤ q0 . Otherwise just take A D S.H /. Next add new edges to A to get a finite complete inverse automaton B. In view of Propositions 3.8 and 3.12, we have B D S.K/ for some K 6f:i: FA . Hence Ku is open and contains u. Since H \ Ku ¤ ; yields u 2 K 1 H D K , contradicting Proposition 3.5, it follows that H \ Ku D ; and so H is closed as claimed.

23. Rational subsets of groups

855

Example 3.7. We consider the above construction for H D ha 1 ba; ba2 i and u D b 2 : b a

S.H / D q0

b

a

a

AD q0

b

a

b

b

b a

BD

a q0

a b

b

a

b

If we take the spanning tree defined by the dotted lines in B, it follows from Proposition 3.6 that K D hba 1 ; b 3 ; b 2 ab 2 ; ba2 ; baba 1 b 1 i

is a finite-index subgroup of F2 such that H \ Kb 2 D ;.

We recall that a group G is residually finite if its set of finite-index subgroups has trivial intersection. Considering the trivial subgroup in Theorem 3.17, we deduce: Corollary 3.18. FA is residually finite. We remark that Ribes and Zalesski˘ı extended Theorem 3.17 to products of finitely many finitely generated subgroups of FA ; see [44]. This result is deeply connected to the solution of Rhodes’ Type II conjecture; see Chapter 4 of [43]. If V denotes a pseudovariety of finite groups (see Chapter 16), the pro-V topology on FA is defined by considering that each u 2 FA has ¹Ku j K Ef:i: FA ; FA =K 2 V º

as a basis of clopen neighbourhoods. The closure for the pro-V topology of H 6f:g FA can be related to an extension property of S.H /, and Margolis, Sapir and Weil used automata to prove that efficient computation can be achieved for the pseudovarieties of finite p -groups and finite nilpotent groups [34]. The original computability proof for the p -group case is due to Ribes and Zalesski˘ı [45]. 3.7. Dynamical properties. We briefly mention some examples of applications of Stallings automata to the study of endomorphism dynamics, starting with Gersten’s solution of the subgroup orbit problem [19].

Laurent Bartholdi and Pedro V. Silva

856

The subgroup orbit problem consists of finding an algorithm to decide, for given H; K 6f:g: FA , whether or not K D '.H / for some automorphism ' of FA . Equivalently, this can be described as deciding whether or not the automorphic orbit of a finitely generated subgroup is recursive. Gersten’s solution adapts to the context of Stallings automata the strategy designed by Whitehead in [66] for solving the orbit problem for words. Whitehead’s proof relies on a suitable decomposition of automorphisms as products of elementary factors (which became known as Whitehead automorphisms), and using these as a tool to compute the elements of minimum length in the automorphic orbit of the word. In the subgroup case, word length is replaced by the number of vertices of the Stallings automaton. The most efficient solution to the problem of identifying free factors [46], mentioned in § 3.5, also relies on this approach: H 6f:g: FA is a free factor if and only if the Stallings automaton of some automorphic image of H has a single vertex (that is, a bouquet). Another very nice application is given by the following theorem of Goldstein and Turner [20], which holds more generally for homomorphisms H ! FA with H 6f:i: FA . Theorem 3.19. The fixed point subgroup of an endomorphism of FA is finitely generated. Proof. Let ' be an endomorphism of FA . For every u 2 FA , define Q.u/ D '.u/ 1 u. We define a potentially infinite automaton A by taking ¹Q.u/ j u 2 FA º  FA

a as the vertex set, all edges of the form Q.u/ ! Q.ua/ with u 2 FA ; a 2 Az, and fixing 1 as the basepoint. Then A is a well-defined inverse automaton, with initial state Q.1/ D 1. Next we take B to be the subautomaton of A obtained by retaining only those vertices and edges that lie in successful paths labelled by reduced words. Clearly, B is still an inverse automaton, and it is easy to check that it must be the Stallings automaton of the fixed point subgroup of ' . It remains to be proved that B is finite. We define a subautomaton C of B by removing exactly one edge among each inverse pair a

Q.u/ ! Q.ua/;

Q.ua/

a

1

!Q.u/

a

with a 2 A as follows: if a is the last letter of Q.ua/, we remove Q.u/ ! Q.ua/; a

1

otherwise, we remove Q.ua/ !Q.u/. Let M denote the maximum length of the image of a letter by ' . We claim that, whenever jQ.u/j > M C 1, the vertex Q.u/ has outdegree 1. Indeed, consider an edge a Q.u/ ! Q.ua/ in C with a 2 A. Then the final letter cancels in '.a/ 1 Q.u/a, and '.a/ 1 Q.u/ ends with the last letter of Q.u/, so this letter must be a 1 . Similarly, the edge Q.u/

a

1

!Q.ua

1

/ belongs to C only if a is the last letter of Q.u/.

23. Rational subsets of groups

857

Therefore, C consists of a bounded “kernel,” consisting of finitely many vertices Q.u/ with jQ.u/j 6 M C 1, and finitely many paths from the kernel to the kernel. All in all C is a finite automaton so the fixed point subgroup of ' is finitely generated. Note that this proof is not constructive by any means. Bogopolski and Maslakova give in [9] an algorithm that computes the fixed point subgroup of a free group automorphism; it relies on the sophisticated train track theory of Bestvina and Handel [7] and other algebraic geometry tools. The general endomorphism case remains open. Stallings automata were also used by Ventura in the study of various properties of fixed subgroups, considering in particular arbitrary families of endomorphisms, see [64] and [36] (also see [65]). Automata also play a part in the study of infinite fixed points. In [55], these are considered for the continuous extension of a virtually injective endomorphism to the hyperbolic boundary of a virtually free group.

4. Rational and recognisable subsets Rational subsets generalise the notion of “finitely generated” from subgroups to arbitrary subsets of a group, and can be quite useful in establishing inductive procedures that need to go beyond the territory of subgroups. Similarly, recognisable subsets extend the notion of finite-index subgroups. Basic properties and results can be found in [5] or [49]. We consider a finitely generated group G D hAi, with the canonical map W FA ! G . A subset of G is rational if it is the image by  D  of a rational subset of Az  , and is recognisable if its full preimage under  is rational in Az  . For every group G , the classes Rat G and Rec G satisfy the following closure properties:  Rat G is (effectively) closed under union, product, star, morphisms, inversion, subgroup generating;  Rec G is (effectively) closed under boolean operations, translation, product, star, inverse morphisms, inversion, subgroup generating.

Kleene’s Theorem is not valid for groups: Rat G D Rec G if and only if G is finite. However, if the class of rational subsets of G possesses some extra algorithmic properties, then many decidability/constructibility results can be deduced for G . Two properties are particularly coveted for Rat G :  (effective) closure under complement (yielding closure under all the boolean operations);  decidable membership problem for arbitrary rational subsets.

In these cases, one may often solve problems (e.g., equations, or systems of equations) whose statement lies far out of the rational universe, by proving that the solution is a rational set.

858

Laurent Bartholdi and Pedro V. Silva

4.1. Rational and recognisable subgroups. We start by some basic, general facts. The following result is essential to connect language theory to group theory. Theorem 4.1 (Anisimov and Seifert). A subgroup H of a group G is rational if and only if H is finitely generated. Proof. (H) ) Let H be a rational subgroup of G and let W FA ! G denote a morphism. Then there exists a finite Az-automaton A such that H D .L.A//. Assume that A has m vertices and let X consist of all the words in  1 .H / of length < 2m. Since A is finite, so is X . We claim that H D h.X /i. To prove it, it suffices to show that

u 2 L.A/ H) .u/ 2 h.X /i (8)  z holds for every u 2 A . We use induction on juj. By the definition of X , implication (8) holds for words of length < 2m. Now assume that juj > 2m and (8) holds for shorter words. Write u D vw with jwj D m. Then there exists a path v

z

   ! q0 ! q ! t !   

in A with jzj < m. Thus vz 2 L.A/ and by the induction hypothesis .vz/ 2 h.X /i. On the other hand, jz 1 wj < 2m and .z 1 w/ D .z 1 v 1 /.vw/ 2 H ; hence z 1 w 2 X and so .u/ D .vz/.z 1 w/ 2 h.X /i, proving (8) as required. ((H ) It is trivial.

It is an easier task to characterise the smaller class of recognisable subgroups:

Proposition 4.2. A subgroup H of a group G is recognisable if and only if it has finite index. Proof. (H) ) In general, a recognisable subset of G is of the form NX , where N Ef:i: G and X  G is finite. If H D NX is a subgroup of G , then N  H and so H has finite index as well. ((H ) This follows from the well-known fact that every finite-index subgroup H of T G contains a finite-index normal subgroup N of G , namely N D g2G gHg 1 . Since N has finite index, H must be of the form NX for some finite X  G . 4.2. Benois’ theorem. The central result in this subsection is Benois’ theorem, the cornerstone of the whole theory of rational subsets of free groups: Theorem 4.3 (Benois). The following facts hold. x is also rational, and can be effectively construci. If L  Az  is rational, then L ted from L. ii. A subset of RA is a rational language as a subset of Az  if and only if it is rational as a subset of FA . We illustrate this in the case of finitely generated subgroups: temporarily calling those automata recognising rational subsets of RA “Benois automata,” we may convert them into Stallings automata by adding inverse edges, identifying initial and terminal vertices to enforce a basepoint and folding this new automaton. Given a Stallings automaton, one intersects it with RA to obtain a Benois automaton.

23. Rational subsets of groups

859

z E; I; T / be a finite automaton recognising L. We define Proof. (i) Let A D .Q; A; a sequence .An /n of finite automata with "-transitions as follows. Let A0 D A. z En ; I; T / is defined, we consider all instances of ordered Assuming that An D .Q; A; pairs .p; q/ 2 Q  Q such that

there exists a path p

aa

1

1 !q in An for some a 2 Az, but no path p ! q .

(P)

Clearly, there are only finitely many instances of (P) in An . We define EnC1 to be the union of En with all the new edges .p; 1; q/, where .p; q/ 2 Q  Q is an instance of z EnC1 ; I; T /. In particular, note that An D AnCk (P). Finally, we define AnC1 D .Q; A; for every k > 1 if there are no instances of (P) in An . Since Q is finite, the sequence .An /n is ultimately constant, say after reaching Am . We claim that x D L.Am / \ RA : L

(9)

Indeed, take u 2 L. There exists a sequence of words u D u0 ; u1 ; : : : ; uk 1 , uk D uN where each term is obtained from the preceding one by erasing a factor of the form aa 1 for some a 2 Az. A straightforward induction shows that ui 2 L.Ai / for i D 0; : : : ; k , aa 1 1 since the existence of a path p !q in Ai implies the existence of a path p ! q in x  L.Am / \ RA . Ai C1 . Hence uN D uk 2 L.Ak /  L.Am / and it follows that L u For the opposite inclusion, we start by noting that any path p ! q in Ai C1 can v be lifted to a path p ! q in Ai , where v is obtained from u by inserting finitely many factors of the form aa 1 . It follows that L.Am / D L.Am

1/

x D    D L.A0 / D L;

x . Thus (9) holds. and so L.Am / \ RA  L.Am / D L Since [ Az  aa 1 Az  RA D Az  n a2Az

is obviously rational, and the class of rational languages is closed under intersection, it x is rational. Moreover, we can effectively compute the automaton Am and follows that L a finite automaton recognising RA , hence the direct product construction can be used to x D L.Am / \ RA . construct a finite automaton recognising the intersection L (ii) Consider X  RA . If X 2 Rat Az  , then .X / 2 Rat FA and so X is rational as a subset of FA . Conversely, if X is rational as a subset of FA , then X D .L/ for some L 2 Rat Az  . x . Now part (i) yields L x 2 Rat Az  and so X 2 Rat Az  as Since X  RA , we get X D L required.

Laurent Bartholdi and Pedro V. Silva

860

Example 4.1. Let A D A0 be depicted by a

b

b

a

a

b

1

1

We get a

b A1 D

b 1

a

b

a

1

a

b A2 D A3 D

1

1

a

1

b

b

a

1

1

x D L.A2 / \ R2 . and we can then proceed to compute L

The following result summarises some of the most direct consequences of Benois’ Theorem: Corollary 4.4. The following facts hold.

i. FA has decidable rational subset membership problem. ii. Rat FA is closed under the boolean operations. Proof. (i) Given X 2 Rat FA and u 2 FA , write X D .L/ for some L 2 Rat Az  . x . By Theorem 4.3 (i), we may construct a finite Then u 2 X if and only if uN 2 Xx D L x x. automaton recognising L and therefore decide whether or not uN 2 L (ii) Given X 2 Rat FA , we have FA n X D RA n Xx and so FA n X 2 Rat FA by Theorem 4.3. Therefore Rat FA is closed under complement. Since Rat FA is trivially closed under union, it follows from De Morgan’s laws that it is closed under intersection as well.

Note that we can associate algorithms with these boolean closure properties of Rat FA in a constructive way. We remark also that the proof of Theorem 4.3 can be clearly adapted to more general classes of rewriting systems (see [10]). Theorem 4.3 and Corollary 4.4 have been generalised several times by Benois herself [4] and by Sénizergues, who obtained the most general versions. Sénizergues’ results [50] hold for rational length-reducing left basic confluent rewriting systems and remain valid for the more general notion of controlled rewriting system.

23. Rational subsets of groups

861

4.3. Rational versus recognisable. Since FA is a finitely generated monoid, it follows that every recognisable subset of FA is rational, see Proposition III.2.4 in [5]. We turn to the problem of deciding which rational subsets of FA are recognisable. The first proof, using rewriting systems, is due to Sénizergues [51], but we follow the shorter alternative proof from [54], where a third proof, of a more combinatorial nature, was also given. Given a subset X of a group G , we define the right stabiliser of X to be the submonoid of G defined by Next, let

R.X / D ¹g 2 G j Xg  X º:

K.X / D R.X / \ .R.X // 1 D ¹g 2 G j Xg D X º be the largest subgroup of G contained in R.X / and let \ N.X / D gK.X /g 1 g2G

be the largest normal subgroup of G contained in K.X /, and therefore in R.X /. Lemma 4.5 ([53]). A subset X of a group G is recognisable if and only if K.X / is a finite-index subgroup of G . In fact, the Schreier graph (see § 2 of Chapter 24) of K.X /nG is the underlying graph of an automaton recognising X , and G=N.X / is the syntactic monoid of X . Proof. (H) ) If X  G is recognisable, then X D NF for some N Ef:i: G and F  G finite. Hence N  R.X / and so N  K.X / since N 6 G . Since N has finite index in G , so does K.X /. ((H ) If K.X / is a finite-index subgroup of G , so is N D N.X /. Indeed, a finiteindex subgroup has only finitely many conjugates (also having finite index) and a finite intersection of finite-index subgroups is easily checked to have finite index itself. Therefore it suffices to show that X D F N for some finite subset F of G . Since N has finite index, the claim follows from XN D X , in turn an immediate consequence of N  R.X /. Proposition 4.6. It is decidable whether or not a rational subset of FA is recognisable. Proof. Take X 2 Rat FA . In view of Lemma 4.5 and Proposition 3.8, it suffices to show that K.X / is finitely generated and effectively computable. Given u 2 FA , u … R.X / () Xu 6 X () Xu \ .FA n X / ¤ ; () u 2 X

and hence

1

.FA n X /;

R.X / D FA n .X 1 .FA n X //: It easily follows from the fact that the class of rational languages is closed under reversal and morphisms, combined with Theorem 4.3 (ii), that X 1 2 Rat FA . Since Rat FA is trivially closed under product, it follows from Corollary 4.4 that R.X / is rational and

862

Laurent Bartholdi and Pedro V. Silva

effectively computable, and so is K.X / D R.X / \ .R.X // 1 . By Theorem 4.1, the subgroup K.X / is finitely generated and the proof is complete. These results are related to the Sakarovitch conjecture [48], which states that every rational subset of FA must be either recognisable or disjunctive: a subset X of a monoid M is disjunctive if it has trivial syntactic congruence, or equivalently, if any morphism 'W M ! M 0 recognising X is necessarily injective. In the group case, it easily follows from the proof of the direct implication of Lemma 4.5 that the projection G ! G=N recognises X  G if and only if N  N.X /. Thus X is disjunctive if and only if N.X / is the trivial subgroup. The Sakarovitch Conjecture was first proved in [51], but once again we follow the shorter alternative proof from [54]. Theorem 4.7 (Sénizergues). A rational subset of FA is either recognisable or disjunctive. Proof. Since the only subgroups of Z are the trivial subgroup and finite-index subgroups, we may assume that Card A > 1. Take X 2 Rat FA . By the proof of Proposition 4.6, the subgroup K.X / is finitely generated. In view of Lemma 4.5, we may assume that K.X / is not a finite-index subgroup. Thus S.K.X // is not complete by Proposition 3.8. Let q0 denote the basepoint of S.K.X //. Since S.K.X // is not complete, q0  u is undefined for some reduced word u. Let w be an arbitrary nonempty reduced word. We must show that w … N.X /. Suppose otherwise. Since u; w are reduced and Card A > 1, there exist enough letters to make sure that there is some word v 2 RA such that uvwv 1 u 1 is reduced. Now w 2 N.X /, hence uvwv 1 u 1 2 N.X /  K.X / by normality. Since uvwv 1 u 1 is reduced, it follows from Proposition 3.5 that uvwv 1 u 1 labels a loop at q0 in S.K.X //, contradicting q0  u being undefined. Thus w … N.X / and so N.X / D 1. Therefore X is disjunctive as required. 4.4. Beyond free groups. Let W FA  G be a morphism onto a group G . We consider the word problem submonoid of a group G , defined as W .G/ D ./

1

.1/:

(10)

Proposition 4.8 (Anisimov). The language W .G/ is rational if and only if G is finite. Proof. If G is finite, it is easy to check that W .G/ is rational by viewing the Cayley graph of G (see § 2 of Chapter 24) as an automaton. Conversely, if W .G/ is rational, then  1 .1/ is a finitely generated normal subgroup of FA , either finite-index or trivial by the proof of Theorem 4.7. It is well known that the Dyck language DA D  1 .1/ is not rational if Card A > 0; thus it easily follows that  1 .1/ has finite index and therefore G must be finite. How about finitely generated groups with context-free W .G/? A celebrated result by Muller and Schupp [39], with a contribution by Dunwoody [15], relates them to virtually free groups: these are groups with a free subgroup of finite index.

23. Rational subsets of groups

863

Theorem 4.9 (Muller and Schupp). The language W .G/ is context-free if and only if G is virtually free. Sketch of proof. First assume that G is finitely generated and virtually free. We claim that G has a normal free subgroup FA of finite index, with ATfinite. Indeed, letting F be a finite-index free subgroup of G , it suffices to take F 0 D g2G gF g 1 . Since F has finite index, so does F 0 ; see the proof of Lemma 4.5. Taking a morphism W FB ! G with B finite, we get from Corollary 3.9 that  1 .F 0 / 6f:i: FB is finitely generated, so F 0 is itself finitely generated. Finally, F 0 is a subgroup of F , so F 0 is still free by Theorem 3.7, and we can write F 0 D FA . We may therefore decompose G as a finite disjoint union of the form G D FA b0 [ FA b1 [    [ FA bm ;

with b0 D 1:

(11)

We view this decomposition as a rewriting system for G , providing a rational transduction between W .G/ and DA . The converse implication can be proved by arguing geometrical properties of the Cayley graph of G such as in Chapter 24; briefly said, one deduces from the contextfreeness of W .G/ that the Cayley graph of G is close (more precisely, quasi-isometric) to a tree. It follows that virtually free groups have decidable word problem. In Chapter 24, we shall discuss the word problem for more general classes of groups using other techniques. Grunschlag proved that every rational (respectively recognisable) subset of a virtually free group G decomposed as in (11) admits a decomposition as a finite union X0 b0 [    [ Xm bm , where the Xi are rational (respectively recognisable) subsets of FA , see [21]. Thus basic results such as Corollary 4.4 or Proposition 4.6 can be extended to virtually free groups (see [21] and [53]), and Theorem 3.19 also holds for them; see [55]. Similar generalisations can be obtained for free abelian groups of finite rank [53]. The fact that the strong properties of Corollary 4.4 hold for both free groups and free abelian groups suggests considering the case of graph groups (also known as free partially abelian groups or right angled Artin groups), where we admit partial commutation between letters. An independence graph is a finite undirected graph .A; I / with no loops, that is, I is a symmetric anti-reflexive relation on A. The graph group G.A; I / is the quotient FA = , where  denotes the congruence generated by the relation ¹.ab; ba/ j .a; b/ 2 I º:

On both extremes, we have FA D G.A; ;/ and the free abelian group on A, which corresponds to the complete graph on A. These turn out to be particular cases of

Laurent Bartholdi and Pedro V. Silva

864

transitive forests. We can say that .A; I / is a transitive forest if it has no induced subgraph of either of the following forms: 







C4





P4





We recall that an induced subgraph of .A; I / is formed by a subset of vertices A0  A and all the edges in I connecting vertices from A0 . The following difficult theorem, a group-theoretic version of a result on trace monoids by Aalbersberg and Hoogeboom [1], was proved in [28]: Theorem 4.10 (Lohrey and Steinberg). Let .A; I / be an independence graph. Then G.A; I / has decidable rational subset membership problem if and only if .A; I / is a transitive forest. They also proved that these conditions are equivalent to decidability of the membership problem for finitely generated submonoids. Such a ‘bad’ G.A; I / gives an example of a finitely presented group with a decidable generalised word problem that does not have a decidable membership problem for finitely generated submonoids. It follows from Theorem 4.10 that any group containing a direct product of two free monoids has undecidable rational subset membership problem, a fact that can be directly deduced from the undecidability of the Post correspondence problem. Other positive results on rational subsets have been obtained for graphs of groups, HNN extensions and amalgamated free products by Kambites, Silva and Steinberg [23], or Lohrey and Sénizergues [27]. Lohrey and Steinberg proved recently that the rational subset membership problem is recursively equivalent to the finitely generated submonoid membership problem for groups with two or more ends [29]. With respect to closure under complement, Lohrey and Sénizergues [27] proved that the class of groups for which the rational subsets form a boolean algebra is closed under HNN extension and amalgamated products over finite groups. On the negative side, Bazhenova proved that rational subsets of finitely generated nilpotent groups do not form a boolean algebra, unless the group is virtually abelian [3]. Moreover, Roman0 kov proved in [47], via a reduction from Hilbert’s 10th problem, that the rational subset membership problem is undecidable for free nilpotent groups of any class > 2 of sufficiently large rank. Last but not least, we should mention that Stallings’ construction was successfully generalised in various directions: to fundamental groups of certain classes of graphs of groups (by Kapovich, Miasnikov and Weidmann [25]); to amalgamated free products of finite groups (by Markus-Epstein [35]); and to virtually free groups (by Silva, Soler-Escrivà and Ventura [56]); to quasi-convex subgroups of automatic groups (by Kharlampovich, Miasnikov, and Weil [26]).

23. Rational subsets of groups

865

4.5. Rational solution sets and rational constraints. In this final subsection we make a brief incursion in the brave new world of rational constraints. Rational subsets provide group theorists with two main assets:  a concept that generalises finite generation for subgroups and is much more fit to withstand most induction procedures;  a systematic way of looking for solutions of the right type in the context of equations of many sorts.

This second feature leads us to the notion of rational constraint, where we restrict the set of potential solutions to some rational subset. And there is a particular combination of circumstances that can ensure the success of this strategy: if Rat G is closed under intersection and we can prove that the solution set of problem P is an effectively computable rational subset of G , then we can solve problem P with any rational constraint. An early example is the adaptation by Margolis and Meakin of Rabin’s language and Rabin’s tree theorem to free groups, where first-order formulae provide rational solution sets [33]. The logic language considered here is meant to be applied to words, seen as models, and consists basically of unary predicates that associate letters with positions in each word, as well as a binary predicate for position ordering. Margolis and Meakin used this construction to solve problems in combinatorial inverse semigroup theory [33]. Diekert, Gutierrez and Hagenah proved that the existential theory of systems of equations with rational constraints is solvable over a free group [12]. Working basically on a free monoid with involution, and adapting Plandowski’s approach [41] in the process, they extended the classical result of Makanin [31] to include rational constraints, with much lower complexity as well. The complexity of their results has been recently improved in a paper by Diekert, Jeż and Plandowski [13], using the compression techniques developed by Jeż [22] for word equations. The proof of this deep result is well out of scope here, but its potential applications are immense. Group theorists are only starting to discover its full strength. The results in [27] can be used to extend the existential theory of equations with rational constraints to virtually free groups, a result that follows also from Dahmani and Guirardel’s recent paper on equations over hyperbolic groups with quasi-convex rational constraints [11]. Equations over graph groups with a restricted class of rational constraints were also successfully considered by Diekert and Lohrey [14]. A somewhat exotic example of computation of a rational solution set arises in the problem of determining which automorphisms of F2 (if any) carry a given word into a given finitely generated subgroup. The full solution set is recognised by a finite automaton; its vertices are themselves structures named “finite truncated automata” [58]. Acknowledgement. Pedro V. Silva acknowledges support by Project ASA (PTDC /MAT/65481/2006) and C.M.U.P., financed by F.C.T. (Portugal) through the programmes POCTI and POSI, with national and E.U. structural funds.

866

Laurent Bartholdi and Pedro V. Silva

References [1] I. J. Aalbersberg and H. J. Hoogeboom, Characterizations of the decidability of some problems for regular trace languages. Math. Systems Theory 22 (1989), no. 1, 1–19. MR 0992783 Zbl 0679.68132 q.v. 864 [2] Algebraic Cryptography Center, CRAG – The Cryptography and Groups Software Library, 2010. https://web.stevens.edu/algebraic/downloads.php q.v. 844 [3] G. A. Bazhenova, Rational sets in finitely generated nilpotent groups. Algebra Log. 39 (2000), no. 4, 379–394, 507. In Russian. English translation, Algebra and Logic 39 (2000), no. 4, 215–223. MR 1803582 Zbl 1054.20015 q.v. 864 [4] M. Benois, Descendants of regular language in a class of rewriting systems: algorithm and complexity of an automata construction. In Rewriting techniques and applications (P. Lescanne, ed.). Proceedings of the second international conference held in Bordeaux, May 25–27, 1987. Lecture Notes in Computer Science, 256. Springer, Berlin, 1987, 121–132. MR 0903667 Zbl 0643.68119 q.v. 860 [5] J. Berstel, Transductions and context-free languages. Leitfäden der angewandten Mathematik und Mechanik, 38. B. G. Teubner, Stuttgart, 1979. MR 0549481 Zbl 0424.68040 q.v. 857, 861 [6] J. Berstel, C. De Felice, D. Perrin, C. Reutenauer, and G. Rindone, Bifix codes and Sturmian words. J. Algebra 369 (2012), 146–202. MR 2959791 Zbl 1263.68121 q.v. 849 [7] M. Bestvina and M. Handel, Train tracks and automorphisms of free groups. Ann. of Math. (2) 135 (1992), no. 1, 1–51. MR 1147956 Zbl 0757.57004 q.v. 857 [8] J.-C. Birget, S. W. Margolis, J. C. Meakin, and P. Weil, PSPACE-complete problems for subgroups of free groups and inverse finite automata. Theoret. Comput. Sci. 242 (2000), no. 1–2, 247–281. MR 1769781 q.v. 853 [9] O. Bogopolski and O. Maslakova, An algorithm for finding a basis of the fixed point subgroup of an automorphism of a free group. Internat. J. Algebra Comput. 26 (2016), no. 1, 29–67. MR 3463201 Zbl 1348.20030 q.v. 857 [10] R. V. Book and F. Otto, String-rewriting systems. Texts and Monographs in Computer Science. Springer, New York, 1993. MR 1215932 Zbl 0832.68061 q.v. 843, 860 [11] F. Dahmani and V. Guirardel, Foliations for solving equations in groups: free, virtually free, and hyperbolic groups. J. Topol. 3 (2010), no. 2, 343–404. MR 2651364 Zbl 1217.20021 q.v. 865 [12] V. Diekert, C. Gutierrez, and C. Hagenah, The existential theory of equations with rational constraints in free groups is PSPACE-complete. Inform. and Comput. 202 (2005), no. 2, 105–140. MR 2172984 Zbl 1101.68649 q.v. 865 [13] V. Diekert, A. Jeż, and W. Plandowski, Finding all solutions of equations in free groups and monoids with involution. In Computer science—theory and applications (E. A. Hirsch, S. O. Kuznetsov, J.-É. Pin, and N. K. Vereshchagin, eds.). Proceedings of the 9th International Computer Science Symposium in Russia (CSR 2014) held in Moscow, June 7–11, 2014. Lecture Notes in Computer Science, 8476. Springer, Cham, 2014, 1–15. MR 3218535 Zbl 1382.68348 q.v. 865 [14] V. Diekert and M. Lohrey, Word equations over graph products. Internat. J. Algebra Comput. 18 (2008), no. 3, 493–533. MR 2422071 Zbl 1186.20041 q.v. 865 [15] M. J. Dunwoody, The accessibility of finitely presented groups. Invent. Math. 81 (1985), no. 3, 449–457. MR 0807066 Zbl 0572.20025 q.v. 862

23. Rational subsets of groups

867

[16] J. Friedman, Sheaves on graphs, their homological invariants, and a proof of the Hanna Neumann conjecture: with an appendix by Warren Dicks. Mem. Amer. Math. Soc., 233(1100):xii+106, 2015. With an appendix by Warren Dicks. q.v. 851 [17] The GAP Group, GAP–Groups, Algorithms, and Programming. Version 4.4.12, 2008. https://www.gap-system.org/ q.v. 844 [18] S. M. Gersten, Intersections of finitely generated subgroups of free groups and resolutions of graphs. Invent. Math. 71 (1983), no. 3, 567–591. MR 0695907 Zbl 0521.20014 q.v. 851 [19] S. M. Gersten, On Whitehead’s algorithm. Bull. Amer. Math. Soc. (N.S.) 10 (1984), no. 2, 281–284. MR 0733696 Zbl 0537.20015 q.v. 855 [20] R. Z. Goldstein and E. C. Turner, Fixed subgroups of homomorphisms of free groups. Bull. London Math. Soc. 18 (1986), no. 5, 468–470. MR 0847985 Zbl 0576.20016 q.v. 856 [21] Z. Grunschlag, Algorithms in geometric group theory. Ph.D. thesis, University of California, Berkeley, 1999. MR 2699134 q.v. 863 [22] A. Jeż, Recompression: a simple and powerful technique for word equations. J. ACM 63 (2016), no. 1, art. 4, 51 pp. MR 3490231 Zbl 1403.68374 q.v. 865 [23] M. Kambites, P. V. Silva, and B. Steinberg, On the rational subset problem for groups. J. Algebra 309 (2007), no. 2, 622–639. MR 2303197 Zbl 1123.20047 q.v. 864 [24] I. Kapovich and A. Myasnikov, Stallings foldings and subgroups of free groups. J. Algebra 248 (2002), no. 2, 608–668. MR 1882114 Zbl 1001.20015 q.v. 843, 852, 853 [25] I. Kapovich, R. Weidmann, and A. Miasnikov, Foldings, graphs of groups and the membership problem. Internat. J. Algebra Comput. 15 (2005), no. 1, 95–128. MR 2130178 Zbl 1089.20018 q.v. 864 [26] O. Kharlampovich, A, Miasnikov, and P. Weil, Stallings graphs for quasi-convex subgroups. J. Algebra 488 (2017), 442–483. MR 3680926 Zbl 06768984 q.v. 864 [27] M. Lohrey and G. Sénizergues, Rational subsets in HNN-extensions and amalgamated products. Internat. J. Algebra Comput. 18 (2008), no. 1, 111–163. MR 2394724 Zbl 1190.20019 q.v. 864, 865 [28] M. Lohrey and B. Steinberg, The submonoid and rational subset membership problems for graph groups. J. Algebra 320 (2008), no. 2, 728–755. MR 2422314 Zbl 1156.20052 q.v. 864 [29] M. Lohrey and B. Steinberg, Submonoids and rational subsets of groups with infinitely many ends. J. Algebra 324 (2010), no. 5, 970–983. MR 2659208 Zbl 1239.20038 q.v. 864 [30] W. Magnus, A. Karrass, and D. Solitar, Combinatorial group theory. Presentations of groups in terms of generators and relations. Second revised edition. Dover Publications, New York, 1976. MR 0422434 Zbl 0362.20023 q.v. 842, 843 [31] G. S. Makanin, Equations in a free group. Izv. Akad. Nauk. SSR, Ser. Math. 46 (1983), 1199–1273. In Russian. English translation, Math. USSR Izv. 21 (1983), 483–546. Zbl 0527.20018 q.v. 865 [32] S. W. Margolis and J. C. Meakin, Free inverse monoids and graph immersions. Internat. J. Algebra Comput. 3 (1993), no. 1, 79–99. MR 1214007 Zbl 0798.20056 q.v. 843 [33] S. W. Margolis and J. C. Meakin, Inverse monoids, trees and context-free languages. Trans. Amer. Math. Soc. 335 (1993), no. 1, 259–276. MR 1073775 Zbl 0795.20043 q.v. 865 [34] S. W. Margolis, M. V. Sapir, and P. Weil, Closed subgroups in pro-V topologies and the extension problem for inverse automata. Internat. J. Algebra Comput. 11 (2001), no. 4, 405–445. MR 1850210 Zbl 1027.20036 q.v. 855

868

Laurent Bartholdi and Pedro V. Silva

[35] L. Markus-Epstein, Stallings foldings and subgroups of amalgams of finite groups. Internat. J. Algebra Comput. 17 (2007), no. 8, 1493–1535. MR 2378049 Zbl 1185.20027 q.v. 864 [36] A. Martino and E. Ventura, Fixed subgroups are compressed in free groups. Comm. Algebra 32 (2004), no. 10, 3921–3935. MR 2097438 Zbl 1069.20015 q.v. 857 [37] A. Miasnikov, E. Ventura, and P. Weil, Algebraic extensions in free groups. In Geometric group theory (G. N. Arzhantseva, L. Bartholdi, J. Burillo and E. Ventura, eds.). Papers from the Conference “Asymptotic and Probabilistic Methods in Geometric Group Theory”, held at the University of Geneva, Geneva, June 20–25, 2005, and the Conference in Group Theory, held in Barcelona, June 28–July 2, 2005. Trends in Mathematics. Birkhäuser Verlag, Basel, 2007, 225–253. MR 2395796 Zbl 1160.20022 q.v. 852 [38] I. Mineyev, Groups, graphs and the Hanna Neumann conjecture. J. Topol. Anal. 4 (2012), no. 1, 1–12. MR 2914871 Zbl 1257.20034 q.v. 851 [39] D. E. Muller and P. E. Schupp, Groups, the theory of ends, and context-free languages. J. Comput. System Sci. 26 (1983), no. 3, 295–310. MR 0710250 Zbl 0537.20011 q.v. 862 [40] H. Neumann, On the intersection of finitely generated free groups. Addendum. Publ. Math. Debrecen 5 (1957), 128. MR 0093537 Zbl 0078.01402 q.v. 851 [41] W. Plandowski, Satisfiability of word equations with constants is in PSPACE. J. ACM 51 (2004), no. 3, 483–496. MR 2145862 Zbl 1192.68372 q.v. 865 [42] K. Reidemeister, Fundamentalgruppe und Überlagerungsräume. J. Nachrichten Göttingen 1928, 69–76. JFM 54.0603.01 q.v. 848 [43] J. Rhodes and B. Steinberg, The q -theory of finite semigroups. Springer Monographs in Mathematics. Springer, New York, 2009. MR 2472427 Zbl 1186.20043 q.v. 855 [44] L. Ribes and P. A. Zalesski˘ı, On the profinite topology on a free group. Bull. London Math. Soc. 25 (1993), no. 1, 37–43. MR 1190361 Zbl 0811.20026 q.v. 855 [45] L. Ribes and P. A. Zalesski˘ı, The pro-p topology of a free group and algorithmic problems in semigroups. Internat. J. Algebra Comput. 4 (1994), no. 3, 359–374. MR 1297146 Zbl 0839.20041 q.v. 855 [46] A. Roig, E. Ventura, and P. Weil, On the complexity of the Whitehead minimization problem. Internat. J. Algebra Comput. 17 (2007), no. 8, 1611–1634. MR 2378055 Zbl 1149.20024 q.v. 852, 856 [47] V. Roman’kov, On the occurrence problem for rational subsets of a group. In International Conference on Combinatorial and Computational Methods in Mathematics (V. Roman’kov, ed.), Omsk, 1999, 76–81. q.v. 864 [48] J. Sakarovitch, Syntaxe des langages de Chomsky, essai sur le déterminisme. Ph.D. thesis. Université Paris VII, Paris, 1979. q.v. 862 [49] J. Sakarovitch, Éléments de théorie des automates. Vuibert Informatique, Paris, 2003. English translation, Elements of automata theory. Cambridge University Press, 2009. Translated by R. Thomas. Cambridge University Press, Cambridge, 2009. MR 2567276 Zbl 1188.68177 (English ed.) Zbl 1178.68002 (French ed.) q.v. 857 [50] G. Sénizergues, Some decision problems about controlled rewriting systems. Theoret. Comput. Sci. 71 (1990), no. 3, 281–346. MR 1057769 Zbl 0695.68056 q.v. 860 [51] G. Sénizergues, On the rational subsets of the free group. Acta Inform. 33 (1996), no. 3, 281–296. MR 1393764 Zbl 0858.68044 q.v. 861, 862 [52] J.-P. Serre, Arbres, amalgames, SL2 . Société Mathématique de France, 1977. Rédigé avec la collaboration de H. Bass. Astérisque, 46. Société Mathématique de France, Paris, 1977. MR 0476875 Zbl 0369.20013 q.v. 843

23. Rational subsets of groups

869

[53] P. V. Silva, Recognizable subsets of a group: finite extensions and the abelian case. Bull. Eur. Assoc. Theor. Comput. Sci. 77 (2002), 195–215. MR 1920336 Zbl 1015.20049 q.v. 861, 863 [54] P. V. Silva, Free group languages: rational versus recognizable. Theor. Inform. Appl. 38 (2004), no. 1, 49–67. MR 2059028 Zbl 1082.68071 q.v. 861, 862 [55] P. V. Silva, Fixed points of endomorphisms of virtually free groups. Pacific J. Math. 263 (2013), no. 1, 207–240. MR 3069081 Zbl 1280.20035 q.v. 857, 863 [56] P. V. Silva, X. Soler-Escrivà, and E. Ventura, Finite automata for Schreier graphs of virtually free groups. J. Group Theory 19 (2016), no. 1, 25–54. MR 3441128 Zbl 06532858 q.v. 864 [57] P. V. Silva and P. Weil, On an algorithm to decide whether a free group is a free factor of another. Theor. Inform. Appl. 42 (2008), no. 2, 395–414. MR 2401269 Zbl 1146.20021 q.v. 852 [58] P. V. Silva and P. Weil, Automorphic orbits in free groups: words versus subgroups. Internat. J. Algebra Comput. 20 (2010), no. 4, 561–590. MR 2665777 Zbl 1206.20026 q.v. 865 [59] P. V. Silva and P. Weil, On finite-index extensions of subgroups of free groups. J. Group Theory 13 (2010), no. 3, 365–381. MR 2653525 Zbl 1203.20025 q.v. 853 [60] C. C. Sims, Computation with finitely presented groups. Encyclopedia of Mathematics and its Applications, 48. Cambridge University Press, Cambridge, 1994. MR 1267733 Zbl 0828.20001 q.v. 841, 843 [61] J. R. Stallings, Topology of finite graphs. Invent. Math. 71 (1983), no. 3, 551–565. MR 0695906 Zbl 0521.20013 q.v. 843, 854 [62] M. Takahasi, Note on chain conditions in free groups. Osaka Math. J. 3 (1951), 221–225. MR 0046362 Zbl 0044.01106 q.v. 852 [63] W. M. Touikan, A fast algorithm for Stallings’ folding process. Internat. J. Algebra Comput. 16 (2006), no. 6, 1031–1045. MR 2286421 Zbl 1111.20032 q.v. 847 [64] E. Ventura, On fixed subgroups of maximal rank. Comm. Algebra 25 (1997), no. 10, 3361–3375. MR 1465119 Zbl 0893.20025 q.v. 857 [65] E. Ventura, Fixed subgroups in free groups: a survey. In Combinatorial and geometric group theory (S. Cleary, R. Gilman, A. G. Myasnikov, and V. Shpilrain, eds.). Papers from the AMS Special Sessions on Combinatorial Group Theory and on Computational Group Theory, held in New York, November 4–5, 2000, and in Hoboken, N.J., April 28–29, 2001. Contemporary Mathematics, 296. American Mathematical Society, Providence, R.I., 2002, 231–255. MR 1922276 Zbl 1025.20012 q.v. 857 [66] J. H. C. Whitehead, On equivalent sets of elements in a free group. Ann. of Math. (2) 37 (1936), no. 4, 782–800. MR 1503309 Zbl 0015.24804 q.v. 856

Chapter 24

Groups defined by automata Laurent Bartholdi and Pedro V. Silva

Contents 1. 2. 3.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The geometry of the Cayley graph . . . . . . . . . . . . . . . . . . . . . Groups generated by automata . . . . . . . . . . . . . . . . . . . . . .

871 871 885

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

903

1. Introduction Finite automata have been used effectively in recent years to define infinite groups. The two main lines of research have, as their most representative objects, the class of automatic groups (including “word-hyperbolic groups” as a particular case) and automata groups (singled out among the more general “self-similar groups”). The first approach is studied in § 2, and implements in the language of automata some tight constraints on the geometry of the group’s Cayley graph. Automata are used to define a normal form for group elements and execute the fundamental group operations. The second approach is developed in § 3, and focuses on groups acting in a finitely constrained manner on a regular rooted tree. The automata define sequential permutations of the tree, and represent the group elements themselves. The authors are grateful to Martin R. Bridson, François Dahmani, Rostislav I. Grigorchuk, Luc Guyot, and Mark V. Sapir for their remarks on a preliminary version of this text.

2. The geometry of the Cayley graph Since its inception at the beginning of the 19 th century, group theory has been recognised as a powerful language to capture symmetries of mathematical objects: crystals in the early 19th century, for Hessel and Frankenheim [57], p. 120; roots of a polynomial, for Galois and Abel; solutions of a differential equation, for Lie, Painlevé, etc. It was only later, mainly through the work of Klein and Poincaré, that the tight connections between group theory and geometry were brought to light. Topology and group theory are related as follows. Consider a space X , on which a group G acts freely: for every g ¤ 1 2 G and x 2 X , we have x  g ¤ x . If the quotient

872

Laurent Bartholdi and Pedro V. Silva

space Z D X=G is compact, then G “looks very much like” X , in the following sense: choose any x 2 X , and consider the orbit x  G . This identifies G with a roughly evenly distributed subset of X . Conversely, consider a “nice” compact space Z with fundamental group G : then z , the universal cover of Z , admits a free G -action. In conclusion, properties of X DZ the fundamental group of a compact space Z reflect geometric properties of the space’s universal cover and conversely. We recall that finitely generated groups were defined in § 2 of Chapter 23: they are groups G admitting a surjective map W FA  G , where FA is the free group on a finite set A. Definition 2.1. A group G is finitely presented if it is finitely generated, say by W FA  G , and if there exists a finite subset R  FA such the kernel ker./ is generated by the FA -conjugates of R, that is, ker./ D hhRii; one then has G D FA =hhRii. These r 2 R are called relators of the presentation; and one writes G D hA j Ri:

Sometimes it is convenient to write relations in the form ‘a D b ’ in the presentation; such a relation stands for the corresponding relator ‘ab 1 ’. Let G be a finitely generated group, with generating set A. Its Cayley graph C.G; A/, introduced by Cayley [48], is the graph with vertex set G and edge set G  A; the edge .g; s/ starts at vertex g and ends at vertex gs . In particular, the group G acts freely on C.G; A/ by left translation; the quotient GnC.G; A/ is a graph with one vertex and Card A loops. Assume moreover that G is finitely presented, with relator set R. For each r D r1    rn 2 R and each g 2 G , the word r traces a closed path in C.G; A/, starting at g and passing successively through gr1 ; gr1 r2 ; : : : ; gr1 r2    rn D g . If one glues a closed disc to C.G; A/ for each such r; g by identifying the disc’s boundary with that path, one obtains a 2-dimensional cell complex in which each loop is contractible – this is a direct translation of the fact that the normal closure of R is the kernel of the presentation homomorphism FA ! G . It is called the Cayley 2-complex of G . For example, consider G D Z2 , with generating set A D ¹.0; 1/; .1; 0/º. Its Cayley graph is the standard square grid, and its Cayley 2-complex is the plane R2 . The Cayley graph of a free group FA , generated by A, is a tree.

24. Groups defined by automata

873

More generally, consider a right G -set X , for instance the coset space H nG for a subgroup H 6 G . The Schreier graph C.G; X; A/ of X is then the graph with vertex set X and edge set X  A; the edge .x; s/ starts in x and ends in xs . 2.1. History of geometric group theory. In a remarkable series of papers [52], [53], and [54] (also see [55]), Dehn initiated the geometric study of infinite groups, by trying to relate algorithmic questions on a group G with geometric questions on its Cayley graph. These problems were described in Definition 2.1 of Chapter 23, to which we refer. For instance, the word problem asks to determine whether a path in the Cayley graph of G is closed, knowing only the path’s labels. It is striking that Dehn used, for Cayley graph, the German Gruppenbild, literally “group picture.” We must solve the word problem in a group G to be able to draw bounded portions of its Cayley graph; and some algebraic properties of G are tightly bound to the algorithmic complexity of the word problem; see § 4.4 of Chapter 23. For example, Muller and Schupp proved (see Theorem 4.9 of Chapter 23) that a push-down automaton recognises precisely the trivial elements of G if and only if G admits a free subgroup of finite index. We consider now a more complicated example. Let Sg be an oriented surface of genus g > 2, and let Jg denote its fundamental group. Recall that, in a group, Œx; y denotes the commutator x 1 y 1 xy . We have a presentation Jg D ha1 ; b1 ; : : : ; ag ; bg j Œa1 ; b1     Œag ; bg i:

(1)

Let r D Œa1 ; b1     Œag ; bg  denote the relator, and let R denote the set of cyclic permutations of r ˙1 . The word problem in Jg is solvable in polynomial time by the following algorithm: let u be a given word. Freely reduce u by removing all aa 1 subwords. Then, if u contains a subword v1 such that v1 v2 2 R and v1 is longer than v2 , replace v1 by v2 1 in u and repeat. Then u represents 1 2 G if and only if it eventually reduces to the empty word. The validity of this algorithm relies on a lemma by Dehn, that every nontrivial word representing the identity contains more than half of the relator as a subword. Incidentally, the Cayley graph of Jg is a tiling of the hyperbolic plane by 4g-gons, with 4g meeting at each vertex, and its Cayley 2-complex is homeomorphic to the hyperbolic plane. Tartakovski˘ı [126], Greendlinger [72] and [71], and Lyndon [104] and [103] then devised “small cancellation” conditions on a group presentation that guarantee that Dehn’s algorithm will succeed. Briefly said, they require the relators to have small enough overlaps. These conditions are purely combinatorial, and are described in § 2.3. Cannon and Thurston, on the other hand, sought a formalism that would encode the “periodicity of pictures” of a group’s Cayley graph. Treating the graph as a metric space with geodesic distance d , already seen in Chapter 23 (§ 3.4), they made the following definition: the cone type of g 2 G is Cg D ¹h 2 G j d.1; gh/ D d.1; g/ C d.g; gh/ºI

(2)

Laurent Bartholdi and Pedro V. Silva

874

the translate gCg is the set of vertices that may be connected to 1 by a geodesic passing through g. Their intuition is that the cone type of a vertex v remembers, for points near v , whether they are closer or further to the origin than v ; for example, Z2 with its standard generators has 9 cone types: the cone type of the origin (the whole plane), those of vertices on the axes (half-planes), and those of other vertices (quadrants). Thurston’s motivation was to get a good, algorithmic understanding of fundamental groups of 3-dimensional manifolds. They should be made of nilpotent (or, more generally, solvable) groups on the one hand, and “automatic” groups on the other hand. Definition 2.2. Let G D hAi be a finitely generated group, and recall that Az denotes the disjoint union A t A 1 of two copies of A. The word metric on G is the geodesic distance in G ’s Cayley graph C.G; A/. It may be defined directly as d.g; h/ D min¹n j g D hs1    sn with all si 2 Az º;

and is left-invariant: d.kg; kh/ D d.g; h/ for all k 2 G . The ball of radius n is the set BG;A .n/ D ¹g 2 G j d.1; g/ 6 nº:

The growth function of G is the function

G;A .n/ D Card BG;A .n/:

The growth series of G is the power series X X €G;A .z/ D z d.1;g/ D . G;A .n/ G;A .n g2G

n>0

1//z n D

X

G;A .n/z n .1

z/:

n>0

Growth functions are usually compared as follows: - ı if there is a constant C 2 N such that .n/ 6 ı.C n/ for all n 2 N; and  ı if - ı - . The equivalence class of G;A is independent of A. Cannon observed (in an unpublished 1981 manuscript; also see [44]) that, if a group has finitely many cone types, then its growth series satisfies a finite linear system and is therefore a rational function of z . For Jg , for instance, he computes €Jg ;A D

1 C 2z C    C 2z 2g 1 C z 2g : 1 C .2 4g/z C    C .2 4g/z 2g 1 C z 2g

This notion was formalised by Thurston in 1984 using automata, and is largely the topic of the next section. We will return to growth of groups in § 3.5; however, see [30] for a good example of growth series of groups computed thanks to a description of the Cayley graph by automata. Gromov emphasised the relevance to group theory of the following definition attributed to Margulis.

24. Groups defined by automata

875

Definition 2.3 ([86]). A map f W X ! Y between two metric spaces is a C -quasiisometry, for a constant C > 0, if one has C

1

d.x; x 0 /

C 6 d.f .x/; f .x 0 // 6 Cd.x; x 0 / C C

for all x; x 0 2 X;

and d.f .X /; y/ 6 C for all y 2 Y . A quasi-isometry is a C -quasi-isometry for some C > 0. Two spaces are quasi-isometric if there exists a quasi-isometry between them; this is an equivalence relation. A property of finitely generated groups is geometric if it only depends on the quasiisometry class of its Cayley graph. Thus, for instance, the inclusion Z ! R and the map R ! Z; x 7! bxc are quasiisometries. Being finite, having a finite-index subgroup isomorphic to Z, and being finitely presented are geometric properties. The asymptotics of the growth function is also a geometric invariant; thus, for instance, having growth function - n2 is a geometric property. 2.2. Automatic groups. Let G D hAi be a finitely generated group. We will consider the formal alphabet Ay D A t A 1 t ¹1º, where the identity 1 is treated as a “padding” symbol. Following the main reference [58] by Epstein et al.: Definition 2.4 ([58], [59], and [26]). The group G is automatic if there are finitestate automata L; M, the language and multiplication automata, with the following properties:

i. L is an automaton with alphabet Az; ii. M has alphabet Ay  Ay, and has an accepting subset Ts of states for each s 2 Ay; call Ms the automaton with accepting states Ts ;  iii. the language of L surjects onto G by the natural map W Az ! FA ! G ; words in L.L/ are called normal forms; iv. for any two normal forms u; v 2 L.L/, consider the word y ; w D .u1 ; v1 /.u2 ; v2 /    .un ; vn / 2 .Ay  A/

where n D max¹juj; jvjº and ui ; vj D 1 if i > juj; j > jvj. Then Ms accepts w if and only if .u/ D .vs/. In words, G is automatic if the automaton L singles out sufficiently many words which may be used to represent all group elements; and the automaton Ms recognises when two such singled-out words represent group elements differing by a generator. The pair .L; M/ is an automatic structure for G . We will give numerous examples of automatic groups in § 2.3. Here is a simple one that contains the main features: the group G D Z2 , with standard generators x; y .

Laurent Bartholdi and Pedro V. Silva

876

The language accepted by L is .x  [ .x

1 

/ /.y  [ .y

1 

/ /:

y x

1

x

y

y

y x

1

x

1

y

y

y

1 1

y

1

The multiplication automaton, in which states in Ts are labelled s , is y

.s; s/

.s; s/

.y 1/

.y; 1/

1

/

1 ; 1/

.1; y/

1

.y

;y

.s; s/

;x

.x

1

;y

.x

.1; y/

.y

/

.1; y

/

1 ; 1/

1

;x

.y

/

x

1 ; 1/

.x; 1/

1

.1; x/

.1; x

1/

.1; y

/ ;y

/

1

1 ; 1/

.y; 1/

;x

.1; y/

1

.x

1

;x

/

.y

.y

.y

1/

;y

.y; 1/

x

1/

.x

.1; y

.x

1

1

y

1

/

.s; s/

.s; s/

The definition we gave is purely automata-theoretic. It does, however, have a  more geometric counterpart. A word w 2 Az represents a path in the Cayley graph C.G; A/ starting at 1 and ending at .w/, in a natural way. If w D w1    wn , we write w.j / D w1    wj for the vertex of C.G; A/ reached after j steps; if j > n  then w.j / D w . For two paths u; v 2 Az , we say that they k -fellow-travel if d.u.j /; v.j // 6 k for all j 2 ¹1; : : : ; max¹juj; jvjºº. Proposition 2.1. A group G is automatic if and only if there exist a rational lan guage L  Az mapping onto G and a constant k such that for any u; v 2 L with d..u/; .v// 6 1 the paths u; v k -fellow-travel.

24. Groups defined by automata

877

Sketch of proof. First, assume that G has automatic structure .L; M/, and let c denote the number of states of M. If u; v 2 L.L/ satisfy .u/ D .vs/, let sj denote the state M is in after having read .u1 ; v1 /    .uj ; vj /. There is a path of length < c , in M, from sj to an accepting state (labelled s ); let its label be .p; q/. Then .u.j /p/ D .v.j /qs/, so u.j / and v.j / are at distance at most 2c 1 in C.G; A/. Conversely, assume that paths k -fellow-travel and that an automaton L with state set Q is given, with language surjecting onto G . Denote by B.k/ the set of group elements at distance 6 k from 1 in C.G; A/. Consider the automaton with state set Q  Q  B.k/. Its initial state is .; ; 1/, where  is the initial state of L; its alphabet is Ay  Ay, and its transitions are given by .p; q; g/  .s; t/ D .p  s; q  t; s 1 gt/ whenever these are defined. Its accepting set of states, for s 2 Ay, is Ts D Q  Q  ¹sº.

Corollary 2.2. If the finitely generated group G D hAi is automatic, and if B is another finite generating set for G , then there also exists an automatic structure for G using the alphabet B .

Sketch of proof. First, note that a trivial generator may be added or removed from A or B , using an appropriate finite transducer for the latter. There exists then M 2 N such that every a 2 Az can be written as a word wa 2 Bz  of length precisely M . Accept as normal forms all wa1    wan such that a1    an is a normal form in the original automatic structure L. The new normal forms constitute a homomorphic image of L and therefore define a rational language. If paths in L.L/ k -fellow-travel, then their images in the new structure will kM -fellow-travel. Note that the language of normal forms is only required to contain “enough” expressions; namely that the evaluation map L.L/ ! G is onto. We may assume that it is bijective, by the following lemma. The language L.L/ is then called a “rational crosssection” by Gilman [66]; and .L; M/ is called an automatic structure with uniqueness. Lemma 2.3. Let G be an automatic group. Then G admits an automatic structure with uniqueness. Sketch of proof. Consider .L0 ; M/ an automatic structure. Recall the “short-lex” ordering on words: u 6 v if juj < jvj, or if juj D jvj and u comes lexicographically before v . The language ¹.u; v/ 2 Ay   Ay  j u 6 vº is rational. The language  L D L.L0 / \ ¹u 2 Az j for all v 2 Ay  ; if .u; v/ 2 L.M1 / then u 6 vº

is then also rational, of the form L.L/. The automaton M need not be changed. Various notions related to automaticity have emerged, some stronger, some weaker.  One may require the words accepted by L to be representatives of minimal length; the automatic structure is then called geodesic. If the automatic structure also has uniqueness, then the growth series €G;A .z/ of G , which is the growth series of L, is a rational function. For all automatic groups, there is a constant K such that, for the language produced by Lemma 2.3, all words u 2 L.L/ satisfy juj 6 Kd.1; .u//.

878

Laurent Bartholdi and Pedro V. Silva

 The definition is asymmetric; more precisely, we have defined a right automatic group, in that the automaton M recognises multiplication on the right. One could similarly define left automatic groups; then a group is right automatic if and only if it is left automatic. Indeed, let .L; M/ be an automatic structure in which L recognises a rational cross section. Then the language L0 D ¹u 1 j u 2 L.L/º is again rational, and so is the language M 0 D ¹.v 1 ; u 1 / j .u; v/ 2 L.M/º. Indeed, since rational languages are closed under reversal and morphisms, it follows easily that L0 is rational. On the other hand, using the pumping lemma and the fact that group elements admit unique representatives in L.L/, the amount of padding at the end of word-pairs in L.M/ is bounded, and can be moved from the beginning to the end of the word-pairs in M 0 by a finite transducer. Therefore, L0 ; M 0 are the languages of a right automatic structure. However, one could require both properties simultaneously, namely, on top of an automatic structure, a third automaton N with subsets Us of its stateset for all s 2 Ay that accepts in a state in Us all pairs of normal forms .u; v/ with .u/ D .sv/. Such groups are called biautomatic. No example is known of a group that is automatic but not biautomatic.  One might also only keep the geometric notion of “combing”: a combing on a  group is a choice, for every g 2 G , of a word wg 2 Az evaluating to g, such that the words wg and wgs fellow-travel for all g 2 G , s 2 Az. In that sense, a group is automatic if and only if it admits a combing whose words form a rational language; see [33] for details. One may again require the combing lines to be geodesics, i.e., words of minimal length; see Hermiller’s work [91], [92], and [93]. One may also put weaker constraints on the combing; for example, require it to be an indexed language. Bridson and Gilman [34] proved that all geometries of threefolds, in particular the Nil (3) and Sol geometry, which are not automatic, fall in this framework.  Another relaxation is to allow the automaton M to read at will letters from the first or the second word; groups admitting such a structure are called asynchronously automatic. Among fundamental groups of threefolds, there is no difference between these definitions [34], but for more general groups there is.  Finally, Definition 2.4 can be adapted to define automatic semigroups. Properties from automatic groups that can be proved within the automata-theoretic framework can often be generalised to automatic semigroups, or at least monoids [43]. However, establishing an alternative geometric approach has proved to be a tough task, successful only in restricted cases [124] and [95].

2.3. Main examples of automatic groups. From the very definition, it is clear that finite groups are automatic: one chooses a word representing each group element, and these necessarily form a fellow-travelling rational language.

24. Groups defined by automata

879

It is also clear that Z is automatic: write t for the canonical generator of Z; the language t  [.t 1 / maps bijectively to Z; and the corresponding paths 1-fellow-travel. The automata are t

1

.s; s/

t

t

LW t

;

1

MW t

1

.1; t/ .t

1

1

; 1/

.t; 1/ .1; t

1

/

t

Simple constructions show that the direct and free products of automatic groups are again automatic. Finite extensions and finite-index subgroups of automatic groups are automatic. It is however still an open problem whether a direct factor of an automatic group is automatic ([62], Problem 6). Recall that we glued discs, one for each g 2 G and each r 2 R, to the Cayley graph of a finitely presented group G D hA j Ri, so as to obtain the Cayley 2-complex K. The small cancellation conditions express a combinatorial form of non-positive curvature of K: roughly, C.p/ means that every proper edge cycle in K has length > p , and T .q/ means that every proper edge cycle in the dual K_ has length > q ; see Chapter V of [104] for details. If G satisfies C.p/ and T .q/ with p 1 C q 1 6 21 then G is automatic. Consider the configuration space of n strings in R2  Œ0; 1, with string i starting at .i; 0; 0/ and ending at .i; 0; 1/; these configurations are viewed up to isotopy preserving the endpoints. They can be multiplied (by stacking them above each other) and inverted (by flipping them upside-down), yielding a group, the pure braid group; if the strings are allowed to end in an arbitrary permutation, one obtains the braid group. This group Bn is generated by elementary half-twists i of strings i; i C 1 around each other, and admits the presentation Bn D h1 ; : : : ; n

1

j i i C1 i D i C1 i i C1 ; and i j D j i whenever ji

j j > 2i:

More generally, consider a surface S of genus g, with n punctures and b boundary components. The mapping class group Mg;n;b is the group of homeomorphisms S ! S modulo isotopy, and Bn is the special case M0;n;1 of mapping classes of the n-punctured disc. All mapping class groups Mg;n;b are automatic groups [110]. As another generalisation of braid groups, consider Artin groups. Let m D .mij / be a symmetric n  n-matrix with entries in N [ ¹1º. The Artin group of type .mij / is the group with presentation A.m/ D hs1 ; : : : ; sn j si sj si    D sj si sj    (mij terms) whenever mij < 1i:

The corresponding Coxeter group has presentation C.m/ D hs1 ; : : : ; sn j si2 ; .si sj /mij whenever mij < 1i:

Laurent Bartholdi and Pedro V. Silva

880

An Artin group A.m/ has finite type if C.m/ is finite. Artin groups of finite type are biautomatic [49]. Coxeter groups are automatic [36]. Fundamental groups of threefolds, except those with a piece modelled on Nil or Sol geometry (See Chapter 12 of [58]), are automatic. 2.4. Properties of automatic groups. The property of being an automatic group has a variety of interesting consequences. First, automatic groups are finitely presented; more generally, combable groups are finitely presented: Proposition 2.4 ([4]). Let G be a combable group. Then G has type F1 . Namely, there exists a contractible cellular complex with free G -action and finitely many G -orbits of cells in each dimension. (Finite presentation is equivalent to “finitely many G -orbits of cells in dimension 6 2”). Sketch of proof. By assumption, G is finitely generated. Therefore, the Cayley graph contains one G -orbit of 0-cells (vertices), and Card A orbits of 1-cells (edges). Consider all pairs of paths u; v in the combing that have neighbouring extremities. They k -fellowtravel by hypothesis; so there are for all j paths w.j / of length 6 k connecting u.j / to v.j /. The closed paths u.j / v.j / v.j C 1/ u.j C 1/ u.j / have length 6 2k C 2, so they trace finitely many words in FA . Taking them as relators defines a finite presentation for G . The process may be continued with higher-dimensional cells. Proposition 2.5. Automatic groups satisfy a quadratic isoperimetric inequality; that is, for any finite presentation G D hA j Ri there is a constant k such that, if w 2 FA is a word evaluating to 1 in G , then wD

` Y

i D1

wi

ri

for some ri 2 R˙1 ; wi 2 FA and ` 6 kjwj2 :

Sketch of proof. Write n D jwj, and draw the combing lines between 1 and w.j /. There are n combing lines, which have length O.n/; so the gap between neighbouring combing lines can be filled by O.n/ relators. This gives O.n2 / relators in total. Note that being finitely presented is usually of little value as far as algorithmic questions are concerned: there are finitely presented groups whose word problem cannot be solved by a Turing machine [114] and [28]. By contrast: Proposition 2.6. The word problem in a group given by an automatic structure is solvable in quadratic time. A word may even be put into canonical form in quadratic time. Sketch of proof. We may assume, by Lemma 2.3, that every g 2 G admits a unique normal form. Now, given a word u D a1    an 2 Ay  , construct the following words: w0 2 L.L/ is the representative of 1. Treating Ma as a non-deterministic automaton in its second variable, find for i D 1; : : : ; n a word wi 2 Ay  such that the padding of .wi 1 ; wi / is accepted by Mai . Then .u/ D 1 2 G if and only if wn D w0 . Clearly the wi have linear length in i , so the total running time is quadratic in n.

24. Groups defined by automata

881

In general, finitely generated subgroups and quotients of automatic groups need not be automatic – they need not even be finitely presented. A subgroup H of a finitely generated group G D hAi is L-quasi-convex if there exists a constant ı such that every h 2 H is connected to 1 2 G by a geodesic in L that remains at distance 6 ı from H . Finite-index subgroups are L-quasi-convex; and passing to a L-quasi-convex subgroup preserves (bi)automaticity. On the other hand, a subgroup H of an automatic group G with language L.L/ is L-rational if the full preimage of H in L.L/ is rational. The following is easy but fundamental: Lemma 2.7 ([64]). A subgroup H of an automatic group is quasi-convex if and only if it is L-rational. It is still unknown whether automatic groups have solvable conjugacy problem; however, there are asynchronously automatic groups with unsolvable conjugacy problem, for instance appropriate amalgamated products of two free groups over finitely generated subgroups. These groups are asynchronously automatic (Theorem E in [26]), and have unsolvable conjugacy problem [107]. Theorem 2.8 (Gersten and Short). Biautomatic groups have solvable conjugacy problem.  Sketch of proof (see [63]). Consider two words x; y 2 Az . Using the biautomatic structure, the language

C.x; y/ D ¹.u; v/ 2 Ay   Ay  j u; v 2 L.L/ and .u/ D .xvy/º

is rational. Now x; y are conjugate if and only if C.x 1 ; y/ \ ¹.w; w/ j w 2 Lº is nonempty. The problem of deciding whether a rational language is empty is algorithmically solvable. In fact, the centraliser of any finite subset of a biautomatic group is itself biautomatic. It follows [111] that direct factors of biautomatic groups are biautomatic, while as we remarked above ([62], Problem 6) this is not known for automatic groups. 2.5. Word-hyperbolic groups. Gromov [87] introduced the fundamental concept of “negative curvature” to group theory. This goes further in the direction of viewing groups as metric spaces, through the geodesic distance on their Cayley graph. The definition is given for geodesic metric spaces, i.e., metric spaces in which any two points can be joined by a geodesic segment. Definition 2.5 ([65], [5], and [50]). Let X be a geodesic metric space, and let ı > 0 be given. The space X is ı -hyperbolic if, for any three points A; B; C 2 X and geodesic arcs a; b; c joining them, every P 2 a is at distance at most ı from b [ c . The space X is hyperbolic if it is ı -hyperbolic for some ı . The finitely generated group G D hAi is word-hyperbolic if it acts by isometries on a hyperbolic metric space X with discrete orbits, finite point stabilisers, and compact quotient X=G . Equivalently, G is word-hyperbolic if and only if C.G; A/ is hyperbolic.

882

Laurent Bartholdi and Pedro V. Silva

Gilman [67] gave a purely automata-theoretic definition of word-hyperbolic groups:  G is word-hyperbolic if and only if, for some regular combing M  Az , the language ¹u1v 1w j u; v; w 2 M; .uvw/ D 1º  Ay  is context-free. Using the geometric definition, we note immediately the following examples: first, the hyperbolic plane H2 is hyperbolic (with ı D log 3); so is Hn . Any discrete, cocompact group of isometries of Hn is word-hyperbolic. This applies in particular to the surface group Jg from (1), if g > 2. Note however that some word-hyperbolic groups are not small cancellation groups, for instance because for small cancellation groups the complex in Proposition 2.4 has trivial homology in dimension > 3; yet the complex associated with a cocompact group acting on Hn has infinite cyclic homology in degree n (see [61] for applications of topology to group theory). It is also possible to define ı -hyperbolicity for spaces X that are not geodesic (such as, e.g., a discrete group). The following is equivalent to Definition 2.5: Definition 2.6. Let X be a metric space, and let ı 0 > 0 be given. The space X is ı 0 -hyperbolic if, for any four points A; B; C; D 2 X , the numbers ¹d.A; B/ C d.C; D/; d.A; C / C d.B; D/; d.A; D/ C d.B; C /º

are such that the largest two differ by at most ı 0 .

Word-hyperbolic groups arise naturally in geometry, in the following way: let M be a compact Riemannian manifold with negative (not necessarily constant) sectional curvature. Then 1 .M/ is a word-hyperbolic group. Word-hyperbolic groups are also “generic” among finitely-presented groups, in the following sense: fix a number k of generators, and a constant  2 Œ0; 1. For large N , there are  .2k 1/N words of length 6 N in Fk ; choose a subset R of size  .2k 1/N of them uniformly at random, and consider the group G with presentation hA j Ri. Then, with probability ! 1 as N ! 1, the group G is word-hyperbolic. Furthermore, if  < 12 , then with probability ! 1 the group G is infinite, while if  > 21 , then with probability ! 1 the group G is trivial [115]. Word-hyperbolic groups provide us with a large number of examples of automatic groups. They actually enjoy stronger geometric and algorithmic properties: Theorem 2.9 (Gersten and Short; Gromov). Let G be a word-hyperbolic group. Then G is biautomatic. Moreover, the normal form L may be chosen to consist of geodesics. Sketch of proof. In a ı -word-hyperbolic group G , geodesics .2ı C 1/-fellow-travel. On the other hand, G has a finite number of cone types (2), so the language of geodesics is rational, recognised by an automaton with as many states as there are cone types. In fact, the automatic structure is, in some precise sense, unique [32]. Calegari and Fujiwara [42] show that, given two finite generating sets A; B for a word-hyperbolic group G , there exist an algebraic number  and a rational transduction f W Ay  ! By  that converts a geodesic p normal form for A into a geodesic normal form for B , such that jf .w/j jwj D O. jwj/.

24. Groups defined by automata

883

Hyperbolic spaces X have a natural hyperbolic boundary @X : fix a point x0 2 X , and consider quasi-geodesics at x0 , namely quasi-isometric embeddings W N ! X starting at x0 . Declare two such quasi-geodesics ; ı to be equivalent if ¹d. .n/; ı.n// j n 2 Nº is bounded. The set of equivalence classes, with its natural topology, is the boundary @X of X . The fundamental tool in studying hyperbolic spaces is the following: Lemma 2.10 (Morse). Let X be a hyperbolic space and let C be a constant. There is a constant D such that all C -quasi-geodesics between two points x; y 2 X are at distance at most D from one another. The hyperbolic boundary @X is compact, under appropriate conditions satisfied, e.g., by X D C.G; A/, and X [ @X is a compactification of X . Now, in that case, the automaton L provides a symbolic coding of @X as a finitely presented shift space (where the shift action is the “geodesic flow,” following one step along infinite paths in Ay ! representing quasi-geodesics). We note that, for word-hyperbolic groups, the word and conjugacy problem admit extremely efficient solutions: they are both solvable in linear time by a Turing machine. The word problem is actually solvable in real time, namely with a bounded amount of calculation allowed between inputs [96]. The isomorphism problem is decidable for word-hyperbolic groups, say given by a finite presentation [51]. Word-hyperbolic groups also satisfy a linear isoperimetric inequality, in the sense that every w 2 FA that evaluates to 1 in G is a product of O.jwj/ conjugates of relators. Even better, we have the following: Proposition 2.11. A finitely presented group is word-hyperbolic if and only if it satisfies a linear isoperimetric inequality, if and only if it satisfies a subquadratic isoperimetric inequality. The generalised word problem is sometimes unsolvable [117], but the order problem is solvable in word-hyperbolic groups [29]. It follows that the generalised word problem is unsolvable for automatic groups as well. There are important weakenings of the definition of word-hyperbolic groups; we mention two. A bicombing is a choice, for every pair of vertices g; h 2 C.G; A/, of a path `g;h from g to h. Since G acts by left-translation on C.G; A/, it also acts on bicombings. A bicombing satisfies the k -fellow-traveller property if for any neighbours x 0 of x and y 0 of y , the paths `x;y and `x 0 ;y 0 k -fellow-travel. A semi-hyperbolic group is a group admitting an invariant bicombing by fellowtravelling words. See [35], or the older paper [6]. In particular, biautomatic, and therefore word-hyperbolic, groups are semi-hyperbolic. Semi-hyperbolic groups are finitely presented and have solvable word and conjugacy problems. In fact, they even have the “monotone conjugation property,” namely, if g and h are conjugate, then there exists a word w with g.w/ D h and jg .w.i // j 6 max¹jgj; jhjº for all i 2 ¹0; : : : ; jwjº. A group G is relatively hyperbolic [60] if it acts properly discontinuously on a hyperbolic space X , preserving a family H of separated horoballs, such that .X nH/=G

884

Laurent Bartholdi and Pedro V. Silva

is compact. All fundamental groups of finite-volume negatively curved manifolds are relatively hyperbolic. A non-closed manifold has “cusps,” going off to infinity, whose interpretation in the fundamental group are conjugacy classes of loops that may be homotoped arbitrarily far into the cusp. Farb [60] captures combinatorially the notion of relative hyperbolicity as follows: let H be a family of subgroups of a finitely generated group G D hAi. Modify the Cayley graph of G as follows: for each coset gH of a subgroup H 2 H, add a vertex gH , and connect it by an edge to every gh 2 C.G; A/, for all h 2 H . In addition, require that every edge in C.G; A/ belong to only finitely many simple loops of any given length. The group G is weakly relatively hyperbolic, relative to the family H, if that modified Cayley graph C.G; A/ is a hyperbolic metric space. By virtue of its geometric characterisation, being word-hyperbolic is a geometric property in the sense of Definition 2.3 (though beware that being a hyperbolic metric space is preserved under quasi-isometry only among geodesic metric spaces). Being combable and being bicombable are also geometric. We finally remark that a notion of word-hyperbolicity has been defined for semigroups [94] and [56]; the definition uses context-free languages. As for automatic (semi)groups, the theory does not seem uniform enough to warrant a simultaneous treatment of groups and semigroups; again, there is no clear geometric counterpart to the definition of word-hyperbolic semigroups – except in particular cases, such as monoids defined through special confluent rewriting systems [47].

2 2

2.6. Non-automatic groups. All known examples of non-automatic groups arise as groups violating some interesting consequence of automaticity. First, infinitely presented groups cannot be automatic. There are uncountably many finitely generated groups, and only countably many finitely presented groups; therefore automatic groups are as rare as the rationals among the real numbers. Groups with unsolvable word problem cannot be automatic. If a nilpotent group is automatic, then it contains an abelian subgroup of finite index [68]; therefore, for instance, the discrete Heisenberg group 0 1 1 Z Z G D @0 1 ZA D hx; y j Œx; Œx; y; Œy; Œx; yi (3) 0 0 1

is not automatic. Note also that G satisfies a cubic, but no quadratic, isoperimetric inequality. Many solvable groups have larger-than-quadratic isoperimetric functions; they therefore cannot be automatic [88]. This applies in particular to the Baumslag–Solitar groups BS1;n D ha; t j an D at i: (4)

Similarly, SLn .Z/, for n > 3, or SLn .O/ for n > 2, where O are the integers in an imaginary number field, are not automatic.

24. Groups defined by automata

885

Infinite, finitely generated torsion groups cannot be automatic: they cannot admit a rational normal form, because of the pumping lemma. We will see examples, due to Grigorchuk and Gupta–Sidki, in § 3.1. There are combable groups that are not automatic [31], for instance G D hai ; bi ; ti ; s j t1 a1 D t2 a2 ; Œai ; s D Œai ; ti  D Œbi ; s D Œbi ; ti  D 1 .i D 1; 2/i;

which satisfies only a cubic isoperimetric inequality. Finitely presented subgroups of automatic groups need not be automatic [25]. The following group is asynchronously automatic, but is not automatic: it does not satisfy a quadratic isoperimetric inequality, see §11 of [26]: G D ha; b; t; u j at D ab; b t D a; au D ab; b u D ai:

3. Groups generated by automata We now turn to another important class of groups related to finite-state automata. These groups act by permutations on a set A of words, and these permutations are represented by Mealy automata. These are deterministic, initial finite-state transducers M with input and output alphabet A, that are complete with respect to input; in other words, at every state and for each a 2 A, (5) there is a unique outgoing edge with input a. The automaton M defines a transformation of A , which extends to a transformation of A! , as follows. Given w D a1 a2    2 A [ A! , there is by (5) a unique path in M starting at the initial state and with input labels w . The image of w under the transformation is the output label along that same path. Definition 3.1. A map f W A ! A is automatic if f is produced by a finite-state automaton as above. One may forget the initial state of M, and consider the set of all transformations corresponding to all choices of initial state of M; the semigroup of the automaton S.M/ is the semigroup generated by all these transformations. It is closely connected to Krohn-Rhodes theory [101]. Its relevance to group theory was seen during Glushkov’s seminar on automata [70]. The automaton M is invertible if furthermore it is complete with respect to output; namely, at every state and for each a 2 A, (6) there is a unique outgoing edge with output a; the corresponding transformation of A [ A! is then invertible. The set of such permutations, for all choices of initial state, generate the group of the automaton G.M/. Note that S.M/ may be a proper subsemigroup of G.M/, even if M is invertible. General references on groups generated by automata are [112], [79], and [17].

Laurent Bartholdi and Pedro V. Silva

886

As our first, fundamental example, consider the automaton with alphabet A D ¹0; 1º 0j0 TW 1j0

t

0j1

(7)

1

1j1

in which the input i and output o of an edge are represented as ‘i jo’. The transformation associated with state 1 is clearly the identity transformation, since any path starting from 1 is a loop with same input and output. Now consider the transformation t . One has, for instance, t  111001 D 000101, with the path consisting of three loops at t , the edge to 1, and two loops at 1. We have G.T/ D hti, and we will see in § 3.7 that it is infinite cyclic. Lemma 3.1. The product of two automatic transformations is automatic. The inverse of an invertible automatic transformation is automatic. The proof becomes transparent once we introduce some good notation. If in an automaton M there is a transition from state q to state r , with input i and output o, we write q  i D o  r: (8) In effect, if the state set of M is Q, we are encoding M by a function W Q  A ! A  Q given by .q; i / D .o;r/. It then follows from (5) that, given q 2 Q and v D a1    an 2 A, there are unique w D b1    bn 2 A ; r 2 Q such that q  a1    an D b1    bn  r . The image of v under the transformation q is w . We have in fact naturally extended the function  to a function W Q  A ! A  Q.

Proof of Lemma 3.1. Given M; N initial automata with state sets Q; R respectively, consider the automaton MN with state set Q  R and transitions defined by .q; r/  i D q  .r  i / D q  .o0  r 0 / D .q  o0 /  r 0 D .o  q 0 /  r 0 D o  .q 0 ; r 0 /:

If q0 ; r0 are the initial states of M; N, then the transformation q0 ır0 is the transformation corresponding to state .q0 ; r0 / in MN. Similarly, if q0 induces an invertible transformation, consider the automaton M 1 with state set ¹q 1 j q 2 Qº and transitions defined by q 1  o D i  r 1 whenever (8) holds. The transformation induced by q0 1 is the inverse of q0 . This construction applies naturally to any composition of finitely many automatic transformations. In case they all arise from the same machine M, we de facto extend the function  describing M to a function W Q A ! A Q , and (if M is invertible) to a function W FQ A ! A FQ . It projects to a function W S.M/A ! A S.M/ and, if M is invertible, to a function W G.M/  A ! A  G.M/. Note that a function W G.M/  A ! A  G.M/ as above naturally gives a function, still written W G.M/ ! G.M/A Ì Sym.A/; this is the semidirect product of functions

24. Groups defined by automata

887

A ! G.M/ by the symmetric group of A (acting by permutation of coordinates), and is commonly called the wreath product G.M/ o Sym.A/; also see Chapter 16. This wreath product decomposition also inspires a convenient description of the function  by a matrix embedding; the size and shape of the matrix is determined by the permutation of A, and the nonzero entries by the elements in G.M/A ; more precisely, assume A D ¹1; : : : ; d º, and, for .q/ D ..s1 ; : : : ; sd /; / 2 G.M/A Ì Sym.A/, write  0 .q/ D the permutation matrix with si at position .i; i /. Then these matrices multiply as wreath product elements. More algebraically, we have defined a homomorphism  0 W kG ! Md .kG/, where kG is the group ring of G D G.M/ over the field k. Such an embedding defines an algebra acting on the linear span of A ; this algebra has important properties, studied in [120] for Gupta–Sidki’s example and in [12] for Grigorchuk’s example. The action of g 2 G.M/ may be described as follows: given a sequence u D a1 : : : an , compute .g; u/ D .w; h/. Then g.u/ D w ; and for all v 2 A we have g.uv/ D w h.v/; that is, the action of g on sequences starting by u is defined by an element h 2 G.M/ acting on the tail of the sequence. More geometrically, we can picture A as an infinite tree. The action of g carries the subtree uA to wA , and, within uA naturally identified with A , acts by the element h. For that reason, G.M/ is called a self-similar group. The formalism expressing a Mealy machine as a map W Q  A ! A  Q is completely symmetric with respect to A and Q; the dual of the automaton M is the automaton M_ with state set A, alphabet Q, and transitions given by i  q D r  o whenever (8) holds. For example, the dual of (7) is tj1 T _ W 1j1

0

1

1j1

(9)

tjt

In case the dual M_ of the automaton M is itself an invertible automaton, M is called reversible. If M, M_ and .M 1 /_ are all invertible, then M is bireversible; it then has eight associated automata, obtained through all combinations of . / 1 and . /_ . Note that M_ naturally encodes the action of S.M/ on A: it is a graph with vertex set A, and an edge, with (input) label q , from a to q.a/. More generally, .Mn /_ encodes the action of S.M/ on the set An of words of length n. More generally, we will consider subgroups of G.M/, namely subgroups generated by a subset of the states of an automaton; we call these groups automata groups. This is a large class of groups, which contains in particular finitely generated linear groups, see Theorem 3.2 below or [37]. The elements of automata groups are, strictly speaking, automatic permutations of A . It is often convenient to identify them with a corresponding automaton, for instance constructed as a power of the original Mealy automaton (keeping in mind the construction for the composition of automatic transformations), with appropriate initial state.

888

Laurent Bartholdi and Pedro V. Silva

Theorem 3.2 (Brunner and Sidki). The affine group Zn Ì GLn .Z/ is an automata group for all n 2 N. This will be proven in more generality in § 3.7. We mention some closure properties of automata groups. Clearly a direct product of automata groups is an automata group (take the direct product of the alphabets). A more subtle operation, called tree-wreathing in [38] and [123], gives wreath products with Z. A more general class of groups has also been considered, and is relevant to § 3.6: functionally recursive groups. Let A denote a finite alphabet, Q a finite set, and F D FQ the free group on Q. The “automaton” now is given by a set of rules of the form qa Dbr

for all q 2 Q; a 2 A, where we now require b 2 A and r 2 F . In effect, in the dual M_ we are allowing arbitrary words over Q as output symbols. 3.1. Main examples. Automata groups gained significance when elementary examples of finitely generated, infinite torsion groups, and groups of intermediate wordgrowth, were discovered. Alëshin [3] studied the automaton (11), and showed that hA; Bi is an infinite torsion group. Another of his examples is described in § 3.8. Grigorchuk in [73], [77], [75], [76], and [74] simplified Alëshin’s example as follows: let A be obtained from the Alëshin automaton by removing the gray states; the state set of A is ¹1; a; b; c; d º. He showed that G.A/, which is known as the Grigorchuk group, is an infinite torsion group; see Theorem 3.9. In fact, G.A/ and hA; Bi have isomorphic finite-index subgroups. For all prime p , Gupta and Sidki in [90] and [89] constructed an infinite p -torsion group; their construction, for p D 3, is the automata group G.G/ generated by the automaton (12). All invertible automata with at most three states and two alphabet letters have been listed in [27]; here are some important examples. The affine group BS1;3 D ¹z 7 ! 3p z C q=3r j p; q; r 2 Zº;

introduced in (4), is a linear group, and an automata group by Theorem 3.16; also see [23]. It is generated by the automaton (13). As another important example, consider the lamplighter group n

G D .Z=2/.Z/ Ì Z D ha; t j a2 ; Œa; at  for all n 2 Zi:

It is an automata group [82], embedded as the set of maps ¹z 7! .t C 1/p z C q j p 2 Z; q 2 F2 Œt C 1; .t C 1/

1

º

(10)

24. Groups defined by automata

889

in the affine group of F2 ŒŒt. It is generated by the automaton L in (14). 1j1 1j1

1j1

c

b 0j0

d A

0j0

0j0

0j1; 1j0

a

(11)

i ji 1j1

1

0j0 0j1 1j0

0j0 B

A

1j1

a 1j1

0j0 i ji C 1 GW 2j2

t

i ji

1j1 a

1

2j2

(12)

1j1

(13)

1 0j0

1

3z C 1

3z

t

1j0

1j1 0j0

i ji

1

0j1

3z C 2 0j0

1j1 LW 0j0

.t C 1/z

.t C 1/z C 1 0j1

1j0

(14)

Laurent Bartholdi and Pedro V. Silva

890

The Basilica group (see [83] and [24]) will appear again in § 3.6. It is generated by the automaton (15). a 0j1 BW 1j1

1j0

1

0j0; 1j1

(15)

0j0 b

There are (unpublished) lists by Sushchansky et al. of all (not necessarily invertible) automata with 6 3 states, on a binary alphabet; there are more than 2000 such automata; the invertible ones are listed in [27]. How about groups that are not automata groups? Groups with unsolvable word problem (or more generally whose word problem cannot be solved in linear space; see § 3.2), and groups that are not residually finite – or more generally that are not residually (finite with composition factors of bounded order) – are among the simplest examples. In fact, it is difficult to come up with any other ones. 3.2. Decision problems. One virtue of automata groups is that elements may easily be compared, since (Mealy) automata admit a unique minimised form, which furthermore may efficiently be computed in time O.Card A Card Q log Card Q/; see [97] and [100]. Proposition 3.3. Let G be an automata group. Then the word problem is solvable in G , in linear space, and therefore in at worst exponential time. Proof. We content ourselves to prove that the word problem is solvable in exponential time. Let Q be a generating set for G , and for each q 2 Q compute the minimal automaton Mq representing q . Let C be an upper bound for the number of states of any Mq . Now given a word w D q1    qn 2 .Q t Q 1 / , multiply the automata Mq1 ; : : : ; Mqn . The result is an automaton with 6 C n states. Then w is trivial if and only if all states to which the initial state leads have identical input and output symbols. It is unknown if the conjugacy or generalised word problem are solvable in general; though this is known in particular cases, such as the Grigorchuk group G.A/; see [118], [102], and [80]. The conjugacy problem is solvable as soon as G.A/ is conjugacy separable, namely, for g; h non-conjugate in G.A/ there exists a finite quotient of G.A/ in which their images are non-conjugate. Indeed automata groups are recursively presented and residually finite. It is also unknown whether the order problem is solvable in arbitrary automata groups; but this is known for particular cases, such as bounded automata groups; see § 3.3.

24. Groups defined by automata

891

Nekrashevych’s limit space (see Theorem 3.15) may sometimes be used to prove that two contracting, self-similar groups are non-isomorphic: by [81], some groups admit essentially only one weakly branch self-similar action (see Definition 3.4); if the group is also contracting, then its limit space is an isomorphism invariant. On the other hand, in the more general class of functionally recursive groups, the wold problem may be unsolvable [19]. 3.3. Bounded and contracting automata. As we saw in § 3.2, it may be useful to note, and use, additional properties of automata groups. Definition 3.2. An automaton M is bounded if the function which to n 2 N associates the number of paths of length n in M that do not end at the identity state is a bounded function. A group is bounded if its elements are bounded automata; or, equivalently, if it is generated by bounded automata. More generally, Sidki considered automata for which the above function is bounded by a polynomial; see [121]. He showed in [122] that such groups cannot contain nonabelian free subgroups. Definition 3.3. An automaton M is nuclear if the set of recurrent states of MM spans an automaton isomorphic to M; and, for invertible M, if additionally M D M 1 . Recall that a state is recurrent if it is the endpoint of arbitrarily long paths. An invertible automaton M is contracting if G.M/ D G.N/ for a (necessarily unique) nuclear automaton N. The nucleus of G.M/ is then N. For example, the automata (11,12) are nuclear; the automata (7,15) are contracting, with nucleus ¹1; t; t 1 º and ¹1; a˙1 ; b ˙1 ; b 1 a; a 1 bº; the automaton (14) is not contracting. If M is contracting then for every g 2 G.M/ there is a constant K such that (in the automaton describing g) all paths of length > K end at a state in the nucleus of M. This also implies that there are constants L; m and  < 1 such that, for the word metric k  k on G.M/, whenever one has g  a1    am D b1    bm  h with h; g 2 G.M/, one has khk 6 kgk C L.

Proposition 3.4 (Theorem 3.9.12 in [112]). Finitely generated bounded groups are contracting.

Consider the following graph X.M/: its vertex set is A . It has two kinds of edges, vertical and horizontal. There is a vertical edge .u; ua/ for all u 2 A ; a 2 A, and a horizontal edge .u; q.u// for every u 2 A ; q 2 Q. Note that the horizontal and vertical edges form squares labelled as in (8), and that the horizontal edges form the Schreier graphs of the action of G.M/ on An . Proposition 3.5 (Theorem 3.8.6 in [112]). If G.M/ is contracting then X.M/ is a hyperbolic graph in the sense of Definition 2.5. Discrete groups may be broadly separated in two classes: amenable and nonamenable groups. A group G is amenable if it admits a normalised, invariant mean,

892

Laurent Bartholdi and Pedro V. Silva

that is, a map mW P.G/ ! Œ0; 1 with m.A t B/ D m.A/ C m.B/, m.G/ D 1 and m.gA/ D m.A/ for all g 2 G and A; B  G . All finite and abelian groups are amenable; so are groups of subexponential word-growth (see § 3.5). Extensions, quotients, subgroups, and directed unions of amenable groups are amenable. On the other hand, non-abelian free groups are non-amenable. In understanding the frontier between amenable and non-amenable groups, the Basilica group G.B/ stands out as an important example: Bartholdi and Virág proved that it is amenable [24], but its amenability cannot be decided by the criteria of the previous paragraph. We now briefly indicate the core of the argument. The matrix embedding  0 W kG ! Md .kG/ associated with a self-similar group (see page 887) extends to a map  0 W `1 .G/ ! Md .`1 .G// on measures on G . A measure  gives rise to a random walk on G , with one-step transition probability p1 .x; y/ D .xy 1 /. On the other hand,  0 ./ naturally defines a random walk on G  X ; treating the second variable as an “internal degree of freedom,” one may sample the random walk on G  X each time it hits G  ¹x0 º for a fixed x0 2 X . In favourable cases, the corresponding random walk on G is self-similar: it is a convex combination of 1 and . One may then deduce that its “asymptotic entropy” vanishes, and therefore that G is amenable. This strategy works in the following cases: Theorem 3.6 (Bartholdi, Kaimanovich, and Nekrashevych [18]). Bounded groups are amenable. Theorem 3.7 (Amir, Angel, and Virág [7]). Automata of linear growth generate amenable groups. Nekrashevych conjectures that contracting automata always generate amenable groups, and proves: Proposition 3.8 (Nekrashevych [113]). A contracting self-similar group cannot contain a non-abelian free subgroup. We turn to the original claim to fame of automata groups: Theorem 3.9 (Alëshin and Grigorchuk [3] and [73]; Gupta and Sidki [90]). The Grigorchuk group G.A/ and the Gupta–Sidki group G.G/ are infinite, finitely generated torsion groups. Sketch of proof. To see that these groups G are infinite, consider their action on A , the stabiliser H of 0 2 A  A , and the restriction  of the action of H to 0A . This defines a homomorphism W H ! Sym.0A / Š Sym.A /, which is in fact onto G . Therefore G possesses a proper subgroup mapping onto G , so is infinite. To see that these groups are torsion, proceed by induction on the word-length of an element g 2 G . The initial cases a2 D b 2 D c 2 D d 2 D 1, respectively a3 D t 3 D 1, are easily checked. Now consider again the action of g on A  A . If g fixes A, then its actions on the subsets iA are again defined by elements of G , which are shorter by the contraction property; so have finite order. It follows that g itself has finite order.

24. Groups defined by automata

893

If, on the other hand, g does not fix A, then gCard A fixes A; the action of gCard A on iA is defined by an element of G , of length at most the length of g; and (by an argument that we skip) smaller in the induction order than g; so gCard A is torsion and so is g. Contracting groups have recursive presentations (meaning the relators R of the presentation form a recursive subset of FQ ); in favourable cases, such as branch groups [10], the set of relators is the set of iterates, under an endomorphism of FQ , of a finite subset of FQ . For example [105], Grigorchuk’s group satisfies G.A/ D ha; b; c; d j  n .bcd /;  n .a2 /;  n .Œd; d a /;  n .Œd; d Œa;ca / for all n 2 Ni;

where  is the endomorphism of F¹a;b;c;d º W a 7 ! aca;

b 7 ! d 7 ! c 7 ! b:

(16)

A similar statement holds for the Basilica group (15): p

2p

G.B/ D ha; b j Œap ; .ap /b ; Œb p ; .b p /a  for all p D 2n iI

here the endomorphism is W a 7! b 7! a2 .

3.4. Branch groups. Some of the most-studied examples of automata groups are branch groups; see [78] or the survey [17]. We will define for convenience a strictly smaller class: Definition 3.4. An automata group G.M/ is regular weakly branch if it acts transitively on An for all n, and if there exists a nontrivial subgroup K of G.M/ such that, for all u 2 A and all k 2 K , the permutation ´ u k.v/ if w D uv; w7 ! w otherwise belongs to G.M/. The group G.M/ is regular branch if furthermore K has finite index in G.M/. If we view A as an infinite tree, a regular branch group G contains a rich supply of tree automorphisms in two manners: enough automorphisms to map any vertex to any other of the same depth; and, for any disjoint subtrees u1 A ; : : : ; un A of A and for (up to finite index) any elements g1 ; : : : ; gn of G , an automorphism of A acting as gi under each ui A . In particular, if G is a regular branch group, then G and G      G , with Card A factors, have isomorphic finite-index subgroups (they are commensurable; see (5)). Proposition 3.10. The Grigorchuk group G.A/ and the Gupta–Sidki group G.G/ are regular branch; the Basilica group G.B/ is regular weakly branch. Sketch of proof. For G D G.A/, first note that G acts transitively on A; since the stabiliser of 0 acts as G on 0A , by induction G acts transitively on An for all n 2 N.

894

Laurent Bartholdi and Pedro V. Silva

Now define x D Œa; b and K D hhxii. Consider the endomorphism  defined in (16), and note that .x/ D Œaca; d  D Œx 1 ; d  2 K using the relation .ad /4 D 1, so  restricts to an endomorphism K ! K , such that .k/ acts as k on 1A and fixes 0A . Similarly,  n .k/ acts as k on 1n A , so Definition 3.4 is fulfilled for u D 1n . Since G acts transitively on An , the definition is also fulfilled for other u 2 An . Finally, a direct computation shows that K has index 16 in G . The other groups G.G/ and G.B/ are handled similarly; for them, one takes K D ŒG; G. Various consequences may be derived from a group being a branch group; in particular, Theorem 3.11 (Abért [1]). A weakly branch group satisfies no identity; that is, if G is a weakly branch group, then for every nontrivial word w D w.x1 ; : : : ; xk / 2 Fk , there are a1 ; : : : ; ak 2 G such that w.a1 ; : : : ; ak / ¤ 1. 3.5. Growth of groups. An important geometric invariant of a finitely generated group is the asymptotic behaviour of its growth function G;A .n/. Finite groups, of course, have a bounded growth function. If G has a finite-index nilpotent subgroup, then

G;A .n/ is bounded by a polynomial, and one says G has polynomial growth; the converse is true [84]. On the other hand, if G contains a free subgroup, for example if G is wordhyperbolic and is not a finite extension of Z, then G;A is bounded from above and below by exponential functions, and one says that G has exponential growth. By a result of Milnor and Wolf in [133] and [108], if G has a solvable subgroup of finite index then G has either polynomial or exponential growth. The same conclusion holds, by Tits’ alternative [127], if G is linear, and by the remark after Lemma 2.3 if G is automatic. Milnor [109] asked whether there exist groups with growth strictly between polynomial and exponential. Theorem 3.12 (Grigorchuk [74]). The Grigorchuk group G.A/ has intermediate growth. In fact, its growth function satisfies the following estimates: ˛

ˇ

e n - G;S .n/ - e n ;

with ˛ D 0:515 and ˇ D log.2/= log.2=/  0:767, for   0:811 the real root of the polynomial X 3 C X 2 C X 2. Sketch of proof (see [8] and [9]). We content ourselves with proving the lower bound 0:5 e n . Recall from (16) that G admits an endomorphism  such that .g/ acts as g on 1A and as an element of the finite dihedral group D4 D ha; d i on 0A . Given g0 ; g1 2 G of length 6 N , the element g D a.g0 /a.g1 / has length 6 4N , and acts (up to an element of D4 ) as gi on iA for i D 0; 1. It follows that g essentially (i.e., up to 8 choices) determines g0 ; g1 , and therefore that G;S .4N / > . G;S .N /=8/2 . The lower bound follows easily.

24. Groups defined by automata

895

On the other hand, the Grigorchuk group G satisfies a stronger property than contraction; namely, for a well-chosen metric (which is equivalent to the word metric), one has that if g 2 G acts as gi 2 G on iA , then kg0 k C kg1 k 6 .kgk C 1/;

(17)

with  the constant above. Then, with every g 2 G one associates a description by a finite, labelled binary tree .g/. If kgk 6 1=.1 /, its description is a one-vertex tree with g at its unique leaf. Otherwise, let i 2 ¹0; 1º be such that gai fixes A, and write g0 ; g1 the elements of G defined by the actions of gai on 0A ; 1A respectively. Construct recursively the descriptions .g0 /; .g1 /. Then the description of g is a tree with i at its root, and two descendants .g0 /; .g1 /. By (17), the tree .g/ has at most kgkˇ leaves, and .g/ determines g. There are exponentially many trees with a given number of leaves, and the upper bound follows. Among groups of exponential growth, Gromov asked the following question [85]: is there a group G of exponential growth, namely such that lim G;S .n/1=n > 1 for all finite generating sets S but such that inf hSiDG lim G;S .n/1=n D 1? Such examples, called groups of non-uniform exponential growth, were first found by Wilson [131]; see [11] and [15] for a simplification. Both constructions are heavily based on groups generated by automata. It is known that essentially any function growing faster than n2 may be, asymptotically, the growth function p of a semigroup. Notably, very small automata generate semigroups of growth  e n , and of polynomial growth of irrational degree [21] and [22]. However, it is not known whether there exist groups whose growth function p n is strictly between polynomial and e . There is a gap in the spectrum of growth func1=100 tions: no group with growth strictly between polynomials and n.log n/ exists [119]. The largest class of known growth functions is the following: Theorem 3.13 (Bartholdi and Erschler [13] and [14]). Let  Š 0:811 be the positive root of X 3 C X 2 C X 2 as above. Let f W N ! N satisfy f .2n/ 6 f .n/2 6 f .b2n=c/

for all n  0:

Then there exists a group with growth function  f .

3.6. Dynamics and subdivision rules. In this subsection, we show how automata naturally arise from geometric or topological situations. As a first step, we will obtain a functionally recursive action; in favourable cases it will be encoded by an automaton. We must first adopt a slightly more abstract point of view on functionally recursive groups: Definition 3.5. A group G is self-similar if it is endowed with a self-similarity biset, that is, a set B with commuting left and right actions such that B is free qua right G -set.

Laurent Bartholdi and Pedro V. Silva

896

The fundamental example is G D G.M/ and B D A  G , with actions g  .a; h/ D .b; kh/

if .g; a/ D .b; k/;

.a; g/  h D .a; gh/:

Conversely, given a self-similar group G , choose a basis A of its biset, i.e., choose an isomorphism of right G -sets B D A  G ; then define .g; a/ D .b; k/ whenever we have g  .a; 1/ D .b; k/ in B. This vindicates the notation (8). Two bisets B; B0 are isomorphic if there is a map 'W B ! B0 with g'.b/h D '.gbh/ for all g; h 2 G; b 2 B. They are equivalent if there is a map 'W B ! B0 and an automorphism W G ! G with .g/'.b/.h/ D '.gbh/. Now consider a topological space X and a branched covering f W X ! X ; this means that there is an open dense subspace X0  X such that f W f 1 .X S0 / ! X0 is a covering. The subset C D X n f 1 .X0 / is the branch locus, and P D n>1 f n .C/ is the post-critical locus. Write  D X n P, and choose a basepoint  2 . Two coverings .f; Pf / and .g; Pg / are combinatorially equivalent if there exists a path gt through branched coverings, with g0 D f; g1 D g, such that the post-critical locus of gt varies continuously along the path. We define a self-similarity biset for G D 1 .; / as Bf D ¹homotopy classes of paths W Œ0; 1 !  j .0/ D f . .1// D º:

The right action of G prepends a loop at  to ; the left action appends the unique f -lift of the loop that starts at .1/ to . A choice of basis for B amounts to choosing, for each x 2 f 1 ./, a path ax   from  to x . Set A D ¹ax j x 2 f 1 ./º. Now, for g 2 G , and ax 2 A, consider a path starting at x such that f ı D g; such a path is unique up to homotopy, by the covering property of f . The path ends at some y 2 f 1 ./. Then set .g; ax / D .ay ; ay 1 ax /;

where we write concatenation of paths in reverse order, that is, ı is first ı , then . y , with branched covering f .z/ D z 2 1. For example, consider the sphere X D C Its post-critical locus is P D ¹0; 1; 1º. A direct calculation (see, e.g., [16]) gives that its biset is the automaton (15); the relevant paths are shown here: f f

1

.a/

1

.b/

ax0 x1 

1

0

a

x0 f

b f

1

.b/

1

.a/

24. Groups defined by automata

897

Branched self-coverings are encoded by self-similar groups in the following sense: Theorem 3.14 (Nekrashevych [112], Theorem 6.5.2, and Kameyama [99]). Let f; g be branched coverings. Then f; g are combinatorially equivalent if and only if the bisets Bf ; Bg are equivalent. This result has been used to answer a long-standing open problem in complex dynamics [20], the “twisted rabbit problem.” If, furthermore, G is finitely generated and the map f expands a length metric, then the associated biset may be defined by a contracting automaton. This is, in particular, y. the case for all rational maps acting on the sphere C Definition 3.6. Let f W X ! X be a branched self-covering. The iterated monodromy group of f is the automata group G.f / D G.M/, where M is an automaton describing the biset Bf . If G D G.M/ is a contracting self-similar group, consider the hyperbolic boundary J D @X.M/, called the limit space of G . It admits an expanding self-covering map sW J ! J, induced on vertices by the shift map s.au/ D u. Theorem 3.15 (Theorems 5.2.6 and 5.4.3 in [112]). The groups G.s/ and G.M/ are isomorphic. Conversely, suppose f is an expanding self-covering, with Julia set J D the accumulation points of iterated preimages of a generic point. Then .J; f / and .J; s/ are homeomorphic and topologically conjugate. For instance, the Julia set of the Basilica map f .z/ D z 2 1 is depicted above. Appropriately scaled and metrised, the Schreier graphs of the action of G.M/ on X n converge to J. The first appearance of encodings of branched coverings by automata seems to be the “finite subdivision rules” by Cannon, Floyd and Parry [45]; they wanted to know when a branched covering of the sphere may be realised as a conformal map. In their work, a finite subdivision rule is given by a finite subdivision of the sphere, a refinement of it, and a covering map from the refinement to the original subdivision; by iteration, one obtains finer and finer subdivisions of the sphere. The combinatorial information involved is essentially equivalent to a self-similarity biset. Contraction of G.M/ and combinatorial versions of expansion have been related in [46]. 3.7. Reversible actions. Recall that an automaton M is reversible if its dual M_ is invertible. In other words, if g 2 G.M/, the action of g is determined by the action on any subset uA , for u 2 A . We have already seen some examples of reversible automata, notably (13,14). This last example generalises as follows: consider a finite group G , and set A D Q D G . Define an automaton CG , the “Cayley automaton” of G , by .q; a/ D .qa; qa/. This automaton seems to have first been considered in [101], p. 358. The automaton L

898

Laurent Bartholdi and Pedro V. Silva

in (14) is the special case G D Z=2Z. The inverse of the automaton CG is a reset machine, in that the target of a transition depends only on the input, not on the source state. Silva and Steinberg [125] proved that, if G is abelian, then G.CG / D G o Z. A large class of reversible automata is covered by the following construction. Let R be a ring, let M be an R-module, and let N be a submodule of M , with M=N finite. Let 'W N ! M be an R-module homomorphism. Define a decreasing sequence of submodules Mi of M by M0 D M and MnC1 D ' 1 .Mn /, and let EndR .M; '/ denote the algebra of R-endomorphisms of M that map Mn into Mn for all n. Finally, assume that there is an algebra homomorphism 'W O EndR .M; '/ ! EndR .M; '/ such that '.an/ D '.a/'.n/ O for all a 2 EndR .M; '/ and n 2 N . Consider TM D ¹z 7 ! az C m j a 2 EndR .M; '/; m 2 M º;

the affine semigroup of M . Theorem 3.16. Let A be a set of coset representatives of N in M . Then the semigroup TM acts self-similarly on A by .az C b; x/ D .y; '.a/z O C '.ax C b

y//

for the unique y 2 A with ax C b y 2 N . This action is T 1. faithful if and only if n Mn D 0; 2. reversible if and only if ' is injective;

3. defined by a finite-state automaton if the following hold: 'O is an automorphism of finite order, and there exist a norm k  kW M ! N and a constant  < 1 such that ka C bk 6 kak C kbk, for all K 2 N the ball ¹m 2 M j K > kmkº is finite, and k'.n/k 6 knk for all n 2 N . We already saw some examples of this construction: the lamplighter automaton L is obtained by taking R D M D F2 Œt; N D tM , '.tm/ D m, 'O D 1, and kf k D 2deg f with  D 12 . The semigroup S.L/ is contained in TM , and the group G.L/ is contained in the affine group of F2 ŒŒt. More generally, the Cayley automaton of a finite group G is obtained by taking R D GŒŒt with G viewed as a ring with product xy D 0 unless x D 1 or y D 1. The adding machine (7) generates the subgroup of translations in the affine group of M with R D M D Z; N D 2M , '.2m/ D m, and kmk D jmj. The same ringtheoretic data produce the Baumslag–Solitar group (13); as above, we use R D Z to obtain a semigroup, and R D Z2 (or any ring in which 3 is invertible) to obtain a group. Consider, more generally, R D Z; M D Zn ; N D 2M , and '.2m/ D m. These data produce the affine group Zn Ì GLn .Z/, proving Theorem 3.2. A finer construction, giving an action on the binary tree, is to again take M D Zn and N D ' 1 .M / with ' 1 .x1 ; : : : ; xn / D .2xn ; x1 ; : : : ; xn 1 /; here '.a/ O D 'ıaı' 1 . This gives a faithful action, on the binary tree, of Zn Ì ¹a 2 GLn .Z/ j a mod 2 is lower triangularº:

24. Groups defined by automata

899

Sketch of proof. (1) The action is faithful if and only if the translation part ¹z 7! z Cmº acts faithfully; and z 7! z C m acts trivially on A if and only if m 2 Mn for all n 2 N. (2) For any x 2 A, there is a map (not a homomorphism!) TM ! TM which associates, with g 2 TM , the permutation of A given by g

A ! xA ! g.x/A ! A I

and this map is injective precisely when ' is injective. (3) Without loss of generality, suppose 'O D 1. Consider g D z 7! az C m 2 TM . Let K be larger than the norms of ax C y for all x; y 2 A. Then the states of an automaton describing g are all of the form z 7! azCm0 , with km0 k 6 .kmkCK/=.1 /; there are finitely many possibilities for such m0 . Note that the transversal A amounts to a choice of “digits”: the analogy is clear in the case of the adding machine (7), which has digits ¹0; 1º and “counts” in base 2. For more general radix representations and their association with automata, see, e.g., [128]. 3.8. Bireversible actions. Recall that an automaton M is bireversible if M; M_ ; .M 1 /_ ; ..M_ / 1 /_ etc. are all invertible; equivalently, the map W Q  A ! A  Q is a bijection, for Q the state set of M t M 1 . Bireversible automata are interpreted in [106] in terms of commensurators of free groups, defined in (5) of Chapter 23. Consider a free group FA on a set A. Its Cayley graph C is a tree, and FA acts by isometries on C, so we have FA 6 Isom.C/. Furthermore, C is oriented: its edges are labelled by A t A 1 , and we choose as orientation the edges labelled A. In this way, FA is contained in the orientation! preserving subgroup of Isom.C/, denoted by Isom.C/. ! .FA / is the set of bireversible Proposition 3.17. The stabiliser of 1 in Comm Isom.C/ automata with alphabet A.

Sketch of proof. The proof relies on an interpretation of finite-index subgroups of FA as complete automata; see Chapter 23 (§ 3.2). Let M be a bireversible automaton with alphabet A. First, erase the output labels from M; this defines the Stallings automaton of a finite-index subgroup H1 (of index Card Q) of FA . Then erase the input labels from M; this defines an isomorphic subgroup H2 of FA . The automaton M itself defines an isomorphism between these two subgroups, which changes letters in elements of FA in a regular manner and therefore defines an isometry of C. Conversely, given an isometry g of the Cayley graph of FA which restricts to an isomorphism G ! H between finite-index subgroups of FA , consider the common underlying graph of both Stallings graphs of G and H and put their labels together, as input and output, to construct a bireversible automaton.

Laurent Bartholdi and Pedro V. Silva

900

There are, up to isomorphism, precisely two minimised bireversible automata with three states and two alphabet letters:

a 0j1 1j0

1j1 c

a

0j0 1j1

b 0j0

0j1

0j0 1j1

1j0 0j1

c

E1

1j0

b F1

These automata belong to certain families, whose general term En ; Fn has 2n C 1 states. We only describe Fn : a

0j0; 1j1

z

0j0; 1j1

n

0j1 Fn W 1j0

1j0

b

0j0; 1j1

0j1 c

0j0; 1j1

d

0j0; 1j1

m

Alëshin [2] proved that the group generated by the states b1 ; b2 in F1 ; F2 respectively is a free group on its two generators; but his argument (especially Lemma 8) has been considered incomplete, and a detailed proof appears in Theorem 1.2 in [130]. Alëshin’s method is to prove by induction that, for any reduced word w 2 ¹b1˙1 ; b2˙1 º , the syntactic monoid of the corresponding automaton acts transitively on its state set. Sidki conjectured that, in fact, G.F1 / is a free group on its three generators; this has been proven in [129]. On the other hand, G.E1 / is a free product of three cyclic groups of order 2. Both proofs illustrate some techniques used to compute with bireversible automata. They rely on the following result: Lemma 3.18. Let L  Q be a subset mapping to G.M/ through the evaluation map. If L is G.M_ /-invariant, and every G.M_ /-orbit contains a word mapping to a nontrivial element of G.M/, then L maps injectively onto G.M/. To derive the structure of a bireversible group, we therefore seek a G.M_ /-invariant subset L  Q that maps onto G.M/n¹1º, and show that every G.M/-orbit contains a non-trivial element of G.M/.

24. Groups defined by automata

901

Theorem 3.19 (Muntyan and Savchuk). G.E1 / D ha; b; c j a2 ; b 2 ; c 2 i. Note that this result generalises: G.En / is a free product of 2n C 1 order-two groups. Proof. Write Q D ¹a; b; cº. We first check the relations a2 D b 2 D c 2 D 1 in G D G.E1 /. Let L  Q denote those sequences s1    sn with si ¤ si C1 for all i . Consider the group G.E_ 1 /, with generators 0; 1. It acts on L, and acts transitively n on L \ Q for all n; indeed already 0 acts transitively on Q D L \ Q1 , and 1 acts on ¹a; cºQn 1 \ L as a 2n -cycle, conjugate to the action (7) in the sense that there is an identification of ¹a; cºQn 1 \ L with ¹0; 1ºn interleaving these actions. It follows that the 3  2n 1 elements of L \ An are in the same orbit. cja ajb E_ 1W bjc

0

ajc bjb

1 cja

It remains to note that L \ An contains a word mapping to a nontrivial element of G ; for example, c.ab/.n 1/=2 or c.ab/n=2 1 a depending on the parity of n; and to apply Lemma 3.18. Theorem 3.20 (Vorobets, Mariya, and Yaroslav). We have G.F2 / D ha; b; c j ;i Š F3 :

Note that this result generalises: G.Fn / is a free group of rank 2n C 1.

Sketch of proof. Again the key is to control the orbits of G _ D G.F2_ / D h0; 1i on the reduced words over Q D ¹a; b; cº of any given length. Let s 2 .˙1/n be a sequence of signs, and consider Ls D ¹w D w1s1    wnsn 2 .Q t Q

1 

s

s

/ j wi i ¤ wi C1i C1 for all i º:

We show that G _ acts transitively on Ls for all s , and that Ls contains a word mapping to a nontrivial element of G . Consider the elements ˛ D 02 1

2 2

0 1

1

;

ˇ D 12 0

2 2

1 0

1

;

D1

1

ı D 01

0;

1

of G _ , where the products are computed left-to-right; they are described by the automata a|a b|b

a−1 |b−1 b−1 |a−1

α ±

c |c

a± |a± b± |c± c± |b±

a± |b± b± |a± c± |c±

γ

δ

β

±

a−1 |a−1 b−1 |b−1

a|b b|a

c± |c±

902

Laurent Bartholdi and Pedro V. Silva

The elements ; ı generate a copy of Sym.3/, allowing arbitrary permutations of Q or Q 1 . In particular, G _ acts transitively on Ls whenever jsj 6 1, so we may proceed by induction on jsj. The elements ˛; ˇ , on the other hand, fix a large set of sequences (following the bold edges in the automata). Consider now s D s1    sn , and s 0 D s1    sn 1 . If sn 1 ¤ sn , so that Card Ls D 2 Card Ls 0 , then there exists w D w1s1    wnsn 2 Ls , moved by ˛ or ˇ , and such that w1s1    wnsn 11 2 Ls 0 is fixed by ˛ and ˇ ; so G _ acts transitively on Ls . If s1 ¤ s2 , apply the same argument to Lsn 1 s 1 and Lsn 1 s 1 . 1 2 Finally, if s1 D s2 and sn 1 D sn , consider a typical w 2 Ls2 sn 1 , and all wqr D q s1 wr sn , for q; r 2 Q. Using the action of ˛ and ˇ , the words wqa and wqb are in the same G _ -orbit for all q 2 Q, and similarly war and wbr are in the same G _ -orbit for all r 2 Q. For all r 2 Q, finally, war ; wbr 0 ; wcr 00 are in the same G _ -orbit for some r 0 ; r 00 2 Q, and similarly wqa;q 0 b;q 00 c are in the same G _ -orbit. It follows that all wqr are in the same G _ -orbit, so by induction Ls is a single orbit. It remains to check that every Ls contains a word w mapping to a nontrivial group element. If n is odd, set wi D a if si D 1 and wi D b if si D 1; then wN acts nontrivially on A. If n is even, change wn to c sn ; again wN acts nontrivially on A. We are done by Lemma 3.18. Burger and Mozes in [39], [40], and [41] have constructed some infinite, finitely presented simple groups; also see [116]. From this chapter’s point of view, these groups are obtained as follows: one first constructs an “appropriate” bireversible automaton M with state set Q and alphabet A, defines G0 D hA [ Q j aq D rb whenever that relation holds in Mi;

and finally considers G a finite-index subgroup of G0 . We will not explicitly give the conditions required on M for their construction to succeed; but note that automata groups can be understood as a byproduct of their work. Wise constructed finitely presented groups with non-residual finiteness properties that are also related to automata [132]. One step in Burger and Mozes’ construction gives the following bireversible automata: consider two primes p; `  1 .mod 4/. Let A (respectively Q) denote those integral quaternions, up to a unit ˙1; ˙i; ˙j; ˙k , of norm p (respectively `). By a result of Hurwitz, Card A D p C 1 and Card Q D ` C 1. Furthermore [98], for every q 2 Q and a 2 A there are unique (again up to units) b 2 A and r 2 Q with qa D br . Use these relations to define an automaton Mp;` . Clearly Mp;` is bireversible, with _ dual Mp;` D M`;p . Again thanks to unique factorisation of integral quaternions of odd norm, we get the following result. Proposition 3.21. G.Mp;` / D F.`C1/=2 . Glasner and Mozes [69] constructed an example of a bireversible automata group with Kazhdan’s property (T).

24. Groups defined by automata

903

Acknowledgement. Pedro V. Silva acknowledges support by Project ASA (PTDC /MAT/65481/2006) and C.M.U.P., financed by F.C.T. (Portugal) through the programmes POCTI and POSI, with national and E.U. structural funds.

References [1] M. Abért, Group laws and free subgroups in topological groups. Bull. London Math. Soc. 37 (2005), no. 4, 525–534. MR 2143732 Zbl 1095.20001 q.v. 894 [2] S. V. Alëshin, A free group of finite automata. Vestnik Moskov. Univ. Ser. I Mat. Mekh. 1983, no. 4, 12–14. In Russian. English translation, Moscow Univ. Math. Bull. 38 (1983), no. 4, 10–13. MR 0713968 Zbl 0513.68044 Zbl 0522.68054 (translation) q.v. 900 [3] S. V. Alëshin, Finite automata and the Burnside problem for periodic groups. Mat. Zametki 11 (1972), 319–328. English translation, Math. Notes 11 (1972), 199–203 MR 0301107 Zbl 0253.20049 q.v. 888, 892 [4] J. M. Alonso, Combings of groups. In Algorithms and classification in combinatorial group theory (G. Baumslag and C. F. Miller, III, eds.). Papers from the Workshop on Algorithms, Word Problems and Classification in Combinatorial Group Theory held in Berkeley, California, January 1989. Mathematical Sciences Research Institute Publications, 23. Springer, New York, 1992, 165–178. MR 1230633 Zbl 0763.57001 q.v. 880 [5] J. M. Alonso, T. Brady, D. Cooper, V. Ferlini, M. Lustig, M. Mihalik, M. Shapiro, and H. Short, Notes on word hyperbolic groups. Edited by H. Short. In Group theory from a geometrical viewpoint (É. Ghys, A. Haefliger, and A. Verjovsky, eds.). Proceedings of the workshop held in Trieste, March 26–April 6, 1990. World Scientific, River Edge, N.J., 1991, 3–63. MR 1170363 Zbl 0849.20023 q.v. 881 [6] J. M. Alonso and M. R. Bridson, Semihyperbolic groups. Proc. London Math. Soc. (3) 70 (1995), no. 1, 56–114. MR 1300841 Zbl 0823.20035 q.v. 883 [7] G. Amir, O. Angel, and B. Virág, Amenability of linear-activity automaton groups. J. Eur. Math. Soc. (JEMS) 15 (2013), no. 3, 705–730. MR 3085088 Zbl 1277.37019 q.v. 892 [8] L. Bartholdi, The growth of Grigorchuk’s torsion group. Internat. Math. Res. Notices 1998, no. 20, 1049–1054. MR 1656258 Zbl 0942.20027 q.v. 894 [9] L. Bartholdi, Lower bounds on the growth of a group acting on the binary rooted tree. Internat. J. Algebra Comput. 11 (2001), no. 1, 73–88. MR 1818662 Zbl 1028.20025 q.v. 894 [10] L. Bartholdi, Endomorphic presentations of branch groups. J. Algebra 268 (2003), no. 2, 419–443. MR 2009317 Zbl 1044.20015 q.v. 893 [11] L. Bartholdi, A Wilson group of non-uniformly exponential growth. C. R. Math. Acad. Sci. Paris 336 (2003), no. 7, 549–554. MR 1981466 Zbl 1050.20018 q.v. 895 [12] L. Bartholdi, Branch rings, thinned rings, tree enveloping rings. Israel J. Math. 154 (2006), 93–139. MR 2254535 Zbl 1173.16303 q.v. 887 [13] L. Bartholdi and A. Erschler, Growth of permutational extensions. Invent. Math. 189 (2012), no. 2, 431–455. MR 2947548 Zbl 1286.20025 q.v. 895 [14] L. Bartholdi and A. Erschler, Groups of given intermediate word growth. Ann. Inst. Fourier (Grenoble) 64 (2014), no. 5, 2003–2036. MR 3330929 Zbl 1317.20043 q.v. 895

904

Laurent Bartholdi and Pedro V. Silva

[15] L. Bartholdi and A. Erschler, Ordering the space of finitely generated groups. Ann. Inst. Fourier (Grenoble) 65 (2015), no. 5, 2091–2144. MR 3449208 Zbl 1372.20030 q.v. 895 [16] L. Bartholdi, R. I. Grigorchuk, and V. Nekrashevych, From fractal groups to fractal sets. In Fractals in Graz 2001 (P. Grabner and W. Woess, eds.). Analysis – dynamics – geometry – stochastics. Proceedings of the conference held at Graz University of Technology, Graz, June 2001. Trends in Mathematics. Birkhäuser, Basel, 2003, 25–118. MR 2091700 Zbl 1037.20040 q.v. 896 ´ Branch groups. In Handbook of algebra [17] L. Bartholdi, R. I. Grigorchuk, and Z. Šunik, (M. Hazewinkel, ed.). Vol. 3. Elsevier/North-Holland, Amsterdam, 2003, 989–1112. MR 2035113 Zbl 1140.20306 q.v. 885, 893 [18] L. Bartholdi, V. A. Kaimanovich, and V. V. Nekrashevych, On amenability of automata groups. Duke Math. J. 154 (2010), no. 3, 575–598. MR 2730578 Zbl 1268.20026 q.v. 892 [19] L. Bartholdi and I. Mitrofanov, The word and order problems for self-similar and automata groups. Groups Geom. Dyn. 14 (2020), no. 2, 705–728. MR 4118634 Zbl 07227243 q.v. 891 [20] L. Bartholdi and V. Nekrashevych, Thurston equivalence of topological polynomials. Acta Math. 197 (2006), no. 1, 1–51. MR 2285317 Zbl 1176.37020 q.v. 897 [21] L. Bartholdi and I. I. Reznykov, A Mealy machine with polynomial growth of irrational degree. Internat. J. Algebra Comput. 18 (2008), no. 1, 59–82. MR 2394721 Zbl 1185.68430 q.v. 895 [22] L. Bartholdi, I. I. Reznykov, and V. I. Sushchansky, The smallest Mealy automaton of intermediate growth. J. Algebra 295 (2006), no. 2, 387–414. MR 2194959 Zbl 1095.20045 q.v. 895 ´ Some solvable automaton groups. In Topological and asymp[23] L. Bartholdi and Z. Šunik, ´ eds.) totic aspects of group theory (R. Grigorchuk, M. Mihalik, M. Sapir, and Z. Šunik, Proceedings of the AMS Special Sessions on Probabilistic and Asymptotic Aspects of Group Theory held in Athens, OH, March 26–27, 2004, and Topological Aspects of Group Theory held in Nashville, TN, October 16–17, 2004. Contemporary Mathematics, 394. American Mathematical Society, Providence, R.I., 2006, 11–2. MR 2216703 Zbl 1106.200211106.20021 q.v. 888 [24] L. Bartholdi and B. Virág, Amenability via random walks. Duke Math. J. 130 (2005), no. 1, 39–56. MR 2176547 Zbl 1104.43002 q.v. 890, 892 [25] G. Baumslag, M. R. Bridson, C. F. Miller, III, and H. Short, Finitely presented subgroups of automatic groups and their isoperimetric functions. J. London Math. Soc. (2) 56 (1997), no. 2, 292–304. MR 1489138 Zbl 0910.20023 q.v. 885 [26] G. Baumslag, S. M. Gersten, M. Shapiro, and H. Short, Automatic groups and amalgams. J. Pure Appl. Algebra 76 (1991), no. 3, 229–316. MR 1147304 Zbl 0749.20006 q.v. 875, 881, 885 [27] I. Bondarenko, R. Grigorchuk, R. Kravchenko, Y. Muntyan, V. Nekrashevych, D. Savchuk, and Z. Šunić, On classification of groups generated by 3-state automata over a 2-letter alphabet. Algebra Discrete Math. 2008, no. 1, 1–163. MR 2432182 Zbl 1164.20004 q.v. 888, 890 [28] W. W. Boone, The word problem. Proc. Nat. Acad. Sci. U.S.A. 44 (1958), 1061–1065. MR 0101267 Zbl 0086.24701 q.v. 880 [29] N. Brady, Finite subgroups of hyperbolic groups. Internat. J. Algebra Comput. 10 (2000), no. 4, 399–405. MR 1776048 Zbl 1010.20030 q.v. 883

24. Groups defined by automata

905

[30] M. Brazil, Calculating growth functions for groups using automata. In Computational algebra and number theory (W. Bosma and A. van der Poorten, eds.). Papers from the CANT2 Meeting held at Sydney University, Sydney, November 1992. Mathematics and its Applications, 325. Kluwer Academic Publishers, Dordrecht, 1995, 1–18. MR 1344918 Zbl 0833.20042 q.v. 874 [31] M. R. Bridson, Combings of groups and the grammar of reparameterization. Comment. Math. Helv. 78 (2003), no. 4, 752–771. MR 2016694 Zbl 1044.20018 q.v. 885 [32] M. R. Bridson, A note on the grammar of combings. Internat. J. Algebra Comput. 15 (2005), no. 3, 529–535. MR 2151425 Zbl 1120.20041 q.v. 882 [33] M. R. Bridson, Non-positive curvature and complexity for finitely presented groups. In International Congress of Mathematicians (M. Sanz-Solé, J. Soria, J. L. Varona, and J. Verdera, eds.). Vol. II. Invited lectures. Proceedings of the congress held in Madrid, August 22–30, 2006. European Mathematical Society (EMS), Zürich, 2006, 961–987. MR 2275631 Zbl 1108.20041 q.v. 878 [34] M. R. Bridson and R. H. Gilman, Formal language theory and the geometry of 3-manifolds. Comment. Math. Helv. 71 (1996), no. 4, 525–555. MR 1420509 Zbl 0873.20026 q.v. 878 [35] M. R. Bridson and A. Haefliger, Metric spaces of non-positive curvature. Grundlehren der Mathematischen Wissenschaften, 319. Springer, Berlin, 1999. MR 1744486 Zbl 0988.53001 q.v. 883 [36] B. Brink and R. B. Howlett, A finiteness property and an automatic structure for Coxeter groups. Math. Ann. 296 (1993), no. 1, 179–190. MR 1213378 Zbl 0793.20036 q.v. 880 [37] A. M. Brunner and S. Sidki, The generation of GL.n; Z / by finite state automata. Internat. J. Algebra Comput. 8 (1998), no. 1, 127–139. MR 1492064 Zbl 0923.20023 q.v. 887 [38] A. M. Brunner and S. Sidki, Wreath operations in the group of automorphisms of the binary tree. J. Algebra 257 (2002), no. 1, 51–64. MR 1942271 Zbl 1027.20018 q.v. 888 [39] M. Burger and S. Mozes, Finitely presented simple groups and products of trees. C. R. Acad. Sci. Paris Sér. I Math. 324 (1997), no. 7, 747–752. MR 1446574 Zbl 0966.20013 q.v. 902 [40] M. Burger and S. Mozes, Groups acting on trees: from local to global structure. Inst. Hautes Études Sci. Publ. Math. 92 (2000), 113–150. MR 1839488 Zbl 1007.22012 q.v. 902 [41] M. Burger and S. Mozes, Lattices in product of trees. Inst. Hautes Études Sci. Publ. Math. 92 (2000), 151–194. MR 1839489 Zbl 1007.22013 q.v. 902 [42] D. Calegari and K. Fujiwara, Combable functions, quasimorphisms, and the central limit theorem. Ergodic Theory Dynam. Systems 30 (2010), no. 5, 1343–1369. MR 2718897 Zbl 1217.37025 q.v. 882 [43] C. M. Campbell, E. F. Robertson, N. Ruškuc, and R. M. Thomas, Automatic semigroups. Theoret. Comput. Sci. 250 (2001), no. 1–2, 365–391. MR 1795250 Zbl 0987.20033 q.v. 878 [44] J. W. Cannon, The combinatorial structure of cocompact discrete hyperbolic groups. Geom. Dedicata 16 (1984), no. 2, 123–148. MR 0758901 Zbl 0606.57003 q.v. 874 [45] J. W. Cannon, W. J. Floyd, and W. R. Parry, Finite subdivision rules. Conform. Geom. Dyn. 5 (2001), 153–196. MR 1875951 Zbl 1060.20037 q.v. 897 [46] J. W. Cannon, W. J. Floyd, W. R. Parry, and K. M. Pilgrim, Subdivision rules and virtual endomorphisms. Geom. Dedicata 141 (2009), 181–195. MR 2520071 Zbl 1364.37100 q.v. 897

906

Laurent Bartholdi and Pedro V. Silva

[47] J. Cassaigne and P. V. Silva, Infinite words and confluent rewriting systems: endomorphism extensions. Internat. J. Algebra Comput. 19 (2009), no. 4, 443–490. MR 2536187 Zbl 1213.68477 q.v. 884 [48] A. Cayley, Desiderata and suggestions: No. 2. The theory of groups: graphical representation. Amer. J. Math. 1 (1878), no. 2, 174–176. MR 1505159 JFM 10.0105.02 q.v. 872 [49] R. Charney, Artin groups of finite type are biautomatic. Math. Ann. 292 (1992), no. 4, 671–683. MR 1157320 Zbl 0736.57001 q.v. 880 [50] M. Coornaert, T. Delzant, and A. Papadopoulos, Géométrie et théorie des groupes. Les groupes hyperboliques de Gromov. Lecture Notes in Mathematics, 1441. Springer, Berlin, 1990. MR 1075994 Zbl 0727.20018 q.v. 881 [51] F. Dahmani and V. Guirardel, The isomorphism problem for all hyperbolic groups. Geom. Funct. Anal. 21 (2011), no. 2, 223–300. MR 2795509 Zbl 1258.20034 q.v. 883 [52] M. Dehn, Über die Topologie des dreidimensionalen Raumes. Math. Ann. 69 (1910), no. 1, 137–168. MR 1511580 JFM 41.0543.01 q.v. 873 [53] M. Dehn, Über unendliche diskontinuierliche Gruppen. Math. Ann. 71 (1911), no. 1, 116–144. JFM 42.0508.03 q.v. 873 [54] M. Dehn, Transformation der Kurven auf zweiseitigen Flächen. Math. Ann. 72 (1912), no. 3, 413–421. MR 1511705 JFM 43.0571.03 q.v. 873 [55] M. Dehn, Papers on group theory and topology. Translated from the German and with introductions and an appendix by J. Stillwell. With an appendix by O. Schreier. Springer, New York, 1987. MR 0881797 Zbl 1264.01046 q.v. 873 [56] A. Duncan and R. H. Gilman, Word hyperbolic semigroups. Math. Proc. Cambridge Philos. Soc. 136 (2004), no. 3, 513–524. MR 2055042 Zbl 1064.20055 q.v. 884 [57] P. Engel, Geometric crystallography. In Handbook of convex geometry (P. M. Gruber and J. M. Wills, eds.). Vol. B. North-Holland Publishing Co., Amsterdam, 1993, 989–1041. MR 1243001 Zbl 0804.51032 q.v. 871 [58] D. B. A. Epstein, J. W. Cannon, D. F. Holt, S. V. F. Levy, M. S. Paterson, and W. P. Thurston, Word processing in groups. Jones and Bartlett Publishers, Boston, MA, 1992. MR 1161694 Zbl 0764.20017 q.v. 875, 880 [59] B. Farb, Automatic groups: a guided tour. Enseign. Math. (2) 38 (1992), no. 3–4, 291–313. MR 1189009 Zbl 0811.20038 q.v. 875 [60] B. Farb, Relatively hyperbolic groups. Geom. Funct. Anal. 8 (1998), no. 5, 810–840. MR 1650094 Zbl 0985.20027 q.v. 883, 884 [61] R. Geoghegan, Topological methods in group theory. Graduate Texts in Mathematics, 243. Springer, New York, 2008. MR 2365352 Zbl 1141.57001 q.v. 882 [62] S. M. Gersten, Problems on automatic groups. In G. Baumslag and C. F. Miller III (eds.), Algorithms and classification in combinatorial group theory. Papers from the Workshop on Algorithms, Word Problems and Classification in Combinatorial Group Theory. Held in Berkeley, CA, January 1989. Mathematical Sciences Research Institute Publications, 23. Springer, New York, 1992, 225–232. MR 1230636 Zbl 0781.20022 q.v. 879, 881 [63] S. M. Gersten and H. B. Short, Small cancellation theory and automatic groups. II. Invent. Math. 105 (1991), no. 3, 641–662. MR 1117155 Zbl 0734.20014 q.v. 881 [64] S. M. Gersten and H. B. Short, Rational subgroups of biautomatic groups. Ann. of Math. (2) 134 (1991), no. 1, 125–158. MR 1114609 Zbl 0744.20035 q.v. 881

24. Groups defined by automata

907

[65] É. Ghys and P. de la Harpe (eds.), Sur les groupes hyperboliques d’après Mikhael Gromov. Papers from the Swiss Seminar on Hyperbolic Groups held in Bern, 1988. Progress in Mathematics, 83. Birkhäuser Boston, Boston, MA, 1990. MR 1086648 Zbl 0731.20025 q.v. 881 [66] R. H. Gilman, Groups with a rational cross-section. In Combinatorial group theory and topology (S. M. Gersten and J. R. Stallings, eds.). Annals of Mathematics Studies, 111. Princeton University Press, Princeton, N.J., 1987, 175–183. MR 0895616 Zbl 0623.20021 q.v. 877 [67] R. H. Gilman, On the definition of word hyperbolic groups. Math. Z. 242 (2002), no. 3, 529–541. MR 1985464 Zbl 1047.20033 q.v. 882 [68] R. H. Gilman, D. F. Holt, and S. Rees, Combing nilpotent and polycyclic groups. Internat. J. Algebra Comput. 9 (1999), no. 2, 135–155. MR 1703070 Zbl 1028.20032 q.v. 884 [69] Y. Glasner and S. Mozes, Automata and square complexes. Geom. Dedicata 111 (2005), 43–64. MR 2155175 Zbl 1088.20037 q.v. 902 [70] V. M. Glushkov, Abstract theory of automata. Uspehi Mat. Nauk 16 (1961), no. 5(101), 3–62. In Russian. English translation, Russ. Math. Surv. 16 (1961), no. 5, 1–53. MR 0138529 Zbl 0104.35404 q.v. 885 [71] M. Greendlinger, Dehn’s algorithm for the word problem. Comm. Pure Appl. Math. 13 (1960), 67–83. MR 0124381 Zbl 0104.01903 q.v. 873 [72] M. Greendlinger, On Dehn’s algorithms for the conjugacy and word problems, with applications. Comm. Pure Appl. Math. 13 (1960), 641–677. MR 0125020 Zbl 0156.01303 q.v. 873 [73] R. I. Grigorchuk, On Burnside’s problem on periodic groups. Funktsional. Anal. i Prilozhen. 14 (1980), no. 1, 53–54. MR 0565099 Zbl 0595.20029 q.v. 888, 892 [74] R. I. Grigorchuk, On Milnor’s problem of group growth. Dokl. Akad. Nauk SSSR 271 (1983), no. 1, 30–33. In Russian. English translation, Soviet Math. Dokl. 28 (1983), no. 1, 23–26. MR 0712546 Zbl 0547.20025 q.v. 888, 894 [75] R. I. Grigorchuk, Construction of p -groups of intermediate growth that have a continuum of quotient groups. Algebra i Logika 23 (1984), no. 4, 383–394, 478. In Russian. English translation, Algebra and Logic 23 (1984), no. 4, 265–273. MR 0781246 Zbl 0573.20037 q.v. 888 [76] R. I. Grigorchuk, Degrees of growth of finitely generated groups and the theory of invariant means. Izv. Akad. Nauk SSSR Ser. Mat. 48 (1984), no. 5, 939–985. In Russian. English translation, Math. USSR-Izv. 25 (1985), no. 2, 259–300. MR 0764305 Zbl 0583.20023 q.v. 888 [77] R. I. Grigorchuk, Degrees of growth of p -groups and torsion-free groups. Mat. Sb. (N.S.) 126(168) (1985), no. 2, 194–214, 286. In Russian. English translation, Math. USSRSb. 54 (1986), no. 1, 185–205. MR 0784354 Zbl 0568.20033 q.v. 888 [78] R. I. Grigorchuk, Just infinite branch groups. In New horizons in pro-p groups (M. du Sautoy, D. Segal, and A. Shalev, eds.). Birkhäuser Boston, Inc., Boston, MA, 2000, 121–179. MR 1765119 Zbl 0982.20024 q.v. 893 [79] R. I. Grigorchuk, V. V. Nekrashevich, and V. I. Sushchanski˘ı, Automata, dynamical systems, and groups. Tr. Mat. Inst. Steklova 231 (2000), Din. Sist., Avtom. i Beskon. Gruppy, 134–214. In Russian. English translation, Proc. Steklov Inst. Math. 2000, no. 4(231), 128–203. MR 1841755 Zbl 1155.37311 q.v. 885

908

Laurent Bartholdi and Pedro V. Silva

[80] R. I. Grigorchuk and J. S. Wilson, A structural property concerning abstract commensurability of subgroups. J. London Math. Soc. (2) 68 (2003), no. 3, 671–682. MR 2009443 Zbl 1063.20033 q.v. 890 [81] R. I. Grigorchuk and J. S. Wilson, The uniqueness of the actions of certain branch groups on rooted trees. Geom. Dedicata 100 (2003), 103–116. MR 2011117 Zbl 1052.20023 q.v. 891 [82] R. I. Grigorchuk and A. Żuk, The lamplighter group as a group generated by a 2-state automaton, and its spectrum. Geom. Dedicata 87 (2001), no. 1–3, 209–244. MR 1866850 Zbl 0990.60049 q.v. 888 [83] R. I. Grigorchuk and A. Żuk, On a torsion-free weakly branch group defined by a three state automaton. Internat. J. Algebra Comput. 12 (2002), no. 1–2, 223–246. International Conference on Geometric and Combinatorial Methods in Group Theory and Semigroup Theory (Lincoln, NE, 2000). MR 1902367 Zbl 1070.20031 q.v. 890 [84] M. Gromov, Groups of polynomial growth and expanding maps. Inst. Hautes Études Sci. Publ. Math. 53 (1981), 53–73. MR 0623534 Zbl 0474.20018 q.v. 894 [85] M. Gromov, Structures métriques pour les variétés riemanniennes. Edited by J. Lafontaine and P. Pansu. Textes Mathématiques, 1. CEDIC, Paris, 1981. MR 0682063 Zbl 0509.53034 q.v. 895 [86] M. Gromov, Infinite groups as geometric objects. In Proceedings of the international congress of mathematicians (Z. Ciesielski and C. Olech, eds.). Vol. 1. PWN—Polish Scientific Publishers, Warsaw, and North-Holland Publishing Co., Amsterdam, 1984, 385–392. MR 0804694 Zbl 0586.20016 q.v. 875 [87] M. Gromov, Hyperbolic groups. In Essays in group theory (S. M. Gersten, ed.). Mathematical Sciences Research Institute Publications, 8. Springer, New York, 1987, 75–263. MR 0919829 Zbl 0634.20015 q.v. 881 [88] J. R. J. Groves and S. M. Hermiller, Isoperimetric inequalities for soluble groups. Geom. Dedicata 88 (2001), no. 1–3, 239–254. MR 1877218 Zbl 0994.20035 q.v. 884 [89] N. Gupta and S. Sidki, On the Burnside problem for periodic groups. Math. Z. 182 (1983), no. 3, 385–388. MR 0696534 Zbl 0513.20024 q.v. 888 [90] N. Gupta and S. Sidki, Some infinite p -groups. Algebra i Logika 22 (1983), no. 5, 584–589. Reprinted in Algebra and Logic 22 (1983), no. 5, 421–424. MR 0759409 Zbl 0546.20026 q.v. 888, 892 [91] S. Hermiller, D. F. Holt, and S. Rees, Star-free geodesic languages for groups. Internat. J. Algebra Comput. 17 (2007), no. 2, 329–345. MR 2310150 Zbl 1170.20023 q.v. 878 [92] S. Hermiller, D. F. Holt, and S. Rees, Groups whose geodesics are locally testable. Internat. J. Algebra Comput. 18 (2008), no. 5, 911–923. MR 2440717 Zbl 1178.20032 q.v. 878 [93] S. M. Hermiller and J. Meier, Tame combings, almost convexity and rewriting systems for groups. Math. Z. 225 (1997), no. 2, 263–276. MR 1464930 Zbl 0873.20023 q.v. 878 [94] M. Hoffmann, D. Kuske, F. Otto, and R. M. Thomas, Some relatives of automatic and hyperbolic groups. In Semigroups, algorithms, automata and languages (G. M. S. Gomes, J.-É. Pin and P. V. Silva, eds.). Papers from the Thematic Term held in Coimbra, May–July 2001. World Scientific, River Edge, N.J., 2002, 379–406. MR 2023798 Zbl 1031.20047 q.v. 884 [95] M. Hoffmann and R. M. Thomas, A geometric characterization of automatic semigroups. Theoret. Comput. Sci. 369 (2006), no. 1–3, 300–313. MR 2277577 Zbl 1155.68039 q.v. 878

24. Groups defined by automata

909

[96] D. F. Holt, Word-hyperbolic groups have real-time word problem. Internat. J. Algebra Comput. 10 (2000), no. 2, 221–227. MR 1758286 Zbl 1083.20507 q.v. 883 [97] J. Hopcroft, An n log n algorithm for minimizing states in a finite automaton. In Theory of machines and computations (Z. Kohavi and A. Paz, eds.). Proceedings of an International Symposium on the Theory of Machines and Computations held at Technion in Haifa, Israel, on August 16–19, 1971. Academic Press, New York and London, 1971, 189–196. MR 0403320 Zbl 0293.94022 q.v. 890 [98] A. Hurwitz, Vorlesungen über die Zahlentheorie der Quaternionen. J. Springer, Berlin, 1919. JFM 47.0106.01 q.v. 902 [99] A. Kameyama, The Thurston equivalence for postcritically finite branched coverings. Osaka J. Math. 38 (2001), no. 3, 565–610. MR 1860841 Zbl 1079.57502 q.v. 897 [100] T. Knuutila, Re-describing an algorithm by Hopcroft. Theoret. Comput. Sci. 250 (2001), no. 1–2, 333–363. MR 1795249 Zbl 0952.68077 q.v. 890 [101] K. B. Krohn and J. L. Rhodes, Algebraic theory of machines. In Proceedings of the Symposium on Mathematical Theory of Automata. Held in New York, April 24–26, 1962. Microwave Research Institute Symposia Series, XII. Polytechnic Press of Polytechnic Institute of Brooklyn, Brooklyn, N.Y., 1963, 341–384. MR 0175718 Zbl 0138.00808 q.v. 885, 897 [102] Y. G. Leonov, Conjugacy problem in a class of 2-groups. Mat. Zametki 64 (1998), no. 4, 573–583. In Russian. English translation, Math. Notes 64 (1998), no. 3–4, 496–505. MR 1687212 Zbl 0942.20011 q.v. 890 [103] R. C. Lyndon, On Dehn’s algorithm. Math. Ann. 166 (1966), 208–228. MR 0214650 Zbl 0138.25702 q.v. 873 [104] R. C. Lyndon and P. E. Schupp, Combinatorial group theory. Ergebnisse der Mathematik und ihrer Grenzgebiete, 89. Springer, Berlin, 1977. MR 0577064 Zbl 0368.20023 q.v. 873, 879 [105] I. G. Lysënok, A system of defining relations for a Grigorchuk group. Mat. Zametki 38 (1985), no. 4, 503–516, 634. In Russian. English translation, Math. Notes 38 (1985), no. 3–4, 784–792. MR 0819415 Zbl 0595.20030 q.v. 893 [106] O. Macedońska, V. Nekrashevych, and V. Sushchansky, Commensurators of groups and reversible automata. Dopov. Nats. Akad. Nauk Ukr. Mat. Prirodozn. Tekh. Nauki 2000, no. 12, 36–39. MR 1841119 Zbl 0977.20022 q.v. 899 [107] C. F. Miller, III, On group-theoretic decision problems and their classification. Annals of Mathematics Studies, 68. Princeton University Press, Princeton, N.J., and University of Tokyo Press, Tokyo, 1971. MR 0310044 Zbl 0277.20054 q.v. 881 [108] J. W. Milnor, Growth of finitely generated solvable groups. J. Differential Geometry 2 (1968), 447–449. MR 0244899 Zbl 0176.29803 q.v. 894 [109] J. W. Milnor, Problem 5603. Amer. Math. Monthly 75 (1968), 685–686. q.v. 894 [110] L. Mosher, Mapping class groups are automatic. Ann. of Math. (2) 142 (1995), no. 2, 303–384. MR 1343324 Zbl 0867.57004 q.v. 879 [111] L. Mosher, Central quotients of biautomatic groups. Comment. Math. Helv. 72 (1997), no. 1, 16–29. MR 1456313 Zbl 0888.20019 q.v. 881 [112] V. Nekrashevych, Self-similar groups. volume 117 of Math. Surveys Monogr. Mathematical Surveys and Monographs, 117. American Mathematical Society, Providence, R.I., 2005. MR 2162164 Zbl 1087.20032 q.v. 885, 891, 897 [113] V. Nekrashevych, Free subgroups in groups acting on rooted trees. Groups Geom. Dyn. 4 (2010), no. 4, 847–862. MR 2727668 Zbl 1267.37009 q.v. 892

910

Laurent Bartholdi and Pedro V. Silva

[114] P. S. Novikov, Об алгоритмической неразрешимости проблемы тождества слов в теории групп (On the algorithmic unsolvability of the word problem in group theory). Trudy Mat. Inst. im. Steklov. 44. Izdat. Akad. Nauk SSSR, Moscow, 1955. In Russian. http://www.mathnet.ru/links/b4a5081a02caa1c912d959b86890ff61/tm1180.pdf MR 0075197 Zbl 0068.013010068.01301 q.v. 880 [115] A. Y. Ol’shanski˘ı, Almost every group is hyperbolic. Internat. J. Algebra Comput. 2 (1992), no. 1, 1–17. MR 1167524 Zbl 0779.20016 q.v. 882 [116] D. Rattaggi, A finitely presented torsion-free simple group. J. Group Theory 10 (2007), no. 3, 363–371. MR 2320973 Zbl 1136.20026 q.v. 902 [117] E. Rips, Subgroups of small cancellation groups. Bull. London Math. Soc. 14 (1982), no. 1, 45–47. MR 0642423 Zbl 0481.20020 q.v. 883 [118] A. V. Rozhkov, Conjugacy problem in an automorphism group of an infinite tree. Mat. Zametki 64 (1998), no. 4, 592–597. In Russian. English translation, Math. Notes 64 (1998), no. 3–4, 513–517. MR 1687204 Zbl 0949.20025 q.v. 890 [119] Y. Shalom and T. Tao, A finitary version of Gromov’s polynomial growth theorem. Geom. Funct. Anal. 20 (2010), no. 6, 1502–1547. MR 2739001 Zbl 1262.20044 q.v. 895 [120] S. Sidki, A primitive ring associated to a Burnside 3-group. J. London Math. Soc. (2) 55 (1997), no. 1, 55–64. MR 1423285 Zbl 0874.20028 q.v. 887 [121] S. Sidki, Automorphisms of one-rooted trees: growth, circuit structure, and acyclicity. J. Math. Sci. (New York) 100 (2000), no. 1, 1925–1943. Algebra. 12. Dedicated to the 70 th birthday of Professor Alexei Ivanovich Kostrikin. A translation of Алгебра 12 (Yu. A. Bahturin and E. S. Golod, eds.). Itogi Nauki Tekh. Ser. Sovrem. Mat. Prilozh. Temat. Obz., 58, Vseross. Inst. Nauchn. i Tekhn. Inform. (VINITI), Moscow, 1998. MR 1774362 Zbl 1069.20504 q.v. 891 [122] S. Sidki, Finite automata of polynomial growth do not generate a free group. Geom. Dedicata 108 (2004), 193–204. MR 2112674 Zbl 1075.20011 q.v. 891 [123] S. Sidki, Tree-wreathing applied to generation of groups by finite automata. Internat. J. Algebra Comput. 15 (2005), no. 5–6, 1205–1212. MR 2197828 Zbl 1108.20025 q.v. 888 [124] P. V. Silva and B. Steinberg, A geometric characterization of automatic monoids. Q. J. Math. 55 (2004), no. 3, 333–356. MR 2082097 Zbl 1076.20041 q.v. 878 [125] P. V. Silva and B. Steinberg, On a class of automata groups generalizing lamplighter groups. Internat. J. Algebra Comput. 15 (2005), no. 5–6, 1213–1234. MR 2197829 Zbl 1106.20028 q.v. 898 [126] V. A. Tartakovski˘ı, Решение проблемы тождества для групп с k -сократимым базисом при k > 6 (Solution of the word problem for groups with a k reduced basis for k > 6). Izvestiya Akad. Nauk SSSR. Ser. Mat. 13 (1949). 483–494. In Russian. MR 0033816 q.v. 873 [127] J. Tits, Free subgroups in linear groups. J. Algebra 20 (1972), 250–270. MR 0286898 Zbl 0236.20032 q.v. 894 [128] A. Vince, Radix representation and rep-tiling. Congr. Numer. 98 (1993), 199–212. Proceedings of the Twenty-fourth Southeastern International Conference on Combinatorics, Graph Theory, and Computing. Held at Florida Atlantic University, Boca Raton, FL, February 22–26, 1993. MR 1267355 Zbl 0801.05023 q.v. 899 [129] M. Vorobets and Y. Vorobets, On a free group of transformations defined by an automaton. Geom. Dedicata 124 (2007), 237–249. MR 2318547 Zbl 1183.20024 q.v. 900

24. Groups defined by automata

911

[130] M. Vorobets and Y. Vorobets, On a series of finite automata defining free transformation groups. Groups Geom. Dyn. 4 (2010), no. 2, 377–405. MR 2595096 Zbl 1227.20027 q.v. 900 [131] J. S. Wilson, On exponential growth and uniformly exponential growth for groups. Invent. Math. 155 (2004), no. 2, 287–303. MR 2031429 Zbl 1065.20054 q.v. 895 [132] D. T. Wise, Complete square complexes. Comment. Math. Helv. 82 (2007), no. 4, 683–724. MR 2341837 Zbl 1142.20025 q.v. 902 [133] J. A. Wolf, Growth of finitely generated solvable groups and curvature of Riemannian manifolds. J. Differential Geometry 2 (1968), 421–446. MR 0248688 Zbl 0207.51803 q.v. 894

Chapter 25

Automata in number theory Boris Adamczewski and Jason Bell

Contents 1. 2. 3. 4. 5. 6. 7.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . Automatic sequences and automatic sets of integers . . . . . Prime numbers and finite automata . . . . . . . . . . . . . Expansions of algebraic numbers in integer bases . . . . . . The Skolem–Mahler–Lech theorem in positive characteristic The algebraic closure of Fp .t/ . . . . . . . . . . . . . . . . Update . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

913 914 919 921 929 937 941

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

942

1. Introduction Among infinite sequences or infinite sets of integers, some are well behaved, such as periodic sequences and arithmetic progressions, whereas others, such as random sequences and random sets, are completely chaotic and cannot be described in a simple way. Finite automata are one of the most basic models of computation and thus lie at the bottom of the hierarchy associated with Turing machines. Such machines can naturally be used to generate sequences with values over a finite set, and also as devices to recognise certain subsets of the integers. One of the main interests of these automatic sequences/sets arises from the fact that they are in many ways very well behaved without necessarily being trivial. One can thus consider that they lie somewhere between order and chaos, even if, in many respects, they are well behaved. We survey some of the connections between automatic sequences/sets and number theory. Several substantial advances have recently been made in this area and we give an overview of some of these new results. This includes discussions about prime numbers, the decimal expansion of algebraic numbers, the search for an analogue of the Skolem– Mahler–Lech theorem in positive characteristic and the description of the algebraic closure of the field Fp .t/.

914

Boris Adamczewski and Jason Bell

2. Automatic sequences and automatic sets of integers In this section, we recall some basic facts about automatic sequences and automatic sets of integers. The main reference on this topic is the book of Allouche and Shallit [7]. An older reference is Eilenberg [20]. In [20] k -automatic sets are called k -recognisable. 2.1. Automatic sequences. Let k > 2 be an integer. An infinite sequence .an /n>0 is said to be k automatic if an is a finite-state function of the base-k representation of n. This means that there exists a deterministic finite automaton with output (DFAO) taking the base-k expansion of n as input and producing the term an as output. We say that a sequence is generated by a finite automaton, or for short is automatic, if it is k -automatic for some k > 2. A more concrete definition of k -automatic sequences can be given as follows. Let Ak denote the alphabet ¹0; 1; : : : ; k 1º. By definition, a k -automaton is a 6-tuple A D .Q; Ak ; ı; q0 ; ; /;

where Q is a finite set of states, ıW Q  Ak ! Q is the transition function, q0 is the initial state,  is the output alphabet and W Q !  is the output function. For a state q in Q and for a finite word w D w1 w2    wn on the alphabet Ak , we define ı.q; w/ recursively by ı.q; w/ D ı.ı.q; w1 w2    wn 1 /; wn /. Let n > 0 be an integer and rC1 let wr wr 1    w1 w0 in .Ak /P be the base-k expansion of n starting with the most significant digit. Thus n D riD0 wi k i WD Œwr wr 1    w0 k . We let w.n/ denote the word wr wr 1    w0 . Then a sequence .an /n>0 is said to be k -automatic if there exists a k -automaton A such that an D .ı.q0 ; w.n/// for all n > 0. Example 2.1. The Thue–Morse sequence t WD .tn /n>0 is probably the most famous example of an automatic sequence. It is defined as follows: tn D 0 if the sum of the binary digits of n is even, and tn D 1 otherwise. We thus have t D 01101001100101 : : :

It is easy to check that the Thue–Morse sequence can be generated by the following finite 2-automaton: A D .¹A; Bº; ¹0; 1º; ı; A; ¹0; 1º; /, where ı.A; 0/ D ı.B; 1/ D A, ı.A; 1/ D ı.B; 0/ D B , .A/ D 0 and .B/ D 1. 0

0 1

A=0

B=1 1

Figure 1. A DFAO generating Thue–Morse word

Example 2.2. Let w D .wn /n>0 be the 3-automatic sequence generated by the DFAO represented in Figure 2. Note that though this 3-automaton has only two states, it seems to be non-trivial to give a simple expression of wn as a function of the ternary expansion of n.

25. Automata in number theory 0; 1

915

1 2

A=0

B=1 0; 2

Figure 2. A DFAO generating the sequence w

2.1.1. Morphisms of free monoids. For a finite set A, we let A denote the free monoid generated by A. The empty word " is the identity element of A . Let A and B be two finite sets. A map from A to B  extends uniquely to a homomorphism between the free monoids A and B  . We call such a homomorphism from A to B  a morphism. If there is a positive integer k such that each element of A is mapped to a word of length k , then the morphism is called k -uniform or simply uniform. A coding is a 1-uniform morphism. A morphism  from A to itself is said to be prolongable if there exists a letter a such that .a/ D aw , where the word w is such that  n .w/ is a nonempty word for every n > 0. In that case, the sequence of finite words . n .a//n>0 converges in A! D A [ AN , endowed with its usual topology, to an infinite word denoted  ! .a/. This infinite word is clearly a fixed point for  (extended by continuity to infinite words) and we say that  ! .a/ is generated by the morphism  . For instance, the morphism  defined over the alphabet ¹0; 1º by .0/ D 01 and .1/ D 10 is a 2-uniform morphism that generates the Thue–Morse sequence t D  ! .0/ D 01101001100101 : : :

Uniform morphisms and automatic sequences are strongly connected, as the following classical result of Cobham shows [15]. A notable consequence of Theorem 2.1 is that finite automata are Turing machines that produce sequences in linear time. Theorem 2.1 (Cobham). An infinite word is k -automatic if and only if it is the image by a coding of a word that is generated by a k -uniform morphism. Example 2.3. Let us consider the 3-uniform morphism defined over the monoid ¹0; 1; 2º by .0/ D 012, .1/ D 020, and .2/ D 021. This morphism has a unique fixed point x D ! .0/ D 012020021012021012012 : : :

Letting  denote the coding that maps 0 and 1 to 0, and 2 to 1, we thus obtain that y WD y1 y2    WD .x/ D 001010010001010001001 : : :

is a 3-automatic word. Example 2.4. The word w defined in Example 2.2 is the unique fixed point generated by the binary morphism satisfying .0/ D 001 and .1/ D 010.

Boris Adamczewski and Jason Bell

916

2.1.2. Kernels. An important notion in the study of k -automatic sequences is the notion of k -kernel. The k -kernel of a sequence a D .an /n>0 is defined as the set of subsequences Nk .a/ D ¹.ak i nCj /n>0 W i > 0; 0 6 j < k i º: This notion gives rise to another useful characterisation of k -automatic sequences which was first proved by Eilenberg in [20].

Theorem 2.2 (Eilenberg). A sequence is k -automatic if and only if its k -kernel is finite. Example 2.5. The 2-kernel of the Thue–Morse sequence t has only two elements: t and the sequence t obtained by exchanging the symbols 0 and 1 in t . 2.2. Automatic sets of integers. Another important aspect of finite automata is that they can naturally be used as a device to recognise sets of integers. 2.2.1. Automatic subsets of N. A set N  N is said recognisable by a finite k -automaton, or for short k -automatic, if the characteristic sequence of N, defined by an D 1 if n 2 N and an D 0 otherwise, is a k -automatic sequence. This means that there exists a finite k -automaton that reads as input the base-k expansion of n and accepts this integer (producing as output the symbol 1) if n belongs to N; otherwise this automaton rejects the integer n, producing as output the symbol 0. Example 2.6. The simplest automatic sets are arithmetic progressions. Moreover, arithmetic progressions have the very special property of being k -automatic sets for every integer k > 2 (see Cobham’s theorem in Chapter 26). 1 A=0

B=1 0

1 C =0

0 1

1 D=0

0 0

E=0

1

0 Figure 3. A 2-DFAO recognising the arithmetic progression 5N C 3

Example 2.7. The set ¹1; 2; 4; 8; 16; : : :º formed by the powers of 2 is also a typical example of a 2-automatic set. (See Figure 4.) Example 2.8. In the same spirit, the set formed by taking all integers that can be expressed as the sum of at most two powers of 3 is 3-automatic (see Figure 5).

25. Automata in number theory 0

A=0

0

1

917

0; 1

1

B=1

C =0

Figure 4. A 2-DFAO recognising the powers of 2

0

A=0

2

0

C =1

0 1

B=1

1

2

D=0

0; 1; 2

Figure 5. A 3-DFAO recognising those integers that are the sum of tat most two powers of 3

There are also much stranger automatic sets. For instance, the set of integers whose binary expansion has an odd number of digits, does not contain three consecutive 1’s, and contains an even number of two consecutive 0’s is a 2-automatic set. Furthermore, the class of k -automatic sets is closed under various natural operations such as intersection, union and complement. On the other hand, some classical sets of integers, such as the set of prime numbers and the set of perfect squares, cannot be recognised by a finite automaton (see Theorem 3.1 and [56] and [49]). 2.2.2. Automatic subsets of Nd and multidimensional automatic sequences. Salon [58] extended the notion of automatic sets to include subsets of Nd , where d > 1. To describe Salon’s construction, we let Ak denote the alphabet ¹0; 1; : : : ; k 1º. We then consider an automaton A D .Q; Adk ; ı; q0 ; ; /;

where Q is a finite set of states, ıW Q  Adk ! Q is the transition function, q0 is the initial state,  is the output alphabet and W Q !  is the output function. Just as in the one-dimensional case, for a state q in Q and for a finite word w D w1 w2    wn on the alphabet Adk , we recursively define ı.q; w/ by ı.q; w/ D ı.ı.q; w1 w2    wn 1 /; wn /. We call such an automaton a d -dimensional k -automaton.

Boris Adamczewski and Jason Bell

918

We identify .Adk / with the subset of .Ak /d consisting of all d -tuples .u1 ; : : : ; ud / such that u1 ; : : : ; ud all have the same length. Each nonnegative integer n can be written uniquely as 1 X nD ej .n/k j ; j D0

in which ej .n/ 2 ¹0; : : : ; k 1º and ej .n/ D 0 for all sufficiently large j . Let .n1 ; : : : ; nd / be a d -tuple of nonnegative integers and let h WD max.blog n1 = log kc; : : : ; blog nd = log kc/;

that is, if ai represents the number of digits in the base-k expansion of ni , then h C 1 is the maximum of a1 ; : : : ; ar . We can then produce an element wk .n1 ; : : : ; nd / WD .w1 ; : : : ; wd / 2 .Adk /

corresponding to .n1 ; : : : ; nd / by defining

wi WD eh .ni /eh

1 .ni /    e0 .ni /:

In other words, we are taking the base-k expansions of n1 ; : : : ; nr and then “padding” the expansions of each ni at the beginning with 0’s if necessary to ensure that each expansion has the same length. We say that a map f W Nd !  is k -automatic if there is a d -dimensional k -automaton A D .Q; Adk ; ı; q0 ; ; / such that f .n1 ; : : : ; nd / D .ı.q0 ; wd .n1 ; : : : ; nd ///:

Similarly, we define a k -automatic subset of Nd to be a subset S such that the characteristic function of S , f W Nd ! ¹0; 1º, defined by f .n1 ; : : : ; nd / D 1 if .n1 ; : : : ; nd / 2 S ; and f .n1 ; : : : ; nd / D 0, otherwise, is k -automatic.

Example 2.9. Let f W N2 ! ¹0; 1º be defined by f .n; m/ D 1 if the sum of the binary digits of n added to the sum of the binary digits of m is even, and f .n; m/ D 0 otherwise. Then f .m; n/ is a 2-automatic map. One can check that f can be generated by the following 2-dimensional 2-automaton: A D .¹A; Bº; ¹0; 1º2; ı; A; ¹0; 1º; /, where ı.A; .0; 0// D ı.A; .1; 1// D ı.B; .1; 0// D ı.B; .0; 1// D A, ı.A; .1; 0// D ı.A; .0; 1// D ı.B; .0; 0// D ı.B; .1; 1// D B , .A/ D 1 and .B/ D 0. .0; 0/; .1; 1/ .0; 1/; .1; 0/ A=1

B=0

.0; 0/; .1; 1/

.0; 1/; .1; 0/

Figure 6. A DFAO generating the map f defined in Example 2.9

Just as k -automatic sequences can be characterised by the finiteness of the k -kernel, multidimensional k -automatic sequences have a similar characterisation.

25. Automata in number theory

919

Definition 2.1. Let d be a positive integer and let  be a finite set. We define the k -kernel of a map f W Nd !  to be the collection of all maps of the form g.n1 ; : : : ; nd / WD f .k a n1 C b1 ; : : : ; k a nd C bd /

where a > 0 and 0 6 b1 ; : : : ; bd < k a . For example, if f W N2 ! ¹0; 1º is the map defined in Example 2.9, then the 2-kernel of f consists of the 2 maps f1 .m; n/ WD f .m; n/, f2 .m; n/ D f .2m C 1; 2n/. Just as Eilenberg [20] showed that being k -automatic is equivalent to having a finite k -kernel for k -automatic sequences, Salon in Theorem 1 of [58] showed that a similar characterisation of multidimensional k -automatic maps holds. Theorem 2.3 (Salon). Let d be a positive integer and let  be a finite set. A map f W Nd !  is k -automatic if and only if its k -kernel is finite.

3. Prime numbers and finite automata In this section, we briefly discuss some results concerning primes and finite automata. 3.1. Primes and randomness. An efficient way to produce conjectures about prime numbers comes from the so-called Cramér probabilistic model (see [16], [64], and [65]). It is based on the principle that the set P of prime numbers behaves roughly like a random sequence, in which an integer of size about n has – as inspired by the prime number theorem – a 1 in log n chance of being prime. Of course, this probabilistic model has some limitations: for instance prime numbers are all odd with only one exception (see [53] for more about such limitations). Thus the set of prime numbers should be thought of as being a hybrid set rather than as a pseudorandom set (see the discussion in [66]). However, the Cramér model allows one to predict precise answers concerning occurrences of large gaps between consecutive prime numbers and concerning small gaps between primes (twin prime conjecture) and of some special patterns in P such as arithmetic progressions (Hardy–Littlewood conjectures). Some spectacular breakthrough were made recently in the two latter topics. See in particular [26] and [25]. A consequence of this probabilistic way of thinking is that the set P should be sufficiently random that it cannot be recognised by a finite automaton. This result was in fact proved to be true by Minsky and Papert [49] in 1966. Theorem 3.1 (Minsky and Papert). The set of prime numbers cannot be recognised by a finite automaton. Schützenberger [60] (also see [29]) even proved the stronger result that an automatic set always contains infinitely many composite numbers. Theorem 3.2 (Schützenberger). No infinite subset of the set of prime numbers can be recognised by a finite automaton.

920

Boris Adamczewski and Jason Bell

The intriguing question we are now left with is: how can we prove that a set of integers is not automatic? There are actually several different approaches: one can use the k -kernel and show that it is infinite or one can use some density properties (the logarithmic frequency of an automatic set exists; also if an automatic set has a positive density then it is rational). Another very efficient tool is the so-called pumping lemma, which is recalled below. For more details about the different ways of proving that a sequence is not automatic, we refer the reader to [7]. Lemma 3.3 (pumping lemma). Let N  N be a k -automatic set. Then for every sufficiently large integer n in N, there exist finite words w1 , w2 and w3 , with jw2 j > 1, such that n D Œw1 w2 w3 k and Œw1 w2i w3 k belong to N for all i > 1. Sketch of proof. Let n D Œar ar 1    a0 k be an element of N and assume that r is larger than the number of states in the underlying automaton. By the pigeonhole principle, there is a state that is encountered twice when reading the input a0 a1    ar , say just after reading ai and aj , i < j . Then setting w1 D ar    aj C1 , w2 D aj    ai C1 and w3 D ai    a0 , gives the result. Proof of Theorem 3.2. Let us assume that N is an infinite k -automatic set consisting only of prime numbers. Let p be an element of N that is sufficiently large to apply the pumping lemma. By the pumping lemma, there exist finite words w1 , w2 , and w3 , with jw2 j > 1, such that p D Œw1 w2 w3 k and such that all integers of the form Œw1 w2i w3 k , with i > 1, belong to N. However, it is not difficult to see, by using Fermat’s little theorem, that Œw1 w2p w3 k  Œw1 w2 w3 k .mod p/ and thus that Œw1 w2p w3 k  0 .mod p/. It follows that the integer Œw1 w2p w3 k belongs to N but is not a prime number. Hence we obtain a contradiction. 3.2. Primes in automatic sets. We have just seen that the set of all prime numbers is not automatic. However, it is believed that many automatic sets should contain infinitely many prime numbers. The most basic example of such a result is the famous Dirichlet theorem. Theorem 3.4 (Dirichlet). Let a and b be two relatively prime positive integers. Then the arithmetic progression aN C b contains infinitely many primes. Note that the special case of the arithmetic progression 2N C1 was known by Euclid and his famous proof that there are infinitely many prime numbers. A more complete discussion about Dirichlet’s theorem can be found in [54]. Beyond Dirichlet’s theorem, the more general result concerning automatic sets and prime numbers is Theorem 3.5 from [23]. Recall that an automaton is irreducible if for all pairs of states .A; B/ there is a path from A to B . Recall also that a positive integer is an r -almost prime if it is the product of at most r prime numbers. It is well known that results about almost-primes are much easier to prove than those concerning primes (compare for instance Chen’s theorem [12] with known results about the twin prime conjecture and the Goldbach conjecture).

25. Automata in number theory

921

Theorem 3.5 (Fouvry and Mauduit). Given an automatic set N  N associated with an irreducible automaton, there exists a positive integer r such that N contains infinitely many r -almost primes. Theorem 3.5 is not too difficult to prove using results similar to Chen’s theorem. In contrast, to prove that there are infinitely many primes in sparse automatic sets such as ¹2n 1; n > 1º and ¹2n C 1; n > 1º appears to be extremely difficult. This would solve two long-standing conjectures about the existence of infinitely many Fermat primes and Mersenne primes. 3.3. A problem of Gelfond: the sum of digits of prime numbers. Given a natural number n and a base b , we let sb .n/ denote the sum of the digits of n in base b . Given two natural numbers a and m with 0 6 a < m and gcd.m; b 1/ D 1, one can then look at the set of positive integers n such that sb .n/  a . mod m/. This set is known to be recognisable by a finite b -automaton. In 1968, Gelfond [24] asked about the collection of prime numbers that belong to this set. Theorem 3.5 implies that such a set contains infinitely r -almost primes for some r , but until recently it was still not known whether it contains infinitely many primes. Remarkably, Mauduit and Rivat [46] proved a much stronger result that gives the exact proportion of primes that belong to this automatic set. As usual with analytic number theory, the proof of their result – which relies on strong estimates of exponential sums – is long and difficult. As an example, an immediate corollary of the work of Mauduit and Rivat is that half of prime numbers belong to the Thue–Morse set ¹1; 2; 4; 7; 8; 11; 13; : : :º. Theorem 3.6 (Mauduit and Rivat). One has ¹0 6 n 6 N; n 2 P and s2 .n/  1 .mod 2/º 1 lim D  N !1 ¹0 6 n 6 N; n 2 Pº 2

4. Expansions of algebraic numbers in integer bases

p The decimal expansions of classical constants like 2,  and e appear to be very mysterious and have baffled mathematicians for a long time. Numerical observations suggest that a complex underlying structure exists and several famous mathematicians have suggested possible rigorous definitions to try to formalise what “complex structure” actually means (see, for instance, [10], [50], and [30]). These mathematicians were mainly influenced by notions from probability theory, dynamical systems, or theoretical computer science. These pioneering works lead us to a cluster of interesting conjectures concerning expansions of irrational periods in integer bases. However, even some of the simplest questions one can ask about the decimal expansions of classical irrational constants are still far out of reach.

The seminal work of Turing [67] gives rise to a rough classification of real numbers. On one side we find computable real numbers; that is, real numbers whose binary (or more generally base-b ) expansion can be produced by a Turing machine, while on the other side lie uncomputable real numbers which, in some sense, “evade computers.”

922

Boris Adamczewski and Jason Bell

Though most real numbers belong to the second class (the first one being countable), classical mathematical constants are usually computable. Following the pioneering ideas of Turing, Hartmanis and Stearns [30] proposed the emphasis of the quantitative aspect of the notion of computability, and to take into account the number T .n/ of operations needed by a (multitape) Turing machine to produce the first n digits of the expansion. In this regard, a real number is considered to be simple if its base-b expansion can be produced quickly by a Turing machine. A general problem is then to determine where our mathematical constants take place in such a classification. It is a source of challenging open questions such as the Hartmanis–Stearns problem which asks whether there exists an irrational algebraic number computable in linear time; that is, with T .n/ D O.n/. In 1968, Cobham [14] suggested to restrict this problem to a particular class of Turing machines, namely to the case of finite automata. Several attempts at a resolution to this problem are due to Cobham in 1968 (see [14]) and to Loxton and van der Poorten (see [39] and [40]) during the 1980s. Both of these works are based on the so-called Mahler transcendence method, see [41]. The aim of this section is to give a proof, due to Adamczewski and Bugeaud [2], of Cobham’s conjecture following a completely different approach based on a deep Diophantine result known as the Schmidt subspace theorem. Theorem 4.1 (Adamczewski and Bugeaud). The base-b expansion of an algebraic irrational number cannot be generated by a finite automaton. 4.1. Rational approximations and transcendence of some automatic numbers. Given an integer k > 2, a real number is said to be k -automatic if there exists an integer b > 2 such that its base-b expansion is a k -automatic sequence. 4.1.1. Liouville’s inequality. In 1844, Liouville [38] proved that transcendental numbers exist. Moreover, he constructed explicit examples of such numbers. His approach relies on the famous Liouville inequality recalled below. Proposition 4.2 (Liouville’s inequality). Let  be an algebraic number of degree d > 2. Then there exists a positive real number c such that ˇ c p ˇˇ ˇ ˇ> d ˇ q q for every rational number p=q with q > 1. Proof. Let P denote the minimal polynomial of  , let P 0 denote its derivative, and set c WD 1=.1 C max jP 0 .x/j/: j xj 1, then our choice of c ensures that j p=qj > c =q d . Let us now assume that j p=qj < 1. Since P is the minimal polynomial of  , it does not vanish at p=q and q d P .p=q/ is a nonzero integer. Consequently, ˇ  p ˇ 1 ˇ ˇ (1) ˇP ˇ> d q q

25. Automata in number theory

Since j in .p=q

923

p=qj < 1, the mean value theorem implies the existence of a real number t 1; p=q C 1/ such that ˇ  p ˇ ˇ p ˇˇ ˇ ˇ ˇ jP .p=q/j D ˇP ./ P ˇ  jP 0 .t/j; ˇ D ˇ q q

which ends the proof in view of inequality (1) and the definition of c .

Liouville’s inequality can be used to easily construct transcendental numbers. Indeed, if  is an irrational real number such that for every integer d > 2 there exists a rational number p=q satisfying j p=qj < q d , then  is transcendental. Real numbers enjoying this property are termed Liouville numbers. The number L below is a typical example of Liouville number, often considered as the first example of a transcendental number. Theorem 4.3 (Liouville). The real number L WD

is transcendental.

C1 X

1 10nŠ nD1

Proof of Theorem 4.3. Let j > d > 2 be two integers. Then, there exists an integer pj such that j X 1 pj D  j Š 10 10nŠ nD1

Observe that

ˇ ˇ ˇL

pj ˇˇ X 1 2 1 < .j C1/Š <  ˇD 10j Š 10nŠ 10 .10j Š/d n>j

It then follows from Proposition 4.2 that L cannot be algebraic of degree less than d . Since d is arbitrary, L is transcendental. Adamczewski and Cassaigne [3] confirmed a conjecture of Shallit by proving that no Liouville number can be generated by a finite automaton. In other words, there is no automatic real number that can be proved to be transcendental by the elementary approach described above. However, we will see in the sequel how some deep improvements of Liouville’s inequality can be used in a similar way to prove the transcendence of irrational automatic numbers. 4.1.2. Roth’s theorem. The following famous improvement of Liouville’s inequality was established by Roth [57] in 1955. This result is the best possible in the sense that the exponent 2 C " in (2) cannot be lowered.

Theorem 4.4 (Roth). Let  be a real algebraic number and let " be a positive real number. Then there are only a finite number of rational numbers p=q such that q > 1 and ˇ 1 p ˇˇ ˇ (2) ˇ < 2C"  ˇ q q

924

Boris Adamczewski and Jason Bell

We give an immediate application of Roth’s theorem to the transcendence of automatic real numbers. Corollary 4.5. For every integer k > 3, the k -automatic real number C1 X

1 10k n nD1

is transcendental.

Proof. Use the same argument as in the proof of Theorem 4.3. However, Roth’s theorem gives no information on the arithmetical nature of the 2-automatic real number C1 X 1 ; 102n nD1

and indeed this number has bounded partial quotients. Let us now consider the word w defined in Example 2.2. We associate with w the real number X wn D 0:001 001 010 001 001 010 001 : : : w WD 10nC1 n>0

A characteristic of the number L and the numbers defined in Corollary 4.5 is that large blocks of zeros appear in their decimal expansion much more frequently than one would expect if the numbers we were dealing with were randomly selected. In contrast, the decimal expansion of w contains no occurrence of more than three consecutive zeros. However, the combinatorial structure of w can be used to reveal more hidden good rational approximations to w that imply the following result. Theorem 4.6. The 2-automatic real number w is transcendental. Proof. Let be the binary morphism defined in Example 2.4. For every positive integer j , set uj WD j .0/, sj WD juj j and let us consider the rational number j defined by j WD 0:uj! :

An easy computation shows that there exists an integer pj such that pj  (3) j D s 10 j 1 The rational number j turns out to be a very good approximation to w . Indeed, by definition of w , the decimal expansion of w begins with j .0/ j .0/ j 1 .0/, which is also a prefix of uj! . Consequently, the first .2 C1=3/sj D 7  3j 1 digits in the decimal expansion of w and of j are the same. We thus obtain that jw

j j < 10

.2C1=3/sj

:

Consequently, we infer from (4) and (3) that ˇ 1 pj ˇˇ ˇ  ˇ< ˇw s 10 j 1 .10sj 1/2:3

(4)

25. Automata in number theory

925

Furthermore, the rational numbers j are all different since n .0/ is not a prefix of the infinite word . m .0//! when n > m. It thus follows from Roth’s theorem that w is transcendental. 4.1.3. A p-adic version of Roth’s theorem. The following non-Archimedean extension of Roth’s theorem was proved in 1957 by Ridout [55]. For every prime number `, we let j  j` denote the `-adic absolute value, which is normalised such that j`j` D ` 1 . Thus given an integer n, we have jnj` D ` j where j denotes the largest integer for which `j divides n. Theorem 4.7. Let  be an algebraic number and " be a positive real number. Let S be a finite set of distinct prime numbers. Then there are only a finite number of rational numbers p=q such that q > 1 and Y  ˇ 1 p ˇˇ ˇ jpj`  jqj`  ˇ ˇ < 2C"  q q `2S

We point out a first classical consequence of Ridout’s theorem.

Corollary 4.8. The real number K WD

is transcendental.

C1 X

1 2n 10 nD1

Proof. Let j be a positive integer and let us consider the rational number j WD

j X

10

2n

:

nD1 j

There exists an integer pj such that j D pj =qj with qj WD 102 . Observe that X 1 2 2 D < ; jK j j D 2j C1 102n .qj /2 10 n>j

and set S WD ¹2; 5º. An easy computation gives that Y  Y 1 jqj j` D jqj j`  jpj j` 6 qj `2S

and thus

Y `2S

 jqj j`  jpj j`  jK

`2S

pj =qj j
1

Boris Adamczewski and Jason Bell

926

Unfortunately, the word y does not have sufficiently large initial repetitive patterns to prove the transcendence of y by means of Roth’s theorem as we did in Theorem 4.6. To overcome this difficulty we use a trick based on Ridout’s theorem that was first introduced by Ferenczi and Mauduit [22]. Theorem 4.9. The 3-automatic real number y is transcendental. Proof. For every integer j > 0, set uj WD . j .012020//, vj WD . j .021012//, rj WD juj j and sj WD jvj j. Let us also consider the rational number j defined by j WD 0:uj vj! :

An easy computation shows that there exists an integer pj such that j D

pj 10rj .10sj

1/

(5)



On the other hand, one can check that y begins with the word . j .0120200210120210120// D uj vj vj . j .0//:

Since . j .0// is a prefix of vj , we obtain that the first rj C 2sj C j. j .0//j D 19  3j digits in the decimal expansion of y and of j are the same. We thus have jy

j j
1º is infinite. It thus follows from Theorem 4.7 that y is transcendental, concluding the proof.

25. Automata in number theory

927

4.2. The Schmidt subspace theorem and a proof of Cobham’s conjecture. A wonderful multidimensional generalisation of Roth’s theorem was obtained by Schmidt in the early 1970s (see [59]). It is now referred to as the Schmidt subspace theorem or, for short, as the subspace theorem. We state below a heavily simplified p -adic version of this theorem. However, Theorem 4.10 turns out to be strong enough for our purpose. Theorem 4.10. Let m > 2 be an integer and " be a positive real number. Let S be a finite set of distinct prime numbers. Let L1 ; : : : ; Lm be m linearly independent (over the field of algebraic numbers) linear forms with real algebraic coefficients. Then the set of solutions x D .x1 ; : : : ; xm / in Zm to the inequality m Y m Y  Y jxi j`  jLi .x/j 6 .max¹jx1 j; : : : ; jxm jº/ " i D1 `2S

i D1

lies in finitely many proper vector subspaces of Qm .

Let us note that Roth’s theorem easily follows from Theorem 4.10. Let 0 <  < 1 be a real algebraic number and let " be a positive real number. Consider the two independent linear forms X Y and X . Choosing S D ¹;º, Theorem 4.10 implies that all the integer solutions .p; q/ to jqj  jq

pj < jqj

"

(9)

are contained in a finite union of proper vector subspaces of Q2 . There thus is a finite set of equations x1 X C y1 Y D 0; : : : ; x t X C y t Y D 0 such that, for every solution .p; q/ to (9), there exists an integer k with xk p C yk q D 0. This means that there are only finitely many rational solutions to j p=qj < jqj 2 " , which immediately gives Roth’s theorem. Proof of Theorem 4.1. Let 0 <  < 1 be an automatic irrational real number. Then there is an integer base b > 2 such that the base-b expansion of  is a k -automatic word for some integer k > 2. Let a denote the base-b expansion of  . By Theorem 2.1, there exist a coding ' from an alphabet A D ¹1; 2; : : : ; rº to the alphabet ¹0; 1; : : : ; b 1º and a k -uniform morphism  from A into itself such that a D '.u/;

where u is a fixed point of  . By the pigeonhole principle, the prefix of length r C 1 of u can be written in the form w1 cw2 cw3 , where c is a letter and w1 , w2 , w3 are (possibly empty) finite words. For every integer j > 1, set uj D '. j .w1 //, vj D '. j .cw2 //, and vj0 D '. j .c//. Since  is a k -uniform morphism and ' is a coding, we get that juj j D s  k j ;

jvj j D t  k j ;

jvj0 j D k j ;

where s WD ju1 j and t WD jv1 j. Thus the base-b expansion of  begins with the word uj vj vj0 , that is,  D 0:uj vj vj0    :

Boris Adamczewski and Jason Bell

928

Let j be the rational number whose base-b expansion is the infinite word uj vj! , that is, j D 0:uj vj! : A simple computation shows that there exists an integer pj such that pj j D  j sk b .b t k j 1/ Since j and  have the same first .s C t C 1/  k j digits, we have 1 j j j <  .sCt C1/k j b Henceforth, we assume that  is an algebraic number, and we will reach a contradiction. Consider the three linearly independent linear forms with real algebraic coefficients: L1 .X1 ; X2 ; X3 / D X1 X2 X3 ; L2 .X1 ; X2 ; X3 / D X1 ; L3 .X1 ; X2 ; X3 / D X2 :

For j > 1, evaluating them on the integer triple

j

we obtain that

j

xj WD .x1.j / ; x2.j / ; x3.j / / WD .b .sCt /k ; b sk ; pj /; 3 Y

i D1

jLi .xj /j 6 b .2sCt

1/k j

(10)

:

On the other hand, letting S be the set of prime divisors of b , we get that 3 Y Y

i D1 `2S

jxi.j / j` 6

Y

`2S

j

jb .sCt /k j` 

Y

`2S

j

jb sk j` D b

Combining (10) and (11), we get that 3 Y 3 Y  Y jxi.j / j`  jLi .xj /j 6 b i D1 `2S

kj

.2sCt /k j

:

(11)

:

i D1

Set " D 1=.s C t/. We thus obtain 3 Y 3 Y  Y j j .j / jxi j`  jLi .xj /j 6 .max¹b .sCt /k ; b sk ; pj º/ " ; i D1 `2S

i D1

for every positive integer j . We then infer from Theorem 4.10 that all integer points xj lie in a finite number of proper vector subspaces of Q3 . Thus there exist a nonzero integer triple .z1 ; z2 ; z3 / and an infinite set of distinct positive integers J such that j

j

z1 b .sCt /k C z2 b sk C z3 pj D 0;

(12)

25. Automata in number theory

929

j

for every j in J. Recall that pj =b .sCt /k tends to  when j tends to infinity. Dividj ing (12) by b .sCt /k and letting j tend to infinity along J, we get that  is a rational number since .z1 ; z2 ; z3 / is a nonzero triple. This provides a contradiction.

5. The Skolem–Mahler–Lech theorem in positive characteristic 5.1. Zeros of linear recurrences over fields of characteristic zero. The Skolem– Mahler–Lech theorem is a celebrated result which describes the set of solutions in n to the equation a.n/ D 0, where a.n/ is a sequence satisfying a linear recurrence over a field of characteristic 0. We recall that if K is a field and a.n/ is a K-valued sequence, then a.n/ satisfies a linear recurrence over K if there exists a natural number d and values c1 ; : : : ; cd 2 K such that a.n/ D

d X

ci a.n

i/

i D1

for all sufficiently large values of n. The zero set of the linear recurrence a is defined by Z.a/ WD ¹n 2 NW a.n/ D 0º:

Theorem 5.1 (Skolem, Mahler, and Lech). Let a be a linear recurrence over a field of characteristic 0. Then Z.a/ is a union of a finite set and a finite number of arithmetic progressions. This theorem was first proved for linear recurrences over the rational numbers by Skolem [63]. It was next proved for linear recurrences over the algebraic numbers by Mahler [42]. The version above was proven first by Lech [35] and later by Mahler in [43] and [44]. This history of this theorem can be found in the book by Everest van der Poorten, Shparlinski, and Ward [21]. The techniques used by Lech to prove the Skolem–Mahler–Lech theorem are a modification of a method first used by Skolem [63]. The idea of the proof is to first note that it is no loss of generality to assume that K is a finitely generated extension of Q. We can then embed K in a p -adic field Qp for some prime p . One can then show that there exists a natural number a such that for each i D 0; : : : ; a 1, there is a p -adic analytic map i on Zp such that i .n/ D f .an C i / for all sufficiently large positive integers n 2 N. If f .an C i / is zero for infinitely many natural numbers n, then the map i has infinitely many zeros in Zp . Since an analytic function cannot have infinitely many zeros in a compact subset of its domain of convergence unless that function is identically zero, this implies that either f .an C i / D 0 for all n sufficiently large, or there are only finitely many n for which f .an C i / D 0, which gives the result. There are many different proofs and extensions of the Skolem–Mahler–Lech theorem in the literature, see [9], [28], [68], and [21]. These proofs all use p -adic methods in

Boris Adamczewski and Jason Bell

930

some way, although the result is valid in any field of characteristic 0. A well-known aspect of theorem SML is that it is an ineffective result. Indeed, it is still an open problem whether the set Z.a/ can always be determined for a given linear recurrence a.n/ defined over a field of characteristic 0 (see the discussions in [21] and [66]). In particular, it is still unknown whether the fact that Z.a/ is empty or not is a decidable question. 5.2. Zeros of linear recurrences over fields of positive characteristic 5.2.1. Pathological examples over fields of positive characteristic. It is interesting to note that the Skolem–Mahler–Lech theorem does not hold for fields K of positive characteristic. The simplest counter-example was given by Lech [35]. Let p be a prime and let K D Fp .t/ be the field of rational functions in one variable over Fp . Let a.n/ WD .1 C t/n

tn

1:

It is easy to check that a.n/ satisfies the recurrence a.n/

1/ C .1 C 3t C t 2 /a.n

.2 C 2t/a.n

2/

for n > 3. On the other hand, we have

a.p j / D .1 C t/p

j

tp

j

and a.n/ 6D 0 if n is not a power of p , and so we have

.t C t 2 /a.n

3/ D 0

1D0

Z.a/ D ¹1; p; p 2 ; p 3 ; : : :º:

In fact, there are even more pathological examples, which show that the correct analogue of the Skolem–Mahler–Lech theorem in positive characteristic is much more subtle. For example, consider the sequence a.n/ in F2 .x; y; z/ defined by a.n/ WD .x C y C z/n

.x C y/n

.x C z/n

.y C z/n C x n C y n C z n :

We note that if V denotes the K-vector space consisting of all K-valued sequences and S W V ! V is the “shift” linear operator that sends a sequence a.1/; a.2/; : : : to the sequence 0; a.1/; a.2/; : : : , then a.n/ satisfies a linear recurrence if and only if there is a nonzero polynomial P .t/ with coefficients in K such that when P .S / is applied to the sequence a.n/ we obtain a sequence whose terms are eventually zero. Then one can see that the operator .1

.x C y C z/S /.1

.x C y/S /.1

.y C z/S /.1

xS /.1

yS /.1

zS /

sends the sequence a.n/ to a sequence whose terms are eventually zero. We claim that the zero set of a.n/ is precisely all natural numbers n of the form 2i C 2j or of the form 2i . To see this, observe that a.2i / D 0 follows simply from the i i i fact that .b C c/2 D b 2 C c 2 for elements b and c in a field of characteristic 2. To check that a.2i C 2j / D 0 we note that G.x1 ; y1 ; z1 I x2 ; y2 ; z2 / WD .x1 C y1 C z1 /.x2 C y2 C z2 / .x1 C y1 /.x2 C y2 /

.x1 C z1 /.x2 C z2 /

.y1 C z1 /.y2 C z2 / C x1 x2 C y1 y2 C z1 z2

25. Automata in number theory

931

is identically zero in every field. Notice that if c1 ; c2 ; c3 2 F2 then i C2j

.c1 x C c2 y C c3 z/2

Hence

i

i

i

j

j

j

D .c1 x 2 C c2 y 2 C c3 z 2 /.c1 x 2 C c2 y 2 C c3 z 2 /: i

i

i

j

j

j

a.2i C 2j / D G.x 2 ; y 2 ; z 2 I x 2 ; y 2 ; z 2 / D 0:

On the other hand, if n is not a power of 2 or of the form 2i C 2j , then we can write n D 2i C 2j C 2k m where i > j , 2j > 2k m and m is an odd positive integer. Note that i

j

k

.x C y C z/n D .x C y C z/2 .x C y C z/2 ..x C y C z/2 /m i

i

i

j

j

j

k

k

k

D .x 2 C y 2 C z 2 /.x 2 C y 2 C z 2 /.x 2 C y 2 C z 2 /m : i

j

k

Consider the coefficient of x 2 y 2 z 2 m in .x C y C z/n . The only way to get this term i j k is to take x 2 from the first term in the product, y 2 from the second term, and z 2 m from the third term. Hence the coefficient is 1. Since .x C y C z/n is the only term in a.n/ that has monomials of the form x b y c z d with b; c; d > 0 appearing, we see that a.n/ is nonzero if the binary expansion of n has more than two 1’s. 5.2.2. Derksen’s theorem. We now give a remarkable result due to Derksen [17]. We have seen that the zero set of a linear recurrence in a field of characteristic p > 0 is often more pathological than in characteristic zero. At the same time, in our pathological examples, the base-p expansion of a number n gives insight into whether the n-th term of our linearly recurrent sequence vanishes. In fact, Derksen [17] shows that the zero set of a linearly recurrent sequence can always be described in terms of automata. Theorem 5.2 (Derksen). Let a be a linear recurrence over a field of characteristic p . Then the set Z.a/ is a p -automatic set. Derksen gives a further refinement of this result; however the main ingredient of his proof is the fact that the zero set is p -automatic. Furthermore, each step in Derksen’s proof can be made effective! We prove an extension of Derksen’s result for algebraic power series in several variables in the next section. To explain the connection between Derksen’s result and power series, we recall the following classical result. Proposition 5.3. Let K be a field and let a.n/ be a K-valued sequence. The following conditions are equivalent: i. the sequence a.n/ satisfies a linear recurrence over KI ii. there is a natural number d , a matrix A 2 Md .K/, and vectors v and w in Kd such D w T An vI P that a.n/ n iii. n>0 a.n/t is the power series expansion of a rational function in K.t/. Proof. (i) H) (ii). Suppose that a.n/ satisfies a linear recurrence a.n/ WD

d X

j D1

cj a.n

j/

Boris Adamczewski and Jason Bell

932

for all n > d . We let v.i / WD Œa.i /a.i C 1/    a.i C d

1/T

and w WD Œ1 0 0 : : : 0T :

Finally, we let

0

B B B A WD B B @

0 0 :: : 0 cd

1 0 :: : 0 cd

0 1 :: : 0 1

cd

2

0 0 :: : 

cd

3

 

0 0 :    :: 0 1    c1

1

C C C C: C A

Then one easily sees that v.i C 1/ D Av.i / and so w T An v D a.n/, where v D v.0/.

(ii) H) (iii). Set

f .t/ WD

1 X

.w T An v/t n :

nD0

By the Cayley–Hamilton theorem, A satisfies a polynomial d

A C

d X1

cj Aj D 0

d X1

cj w T Aj Cn v D 0

j D0

and hence w T AnCd v C Pd

j D0

1 d j / j D0 cj t

for all n. It follows that f .t/.1 C is a polynomial in t and so f .t/ is the power series expansion of a rational function. P n (iii) H) (i). Suppose that f .t/ D 1 nD0 a.n/t is the power series expansion of a rational function P .t/=Q.t/ with P .t/ and Q.t/ polynomials and Q.t/ nonzero. We P may assume that Q.0/ D 1. We write Q.t/ D 1 C jdD1 cj t j . Then P .t/ D f .t/Q.t/ P and so a.n/ C jdD1 cj a.n j / D 0 for all n larger than the degree of P .t/. It follows that a.n/ satisfies a linear recurrence. 5.3. Vanishing coefficients of algebraic power series. In light of Proposition 5.3, we may interpret Derksen’s result as a statement about the zero coefficients of the power series expansion of a rational power series over a field of characteristic p > 0. In this section, we show that this interpretation gives rise to a far-reaching generalisation of Derksen’s result. We first note that rational power series are a subset of algebraic power series (choosing m D 1 in the definition below).

25. Automata in number theory

933

Definition 5.1. Let K be a field. We say that a power series f .t/ D

1 X

nD0

a.n/t n 2 KŒŒt

is algebraic if it is algebraic over the field of rational functions K.t/, that is, if there exists a natural number m and polynomials A0 .t/; : : : ; Am .t/ 2 KŒt, with Am .t/ nonzero, such that m X Aj .t/f .t/j D 0: j D0

More generally, we say that f .t1 ; : : : ; td / 2 KŒŒt1 ; : : : ; td  is algebraic if there exist polynomials A0 ; : : : ; Am 2 KŒt1 ; : : : ; td , not all zero, such that m X

j D0

Aj .t1 ; : : : ; td /f .t1 ; : : : ; td /j D 0:

Given a multivariate power series X n f .t1 ; : : : ; td / D an1 ;:::;nd t1n1    td d 2 KŒŒt1 ; : : : ; td ; n1 ;:::;nd

we let Z.f / denote the set of vanishing coefficients, that is,

Z.f / D ¹.n1 ; : : : ; nd / 2 Nd W an1 ;:::;nd D 0º:

It is interesting to note that the Skolem–Mahler–Lech theorem in characteristic 0 has no analogue for multivariate rational functions. For instance, X f .t1 ; t2 / D .2m n2 /t1m t2n m;n

is a bivariate rational power series in QŒŒt1 ; t2  with

Z.f / D ¹.m; n/ j m  0 . mod 2/; n D 2m=2 º:

Thus we cannot expect the zero set to be given in terms of arithmetic progressions or even in terms of finite automata. To see some of the complexities that can occur in the multivariate case, consider the power series X f .t1 ; t2 / D .3m 2n 1/t1m t2n : m;n>0

We see that

f .t1 ; t2 / D .1

3t1 /

1

.1

t2 /

1

.1

t1 /

1

.1

2t2 /

1

.1

t1 /

1

.1

t2 /

1

;

and so it is a rational power series. On the other hand, the coefficient of t1m t2n is zero if and only if 3m D 2n C 1. It is now known that this occurs only when .m; n/ is .2; 3/ or .1; 1/, due to Mihăilescu’s solution to Catalan’s conjecture [48]. In general, finding the zero set often involves difficult diophantine problems.

934

Boris Adamczewski and Jason Bell

Remarkably, in positive characteristic an analogue of Derksen’s result holds for multivariate rational power series, as shown in [1] – in fact it even holds for multivariate algebraic power series! In the sequel of this chapter, we will use n and j to represent, respectively, the d -tuple of natural numbers .n1 ; : : : ; nd / and .j1 ; : : : ; jd /. We will also let t n denote the monomial t1n1    tdnd . Theorem 5.4 (Adamczewski and Bell). Let K be a field of characteristic p > 0 and let f .t/ 2 KŒŒt be the power series expansion of an algebraic function over K.t/. Then Z.f / is a p -automatic subset of Nd . We note that this immediately implies Theorem 5.2 by taking d D 1 and taking f .t/ to be a rational function. On the other hand, by taking K to be a finite field, Theorem 5.4 reduces to the difficult part of the multivariate version of Christol’s theorem (see Theorem 6.2). As with Derksen’s proof, it seems that Theorem 5.4 can be made effective. Furthermore, given any p -automatic set N in Nd , N is the zero set P n of the power series n62N t 2 Fp ..t// which is known to be algebraic over Fp .t/ by Theorem 6.2. At this level of generality, we thus have a nice correspondence between p -automatic sets and the zero set of algebraic multivariate functions over fields of characteristic p . 5.3.1. Proof of Theorem 5.4. In order to prove this result we need to introduce some notation. Let p be a prime number and let d be a natural number. For j D .j1 ; : : : ; jd / 2 ¹0; 1; : : : ; p 1ºd , we define ej W Nd ! Nd by ej .n1 ; : : : ; nd / WD .pn1 C j1 ; : : : ; pnd C jd /:

(13)

We let † denote the semigroup generated by the collection of all ej under composition. Remark 5.5. Note that if  is a finite set, then f W Nd !  is p -automatic if and only if the set of functions ¹f ı e j e 2 †º is a finite set. We also recall that a field K of characteristic p > 0 is perfect if the map x 7! x p is surjective on K. Let p be a prime P number and let K be a perfect field of characteristic p . For a power series f .t/ WD n2Nd a.n/t n 2 KŒŒt, we define X Ej .f .t// WD .a ı ej .n//1=p t n (14) n2Nd

for j 2 ¹0; 1; : : : ; p 1ºd . We let  denote the semigroup generated by the collection of Ej under composition. We let .f / denote the K-vector space spanned by all power series of the form E ı f with E 2 . We note that if g 2 .F / then E ı g 2 .f / for all E 2 . A theorem of Sharif and Woodcock [62] gives a concrete characterisation of the algebraic power series over a perfect field of positive characteristic.

25. Automata in number theory

935

Theorem 5.6 (Sharif and Woodcock). Let p be a prime number and let K be a perfect field of characteristic p . A power series f .t/ 2 KŒŒt is algebraic if and only if .f / is a finite-dimensional K-vector space. One can rephrase the theorem of Sharif and Woodcock in terms of the coefficients of an algebraic power series. Lemma 5.7. Let p be a prime number, let K be a perfect field of characteristic p , and let aW Nd ! K be a sequence with the property that X f .t/ WD a.n/t n 2 KŒŒt n2Nd

is a nonzero algebraic function over K.t/. Then there exists a natural number m and there exist maps a1 ; : : : ; am W Nd ! K with the following properties: P i. the formal power series fi .t/ WD n2Nd ai .n/t n , 1 6 i 6 m, form a basis of .f / as a K-vector space; ii. f1 D f ; P iii. if bW Nd ! K has the property that g.t/ WD n2Nd b.n/t n belongs to .f /, p p then b ı ej 2 Ka1 C    C Kam for every j 2 ¹0; : : : ; p 1ºd : Proof. Since f .t/ is algebraic, dimK ..f // is finite by Theorem 5.6. P We can thus pick maps a1 ; : : : ; am W Nd ! K such that the m power series fi .t/ WD n2Nd ai .n/t n form P a basis of .f /, and with f1 D f . Let bW Nd ! K be such that g.t/ WD n2Nd b.n/t n belongs to .f /. Observe that the power series g can be decomposed as X t j Ej .g.t//p : (15) g.t/ D j 2¹0;:::;p 1ºd

By assumption, Ej .g.t// 2 Kf1 .t/C  C Kfm .t/ and hence Ej .g.t//p 2 Kf1 .t/p C    C Kfm .t/p . Let j 2 ¹0; 1; : : : ; p 1ºd . Considering the coefficient of t pnCj in equation (15), we see that b ı ej .n/ is equal to the coefficient of t pn in Ej .g.t//p , which belongs to Ka1 .n/p C    C Kam .n/p . Before proving Theorem 5.4, we first fix a few notions. Given a finitely generated field extension K0 of Fp , we let Khpi 0 denote the subfield consisting of all elements of the form x p with x 2 K0 . Given Fp -vector subspaces V and W of K0 we let V W denote the Fp -subspace of K0 spanned by all products of the form vw with v 2 V; w 2 W . We let V hpi denote the Fp -vector subspace consisting of all elements of the form v p with v 2 V . We note that since K0 is a finitely generated field extension of Fp , K0 is a finite-dimensional Khpi 0 -vector space. If we fix a basis K0 D

r M i D1

Khpi 0 hi

936

Boris Adamczewski and Jason Bell

then we have projections 1 ; : : : ; r W K0 ! K0 defined by r X xD i .x/p hi :

(16)

i D1

Remark 5.8. For 1 6 i 6 r and a; b; c 2 K0 we have

i .c p a C b/ D ci .a/ C i .b/:

The last ingredient of the proof is a technical (but very useful) result due to Derksen, which we state here without proof. Proposition 5.9 (Derksen). Let K0 be a finitely generated field extension of Fp and let 1 ; : : : ; r W K0 ! K0 be as in equation (16). Let V be a finite-dimensional Fp -vector subspace of K0 . Then there exists a finite-dimensional Fp -vector subspace W of K0 containing V such that i .W V /  W for 1 6 i 6 r .

Proof of Theorem 5.4. By enlarging K if necessary, we may assume that K is perfect. By Lemma 5.7 we can find maps a1 ; : : : ; am W Nd ! K with the following properties: P 1. the power series fi .t/ WD n2Nd ai .n/t n , 1 6 i 6 m, form a basis for .f /; 2. f1 D f ; P 3. if bW Nd ! K has the property that g.t/ WD n2Nd b.n/t n belongs to .f /, p then b ı ej 2 Ka1p C    C Kam for every j 2 ¹0; : : : ; p 1ºd . In particular, given 1 6 i 6 m and j 2 ¹0; 1; : : : ; p 1ºd , there are elements .i; j ; k/, 1 6 k 6 m, such that m X ai ı ej D .i; j ; k/akp : (17) kD1

Since f1 ; : : : ; fm are algebraic power series, there exists a finitely generated field extension of Fp such that all coefficients of f1 ; : : : ; fm are contained in this field extension. It follows that the subfield K0 of K generated by the coefficients of f1 .t/; : : : ; fm .t/ and all the elements .i; j ; k/ is a finitely generated field extension of Fp . Since K0 is a finite-dimensional Khpi 0 -vector space, we can fix a basis ¹h1 ; : : : ; hr º of K0 , that is, r M K0 D Khpi 0 hi : i D1

Then we have projections 1 ; : : : ; r W K0 ! K0 defined by r X cD i .c/p hi : i D1

(18)

We let V denote the finite-dimensional Fp -vector subspace of K0 spanned by the elements .i; j ; k/, 1 6 i; k 6 m and j 2 ¹0; 1; : : : ; p 1ºd , and by 1. By equation (17), we have m X p ai ı ej 2 Vak ; (19) kD1

25. Automata in number theory

937

for 1 6 i 6 m and j 2 ¹0; 1; : : : ; p 1ºd . By Proposition 5.9 there exists a finitedimensional Fp -vector subspace W of K0 containing V such that i .W V /  W for 1 6 i 6 r . Set U WD W a1 C    C W am  ¹b j bW Nd ! K0 º: We note that Card U 6 .Card W /m < 1. Note also that if we take ` 2 ¹1; : : : ; rº, i 2 ¹1; : : : ; mº, and j 2 ¹0; 1; : : : ; p 1ºd then by equation (19) and Remark 5.8 we have p ` .W ai ı ej /  ` .W Va1p C    C W Vam /

 

m X

kD1 m X

kD1

` .W V /ak W ak D U:

Thus by Remark 5.8, if b 2 U and j 2 ¹0; 1; : : : ; p 1ºd , then b` WD ` .b ı ej / 2 U for 1 6 ` 6 r . In particular, b.pn C j / D 0 if and only if b1 .n/ D b2 .n/ D    D br .n/ D 0. Given bW Nd ! K0 , we let b W Nd ! ¹0; 1º be defined by ´ 0 if b.n/ 6D 0; b .n/ D 1 if b.n/ D 0: Set

S WD ¹b1    b t j t > 0; b1 ; : : : ; b t 2 U º: We note that since 2b D b for all b 2 U and U is finite, S is finite. Note that if b 2 U and j 2 ¹0; 1; : : : ; p 1ºd , then b` WD ` .b ı ej / 2 U for 1 6 ` 6 r . By the above remarks, r Y .b ı ej /.n/ D b` .n/; `D1

and so we see that if  2 S then  ı e 2 S for all e 2 †. Since S is finite, this proves that W Nd ! ¹0; 1º is p -automatic. In particular, since a.n/ D a1 .n/ 2 U , we obtain that a is p -automatic. In other words, the set of n 2 Nd such that a.n/ D 0 is a p -automatic set. This ends the proof.

6. The algebraic closure of Fp .t/ 6.1. Christol’s theorem. One of the most beautiful results in the theory of automatic sequences is Christol’s theorem, which characterises those formal power series with coefficients in a finite field that are algebraic over the field of rational functions. Theorem Let K be a finite field of characteristic p > 0. Then P 6.1 (Christol). n f .t/ D 1 n>0 a.n/t 2 KŒŒt is algebraic over K.t/ if and only if the sequence a.n/ is p -automatic.

938

Boris Adamczewski and Jason Bell

Christol’s theorem consists of two parts: the “easy” direction in which one shows that if the sequence of coefficients of a Laurent series is p -automatic, then the Laurent series is algebraic, and the “hard” direction in which one must show that the coefficients of an algebraic Laurent series form a p -automatic sequence. The hard direction is generally proved using Ore’s lemma, which is the observation that if f .t/ is algebraic over 2 a field K.t/, then the set ¹f; f p ; f p ; : : :º is linearly dependent over K.t/. Christol’s theorem was generalised to multivariate formal power series by Salon [58]. Theorem 6.2 (Salon). Let K be a finite field of characteristic p > 0. Then f .t/ D P n n2Nd a.n/t 2 KŒŒt is algebraic if and only if the sequence a.n/ is p -automatic. Salon’s theorem turns out to be a special case of Theorems 5.6 and 5.4.

Proof of Theorem 6.2. We suppose first that aW Nd ! K is p -automatic and we consider the power series X f .t/ WD a.n/t n : n2Nd

Using the notation of equations 13 and 14, we infer from Remark 5.5 that there are only finitely many distinct functions of the form a ı e where e runs over †. Consequently, there are only finitely many functions of the form E ı f where E runs over . Thus .f / is finite-dimensional and Theorem 5.6 implies that f .t/ is algebraic. We next suppose that f .t/ is algebraic and let c 2 K. Since f .t/ is algebraic, then so is f .t/ c and by Theorem 5.4 the set Sc of d -tuples of natural numbers n such that a.n/ c D 0 is p -automatic. It follows that the sequence ac W Nd ! K given byPac .n/ D 1 if n 2 Sc and ac .n/ D 0 otherwise is p -automatic. Thus a.n/ D c2K cac .n/ is also p -automatic, as p -automatic sequences taking values in a field are closed under the taking of finite sums and scalar multiplication. While Christol’s theorem can be extended to give a concrete description of the elements of Fq ..t// that are algebraic over Fq .t/, it does not give the whole picture. As Kedlaya [32] points out, the field Fq ..t// is far from being algebraically closed. Indeed, for an algebraically closed field K of characteristic 0, a classical result of Puiseux is that the field 1 [ K..t 1= i // i D1

is itself algebraically closed and contains, in particular, the algebraic closure of K.t/. However, over field of positive characteristic, the situation is more subtle. In particular, the algebraic closure of Fq ..t// is much more complicated to describe, due to the existence of wildly ramified field extensions. For instance, Chevalley remarked [13] that the Artin–Schreier polynomial x p x 1=t does not split in the Puiseux field SC1 1=n //. nD1 Fq ..t 6.2. Generalised power series. It turns out that the appropriate framework to describe the algebraic closure of Fp .t/ is provided by the fields of generalised power series Fq ..t Q // introduced by Hahn [27]. We briefly describe this construction.

25. Automata in number theory

939

We recall that a subset S of a totally ordered group is said to be well ordered if every nonempty subset of S has a minimal element or, equivalently, if there is no infinite decreasing sequence within S . Given a commutative ring R and a totally ordered Abelian group G we construct a commutative ring, which we denote R..t G //, which is defined to be the collection of all elements of the form X f .t/ WD r˛ t ˛ ˛2G

which satisfy the following conditions:

i. r˛ 2 R for all ˛ 2 G ; ii. the support of f .t/ is well ordered, that is, the subset ¹˛W r˛ ¤0R º is a wellordered set. Addition and multiplication are defined via the rules X X X .r˛ C s˛ /t ˛ s˛ t ˛ D r˛ t ˛ C

and

X

˛2G

˛2G

˛2G

˛2G

r˛ t ˛

 X

˛2G

 XX .rˇ s˛ s˛ t ˛ D

ˇ /t

˛

:

˛2G ˇ 2G

We note that the fact that the supports of valid series expansions are well ordered means that no problems with possible infinite sums appearing in the expression for the coefficients in a product of two generalised power series will occur. We call the ring R..t G // the ring of generalised power series over R with exponents in G . We recall that a group is divisible if for every g 2 G and n > 1, there exists some h 2 G such that hn D g . For an algebraically closed field K and a totally ordered divisible Abelian group G , the field K..t G // is known to be algebraically closed [31] (see also [32] and [61]). In what follows, we will only consider the particular case of the divisible group Q and of a finite field Fq (q being a power of a prime p ). We then have the series of containments Fq .t/  Fq ..t//  Fq ..t Q //:

Though Fq ..t Q // is not algebraically closed, it is sufficient for our purpose to S consider such fields. Indeed, taking n>1 FpnŠ as an algebraic closure of Fp , it follows  Q S from the remark above that the field n>1 Fp nŠ ..t // is algebraically closed. For example, the Artin–Schreier polynomial x p x 1=t does split in Fp ..t Q //. Indeed, we can check that the generalised power series cC

are the roots of this polynomial.

1 X i D1

t

1=p i

;

c 2 Fp ;

Boris Adamczewski and Jason Bell

940

6.3. Kedlaya’s theorem. Kedlaya [32] considered whether one can, as in Christol’s theorem, give an automaton-theoretic characterisation of the elements of Fq ..t Q // that are algebraic over Fq .t/. The work of Kedlaya [33] is thus precisely devoted to a description of the algebraic closure of Fp .t/ as a subfield of generalised power series. For this purpose, Kedlaya introduces the notion of a p -quasi-automatic function over the rational numbers. Kedlaya uses automata to produce power series whose exponents take values in the rational numbers. Hence it is necessary to create automata which accept rational numbers as opposed to just accepting integers. We now explain how Kedlaya does this. Let k > 1 be a positive integer. We set †0k D ¹0; 1; : : : ; k

1; º

and we let L.k/ denote the language on the alphabet †0k consisting of all words on †0k with exactly one occurrence of the letter ‘’ (the radix point) and whose first and last letters are not equal to 0. This is a regular language (see Lemma 2.3.3 in [33]. We let Sk denote the set of nonnegative k -adic rationals, that is, Sk D ¹a=k b j a; b 2 Z; a > 0º:

We note that there is a bijection Œk W L.k/ ! Sk given by s1    si

1 si C1

   sn 2 L.k/ 7 !

i 1 X

j D1

sj k i

1 j

C

n X

sj k i

j

;

j Di C1

where s1 ; : : : ; si 1 ; si C1 ; : : : ; sn 2 ¹0; 1; : : : ; k 1º. So, for example, we have Œ110324 D Œ2087510 D 167=8. We also note that the fact that we exclude strings whose initial and terminal letters are 0 means that we have the awkward looking expression Œk D 0.

Definition 6.1. We say that a map hW Sk !  is k -automatic if there is a finite state machine which takes words on †0k as input such that for each W 2 Lk , h.ŒW k / is generated by the machine using the word W as input.

Since the support of a generalised power series is well ordered, we need a more general notion of automatic functions defined over the set of rationals. For this purpose, we always implicitly consider sets  containing a special element called zero, which we let 0 denote (of course, when  is a subset of R or N, or if it denotes a finite field, zero will preserve its usual meaning). Then we will talk about functions hW Q !  as being k -automatic if their support is contained in Sk and the restriction of h to Sk is k -automatic (the support of such a function being defined as S D ¹˛ 2 QW h.˛/ ¤ 0º). Example 6.1. For w 2 L.2/, define ´ 0 if there are an even number of 10 s in wI h.Œw2 / D 1 otherwise. Then hW S2 ! ¹0; 1º is K2 -automatic.

25. Automata in number theory 0;

941

0; 1

A=0

B=1

1 Figure 7. The DFAO associated with the function h of Example 6.1

Definition 6.2. Let k be a positive integer, let  be a finite set containing a special element 0, and let hW Q ! . We say that h is k -quasi-automatic if it satisfies the following conditions:

i. the support S of h is well ordered; ii. there exist a positive integer a and an integer b such that the set aS Cb consists of nonnegative k -adic rationals and the map h..x b/=a/ is a k -automatic function from Sk to . We are now ready to state Kedlaya’s theorem. Theorem P 6.3 (Kedlaya). Let p be a prime, let q be a power of p , and let aW Q ! Fq . Then ˛2Q a.˛/t ˛ is algebraic over Fq .t/ if and only if the function aW Q ! Fq is p -quasi-automatic. In light of Salon’s result [58], Kedlaya asked whether his theorem has an extension Q to multivariate generalised power series Fq ..t1Q ; : : : ; tm //. As far as we know, this problem has not yet been solved.

7. Update Since the writing of this chapter in 2010 there has been additional work related to the topics we just discussed. Here we point out a few such references. Concerning § 2.3, we mention the papers of Mauduit and Rivat [47], Martin, Mauduit, and Rivat [45], and Müllner [51]. Concerning § 4, we mention an extension of Theorem 4.1 to deterministic pushdown automata due to Adamczewski, Cassaigne and Le Gonidec [4]. Also, a new proof of Theorem 4.1 and some generalisations have been obtained recently by using the so-called “Mahler method” (see Philippon [52], Adamczewski and Faverjon [5] and [6]). Concerning § 5, we mention the work of Derksen and Masser [18] and [19], Leitnik [36] and [37], and Bell and Moosa [8]. Concerning § 6, we mention the papers of Kedlaya [34] and Bridy [11]. Acknowledgement. The work of this article was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under the Grant Agreement № 648132.

942

Boris Adamczewski and Jason Bell

References [1] B. Adamczewski and J. Bell, On vanishing coefficients of algebraic power series over fields of positive characteristic. Invent. Math. 187 (2012), no. 2, 343–393. MR 2885622 Zbl 1257.11027 q.v. 934 [2] B. Adamczewski and Y. Bugeaud, On the complexity of algebraic numbers. I. Expansions in integer bases. Ann. of Math. (2) 165 (2007), no. 2, 547–565. MR 2299740 Zbl 1195.11094 q.v. 922 [3] B. Adamczewski and J. Cassaigne, Diophantine properties of real numbers generated by finite automata. Compos. Math. 142 (2006), no. 6, 1351–1372. MR 2278750 Zbl 1134.11011 q.v. 923 [4] B. Adamczewski, J. Cassaigne, and M. Le Gonidec, On the computational complexity of algebraic numbers: the Hartmanis–Stearns problem revisited. Trans. Amer. Math. Soc. 373 (2020), no. 5, 3085–3115. MR 4082234 Zbl 1441.11048 q.v. 941 [5] B. Adamczewski and C. Faverjon, Méthode de Mahler: relations linéaires, transcendance et applications aux nombres automatiques. Proc. Lond. Math. Soc. (3) 115 (2017), no. 1, 55–90. MR 3669933 Zbl 06774679 q.v. 941 [6] B. Adamczewski and C. Faverjon, Méthode de Mahler, transcendance et relations linéaires : aspects effectifs. J. Théor. Nombres Bordeaux 30 (2018), no. 2, 557–573. MR 3891327 Zbl 07081562 q.v. 941 [7] J.-P. Allouche and J. O. Shallit, Automatic sequences. Theory, applications, generalizations. Cambridge University Press, Cambridge, 2003. MR 1997038 Zbl 1086.11015 q.v. 914, 920 [8] J. Bell and R. Moosa, F -sets and finite automata. J. Théor. Nombres Bordeaux 31 (2019), no. 1, 101–130. MR 3994721 q.v. 941 [9] J.-P. Bézivin, Une généralisation du théorème de Skolem-Mahler-Lech. Quart. J. Math. Oxford Ser. (2) 40 (1989), no. 158, 133–138. MR 0997644 Zbl 0678.10040 q.v. 929 [10] E. Borel, Les probabilités dénombrables et leurs applications arithmétiques. Rend. Circ. Mat. Palermo (2) 27 (1909), 247–271. JFM 40.0283.01 q.v. 921 [11] A. Bridy, Automatic sequences and curves over finite fields. Algebra Number Theory 11 (2017), no. 3, 685–712. MR 3649365 Zbl 06722479 q.v. 941 [12] J. R. Chen, On the representation of a large even integer as the sum of a prime and the product of at most two primes. Kexue Tongbao (Foreign Lang. Ed.) 17 (1966), 385–386. MR 0207668 q.v. 920 [13] C. Chevalley, Introduction to the theory of algebraic functions of one variable. Mathematical Surveys, VI. American Mathematical Society, New York, N.Y., 1951. MR 0042164 Zbl 0045.32301 q.v. 938 [14] A. Cobham, On the Hartmanis–Stearns problem for a class of tag machines. In 9 th Annual Symposium on Switching and Automata Theory (SWAT 1968). Held in Schenedtady, N.Y., USA, October 15–18, 1968. IEEE Computer Society, Los Alamitos, CA, 1968, 51–60. Also appeared as IBM Research Technical Report RC-2178, August 23, 1968. IEEEXplore 4569556 q.v. 922 [15] A. Cobham, Uniform tag sequences. Math. Systems Theory 6 (1972), 164–192. MR 0457011 Zbl 0253.02029 q.v. 915 [16] H. Cramér, On the order of magnitude of the difference between consecutive prime numbers. Acta Arith. 2 (1936), 23–46. JFM 62.1147.01 Zbl 0015.19702 q.v. 919 [17] H. Derksen, A Skolem–Mahler–Lech theorem in positive characteristic and finite automata. Invent. Math. 168 (2007), no. 1, 175–224. MR 2285750 Zbl 1205.11030 q.v. 931

25. Automata in number theory

943

[18] H. Derksen and D. Masser, Linear equations over multiplicative groups, recurrences, and mixing I. Proc. Lond. Math. Soc. (3) 104 (2012), no. 5, 1045–1083. MR 2928336 Zbl 1269.11062 q.v. 941 [19] H. Derksen and D. Masser, Linear equations over multiplicative groups, recurrences, and mixing II. Indag. Math. (N.S.) 26 (2015), no. 1, 113–136. MR 3281694 Zbl 1314.11070 q.v. 941 [20] S. Eilenberg, Automata, languages, and machines. Vol. A. Pure and Applied Mathematics, 58. Academic Press, New York and London, 1974. MR 0530382 Zbl 0317.94045 q.v. 914, 916, 919 [21] G. Everest, A. Van Der Poorten, I. Shparlinski, and T. Ward, Recurrence sequences. Mathematical Surveys and Monographs, 104. American Mathematical Society, Providence, R.I., 2003. MR 1990179 Zbl 1033.11006 q.v. 929, 930 [22] S. Ferenczi and C. Mauduit, Transcendence of numbers with a low complexity expansion. J. Number Theory 67 (1997), no. 2, 146–161. MR 1486494 Zbl 0895.11029 q.v. 926 [23] E. Fouvry and C. Mauduit, Sommes des chiffres et nombres presque premiers. Math. Ann. 305 (1996), no. 3, 571–599. MR 1397437 Zbl 0858.11050 q.v. 920 [24] A. O. Gelfond, Sur les nombres qui ont des propriétés additives et multiplicatives données. Acta Arith. 13 (1967/68), 259–265. MR 0220693 Zbl 0155.09003 q.v. 921 [25] D. A. Goldston, J. Pintz, and C. Y. Yıldırım, Primes in tuples. I. Ann. of Math. (2) 170 (2009), no. 2, 819–862. MR 2552109 Zbl 1207.11096 q.v. 919 [26] B. Green and T. Tao, The primes contain arbitrarily long arithmetic progressions. Ann. of Math. (2) 167 (2008), no. 2, 481–547. MR 2415379 Zbl 1191.11025 q.v. 919 [27] H. Hahn, Gesammelte Abhandlungen/Collected works. Band 1/Vol. 1. With biographical sketches by K. Popper and by L. Schmetterer, and K. Sigmund, and commentaries on Hahn’s work by H. Heuser, H. Sagan, and L. Fuchs. Edited by Schmetterer and Sigmund and with a foreword by Popper. Springer, Vienna, 1995. MR 1361405 q.v. 938 [28] G. Hansel, Une démonstration simple du théorème de Skolem–Mahler–Lech. Theoret. Comput. Sci. 43 (1986), no. 1, 91–98. MR 0847905 Zbl 0605.10007 q.v. 929 [29] J. Hartmanis and H. Shank, On the recognition of primes by automata. J. Assoc. Comput. Mach. 15 (1968), 382–389. MR 0235916 Zbl 0164.05201 q.v. 919 [30] J. Hartmanis and R. E. Stearns, On the computational complexity of algorithms. Trans. Amer. Math. Soc. 117 (1965), 285–306. MR 0170805 Zbl 0131.15404 q.v. 921, 922 [31] I. Kaplansky, Maximal fields with valuations. Duke Math. J. 9 (1942), 303–321. MR 0006161 Zbl 0061.05506 Zbl 0063.03135 q.v. 939 [32] K. S. Kedlaya, The algebraic closure of the power series field in positive characteristic. Proc. Amer. Math. Soc. 129 (2001), no. 12, 3461–3470. MR 1860477 Zbl 1012.12007 q.v. 938, 939, 940 [33] K. S. Kedlaya, Finite automata and algebraic extensions of function fields. J. Théor. Nombres Bordeaux 18 (2006), no. 2, 379–420. MR 2289431 Zbl 1161.11317 q.v. 940 [34] K. S. Kedlaya, On the algebraicity of generalized power series. Beitr. Algebra Geom. 58 (2017), no. 3, 499–527. MR 3683025 Zbl 06774993 q.v. 941 [35] C. Lech, A note on recurring series. Ark. Mat. 2 (1953), 417–421. MR 0056634 Zbl 0051.27801 q.v. 929, 930 [36] D. J. Leitner, Linear equations over multiplicative groups in positive characteristic. Acta Arith. 153 (2012), no. 4, 325–347. MR 2925376 Zbl 1292.11053 q.v. 941 [37] D. J. Leitner, Linear equations over multiplicative groups in positive characteristic II. J. Number Theory 180 (2017), 169–194. MR 3679792 Zbl 1406.11025 q.v. 941

944

Boris Adamczewski and Jason Bell

[38] J. Liouville, Sur des classes très étendues de quantités dont la valeur n’est ni algébrique, ni même reductible à des irrationelles algébriques. C. R. Acad. Sci. Paris 18 (1844), 883–885, 910–911. q.v. 922 [39] J. H. Loxton and A. J. Van Der Poorten, Arithmetic properties of the solutions of a class of functional equations. J. Reine Angew. Math. 330 (1982), 159–172. MR 0641817 Zbl 0468.10019 q.v. 922 [40] J. H. Loxton and A. J. Van Der Poorten, Arithmetic properties of automata: regular sequences. J. Reine Angew. Math. 392 (1988), 57–69. MR 0965057 Zbl 0656.10033 q.v. 922 [41] K. Mahler, Arithmetische Eigenschaften der Lösungen einer Klasse von Funktionalgleichungen. Math. Ann. 101 (1929), no. 1, 342–366. Corrigendum, ibid. 103 (1930), no. 1, 532. MR 1512537 MR 1512635 (corrigendum) JFM 55.0115.01 JFM 56.0185.02 (corrigendum) q.v. 922 [42] K. Mahler, Eine arithmetische Eigenshaft der Taylor-Koeffizienten rationaler Funktionen. Proc. Akad. Wet. Amsterdam 38 (1935), 50–60. JFM 61.0176.02 Zbl 0010.39006 q.v. 929 [43] K. Mahler, On the Taylor coefficients of rational functions. Proc. Cambridge Philos. Soc. 52 (1956), 39–48. MR 0074503 Zbl 0070.04004 q.v. 929 [44] K. Mahler, Addendum to the paper “On the Taylor coefficients of rational functions.” Proc. Cambridge Philos. Soc. 53 (1957), 544. MR 0089255 Zbl 0077.05205 q.v. 929 [45] B. Martin, C. Mauduit, and J. Rivat, Fonctions digitales le long des nombres premiers. Acta Arith. 170 (2015), no. 2, 175–197. MR 3383644 Zbl 1395.11024 q.v. 941 [46] C. Mauduit and J. Rivat, Sur un problème de Gelfond: la somme des chiffres des nombres premiers. Acta Arith. 170 (2015), no. 2, 175–197. MR 3383644 Zbl 1213.11025 q.v. 921 [47] C. Mauduit and J. Rivat, Prime numbers along Rudin–Shapiro sequences. J. Eur. Math. Soc. (JEMS) 17 (2015), no. 10, 2595–2642. MR 3420517 Zbl 1398.11121 q.v. 941 [48] P. Mihăilescu, Primary cyclotomic units and a proof of Catalan’s conjecture. J. Reine Angew. Math. 572 (2004), 167–195. MR 2076124 Zbl 1067.11017 q.v. 933 [49] M. Minsky and S. Papert, Unrecognizable sets of numbers. J. Assoc. Comput. Mach. 13 (1966), 281–286. MR 0207481 Zbl 0166.00601 q.v. 917, 919 [50] M. Morse and G. A. Hedlund, Symbolic Dynamics. Amer. J. Math. 60 (1938), no. 4, 815–866. MR 1507944 JFM 64.0798.04 Zbl 0019.33502 q.v. 921 [51] C. Müllner, Automatic sequences fulfill the Sarnak conjecture. Duke Math. J. 166 (2017), no. 17, 3219–3290. MR 3724218 Zbl 06825580 q.v. 941 [52] P. Philippon, Groupes de Galois et nombres automatiques. J. Lond. Math. Soc. (2) 92 (2015), no. 3, 596–614. MR 3431652 Zbl 1391.11087 q.v. 941 [53] J. Pintz, Cramér vs. Cramér. On Cramér’s probabilistic model for primes. Funct. Approx. Comment. Math. 37 (2007), part 2, 361–376. MR 2363833 Zbl 1226.11096 q.v. 919 [54] P. Ribenboim, The new book of prime number records. Springer, New York, 1996. q.v. 920 [55] D. Ridout, Rational approximations to algebraic numbers. Mathematika 4 (1957), 125–131. MR 0093508 Zbl 0079.27401 q.v. 925 [56] R. W. Ritchie, Finite automata and the set of squares. J. Assoc. Comput. Mach. 10 (1963), 528–531. MR 0167374 Zbl 0118.12601 q.v. 917 [57] K. F. Roth, Rational approximations to algebraic numbers. Mathematika 2 (1955), 160–167. Corrigendum idib., 169. MR 0072182 Zbl 0064.28501 q.v. 923 [58] O. Salon, Suites automatiques à multi-indices et algébricité. C. R. Acad. Sci. Paris Sér. I Math. 305 (1987), no. 12, 501–504. MR 0916320 Zbl 0628.10007 q.v. 917, 919, 938, 941

25. Automata in number theory

945

[59] W. M. Schmidt, Diophantine approximation. Lecture Notes in Mathematics, 785. Springer, Berlin, 1980. MR 0568710 Zbl 0421.10019 q.v. 927 [60] M.-P. Schützenberger, A remark on acceptable sets of numbers. J. Assoc. Comput. Mach. 15 (1968), 300–303. MR 0238634 Zbl 0165.02204 q.v. 919 [61] J.-P. Serre, Local fields. Translated from the French by M. J. Greenberg. Graduate Texts in Mathematics, 67. Springer, Berlin, 1979. MR 0554237 Zbl 0423.12016 q.v. 939 [62] H. Sharif and C. F. Woodcock, Algebraic functions over a field of positive characteristic and Hadamard products. J. London Math. Soc. (2) 37 (1988), no. 3, 395–403. MR 0939116 Zbl 0612.12018 q.v. 934 [63] T. Skolem, Ein Verfahren zur Behandlung gewisser exponentialer Gleichungen und diophantischer Gleichungen. In 8. Skand. Mat. Kongr., Stockholm, 1934, 163–188. JFM 61.1080.01 Zbl 0011.39201 q.v. 929 [64] K. Soundararajan, The distribution of prime numbers. In Equidistribution in number theory, an introduction (A. Granville and Z. Rudnick, eds.). Proceedings of the NATO Advanced Study Institute (the 44th Séminaire de Mathématiques Supérieures, SMS) held at the Université de Montréal, Montréal, QC, July 11–22, 2005. NATO Science Series II: Mathematics, Physics and Chemistry, 237. Springer, Dordrecht, 2007, 59–83. MR 2290494 Zbl 1141.11043 q.v. 919 [65] K. Soundararajan, Small gaps between prime numbers: the work of Goldston–Pintz– Yıldırım. Bull. Amer. Math. Soc. (N.S.) 44 (2007), no. 1, 1–18. MR 2265008 Zbl 1193.11086 q.v. 919 [66] T. Tao, Structure and randomness. Pages from year one of a mathematical blog. American Mathematical Society, Providence, R.I., 2008. MR 2459552 Zbl 1245.00024 q.v. 919, 930 [67] A. M. Turing, On computable numbers, with an application to the Entscheidungsproblem. Proc. London Math. Soc. (2) 42 (1936), no. 3, 230–265. MR 1577030 JFM 62.1059.03 Zbl 0016.09701 q.v. 921 [68] A. J. Van Der Poorten, Some facts that should be better known, especially about rational functions. In Number theory and applications (R. A. Mollin, ed.). NATO Advanced Science Institutes Series C: Mathematical and Physical Sciences, 265. Kluwer Academic Publishers Group, Dordrecht, 1989, 497–528. MR 1123092 Zbl 0687.10007 q.v. 929

Chapter 26

On Cobham’s theorem Fabien Durand and Michel Rigo

Contents 1. 2. 3. 4. 5. 6. 7.

Introduction . . . . . . . . . . . . . . . . . . . Numeration basis . . . . . . . . . . . . . . . . Automatic sequences . . . . . . . . . . . . . . Multidimensional extension and first-order logic Numeration systems and substitutions . . . . . . Cobham’s theorem in various contexts . . . . . Decidability issues . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

947 949 953 955 959 970 977

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

978

1. Introduction In this chapter we essentially focus on the representation of non-negative integers in a given numeration system. The main role of such a system – like the usual integer base-k numeration system – is to replace numbers, or more generally sets of numbers, by their corresponding representations, i.e., by words or languages. First we consider integer base numeration systems to present the main concepts, but rapidly we will introduce non-standard systems and their relationship with substitutions. Let k 2 N>2 be an integer, where N>2 denotes the set of non-negative integers greater than or equal to 2. The set ¹0; : : : ; kº is denoted by ŒŒ0; k. If we do not allow leading zeroes when representing numbers, the function mapping a non-negative integer n onto its k -ary representation repk .n/ 2 ŒŒ0; k 1 is a one-to-one correspondence. In particular, 0 is assumed to be represented by the empty word ". In the literature, one also finds notation such as hnik , .n/k , or k .n/ instead of repk .n/. Hence every subset X  N is associated with the language repk .X / consisting of the k -ary representations of the elements of X . It is natural to study the relation between the arithmetic or number-theoretic properties of integers and the syntactical properties of the corresponding representations in a given numeration system. We focus on those sets X  N for which a finite automaton can be used to decide, for any given word w over ŒŒ0; k 1, whether or not w belongs to repk .X /. Sets having the property that repk .X / is regular1 are called k -recognizable sets. Such a set can be considered as a particularly simple set, because using the 1 We use the terminology of regular language, instead of rational language.

948

Fabien Durand and Michel Rigo

k -ary numeration system it has a somehow elementary algorithmic description. In the framework of infinite-state system verification, one also finds the terminology of number decision diagram or NDD [129]. The essence of Cobham’s theorem is to express that the property of being recognisable by a finite automaton strongly depends on the choice of the base and more generally on the considered numeration system. Naturally, this fact leads to and motivates the introduction and the study of recognisable sets in non-standard numeration systems. Considering alternative numeration systems may provide new recognisable sets and these non-standard systems also have applications in computer arithmetic [61]. Last but not least, the proof of Cobham’s theorem is non-trivial and relies on quite elaborate arguments. Now let us state Cobham’s celebrated result from 1969 and give all the needed details and definitions. Several surveys have been written on the same subject; see [25], [26], [28], and [102].

Definition 1.1. Let ˛; ˇ > 1 be two real numbers. If the equation ˛ m D ˇ n with m; n 2 N has only the trivial integer solution m D n D 0, then ˛ and ˇ are said to be multiplicatively independent. Otherwise, ˛ and ˇ are said to be multiplicatively dependent. Definition 1.2. A subset of N is ultimately periodic if it is the union of a finite set and a finite number of infinite arithmetic progressions. In particular, X is ultimately periodic if and only if there exist N > 0 and p > 1 such that for all n > N , n 2 X () n C p 2 X . Recall that an arithmetic progression is a set of the form aN C b WD ¹an C b j n > 0º. Theorem 1.1 (Cobham’s theorem [36]). Let k; ` > 2 be two multiplicatively independent integers. A set X  N is both k -recognisable and `-recognisable if and only if it is ultimately periodic.

In the various contexts that we will describe, showing that an ultimately periodic set is recognisable is always the easy direction to prove (see Remark 1.3). So we focus on the other direction. Let k; ` > 2 be two integers. Notice that k and ` are multiplicatively independent if and only if log k= log ` is irrational. Note that for k and ` to be multiplicatively dependent, it is not enough that k and ` share exactly the same prime factors occurring in their decomposition. For instance, 6 and 18 are multiplicatively independent. But coprime integers are multiplicatively independent. The irrationality of log k= log ` is a crucial point in the proof of Cobham’s theorem (see § 5.3). Recall that if  > 0 is irrational, then the set ¹¹nº j n > 0º of fractional parts of the multiples of  is dense in Œ0; 1. For a proof of the so-called Kronecker theorem; see [68]. Remark 1.2. Multiplicative dependence is an equivalence relation M over N>2 . If k and ` are multiplicatively dependent, then there exist a minimal q > 2 and two positive integers m; n such that k D q m and ` D q n . Let us give the first (with respect to their

26. On Cobham’s theorem

949

minimal element) few equivalence classes for M partitioning N>2 : Œ2M , Œ3M , Œ5M , Œ6M , Œ7M , Œ10M , Œ11M , Œ12M , . . . Remark 1.3. We show that if a set X  N is ultimately periodic then, for all k > 2, X is k -recognisable. In the literature, one also finds the terminology of a recognisable set X (without any mention to a base), meaning that X is k -recognisable for all k > 2. Note that a finite union of regular languages is again a regular language. Hence it is enough to check that repk .aN C b/ is regular with 0 6 b < a. We can indeed assume that b < a because if we add a finite number of words to a regular language or if we or remove a finite number of words from a regular language, we still have a regular language. Consider a DFA having Q D ŒŒ0; a 1 as its set of states. For all state i 2 Q and d 2 ŒŒ0; k 1, the transitions are given by d

i ! ki C d mod a:

The initial state is 0 and the unique final state is b . As an example, a DFA accepting exactly the binary representations of the integers congruent to 3 mod 4 is given in Figure 1. 1

1

1

0

1 1 0

0

0

2

0

3

Figure 1. A finite automaton accepting rep2 .4N C 3/

A study of the minimal automaton recognising such divisibility criteria expressed in an integer base is given in [3]. Also see the discussion in [116] (Prologue). The fact that a divisibility criterion exists in every base for any fixed divisor was already observed by Pascal in [101], pp. 84–89.

2. Numeration basis It is remarkable that the recognisability of ultimately periodic sets extends to wider contexts (see Proposition 2.6 and Theorem 5.1). Let us introduce our first generalisation of the integer base numeration system. Definition 2.1. A numeration basis is a sequence U D .Un /n>0 of integers such that U is increasing, U0 D 1 and that the set ¹Ui C1 =Ui j i > 0º is bounded. This latter condition ensures the finiteness of the alphabet of digits used to represent integers. If w D w`    w0 is a word over a finite alphabet A  Z then the numerical value of w is A;U .w/ D

` X i D0

wi Ui :

950

Fabien Durand and Michel Rigo

Using the greedy algorithm [59], any integer n has a unique (normal) U -representation repU .n/ D w`    w0 , which is a finite word over a minimal finite alphabet called the canonical alphabet of U and denoted by AU . The normal U -representation satisfies AU ;U .repU .n// D n

and AU ;U .wi    w0 / < Ui C1

for all i 2 ŒŒ0; `

1:

Again, repU .0/ D ". See Chapter 7 of [88] or Ch. Frougny and J. Sakarovitch’s chapter in Chapter 2 of [12]. A subset X  N is U -recognisable if repU .X / is accepted by a finite automaton. Let B  Z be a finite alphabet. If w 2 B  is such that B;U .w/ > 0, then the function mapping w onto repU .B;U .w// is called normalisation. Definition 2.2. A numeration basis U is said to be linear if there exist k 2 N n ¹0º, d1 ; : : : ; dk 2 Z, dk ¤ 0, such that, for all n > k , U n D d1 U n

1

C    C dk U n

k:

The polynomial PU .X / D X k

d1 X k

1



dk

1X

dk

is called the characteristic polynomial of U . Definition 2.3. Recall that a Pisot–Vijayaraghavan number is an algebraic integer ˇ > 1 whose Galois conjugates have modulus strictly less than one. We say that U D .Un /n>0 is a Pisot numeration system if the numeration basis U is linear and PU .X / is the minimal polynomial of a Pisot number ˇ . Integer base numeration systems are particular cases of Pisot systems. For instance, see [27] where it is shown that most properties related to k -recognisable sets, k 2 N>2 , can be extended to Pisot systems. In such a case, there exists some c > 0 such that jUn c ˇ n j ! 0, as n tends to infinity. Example 2.1. Consider the Fibonacci sequence defined by U0 D 1, U1 D 2 and UnC2 D UnC1 C Un for all n > 0. A word over ¹0; 1º is a U -representation if and only if it belongs to the language L D 1¹0; 01º [ ¹"º. For instance, the word 10110 is not a U -representation. Since AU ;U .10110/ D 13, the normalisation maps 10110 to repU .13/ D 100000. The characteristic polynomial p of this linear numeration basis is the minimal polynomial of the Pisot number .1 C 5/=2. This Pisot numeration system is presented in [130]. The following result is an easy exercise, but also can be carried out in a wider context. Theorem 2.1 ([122]). Let U be a numeration basis. If N is U -recognisable, then U is linear.

26. On Cobham’s theorem

951

Definition 2.4 ([13]). A Bertrand numeration basis U is a numeration basis satisfying the following property: w 2 repU .N/ if and only if, for all n 2 N, w0n 2 repU .N/. It is a natural condition satisfied by all integer base k > 2 systems. For instance, the sequence defined by U0 D 1, U1 D 3 and, for all n > 0, UnC2 D UnC1 C Un is not a Bertrand numeration basis because repU .2/ D 2, but AU ;U .20/ D 6 and repU .6/ D 102. Let ˛ > 1 be a real number. The notion of ˛ -expansion was introduced by Parry in [100] (also see Rényi’s paper [110]), or again see Chapter 7 of [88]. All x 2 Œ0; 1 can be uniquely written in the following way: X xD an ˛ n ; (1) n>1

with x1 D x and for all n > 1, an D b˛ xn c and xnC1 D ¹˛xn º, where bc stands for the integer part. The sequence d˛ .x/ D .an /n>1 is the ˛ -expansion of x and L.˛/ denotes the set of finite words having an occurrence in some sequence d˛ .x/, x 2 Œ0; 1. Let d˛ .1/ D .tn /n>1 . If there exist N > 0, p > 0 such that, for all n > N , we have tnCp D tn then ˛ is said to be a Parry number, sometimes called a ˇ -number (for more details about these numbers, see [100] or [60]). Observe that integers greater or equal to 2 are Parry numbers. The following result relates Bertrand numeration systems to languages defined by some real number. Theorem 2.2 (Bertrand and Mathis [14]). Let U be a numeration basis. It is a Bertrand numeration basis if and only if there exists a real number ˛ > 1 such that repU .N/ D L.˛/. In this case, if U is linear then ˛ is a root of the characteristic polynomial of U . Theorem 2.3 (Bertrand and Mathis [13]). Let ˛ > 1 be a real number. The language L.˛/ is regular if and only if ˛ is a Parry number. Associated with a Parry number ˇ , one can define the notion of beta-polynomial. For details, see [70] or Chapter 2 of [12]. First we define the canonical beta-polynomial. If dˇ .1/ is eventually constant and equal to 0: dˇ .1/ D t1    tm 0! , with tm ¤ 0, then Pm m m i we set Gˇ .X / D X and r D m. Otherwise, dˇ .1/ is eventually i D1 ti X periodic: dˇ .1/ D t1    tm .tmC1    tmCp /! , with m and p being minimal. Then we PmCp P mCp i m i set Gˇ .X / D X mCp Xm C m and r D p . Let i D1 ti X i D1 ti X ˇ be a Parry number. An extended beta-polynomial is a polynomial of the form Hˇ .X / D Gˇ .X /.1 C X r C    C X rk /X n for k; n 2 N.

Proposition 2.4 ([70]). Let U be a linear numeration basis with dominant root ˇ , i.e., limn!1 UnC1 =Un D ˇ for some ˇ > 1. If repU .N/ is regular, then ˇ is a Parry number.

Theorem 2.5 (M. Hollander [70]). Let U be a linear numeration basis whose dominant root ˇ is a Parry number.

952

Fabien Durand and Michel Rigo

 If dˇ .1/ is infinite and eventually periodic, then repU .N/ is regular if and only if U satisfies an extended beta-polynomial for ˇ .  If dˇ .1/ is finite of length m, then if U satisfies an extended beta-polynomial for ˇ then repU .N/ is regular; and conversely if repU .N/ is regular, then U satisfies either an extended beta-polynomial for ˇ , Hˇ .X /, or a polynomial of the form .X m 1/Hˇ .X /.

Ultimately periodic sets are recognisable for any linear numeration basis. Proposition 2.6 (folklore, [12] and [88]). Let a; b > 0. If U D .Un /n>0 is a linear numeration basis, then ` ˇ X ± ° ˇ ck Uk 2 a N C b AU1 ;U .a N C b/ D c`    c0 2 AU ˇ kD0

is accepted by a DFA that can be effectively constructed. In particular, if N is U -recognizable, then any ultimately periodic set is U -recognisable. To conclude this section, consider again the integer base numeration systems. Example 2.2. The set P2 D ¹2n j n > 0º of powers of two is trivially 2-recognisable because rep2 .P2 / D 10 . Since the difference between any two consecutive elements in P2 is of the form 2n , the set P2 is not ultimately periodic. As a consequence of Cobham’s theorem, the set P2 is, for instance, neither 3-recognisable nor 5-recognisable. One could also consider the case when the two bases k and ` are multiplicatively dependent. This case is much easier and can be considered as an exercise. Proposition 2.7. Let k; ` > 2 be two multiplicatively dependent integers. A set X  N is k -recognisable if and only if it is `-recognisable. The theorem of Cobham implies that ultimately periodic sets are the only infinite sets that are k -recognisable for every k > 2. We have seen so far that there exist sets (like the set P2 of powers of two) that are only recognisable for some specific bases: exactly all bases belonging to a unique equivalence class for the equivalence relation M over N>2 . To see that a given infinite ordered set X D ¹x0 < x1 < x2 <    º is k -recognisable for no base k > 2 at all, we can use results like the following one, where the behaviour of the ratio (resp., difference) of any two consecutive elements in X is studied through the quantities xi C1 and DX D lim sup .xi C1 xi /: RX D lim sup xi i !1 i !1 Theorem 2.8 (gap theorem [37]). Let k > 2. If X  N is a k -recognisable infinite subset of N, then either RX > 1 or DX < C1. Corollary 2.9. Let a 2 N>2 . The set of primes and the set ¹na j n > 0º are not k -recognisable for any integer base k > 2.

26. On Cobham’s theorem

953

Proofs of the gap theorem and its corollary can also be found in [54]. For more results on primes; also see Chapter 25. Definition 2.5. An infinite ordered set X D ¹x0 < x1 < x2 <    º such that DX < C1 is said to be syndetic or with bounded gaps: there exists C > 0 such that for all n > 0, xnC1 xn < C . In particular, any ultimately periodic set is syndetic. The converse does not hold; see, for instance Example 3.1. Remark 2.10. Note that syndeticity occurs in various contexts, such as ergodic theory. As an example, a subset of an Abelian group G is said to be syndetic if finitely many translates of it cover G . The term “syndetic” was first used in [64]. Note that in [66] the following result is proved. Let ˛; ˇ > 1 be multiplicatively independent real numbers. If a set X  N is ˛ -recognisable and ˇ -recognisable, for the Bertrand numeration systems based, respectively, on the real numbers ˛ and ˇ in the sense of [14] and Theorem 2.2, then X is syndetic. Cobham’s original proof of Theorem 1.1 appeared in [36] and we quote [54]: “the proof is correct, long and hard. It is a challenge to find a more reasonable proof of this fine theorem.” Then G. Hansel provided a simpler presentation in [65], and one can see [102] or the dedicated chapter in [9] for an expository presentation. Prior to these last two references, one should read [115]. Usually the first step to prove Cobham’s theorem is to show the syndeticity of the considered set. See § 5.3. T. Krebs recently presented a short proof of Cobham’s theorem without using Kronecker theorem [80].

3. Automatic sequences As explained in Corollary 3.3 presented in this section, the formalism of k -recognisable sets is equivalent to that of k -automatic sequences.2 Let us recall briefly what they are. An infinite word x D .xn /n>0 2 B N over an alphabet B is said to be k -automatic if there exists a DFAO (deterministic finite automaton with output) over the alphabet ŒŒ0; k 1, .Q; ŒŒ0; k 1; ; q0 ; B; / such that, for all n > 0, xn D .q0  repk .n//:

The transition function is W Q  ŒŒ0; k 1 ! Q and can easily be extended to Q  ŒŒ0; k 1 by q  " D q and q  wa D .q  w/  a. The output function is W Q ! B . Roughly speaking, the n-th term of the sequence is obtained by feeding a DFAO with the k -ary representation of n. For a complete and comprehensive exposition on k -automatic sequences and their applications, see the book [9]. We equally use the terms of sequences or (right-)infinite words. For more information about combinatorics on words, see [87], [88], or also J. Cassaigne and F. Nicolas’ chapter in [12] (Chapter 4). 2 We indifferently use the terms sequence and infinite word.

Fabien Durand and Michel Rigo

954

Definition 3.1. Let W A ! A be a morphism, i.e., .uv/ D .u/.v/ for all u; v 2 A . Naturally such a map can be defined on A! . A finite or infinite word x such that .x/ D x is said to be a fixed point of  . A morphism W A ! A is completely determined by the images of the letters in A. In particular, if there exists k > 0 such that j.a/j D k for all a 2 A, then  is said to be of k -uniform or simply uniform. A 1-uniform morphism is called a coding. If there exist a letter a 2 A and a word u 2 AC such that .a/ D au and moreover, if limn!C1 j n .a/j D C1, then  is said to be prolongable on a or to be a substitution. Let W A ! A be a morphism prolongable on a. We have .a/ D a u;

 2 .a/ D a u .u/;

 3 .a/ D a u .u/  2 .u/;

:::

Since for all n 2 N,  n .a/ is a prefix of  nC1 .a/ and because j n .a/j tends to infinity when n ! C1, the sequence . n .a//n>0 converges (for the usual product topology on words – see, for instance (6)) to an infinite word denoted by  1 .a/ and given by  1 .a/ WD lim  n .a/ D au.u/ 2 .u/ 3 .u/    : n!C1

This infinite word is a fixed point of  . An infinite word obtained by iterating a prolongable morphism in this way is said to be purely substitutive (or pure morphic). If W A ! B  is a non-erasing morphism, it can be extended to a map from AN to B N as follows. If x D x0 x1    is an infinite word over A, then the sequence of words ..x0    xn 1 //n>0 is easily seen to converge to an infinite word over B . Its limit is denoted by .x/ D .x0 /.x1 /.x2 /    . If x 2 AN is purely substitutive and if W A ! B is a coding, then the word y D .x/ is said to be substitutive. Another result due to A. Cobham is the following; see [37]. The idea is to associate a DFA over ŒŒ0; k 1 with every k -uniform morphism. Theorem 3.1. Let k > 2. A sequence x D .xn /n>0 2 B N is k -automatic if and only if there exist a k -uniform morphism W A ! A prolongable on a letter a 2 A and a coding W A ! B such that x D . 1 .a//. Theorem 3.2 (Eilenberg [54]). A sequence x D .xn /n>0 is k -automatic if and only if its k-kernel Nk .x/ D ¹.xk e nCd /n>0 j e > 0; 0 6 d < k e º is finite. Definition 3.2. The characteristic sequence 1X 2 ¹0; 1ºN of a set X  N is defined by 1X .n/ D 1 if and only if n 2 X . An infinite word x 2 A! is ultimately periodic if there exist two finite words u 2 A and v 2 AC such that x D uv ! . If u D ", then x is periodic. Obviously, a set X  N is ultimately periodic if and only if 1X is an ultimately periodic word over ¹0; 1º. In that case, there exist two finite words u 2 ¹0; 1º and v 2 ¹0; 1ºC such that 1X D uv ! . In particular, jvj is a period of X . If u and v are chosen of minimal length, then juj (resp., jvj) is said to be the preperiod or index of X (resp., the period of X ). If u D ", then X is (purely) periodic. Periodic sets are, in particular, ultimately periodic.

26. On Cobham’s theorem

955

Corollary 3.3. Let k > 2. If x D .xn /n>0 2 B N is a k -automatic sequence, then the set ¹n > 0 j xn D bº is k -recognisable for all b 2 B . Conversely, if a set X  N is k -recognisable, then its characteristic sequence is k -automatic. Theorem 3.4 (Cobham’s theorem, version 2). Let k; ` > 2 be two multiplicatively independent integers. An infinite word x D .xn /n>0 2 B N is both k -automatic and `-automatic if and only if it is ultimately periodic. Remark 3.5. Using the framework of k -automatic sequences instead of the formalism of k -recognisable sets turns out to be useful. For instance, consider the complexity function of an infinite word x , which maps n 2 N onto the number px .n/ of distinct factors of length n occurring in x . The Morse–Hedlund theorem states that x is ultimately periodic if and only if px is bounded by some constant. This result appeared first in [94]. Proofs can be found in classical textbooks such as [9] and [87]. It is also well known that for a k -automatic sequence x , px 2 O.n/; again see the seminal paper [37]. This latter result can be used to show that particular sets are not k -recognisable for any k > 2: for instance, those sets whose characteristic sequence 1X has a complexity function such that limn!C1 p1X .n/=n D C1. For the behaviour of px in the substitutive case, see the survey [4] or Chapter 4 of [12]. Example 3.1. Iterating the morphism W 0 7! 01; 1 7! 10, we get the Thue–Morse word .tn /n>0 D  1 .0/ D 0110100110010110100101100110 : : : . For an account of this celebrated word, see [8] and Chapter 2 of [107]. It is a 2-automatic word; the n-th letter in the word is 0 if and only if rep2 .n/ contains an even number of 1’s. This word is generated by the DFAO represented in Figure 2. In particular, the set t ˇ ° ± X ˇ X2 D n 2 N ˇ rep2 .n/ D c t    c0 and ci  0 mod 2 i D0

is 2-recognisable. The Thue–Morse word is not ultimately periodic (see, for instance [23] or [40] where the complexity function of this word is studied carefully) and therefore X2 is k -recognisable only for those k of the form 2m , m 2 N>1 . Nevertheless, one can notice that X2 is syndetic. 0

0 1

0

1 1

Figure 2. A DFAO generating the Thue–Morse word

4. Multidimensional extension and first-order logic 4.1. Subsets of Nd . To extend the concept of k -recognisability to subsets of Nd , d > 2, it is natural to consider d -tuples of k -ary representations. To be self-contained,

Fabien Durand and Michel Rigo

956

we repeat the discussions of Chapter 25. To get d words of the same length that have to be read simultaneously by an automaton, the shortest ones are padded with leading zeroes. We extend the definition of repk to a map of domain Nd as follows. If n1 ; : : : ; nd are non-negative integers, then we consider the word 1 0 m j rep .n /j k 1 rep .n / 0 k 1 C B :: d  repk .n1 ; : : : ; nd / WD @ A 2 .ŒŒ0; k 1 / ; : 0m

j repk .nd /j

repk .nd /

where m D max¹j repk .n1 /j; : : : ; j repk .nd /jº. A subset X of Nd is k recognisable if the corresponding language repk .X / is accepted by a finite automaton over the alphabet ŒŒ0; k 1d which is the Cartesian product of d copies of ŒŒ0; k 1. This automaton is reading d digits at a time (one for each component): this is why we need d words of the same length.

Example 4.1. Consider the automaton depicted in Figure 3 (the sink is not represented). It accepts ."; "/ and all pairs of words of the form .u0; 0u/ where u 2 1¹0; 1º. This means that the set ¹.2n; n/ j n > 0º is 2-recognisable. 0 0

1 1 1 0

1 0

0 1

Figure 3. A DFA recognising ¹.2n; n/ j n > 0º

Note that the notion of k -automatic sequence and Theorem 3.1 have been extended accordingly in [118] and [119] where the images by a morphism of letters are d dimensional cubes of size k . Extending the concept of ultimately periodic sets to subsets of Nd , with d > 2, is at first glance not so easy. We use bold face letters to represent elements in Nd . For instance, one could take the following definition of a (purely) periodic subset X  Nd . There exists a non-zero element p 2 Nd such that x 2 X if and only if x C p 2 X . As we will see (Remark 4.2, Proposition 6.9, and Theorem 6.11), it turns out that this definition is not compatible with the extension of Cobham’s theorem in d dimensions. Therefore we will consider sets definable in hN; Ci. Let us mention Nivat’s conjecture connecting such a notion of periodicity in higher dimensions with the notion of block complexity as introduced in Remark 3.5: let X  Z2 . If there exist positive integers n1 ; n2 such that pX .n1 ; n2 / 6 n1 n2 , then X is periodic, where pX .n1 ; n2 / counts the number of distinct blocks of size n1  n2 occurring in X . See [96] and, in particular, [108] for details and pointers to the existing bibliography. The reference [52] establishes a connection with the next section.

26. On Cobham’s theorem

957

4.2. Logic and k-definable sets. The formalism of first-order logic is probably the best suited to present a natural extension (in the sense of Cobham’s theorem) of the definition of ultimately periodic sets in d dimensions. See [105], [106], or the survey [16]. For a textbook presentation, see [113]. In Presburger arithmetic hN; Ci, the variables range over N and we have at our disposal the connectives ^; _; :; !; $, the equality symbol D and the quantifiers 8 and 9 that can only be applied to variables. This is the reason we speak of first-order logic; in second-order logic, quantifiers can be applied to relations, and in monadic second-order logic, only variables and unary relations, i.e., sets, may be quantified. If a variable is not within the scope of any quantifier, then this variable is said to be free. Formulas are built inductively from terms and atomic formulas. Here details have been omitted; see, for instance § 3.1 in [28]. For example, order relations and > can be added to the language by noticing that x 6 y is equivalent to .9z/.y D x C z/:

(2)

In the same way, constants can also be added. For instance, x D 0 is equivalent to .8y/.x 6 y/ and x D 1 is equivalent to :.x D 0/ ^ .8y/.:.y D 0/ ! .x 6 y//. In general, the successor function S .x/ D y of x is defined by .x < y/ ^ .8z/..x < z/ ! .y 6 z//:

For a complete account on the interactions between first-order logic and k -recognisable sets, see the excellent survey [28]. Remark 4.1. We mainly discuss the case hN; Ci, but similar results are obtained for hZ; C; 6i. Note that if the variables belong to Z, then it is no longer possible to define 6 as in (2). So this order relation has to be added to the structure. The constant 0 can be defined by x C x D x . Let '.x1 ; : : : ; xd / be a formula with d free variables x1 ; : : : ; xd . Interpreting ' in hN; Ci permits one to define the set of d -tuples of non-negative integers for which the formulas hold: ¹.r1 ; : : : ; rd / j hN; Ci ˆ 'Œr1 ; : : : ; rd º: We write hN; Ci ˆ 'Œr1 ; : : : ; rd  whenever '.x1 ; : : : ; xd / is satisfied in hN; Ci when interpreting xi by ri for all i 2 ¹1; : : : ; d º. For the reader having no background in logic and model theory, the first chapters of [53] are worth reading.

Remark 4.2. The ultimately periodic sets of N are exactly the sets that are definable in Presburger arithmetic. It is obvious that ultimately periodic sets of N are definable. For instance, the set of even integers can be defined by '.x/  .9y/.x D y C y/. Since constants can easily be defined, it is easy to write a formula for any arithmetic progression. As an example, the formula '.x/  .9y/.x D S .S .y C y C y/// defines the progression 3N C 2. In particular, multiplication by a fixed constant is definable in hN; Ci. Note that it is a classical result that the theory of hN; C; i is undecidable; see, for instance [15].

958

Fabien Durand and Michel Rigo

Adding congruences modulo any integer m permits quantifier elimination, which means that any formula expressed in Presburger arithmetic is equivalent to a formula using only ^, _, D, < and congruences; see [105] and [106]. Presentations can also be found in [55] and [83]. Theorem 4.3 (Presburger). The structure hN; C; 0 i admits elimination of quantifiers. This result can be used to prove that the theory of hN; Ci is decidable. This can be done using the formalism of automata; see, for instance [28]. Corollary 4.4. Any formula '.x/ in Presburger arithmetic hN; Ci defines an ultimately periodic set of N. Let k > 2. We add to the structure hN; Ci a function Vk defined by Vk .0/ D 1 and for all x > 0, Vk .x/ is the greatest power of k dividing x . As an example, we have V2 .6/ D 2, V2 .20/ D 4 and V2 .2n / D 2n for all n > 0. Again the theory of hN; C; Vk i can be shown to be decidable [28]. The next result shows that, as for the k -automatic sequences, the logical framework within the richer structure hN; C; Vk i gives an equivalent presentation of the k -recognisable sets in any dimension. Proofs of the next three theorems can again be found in [28], where a full account of the different approaches used to prove Theorem 4.5 is presented. For Büchi’s original paper; see [29]. Theorem 4.5 (Büchi theorem). Let k > 2 and d > 1. A set X  Nd is k -recognisable if and only if it can be defined by a first-order formula '.x1 ; : : : ; xd / of hN; C; Vk i.

For instance, the set P2 introduced in Example 2.2 can be defined by the formula '.x/  V2 .x/ D x . Note that Theorem 4.5 holds for Pisot numeration systems given in Definition 2.3; see [27] where the function Vk is modified accordingly. This is partially based on the fact that in a Pisot numeration system the normalisation function is realised by a finite automaton (see [60]), which allows one to consider addition of integers: first perform addition digit-wise without any carry, then normalise the result.

Theorem 4.6 (Cobham’s theorem, version 3). Let k; ` > 2 be two multiplicatively independent integers. A set X  N can be defined by a first-order formula in hN; C; Vk i and by a first-order formula in hN; C; V` i if and only if it can be defined by a first-order formula in hN; Ci. This theorem still holds in higher dimensions, and is called the Cobham–Semenov theorem. In this respect, the notion of subset of Nd definable in Presburger arithmetic hN; Ci is the right extension of periodicity in a multidimensional setting. For Semenov’s original paper; see [120].

Theorem 4.7 (Cobham–Semenov theorem). Let k; ` > 2 be two multiplicatively independent integers. A set X  Nd can be defined by a first-order formula in hN; C; Vk i and by a first-order formula in hN; C; V` i if and only if it can be defined by a first-order formula in hN; Ci.

26. On Cobham’s theorem

959

Subsets of Nd defined by a first-order formula in hN; Ci are characterised in [63]. The nice criterion of Muchnik appeared first in 1991 and is given in [95]. See Proposition 6.9 for its precise statement. Using this latter characterisation, a proof of Theorem 4.7 is presented in [28]. The logical framework has given rise to several works. Let us mention chronologically Villemaire [125] and [126] and Michaux and Villemaire [91] and [92]. In § 5 in [92] the authors interestingly show how to reduce Semenov’s theorem to Cobham’s theorem: “nothing new in higher dimensions.” Also extensions to non-standard numeration systems are considered in [103] and [15]. In this latter paper, the Cobham–Semenov theorem is proved for two Pisot numeration systems.

5. Numeration systems and substitutions 5.1. Substitutive sets and abstract numeration systems. In § 4.1 and § chcob:ss32, we have mainly extended the notion of recognisability to subsets of Nd . Now we consider another extension of recognisability. In Corollary 3.3, we have seen that a k recognisable set has a characteristic sequence generated by a uniform substitution and the application of an extra coding. It is rather easy to define sets of integers encoded by a characteristic sequence generated by an arbitrary substitution and an extra coding; that is, those for which the characteristic sequence is morphic. This generalisation permits one to obtain a larger class of infinite words, and hence a larger class of sets of integers. Example 5.1. Consider the morphism W ¹a; b; cº ! ¹a; b; cº given by .a/ D abcc; .b/ D bcc , .c/ D c and the coding W a; b 7! 1; c 7! 0. We get  1 .a/ D abccbccccbccccccbccccccccbccccccccccbcc : : :

and . 1 .a// D 010010000100000010000000010000000000100 : : : Using the special form of the images by  of b and c , it is not difficult to see that the difference between the position of the n-th and .n C 1/st b in  1 .a/ is 2n C 1. Hence . 1 .a// is the characteristic sequence of the set of squares and it is substitutive. From Corollary 2.9 the set of squares is never k -recognisable for any integer base k . Definition 5.1. As a natural extension of the concept of recognisability, we may consider sets X  N having a characteristic sequence 1X which is (purely) substitutive. Such a set is said to be a (purely) substitutive set. In particular, k -recognisable sets are substitutive. With Theorem 5.2 it will turn out that the formalism of substitutive sets is equivalent to the one of abstract numeration systems. Definition 5.2 ([84]). An abstract numeration system (or ANS) is a triple S D .L; A; 0. In particular, the set ¹n2 j n > 0º is S-recognisable because a is regular. It is well known that in a regular language L, the set of the lexicographically first words of each length in the genealogically ordered language L is regular; see [122]. Pisot numeration systems are special cases of ANS. Indeed, if the numeration basis U D .Un /n>0 defines a Pisot numeration system, then repU .N/ is regular.

Example 5.3. Consider the Fibonacci sequence and the language L D 1¹0; 01º [ ¹"º defined in Example 2.1. To get the representation of an integer n, one can either decompose n using the greedy algorithm or, order the words in L genealogically and take the .n C 1/-th element. Theorem 5.1 ([84]). Let S D .L; A; 0 be an infinite word over an alphabet B . This word is substitutive if and only if there exists an abstract numeration system S D .L; A; 0, xn D .q0  repS .n//.

A proof of this result is given in [111] and [114] and a comprehensive treatment is given in Chapter 3 of [12]. In that context, we also obtain an extension of Corollary 3.3.

Corollary 5.3. Let x D .xn /n>0 be an infinite substitutive word over an alphabet B . There exists an ANS S such that for all b 2 B , ¹n > 0 j xn D bº is S-recognisable. Conversely, if a set X  N is S-recognisable, then its characteristic sequence is Sautomatic. Corollary 5.4. A set X  N is substitutive if and only if there exists an ANS S such that X is S-recognisable. 5.2. Cobham’s theorem for substitutive sets. In the context of substitutive sets of integers, how could a Cobham-like theorem be expressed, i.e., what is playing the role of a base? Assume that there exist two purely substitutive infinite words x 2 A!

26. On Cobham’s theorem

961

and y 2 B ! , respectively, generated by the morphisms W A ! A prolongable on a 2 A and W B  ! B  prolongable on b 2 B , i.e.,  1 .a/ D x and  1 .b/ D y . Consider two codings W A ! ¹0; 1º and W B ! ¹0; 1º such that .x/ D .y/. This situation corresponds to the case where a set (here, given by its characteristic word) is recognisable in two a priori different numeration systems. If A D B and  D  m for some m > 1, then nothing particular can be said about the infinite word .x/: iterating  or  m from the same prolongable letter leads to the same fixed point. So we must introduce a notion analogous to the one of multiplicatively independent bases related to the substitutions  and . Definition 5.3. Let W A ! A be a substitution over an alphabet A. The matrix M 2 NAA associated with  is called the incidence matrix of  and is defined as follows: .M /a;b D j.b/ja for all a; b 2 A:

A square matrix M 2 Rnn with entries in R>0 is irreducible if, for all i; j , there exists k such that .M k /i;j > 0. A square matrix M 2 Rnn with entries in R>0 is primitive if there exists k such that, for all i; j , we have .M k /i;j > 0. Similarly, a substitution over the alphabet A is irreducible (resp., primitive) if its incidence matrix is irreducible (resp., primitive). Otherwise stated, a substitution W A ! A is primitive if there exists an integer n > 1 such that, for all a 2 A, all the letters in A appear in the image of  n .a/. Let us denote by P the abelianisation map (or Parikh map) that maps a word w over A D ¹a1 ; : : : ; ar º to the r -tuple t .jwja1 ; : : : ; jwjar /. The matrix M can be defined by its columns: M D .P..a1 //    P..ar ///; and it satisfies the condition

P..w// D M P.w/

for all w 2 A :

Remark 5.5. If a matrix M is primitive, the celebrated theorem of Perron can be used; see standard textbooks [74] or [62] and [121]. A presentation is also given in [86]. To recap some of the key points, M has a unique dominant real eigenvalue ˇ > 0 and there exists an eigenvector with positive entries associated with ˇ . Also, for all i; j , there exists ci;j such that .M n /i;j D ci;j ˇ n C o.ˇ n /. For instance, primitivity of M implies the existence of the frequency of any factor occurring in any fixed point of  . Note that r X if P.w/ D t .p1 ; : : : ; pr /; then jwj D pi : (3) i D1

Hence, the value j .aj /j is obtained by summing up the entries in the j -th column of Mn for all n > 0. If  is primitive, then there exists some Cj such that j n .aj /j D Cj ˇ n C o.ˇ n /. In particular, if  is prolongable on a, then j n .a/j  Cˇ n , for some C > 0. n

962

Fabien Durand and Michel Rigo

In the general case of a matrix M with non-negative entries, one can use the Perron– Frobenius theorem for each of the irreducible components of M (they correspond to the strongly connected components of the associated graph and are also called communicating classes). Thus any non-negative matrix M has a real eigenvalue ˛ which is greater or equal to the modulus of any other eigenvalue. We call ˛ the dominant eigenvalue of M . Moreover, if we exclude the case where ˛ D 1, then there exists a positive integer p such that M p has a dominant eigenvalue ˛ p which is a Perron number; see [86], p. 369. A Perron number is an algebraic integer ˛ > 1 such that all its algebraic conjugates have modulus less than ˛ . In particular, if we replace a prolongable substitution  such that M has a dominant eigenvalue ˛ > 1, with a convenient power  p of  , then we can assume that the dominant eigenvalue of  is a Perron number. Definition 5.4. Let W A ! A be a substitution prolongable on a 2 A such that all letters of A have an occurrence in  1 .a/. Let ˛ > 1 be the dominant eigenvalue of the incidence matrix of  . Let W A ! B  be a coding. We say . 1 .a// is an ˛ substitutive infinite word (with respect to  ). In view of Definition 5.1, this notion can be applied to subsets of N. If, moreover,  is primitive, then . 1 .a// is said to be a primitive ˛ -substitutive infinite word (with respect to  ). Observe that k -automatic infinite words are k -substitutive infinite words. Example 5.4. Consider the substitution  defined by .a/ D aa0a, .0/ D 01 and .1/ D 10. Its dominant eigenvalue is 3. It is prolongable on a, 0, and 1. The fixed point x of  starting with 0 is the Thue–Morse sequence (see Example 3.1). Definition 5.4 does not imply that x is 3-substitutive because a does not appear in x . But the fixed point y of  starting with a is 3-substitutive. Example 5.5. Consider the so-called Tribonacci word, which is the unique fixed point of W a 7! ab; b 7! ac; c 7! a. See [124] and [107]. The incidence matrix of  is 0 1 1 1 1 M  D @1 0 0 A : 0 1 0 One can check that M3 contains only positive entries. So the matrix is primitive. Let ˛T ' 1:839 be the unique real root of the characteristic polynomial X 3 C X 2 C X C 1 of M . The Tribonacci word T D abacabaab    is primitive ˛T -substitutive. Let W a 7! 1; b; c 7! 0 be a coding. The word .T / is the characteristic sequence of a primitive ˛T -substitutive set of integers ¹0; 2; 4; 6; 7; : : :º.

To explain the substitutive extension of Cobham’s theorem we need the following definition. Definition 5.5. Let S be a set of prolongable substitutions and x be an infinite word. If x is an ˛ -substitutive infinite word with respect to a substitution  belonging to S, then x is said to be ˛ -substitutive with respect to S.

26. On Cobham’s theorem

963

Let us consider the following Cobham-like statement depending on two sets S and S0 of prolongable substitutions. It is useful to chronologically describe known results generalising Cobham’s theorem in terms of substitutions leading to the most general statement for all substitutions. Statement (S; S0 ). Let S and S0 be two sets of prolongable substitutions. Let ˛ and ˇ be two multiplicatively independent Perron numbers. Let x 2 A! where A is a finite alphabet. Then the following are equivalent: 1. the infinite word x is both ˛ -substitutive with respect to S and ˇ -substitutive with respect to S0 ; 2. the infinite word x is ultimately periodic. Note that this statement excludes 1-substitutions, i.e., substitutions with a dominant eigenvalue equal to 1, because Perron numbers are larger than 1. The case of 1-substitutive infinite words will be mentioned in § 5.6. Also notice that the substitutions we are dealing with can be erasing, i.e., at least one letter is sent onto the empty word. But from a result in [35], [9], and [73], we can assume that the substitutions are non-erasing. Note that ˛ and ˛ k are multiplicatively dependent. Proposition 5.6 ([51]). Let x be an ˛ -substitutive infinite word. Then there exists an integer k > 1 such that x is ˛ k -substitutive with respect to a non-erasing substitution. The implication (2) H) (1) in the above general statement is not difficult to obtain, as mentioned in Remark 1.3 for the uniform situation. Proposition 5.7 ([47]). Let x be an infinite word over a finite alphabet and ˛ be a Perron number. If x is periodic (resp., ultimately periodic), then x is primitive ˛ -substitutive (resp., ˛ -substitutive). Definition 5.6. Let W A ! A and W B  ! B  be two substitutions. We say that  projects on  if there exists a coding W A ! B such that  ı  D  ı :

(4)

The implication (1) H) (2) in Statement (S; S0 ) is known in many cases described below.

i. When S D S0 is the set of uniform substitutions, this is the classical theorem of Cobham. ii. In [56] S. Fabre proves the statement when S is the set of uniform substitutions and S0 is a set of non-uniform substitutions related to some non-standard numeration systems. iii. When S D S0 is the set of primitive substitutions, the statement is proved in [44]. The proof is based on a characterisation of primitive substitutive sequences using the notion of return word [43]. A word w is a return word to u if wu 2 L.x/, u is a prefix of wu and u has exactly two occurrences in wu.

964

Fabien Durand and Michel Rigo

iv. When S D S0 is the set of substitutions projecting on primitive substitutions, the statement is proved in [45]. This result is applied to generalise (ii). Using a characterisation of U -recognisable sets of integers for a Bertrand numeration basis U , see [57], the main result of [45] extends Cobham’s theorem for a large family of non-standard numeration systems. This latter result includes a result obtained previously in [15] for Pisot numeration systems. v. Definition 5.8 and Theorem 5.17 describe the situation where S D S0 D Sgood (defined later). It includes all known and previously described situations for substitutions. vi. In [49], Statement (S; S0 ) is proven for the most general case that is S and S0 are both the set of all substitutions. The final argument is based on a careful study of return words for non-primitive substitutive sequences. Example 5.6. The Tribonacci word T is purely substitutive, but is k -automatic for no integer k > 2. Proceed by contradiction. Assume that there exists an integer k > 2 such that T is k -automatic. Then T is both k -substitutive and primitive ˛T -substitutive. By Theorem 5.17, T must be ultimately periodic, but it is not the case. The factor complexity of T is pT .n/ D 2n C 1. By the Morse–Hedlund theorem (see Remark 3.5), T is not ultimately periodic. Let L.x/ be the set of all factors of the infinite word x . In [58], the following generalisation of Cobham’s theorem is proved. Theorem 5.8. Let k; ` > 2 be two multiplicatively independent integers. Let x be a k -automatic infinite word and y be a `-automatic infinite word. If L.x/  L.y/, then x is ultimately periodic. The same result is valid in the primitive case. Theorem 5.9 ([44]). Let x and y be, respectively, a primitive ˛ -substitutive infinite word and a primitive ˇ -substitutive infinite word such that L.x/ D L.y/. If ˛ and ˇ are multiplicatively independent, then x and y are periodic. Note that under the hypothesis of Theorem 5.9, x and y are primitive substitutive infinite words. Thus L.x/ D L.y/ whenever L.x/  L.y/. Observe that if y is the fixed point starting with a, and x the fixed point starting with 0 of the substitution  defined in Example 5.4, then L.x/  L.y/, but x is not ultimately periodic. In § 5.3 and § 5.4 we give the main arguments to prove Statement (Sgood ; Sgood ). 5.3. Density, syndeticity and bounded gaps. The proofs of most of the generalisations of Cobham’s theorem are divided into two parts.

i. Dealing with a subset X of integers, we have to prove that X is syndetic. Equivalently, dealing with an infinite word x , we have to prove that the letters occurring infinitely many times in x appear with bounded gaps. ii. In the second part, the proof of the ultimate periodicity of X or x has to be carried out.

26. On Cobham’s theorem

965

This section is devoted to the description of the main arguments that lead to the complete treatment of (i). In the original proof of Cobham’s theorem one of the main arguments is that as k and ` are multiplicatively independent (we refer to Theorem 1.1) the set ¹k n =`m j n; m 2 Nº is dense in Œ0; C1/. In the uniform case, these powers refer to the length of the iterates of the substitutions. Indeed, suppose W A ! A is a k -uniform substitution. Then for every a 2 A we have j n .a/j D k n . Unsurprisingly, to be able to treat the non-uniform case, it is important to know that the set ° j  n .a/ j ˇ ± ˇ n; m 2 N ˇ j  m .b/ j is dense in Œ0; C1/, for some a; b 2 A. We explain below that j n .a/j and j m .b/j are governed by the dominant eigenvalue of their incidence matrices. First we focus on part (i) and consider infinite words.

5.3.1. The length of the iterates. The length of the iterates are described in the following lemma. Note that it includes erasing substitutions and substitutions with a dominant eigenvalue equal to 1. Observe that for the substitution  defined by 0 7! 001 and 1 7! 11 we have j n .0/j D .n C 2/2n 1 and j n .1/j D 2n , showing that the situation is different from the uniform case. It can easily be described using the Jordan normal form of the incidence matrix M . Discussion of the following result can be found in § 4.7.3 in [12]. Lemma 5.10 (Chapter III.7 in [117]). Let W A ! A be a substitution. For all a 2 A one of the two following situations occurs 1. there exists N 2 N such that for all n > N , j n .a/j D 0, or, 2. there exist d.a/ 2 N and real numbers c.a/; .a/ such that j n .a/j D 1: n!C1 c.a/ nd.a/ .a/n

lim

Moreover, in the case (2), for all i 2 ¹0; : : : ; d.a/º there exists a letter b 2 A appearing in  j .a/ for some j 2 N and such that j n .b/j D 1: n!C1 c.b/ ni .a/n

lim

Definition 5.7. Let  be a non-erasing substitution. For all a 2 A, the pair .d.a/; .a// defined in Lemma 5.10 is called the growth type of a. If .d; / and .e; ˇ/ are two growth types, then we say that .d; / is less than .e; ˇ/ (or .d; / < .e; ˇ/) whenever  < ˇ or,  D ˇ and d < e . Consequently, if the growth type of a 2 A is less than the growth type of b 2 A, then limn!C1 j n .a/j=j n .b/j D 0. We say that a 2 A is a growing letter if .d.a/; .a// > .0; 1/ or equivalently, if limn!C1 j n .a/j D C1.

966

Fabien Durand and Michel Rigo

We set ‚ WD max¹.a/ j a 2 Aº, D WD max¹d.a/ j 8a 2 AW .a/ D ‚º and Amax WD ¹a 2 A j .a/ D ‚; d.a/ D Dº. The dominant eigenvalue of M is ‚. We say that the letters of Amax are of maximal growth and that .D; ‚/ is the growth type of  . Consequently, we say that a substitutive infinite word y is .D; ‚/-substitutive if the underlying substitution is of growth type .D; ‚/. Observe that, due to Lemma 5.10, any substitutive sequence is .D; ‚/-substitutive for some pair .D; ‚/. Observe that if ‚ D 1, then in view of the last part of Lemma 5.10, there exists at least one non-growing letter of growth type .0; 1/. Otherwise stated, if a letter has polynomial growth, then there exists at least one non-growing letter. Consequently  is growing (i.e., all its letters are growing) if and only if .a/ > 1 for all a 2 A. We define n 1 X   W A ! R; u0    un 1 7 ! c.ui / 1Amax .ui /; i D0

where cW A ! RC is defined in Lemma 5.10. From Lemma 5.10 we deduce the following lemma. Lemma 5.11. For all u 2 A , we have limn!C1 j n .u/j=nD ‚n D  .u/. We say that the word u 2 A is of maximal growth if  .u/ 6D 0.

Corollary 5.12. Let  be a substitution of growth type .D; ‚/. For all k > 1, the growth type of  k is .D; ‚k /. 5.3.2. Letters and words appear with bounded gaps. Recall that the first step in the proof of Cobham’s theorem is to prove that the letters occurring infinitely many times appear with bounded gaps. In our context, this implies the same property for words. Moreover, we can relax the multiplicative independence hypothesis in order to include 1-substitutions. Note that 1 and ˛ > 1 are multiplicatively dependent. Theorem 5.13 ([51]). Let d; e 2 N n ¹0º and ˛; ˇ 2 Œ1; C1/ such that .d; ˛/ 6D .e; ˇ/ and satisfying one of the following three conditions: i. ˛ and ˇ are multiplicatively independent; ii. ˛; ˇ > 1 and d 6D e ; iii. .˛; ˇ/ ¤ .1; 1/ and, ˇ D 1 and e 6D 0, or, ˛ D 1 and d 6D 0. Let C be a finite alphabet. If x 2 C ! is both .d; ˛/-substitutive and .e; ˇ/-substitutive, then the words occurring infinitely many times in x appear with bounded gaps. The main argument used to prove this in [51] is the following. Theorem 5.14. Let d; e 2 N and ˛; ˇ 2 Œ1; C1/. The set ° ˛ n nd ˇ ± ˇ D n; m 2 N ˇ ˇ m me is dense in Œ0; C1/ if and only if one of the following three conditions holds:

i. ˛ and ˇ are multiplicatively independent; ii. ˛; ˇ > 1 and d 6D e ; iii. ˇ D 1 and e 6D 0, or, ˛ D 1 and d 6D 0.

26. On Cobham’s theorem

967

Sketch of the proof of Theorem 5.13. We only consider the case where ˛ and ˇ are multiplicatively independent. Let W A ! A be a substitution prolongable on a letter a0 having growth type .d; ˛/. Let W B  ! B  be a substitution prolongable on a letter b 0 having growth type .e; ˇ/. Let W A ! C and W B ! C be two codings such that . 1 .a0 // D . 1 .b 0 // D x . Using Proposition 5.6 we may assume that  and  are non-erasing. Suppose there is a letter a having infinitely many occurrences in x , but that appears with unbounded gaps. Then the letters in  1 .¹aº/ appear with unbounded gaps. To avoid extra technicalities (a complete treatment is considered in [51]), we assume that there is a letter in  1 .¹aº/ having maximal growth. Then it is quite easy to construct for all n 2 N, a word wn of length c1 nd ˛ n , appearing in y at the index c2 nd ˛ n , that does not contain any letter of  1 .¹aº/. On the other hand, using a kind of pumping lemma for 1 substitutions, one can show that there is a letter of .¹aº/ in z at the index c3 ne ˇ n . Therefore, using Theorem 5.14, the letter a appears in a word .wn / for some n. This is not possible. Now let us explain how to extend this result for a single letter to words. It uses what is called in [109] the substitutions of the words of length n. Let u be a word of length n occurring infinitely often in x . To prove that u appears with bounded gaps in x , it suffices to prove that the letter 1 appears with bounded gaps in the infinite word t 2 ¹0; 1ºN defined by ´ 1 if xi    xi Cn 1 D uI ti D 0 otherwise. Let An be the set of words of length n over A. The infinite word y .n/ D .yi    yi Cn 1 /i >0 over the alphabet An is a fixed point of the substitution n W .An / ! .An / defined, for all .a1    an / in An , by n ..a1    an // D .b1    bn /.b2    bnC1 /    .bj.a1 /j    bj.a1 /jCn

1/

where .a1    an / D b1    bk . For details; see § V.4 in [109]. Let W An ! A be the coding defined by ..b1    bn // D b1 for all .b1    bn / 2 An . We have  ı n D  ı , and then  ı nk D  k ı . Hence, if  is of growth type .d; ˛/, then y .n/ is .d; ˛/-substitutive. Let f W An ! ¹0; 1º be the coding defined by ´ 1 if b1    bn D uI f ..b1    bn // D 0 otherwise. It is easy to see that f .y .n/ / D t , and hence t is .d; ˛/-substitutive. Then one proceeds in the same way with  and uses the result for letters to conclude the proof. 5.4. Ultimate periodicity Definition 5.8. Let W A ! A be a substitution. If there exists a sub-alphabet B  A such that for all b 2 B , .b/ 2 B  , then the substitution W B  ! B  defined by the restriction .b/ D .b/, for all b 2 B , is a sub-substitution of  . Note that  is, in particular, a sub-substitution of itself.

968

Fabien Durand and Michel Rigo

The substitution  having ˛ as dominant eigenvalue is a “good” substitution if it has a primitive sub-substitution whose dominant eigenvalue is ˛ . So let us stress the fact that to be a “good” substitution, the sub-substitution has to be primitive and have the same dominant eigenvalue as the original substitution. We let Sgood denote the set of good substitutions. Remark 5.15. For all growing substitutions  , there exists an integer k such that  k has a primitive sub-substitution. Hence by taking a convenient power of  , the substitution can always be assumed to have a primitive sub-substitution. Note that primitive substitutions and uniform substitutions are good substitutions. Now consider the substitution W ¹a; 0; 1º ! ¹a; 0; 1º given by W a 7! aa0; 0 7! 01; 1 7! 0. Its dominant eigenvalue is 2 and it has onlypone primitive subsubstitution (0 7! 01, 1 7! 0) whose dominant eigenvalue is .1 C 5/=2, and hence it is not a good substitution. Remark 5.16. Let W A ! A and W B  ! B  be two substitutions such that  projects on  ; recall (4) for the definition of projection. There exists a coding W A ! B such that  ı  D  ı  . Note that  ı  n D  n ı  . If  is primitive, then it follows that  belongs to Sgood . Theorem 5.17. Let ˛ and ˇ be two multiplicatively independent Perron numbers. Let x 2 A! where A is a finite alphabet. Then the following are equivalent:

i. the infinite word x is both ˛ -substitutive with respect to Sgood and ˇ -substitutive with respect to Sgood ; ii. the infinite word x is ultimately periodic.

Proof. Let W B  ! B  (resp., W C  ! C  ) be a substitution in Sgood having ˛ (resp., ˇ ) as its dominant eigenvalue and  (resp., ) be a coding such that x D . 1 .b// for some b 2 B (resp., x D . 1 .c// for some c 2 C ). Let us first suppose that both substitutions are growing. In this way, taking a power if needed, we can suppose that they have primitive sub-substitutions. By Theorem 5.13, the factors occurring infinitely many times in x appear with bounded gaps. Hence for any primitive and growing sub-substitutions N and N of  and of  respectively, we have .L.// N D .L.// N D L. Using Theorem 5.9 it follows that L is periodic, i.e., there exists a shortest word u, appearing infinitely many times in x , such that L D L.u! /. Thus u appears with bounded gaps. Let Ru be the set of return words to u. We recall that a word w is a return word to u if wu belongs to L.x/, u is a prefix of wu and u has exactly two occurrences in wu. Since u appears with bounded gaps, the set Ru is finite. There exists an integer N such that all words wu 2 L.xN xN C1    / appear infinitely many times in x for all w 2 Ru . Hence these words appear with bounded gaps in x . We set t D xN xN C1    and we will prove that t is periodic. Consequently x would be ultimately periodic. We can suppose that u is a prefix of t . Then t is a concatenation of return words to u. Let w be a return word to u. It appears with bounded gaps; hence it appears in some .N n .a//, where N is a primitive

26. On Cobham’s theorem

969

and growing sub-substitution, and there exist two words, p and q , and an integer i such that wu D pui q . As juj is the least period of L, it must be that wu D ui . It follows that t D u! . If, for example,  is non-growing, then a result of J.-J. Pansiot [98] asserts that either by modifying  and  in a suitable way (in that case ˛ could be replaced by a power of ˛ ) we can suppose  is growing or L. 1 .b/) contains the language of a periodic infinite word. We have treated the first case before. For the second case it suffices to use Theorem 5.13. Suppose ˛ and ˇ are multiplicatively independent real numbers and that x is a ˛ -substitutive infinite word with respect to Sgood and y is a ˇ -substitutive infinite word with respect to Sgood satisfying L.x/  L.y/. Then the conclusion of Theorem 5.8 is far from true. It suffices to look at Example 5.4 and the observation made after Theorem 5.9. Remark 5.18. The Statement (S; S0 ) remains open when S is the set of substitutions which are not good. Nevertheless there are cases where we can say more. For example, if x is both ˛ -substitutive and ˇ -substitutive (with ˛ and ˇ being multiplicatively independent), and, L.x/ contains the language of a periodic sequence, then, from Theorem 5.13, we deduce that x is ultimately periodic. Moreover, as we will see in the next section, this statement holds in the purely substitutive context. 5.5. The case of fixed points. Now let restrict ourselves to the purely substitutive case. In this setting Cobham’s theorem holds. Note that in the statement of the following result, ˛ and ˇ are necessarily Perron numbers. Moreover, since the substitutions are growing, then ˛ and ˇ must be larger than one. Theorem 5.19. Let W A ! A and W A ! A be two non-erasing growing substitutions prolongable on a 2 A with respective dominant eigenvalues ˛ and ˇ . Suppose that all letters of A appear in  1 .a/ and in  1 .a/ and that ˛ and ˇ are multiplicatively independent. If x D  1 .a/ D  1 .a/, then x is ultimately periodic. Proof. Thanks to Remark 5.15, we may assume that  has a primitive sub-substitution. Using Theorem 5.13, the letters appearing infinitely often in x appear with bounded gaps. Let W N Ax ! Ax be a primitive sub-substitution of  . Let c 2 Ax. Suppose that there exists a letter b , appearing infinitely many times in x , which does not belong to Ax. Then the word  n .c/ D N n .c/ does not contain b and b could not appear with bounded gaps. Consequently all letters (and, in particular, a letter of maximal growth) appearing infinitely often in x belong to Ax. Hence N also has ˛ as dominant eigenvalue and  is a “good” substitution. In the same way  is a “good” substitution. Theorem 5.17 concludes the proof. 5.6. Back to numeration systems. Let S be an abstract numeration system. There is no reason for the substitutions describing characteristic words of S-recognisable sets (see Corollary 5.4) to be primitive. To obtain a Cobham-type theorem for families of abstract numeration systems, one has to interpret Theorem 5.17 in this formalism.

Fabien Durand and Michel Rigo

970

5.6.1. Polynomially-growing abstract numeration systems. Here we only mention the following result. The paper [41] is also of interest. It is well-known that the growth function counting the number of words of length n in a regular language is either polynomial, i.e., in O.nk / for some integer k or exponential, i.e., in . n / for some  > 1. Proposition 5.20 ([51]). Let S D .L; A; 2. Consider a sequence x D .xn /n>0 taking values in

26. On Cobham’s theorem

971

some R-module. If the R-module generated by all sequences in the k -kernel Nk .x/ is finitely generated (recall Theorem 3.2), then the sequence x is said to be .R; k/-regular. Theorem 6.1 (Cobham–Bell theorem [10]). Let R be a commutative ring.3 Let k , ` be two multiplicatively independent integers. If a sequence x 2 RN is both .R; k/-regular and .R; `/-regular, then it satisfies a linear recurrence over R. 6.2. Algebraic setting and quasi-automatic functions. In [33] G. Christol characterised p -recognisable sets in terms of formal power series. Theorem 6.2. Let p be a prime number and FpP be the field with p elements. A subset A  N is p -recognisable if and only if f .X / D n2A X n 2 Fp ŒŒX  is algebraic over Fp .X /. This was applied to Cobham’s theorem in [34] to obtain an algebraic version.

Theorem 6.3. Let A be a finite alphabet, x 2 AN , and, K1 and K2 be two finite fields with different characteristics. Let ˛1 W A ! K1 and ˛2 W A ! K2 be two oneP n ˛ .x 2 K1 ŒŒX  is algebraic over K1 .X / and to-one maps. If f .X / D 1 n /X n2 N P n f .X / D n2N ˛2 .xn /X 2 K2 ŒŒX  is algebraic over K2 .X /, then f .X / is rational.

Quasi-automatic functions were introduced by Kedlaya in [76]. Also see [77], where Christol’s theorem is generalised to Hahn’s generalised power series. In this algebraic setting, an extension of Cobham’s theorem is proved by Adamczewski and Bell in [1]. Details are given in the chapter “Automata in number theory” of this handbook. 6.3. Real numbers and verification of infinite-state systems. Sets of numbers recognised by finite automata arise when analyzing systems with unbounded mixed variables taking integer or real values. Therefore systems such as timed or hybrid automata are considered [17]. One needs to develop data structures representing sets manipulated during the exploration of infinite state systems. For instance, it is often needed to compute the set of reachable configurations of such a system. Let k > 2 be an integer. Considering separately integer and fractional parts, a real number x > 0 can be decomposed as d C1 X X xD ci ; k i C c i k i ; ci 2 ŒŒ0; k 1; i 6 d; (5) i D0

i D1

and gives rise to the infinite word cd    c0 ? c 1 c 2    over ŒŒ0; k 1 [ ¹?º, which is a k -ary representation of x . Note that rational numbers of the form p=k n have two k -ary representations, one ending with 0! and one with .k 1/! . For the representation of negative elements, one can consider base k -complements or signed number representations [79], the sign being determined by the most significant digit which is thus 0 or k 1 (and this digit may be repeated an arbitrary number of times). For definition of Büchi and Muller automata, see the first part of this handbook. 3 Note that in [6] the ground ring R is assumed to be Noetherian (every ideal in R is finitely generated), but this extra assumption is not needed in the above statement.

972

Fabien Durand and Michel Rigo

Definition 6.1. A set X  R is k -recognisable if there exists a Büchi automaton accepting all the k -ary representations of the elements in X . Such an automaton is called a real number automaton (RNA). These notions extend naturally to subsets of Rd and to real vector automata (RVA). Also the Büchi theorem 4.5 holds for a suitable structure hR; Z; C; 2 be two multiplicatively independent integers. If X  R is both k - and `-recognisable by two weak deterministic RVA, then it is definable in hR; Z; C; 2 share the same prime factors, then there exists a subset of R that is both k - and `-recognisable, but not definable in hR; Z; C; 0 and y D .yn /n>0 are two elements of A! . A subshift on A is a pair .X; TjX / where X is a closed T -invariant subset of A! and T is the shift transformation T W A! ! A! ; .xn /n>0 7! .xnC1 /n>0 . Let u be a word over A. The set ŒuX D ¹x 2 X j x0    xjuj 1 D uº is a cylinder. The family of these sets is a base of the induced topology on X . When there is no misunderstanding, we write Œu and T instead of ŒuX and TjX . Let x 2 A! . The set ¹y 2 A! j L.y/  L.x/º is denoted .x/. It is clear that ..x/; T / is a subshift. We say that ..x/; T / is the subshift generated by x . When x is a sequence, we have .x/ D ¹T n x j n 2 Nº. Observe that ..x/; T / is minimal if and only if x is uniformly recurrent, i.e., all its factors occur infinitely often in x and for each factor u of x , there exists a constant K such that the distance between two consecutive occurrences of u in x is bounded by K . Let  be a factor map from the subshift .X; T / on the alphabet A onto the subshift .Y; T / on the alphabet B . Here xŒi;j  denotes the word xi    xj , i 6 j . The Curtis– Hedlund–Lyndon theorem (see [86], Theorem 6.2.9) asserts that  is a sliding block code: there exists an r -block map f W Ar ! B such that ..x//i D f .xŒi;i Cr 1 / for all i 2 N and x 2 X . We shall say that f is a block map associated to  and that f defines  . If u D u0 u1    un 1 is a word of length n > r , then we define f .u/ by .f .u//i D f .uŒi;i Cr 1 /, i 2 ¹0; 1; : : : ; n r C 1º. Let C denote the alphabet Ar and Z D ¹.xŒi;rCi 1 /i >0 j .xn /n>0 2 X º. It is easy to check that the subshift .Z; T / is isomorphic to .X; T / and that f induces a 1-block map (a coding) from C onto B which defines a factor map from .Z; T / onto .Y; T /. We can now state a Cobham-type theorem for subshifts generated by substitutive sequences. Observe that it implies Theorem 5.9 and Statement (S; S0 ) when S D S0 is the set of primitive substitutions. Theorem 6.6. Let .X; T / and .Y; T / be two subshifts generated, respectively, by a primitive ˛ -substitutive sequence x and by a primitive ˇ -substitutive sequence y . Suppose .X; T / and .Y; T / both factorise to the subshift .Z; T /. If ˛ and ˇ are multiplicatively independent, then .Z; T / is periodic. Below we give a sketch of the proof, which involves the concept of an ergodic measure. An invariant measure for the dynamical system .X; S / is a probability measure , on the  -algebra B.X / of Borel sets, with .S 1 B/ D .B/ for all B 2 B.X /; the measure is ergodic if every S -invariant Borel set has measure 0 or 1. The set of invariant measures for .X; S / is denoted by M.X; S /. The system .X; S / is uniquely ergodic if #.M.X; S // D 1. For expository books on subshifts and/or ergodic theory, see [38], [78], [86], [109], and [82]. It is well known that the subshifts generated by primitive substitutive sequences are uniquely ergodic [109]. Let W X ! Z and W Y ! Z be two factor maps. Suppose that .Z; T / is not periodic. We will prove that ˛ and ˇ are multiplicatively independent. Let  and  be the unique ergodic measures of .X; T / and .Y; T / respectively. It is not difficult to see that .Z; T / is also generated by a primitive substitutive sequence

Fabien Durand and Michel Rigo

974

and consequently is uniquely ergodic. Let ı be its unique ergodic measure. We notice that  defined by .A/ D . 1 .A//, for all Borel sets A of Z , and  defined by .A/ D . 1 .A//, for all Borel sets A of Z , are invariant measures for .Z; T /. Hence  D ı D . Let us give more details about both measures in order to conclude the proof. Theorem 6.7 ([71]). Let .; T / be a subshift generated by a primitive purely -substitutive sequence and m be its unique ergodic measure. Then, the measures of cylinders in  lie in a finite union of geometric progressions. There exists a finite set F of positive real numbers such that [ ¹m.C / j C cylinder of X º 

n F: n2N

In conjunction with the next result and using the pigeonhole principle we will conclude the proof.

Proposition 6.8 ([46]). Let .; T / be a subshift generated by a primitive substitutive sequence on the alphabet A. There exists a constant K such that for any block map f W A2rC1 ! B , we have #.f 1 .¹uº// 6 K for all u appearing in some sequence of f ./. From these last two results we deduce that there exist two sets of numbers FX and FY such that ¹ı.C / j C cylinder of Zº D ¹.

1 1

.C // j C cylinder of Zº

D ¹. .C // j C cylinder of X º [ \ [   ˛ n FX ˇ n FY : n2N

n2N

The sets FX and FY being finite, there exist two cylinder sets U and V of Z , a 2 FX , b 2 FY and n; m; r; s four distinct positive integers, such that a˛

n

D ı.U / D bˇ

m

and a˛

r

D ı.V / D bˇ s :

Consequently ˛ and ˇ are multiplicatively dependent. 6.5. Tilings

6.5.1. From definable sets. Let A be a finite alphabet. An array in Nd is a map TW Nd ! A. It can be viewed as a tiling of RdC . The collection of all these arrays d is AN . For all x 2 Nd , let jxj denote the sum of the coordinates of x and B.x; r/ be the set ¹.y1 ; : : : ; yd / 2 Nd j 0 6 yi xi < r; 1 6 i 6 d º. We say T is periodic (resp., ultimately periodic) if there exists p 2 Nd such that T.x C p/ D T.x/ for all x 2 Nd (resp., for all large enough x ). We also need another notion of periodicity. We say that Z  Nd is p-periodic inside X  Nd if for any x 2 X with x C p 2 X we have x 2 Z () x C p 2 Z:

26. On Cobham’s theorem

975

We say that Z is locally periodic if there exists a non-empty finite set V  Nd of non-zero vectors such that for some K > max¹jvj j v 2 V º and L > 0 one has .8x 2 Nd ; jxj > L/ .9v 2 V / .Z is v-periodic inside B.x; K//:

Observe that for d D 1, local periodicity We say T is pseudo-periodic if for all a 2 A, .d 1/-section of T 1 .a/, say S.i; n/ D ¹x 2 T is pseudo-periodic (ultimately periodic when d to Muchnik; see [95] for the proof.

is equivalent to ultimate periodicity. T 1 .a/ is locally periodic and every 1 .a/ j xi D nº, 1 6 i 6 d and n 2 N, 1 D 1). The following criterion is due

Proposition 6.9. Let E  Nd and TW Nd ! ¹0; 1º be its characteristic function. The following are equivalent:

i. E is definable in Presburger arithmetic; ii. T is pseudo-periodic; iii. for all a 2 ¹0; 1º, there exist n 2 N, vi 2 Nd and finite sets Vi  Nd , 0 6 i 6 n such that   [ X Nv : vi C T 1 .a/ D V0 [ 16i 6n

v2Vi

Let p be a positive integer and A be a finite alphabet. A p -substitution (or substitution if we do not need to specify p ) is a map S W A ! ABp where Bp D B.0; p/ D …diD1 ¹0; : : : ; p 1º. The substitution S can be considered as a function d from AN into itself by setting S..T.x// D ŒS.T.y//.z/;

for all T 2 AN

d

where y 2 Nd and z 2 Bp are the unique vectors satisfying x D py C z. In the same way, we can define S W ABpn ! ABpnC1 . We remark that S n .a/ D n 1 S.S .a// for all a 2 A and n > 0. We say T is generated by a p -substitution if there exist a coding  and a fixed point T0 of a p -substitution such that T D  ı T0 . In [30] the authors proved the following theorem, which is analogous to Theorem 3.1. Theorem 6.10. Let p > 2 and d > 1. A set E  Nd is p -recognisable if and only if the characteristic function of E is generated by a p -substitution. Hence we can reformulate the Cobham–Semenov theorem as follows [120]. Theorem 6.11 (Cobham–Semenov theorem, Version 2). Let p and q be two multiplicatively independent integers greater or equal to 2. Then the array T is generated by both a p -substitution and a q -substitution if and only if T is pseudo-periodic. A dynamical proof of this can be given as for the unidimensional case; see [48] for the primitive case.

976

Fabien Durand and Michel Rigo

6.5.2. Self-similar tilings. In [39], a Cobham-like theorem is expressed in terms of self-similar tilings of Rd with a proof using ergodic measures; see [123] for more about self-similar tilings. From the point of view of dynamical systems, the main result in [97] is also a Cobham-like theorem for self-similar tilings. 6.6. Toward Cobham’s theorem for the Gaussian integers. I. Kátai and J. Szabó proved [75] that the sequences .. p C i /n /n>0 and .. p i /n /n>0 give rise to numeration systems whose set of digits is ¹0; 1; : : : ; p 2 º, p 2 N n ¹0º. It is an exercise to check that when p 2 N n ¹0º and q 2 N n ¹0º are different then p C i and q C i are multiplicatively independent. Therefore one could expect a Cobham-type theorem for the set of Gaussian integers G D ¹a C i b j a; b 2 Zº. A subset S  G is periodic if there exists h 2 G such that, for all g 2 G, s 2 S if and only if s C gh 2 S . G. Hansel and T. Safer conjectured the following [67]: Conjecture 6.12. Let p and q be two different positive integers and S 2 G. Then the following are equivalent: i. the set S is . p C i /-recognisable and . q C i /-recognisable; ii. there exists a periodic set P such that the symmetric difference set SP is finite. The proof that (ii) implies (i) is easy. They tried to prove the other implication using the following (classical) steps: ¯ ® pCi /n 1. Dp;q D .. qCi /m j n; m 2 Z is dense in C; 2. S is syndetic; 3. S is periodic up to some finite set. They succeeded in proving (ii) as given by the next result. Theorem 6.13. Let p and q be two positive integers such that the set Dp;q is dense in C. Let S  G be . p C i /-recognisable and . q C i /-recognisable. Then, S is syndetic. Let us make some observations about the density of the set Dp;q . Let pCi D ae i and q C i D be i  .

Proposition 6.14. The following are equivalent: i. the set Dp;q is dense in C; ii. the set Dp;q is dense on the circle: ¹e i j  2 Rº  Dp;q ; iii. the following numbers are rationally independent (or linearly dependent over Q): ln b  ln b  ; ; 1: ln a 2 ln a 2 The equivalence between (i) and (iii) is proven in [67] from an easy computation. The equivalence between (i) and (ii) comes from the fact that p 2 C 1 and q 2 C 1 are multiplicatively independent; see Proposition 2 in [67]. As an example, take p D 1

26. On Cobham’s theorem

977

p p  1 and p D 2. Then, a D 2, b D 5,  D 3 and  D arctan . Proving 4 2  the density of D1;2 is equivalent to proving that ln 5= ln 2 , arctan 12 = and 1 are rationally independent. In [67] the authors observe that the four exponentials conjecture (see [127]) would imply that Dp;q is dense in C.

Conjecture 6.15 (“four exponentials conjecture”). Let ¹1 ; 2 º and ¹x1 ; x2 º be two pairs of rationally independent complex numbers. Then, one of the numbers e 1 x1 , e 1 x2 , e 2 x1 , e 2 x2 is transcendental. 6.7. Recognisability over Fq ŒX . Using the analogy existing between Z and the ring of polynomials over a finite field Fq of positive characteristic, one can easily define B -recognisable sets of polynomials [112]. In [128] and [104] characterisation of these sets in a convenient logical structure analogous to Theorem 4.5 is given. A family of sets of polynomials recognisable in all polynomial bases is described in [112] and [128]. Again, we can conjecture a Cobham-like theorem.

7. Decidability issues So far we have seen that ultimately periodic sets have a very special status in the context of numeration systems (recall Proposition 2.6, Theorem 5.1, or Theorems 5.17 and 5.19). They can be described using a finite amount of data (two finite words for the preperiodic and the periodic parts). Let us settle down once more to the usual integer base numeration system. Let X  N be a k -recognisable set of integers given by a DFA accepting repk .X /. Is there an algorithmic decision procedure which permits one to decide for any such set X , whether or not X is ultimately periodic? For an integer base, the problem was solved positively in [72]. The main ideas are the following ones. Given a DFA A accepting a k -recognisable set X  N, the number of states of A gives an upper bound on the possible index and period for X . Consequently, there are finitely many candidates to check. For each such pair .i; p/ of candidates, produce a DFA for all possible corresponding ultimately periodic sets and compare it with A. Using nondeterministic finite automata, the same problem was solved in [5]. With the formalism of first-order logic the problem becomes trivial. If a set X  N is k -recognisable, then using Theorem 4.5 it is definable by a formula '.x/ in hN; C; Vk i and X is ultimately periodic if and only if .9p/.9N /.8x/.x > N ^ .'.x/ $ '.x C p///. Since we have a decidable theory, it is decidable whether this latter sentence is true (see [28], Proposition 8.2). The problem can be extended to Zd and was discussed in [95]. It is solved in polynomial time in [85]. In view of Theorem 5.1 the question is extended to any abstract numeration system. Let S be an abstract numeration system. Given a DFA accepting an S-recognisable set X  N, decide whether or not X is ultimately periodic. Some special cases have been solved positively in [32] and [11]. Using Corollary 5.3, the same question can be asked in terms of morphisms. Given a morphism W A ! A prolongable on a letter a and a coding W A ! B , decide whether or not . 1 .a// is ultimately periodic. The is the HD0L (ultimate) periodicity problem. The purely

978

Fabien Durand and Michel Rigo

substitutive case was solved independently in [99] and [69]. The general substitutive case is solved positively in [50] and [93]. Also see [89] and [90], where decidability questions about almost-periodicity are considered. A word is almost periodic if factors occurring infinitely often have a bounded distance between occurrences (but some factors may occur only finitely often). Acknowledgement. We would like to warmly thank Valérie Berthé, Alexis Bès, Véronique Bruyère, Christiane Frougny, Julien Leroy, Narad Rampersad, and Jeffrey Shallit for their useful comments after reading a preliminary version of this work.

References [1] B. Adamczewski and J. Bell, Function fields in positive characteristic: expansions and Cobham’s theorem. J. Algebra 319 (2008), no. 6, 2337–2350. MR 2388308 Zbl 1151.11060 q.v. 971 [2] B. Adamczewski and J. Bell, An analogue of Cobham’s theorem for fractals. Trans. Amer. Math. Soc. 363 (2011), no. 8, 4421–4442. MR 2792994 Zbl 1229.28007 q.v. 972 [3] B. Alexeev, Minimal DFAs for testing divisibility. J. Comput. System Sci. 69 (2004), no. 2, 235–243. MR 2077381 q.v. 949 [4] J.-P. Allouche, Sur la complexité des suites infinies. Bull. Belg. Math. Soc. Simon Stevin 1 (1994), no. 2, 133–143. Journées Montoises (Mons, 1992). MR 1318964 Zbl 0803.68094 q.v. 955 [5] J.-P. Allouche, N. Rampersad, and J. O. Shallit, Periodicity, repetitions, and orbits of an automatic sequence. Theoret. Comput. Sci. 410 (2009), no. 30–32, 2795–2803. MR 2543333 Zbl 1173.68044 q.v. 977 [6] J.-P. Allouche and J. O. Shallit, The ring of k -regular sequences. Theoret. Comput. Sci. 98 (1992), no. 2, 163–197. MR 1166363 Zbl 0774.68072 q.v. 970, 971 [7] J.-P. Allouche and J. O. Shallit, The ring of k -regular sequences. II. Theoret. Comput. Sci. 307 (2003), no. 1, 3–29. MR 2014728 Zbl 1058.68066 q.v. 970 [8] J.-P. Allouche and J. O. Shallit, The ubiquitous Prouhet-Thue-Morse sequence. In Sequences and their applications (C. Ding, T. Helleseth, and H. Niederreiter, eds.). Proceedings of the International Conference (SETA ’98) held in Singapore, December 14–17, 1998. Springer London, London, 1999, 1–16. MR 1843077 Zbl 1005.11005 q.v. 955 [9] J.-P. Allouche and J. O. Shallit, Automatic sequences. Theory, applications, generalizations. Cambridge University Press, Cambridge, 2003. MR 1997038 Zbl 1086.11015 q.v. 953, 955, 963, 970 [10] J. P. Bell, A generalization of Cobham’s theorem for regular sequences. Sém. Lothar. Combin. 54A (2005/07), Art. B54Ap. 15 pp. MR 2223028 Zbl 1194.11032 q.v. 971 [11] J. P. Bell, É. Charlier, A. S. Fraenkel, and M. Rigo, A decision problem for ultimately periodic sets in non-standard numeration systems. Internat. J. Algebra Comput. 19 (2009), no. 6, 809–839. MR 2572876 Zbl 1173.68548 q.v. 977 [12] V. Berthé and M. Rigo, (eds.), Combinatorics, automata and number theory. Encyclopedia of Mathematics and its Applications, 135. Cambridge University Press, Cambridge, 2010. MR 2742574 Zbl 1197.68006 q.v. 950, 951, 952, 953, 955, 960, 965

26. On Cobham’s theorem

979

[13] A. Bertrand-Mathis, Développement en base  ; répartition modulo un de la suite .x n /n>0 ; langages codés et  -shift. Bull. Soc. Math. France 114 (1986), no. 3, 271–323. MR 0878240 Zbl 0628.58024 q.v. 951 [14] A. Bertrand-Mathis, Comment écrire les nombres entiers dans une base que n’est pas entière. Acta Math. Hungar. 54 (1989), no. 3–4, 237–241. MR 1029085 Zbl 0695.10005 q.v. 951, 953 [15] A. Bès, An extension of the Cobham–Semënov theorem. J. Symbolic Logic 65 (2000), no. 1, 201–211. MR 1782115 Zbl 0958.03025 q.v. 957, 959, 964 [16] A. Bès, A survey of arithmetical definability. Bull. Belg. Math. Soc. Simon Stevin 2001, suppl., 1–54. A tribute to M. Boffa. Zbl 1900397 MR 1013.03071 q.v. 957 [17] B. Boigelot, L. Bronne, and S. Rassart, An improved reachability analysis method for strongly linear hybrid systems. In Computer Aided Verification O. Grumberg (ed.). Proceedings of the 9th International Conference. Lecture Notes in Computer Science book series LNCS, 1254. Springer, Berlin, 1997, 167–178. q.v. 971 [18] B. Boigelot and J. Brusten, A generalization of Cobham’s theorem to automata over real numbers. Theoret. Comput. Sci. 410 (2009), no. 18, 1694–1703. MR 2508527 Zbl 1172.68029 q.v. 972 [19] B. Boigelot, J. Brusten, and V. Bruyère, On the sets of real numbers recognized by finite automata in multiple bases. In Automata, languages and programming. Part II. (L. Aceto, I. Damgård, L. A. Goldberg, M. M. Halldórsson, A. Ingólfsdóttir, and I. Walukiewicz, eds.). Proceedings of the 35th International Colloquium (ICALP 2008) held in Reykjavik, July 7–11, 2008. Lecture Notes in Computer Science, 5126. Springer, Berlin, 2008, 112–123. MR 2503581 Zbl 1155.03308 q.v. 972 [20] B. Boigelot, J. Brusten, and J. Leroux, A generalization of Semenov’s theorem to automata over real numbers. In Automated deduction—CADE-22 (R. A. Schmidt, ed.). Proceedings of the 22nd International Conference held at McGill University, Montreal, QC, August 2–7, 2009. Lecture Notes in Computer Science, 5663. Lecture Notes in Artificial Intelligence. Springer, Berlin, 2009, 469–484. MR 2550354 Zbl 1250.03061 q.v. 972 [21] B. Boigelot, S. Jodogne, and P. Wolper, An effective decision procedure for linear arithmetic over the integers and reals. ACM Trans. Comput. Log. 6 (2005), no. 3, 614–633. MR 2147298 Zbl 1407.03052 q.v. 972 [22] B. Boigelot, S. Rassart, and P. Wolper, On the expressiveness of real and integer arithmetic automata. In Automata, languages and programming (K. G. Larsen, S. Skyum, and G. Winskel, eds.). Proceedings of the 25th international colloquium, ICALP ’98. Aalborg, Denmark, July 13–17, 1998. Lecture Notes in Computer Science. 1443. Springer, Berlin, 1998, 152–163. Zbl 0910.68149 q.v. 972 [23] S. Brlek, Enumeration of factors in the Thue–Morse word. Discrete Appl. Math. 24 (1989), no. 1–3, 83–96. MR 1011264 Zbl 0683.20045 q.v. 955 [24] J. Brusten, On the sets of real vectors recognized by finite automata in multiple bases. Ph.D. thesis. University of Liège, Liège, 2011. q.v. 972 [25] V. Bruyère, Automata and numeration systems. Sém. Lothar. Combin. 35 (1995), Art. B35b, 19 pp. MR 1399506 Zbl 0856.68102 q.v. 948 [26] V. Bruyère, On Cobham’s theorem. Thematic term on semigroups, algorithms, automata and languages. School on automata and languages, Coimbra, Portugal, 2001. q.v. 948 [27] V. Bruyère and G. Hansel, Bertrand numeration systems and recognizability. Theoret. Comput. Sci. 181 (1997), no. 1, 17–43. MR 1463527 Zbl 0957.11015 q.v. 950, 958

980

Fabien Durand and Michel Rigo

[28] V. Bruyère, G. Hansel, C. Michaux, and R. Villemaire, Logic and p -recognizable sets of integers. Bull. Belg. Math. Soc. Simon Stevin 1 (1994), no. 2, 191–238. Correction, ibid. 1 (1994), no. 4, 577. MR 1318968 MR 1315840 (correction) Zbl 0804.11024 Zbl 0812.11019 (correction) q.v. 948, 957, 958, 959, 977 [29] J. R. Büchi, Weak secord-order arithmetic and finite automata. Z. Math. Logik Grundlagen Math. 6 (1960), 66–92. Reprinted in The collected works of J. Richard Büchi (S. Mac Lane and D. Siefkes, eds.). Springer, New York, 398–424. MR 0125010 Zbl 0103.24705 q.v. 958 [30] A. Černý and J. Gruska, Modular trellises. In The book of L (G. Rozenberg and A. Salomaa, eds.). Dedicated to A. Lindenmayer on the occasion of his 60th birthday. Springer, Berlin, 1986, 45–61. Zbl 0586.68049 q.v. 975 [31] É. Charlier, J. Leroy, and M. Rigo, An analogue of Cobham’s theorem for graph directed iterated function systems. Adv. Math. 280 (2015), 86–120. MR 3350214 Zbl 1332.28013 q.v. 972 [32] É. Charlier and M. Rigo, A decision problem for ultimately periodic sets in nonstandard numeration systems. In Mathematical foundations of computer science 2008 (E. Ochmański and J. Tyszkiewicz, eds.). Proceedings of the 33rd International Symposium (MFCS 2008) held in Toruń, August 25–29, 2008. Lecture Notes in Computer Science, 5162. Springer, Berlin, 2008, 241–252. MR 2539374 Zbl 1173.68548 q.v. 977 [33] G. Christol, Ensembles presque périodiques k -reconnaissables. Theoret. Comput. Sci. 9 (1979), no. 1, 141–145. MR 0535129 Zbl 0402.68044 q.v. 971 [34] G. Christol, T. Kamae, M. Mendès France, and G. Rauzy, Suites algébriques, automates et substitutions. Bull. Soc. Math. France 108 (1980), no. 4, 401–419. MR 0614317 Zbl 0472.10035 q.v. 971 [35] A. Cobham, On the Hartmanis–Stearns problem for a class of tag machines. In 9 th Annual Symposium on Switching and Automata Theory (SWAT 1968). Held in Schenedtady, N.Y., USA, October 15–18, 1968. IEEE Computer Society, Los Alamitos, CA, 1968, 51–60. Also appeared as IBM Research Technical Report RC-2178, August 23, 1968. IEEEXplore 4569556 q.v. 963 [36] A. Cobham, On the base-dependence of sets of numbers recognizable by finite automata. Math. Systems Theory 3 (1969), 186–192. MR 0250789 Zbl 0179.02501 q.v. 948, 953 [37] A. Cobham, Uniform tag sequences. Math. Systems Theory 6 (1972), 164–192. MR 0457011 Zbl 0253.02029 q.v. 952, 954, 955 [38] I. P. Cornfeld, S. V. Fomin, and Y. G. Sina˘ı, Ergodic theory. Translated from the Russian by A. B. Sosinski˘ı. Grundlehren der Mathematischen Wissenschaften, 245. Springer, New York, 1982. MR 0832433 Zbl 0493.28007 q.v. 973 [39] M. I. Cortez and F. Durand, Self-similar tiling systems, topological factors and stretching factors. Discrete Comput. Geom. 40 (2008), no. 4, 622–640. MR 2453331 Zbl 1168.52016 q.v. 976 [40] A. de Luca and S. Varricchio, Some combinatorial properties of the Thue–Morse sequence and a problem in semigroups. Theoret. Comput. Sci. 63 (1989), no. 3, 333–348. MR 0993769 Zbl 0671.10050 q.v. 955 [41] V. Diekert and D. Krieger, Some remarks about stabilizers. Theoret. Comput. Sci. 410 (2009), no. 30–32, 2935–2946. MR 2543346 Zbl 1173.68053 q.v. 970 [42] A. W. M. Dress and F. von Haeseler, A semigroup approach to automaticity. Ann. Comb. 7 (2003), no. 2, 171–190. MR 1994574 Zbl 1053.11018 q.v. 970

26. On Cobham’s theorem

981

[43] F. Durand, A characterization of substitutive sequences using return words. Discrete Math. 179 (1998), no. 1–3, 89–101. MR 1489074 Zbl 0895.68087 q.v. 963 [44] F. Durand, A generalization of Cobham’s theorem. Theory Comput. Syst. 31 (1998), no. 2, 169–185. MR 1491657 Zbl 0895.68081 q.v. 963, 964 [45] F. Durand, Sur les ensembles d’entiers reconnaissables. J. Théor. Nombres Bordeaux 10 (1998), no. 1, 65–84. MR 1827286 Zbl 1046.11500 q.v. 964, 970 [46] F. Durand, Linearly recurrent subshifts have a finite number of non-periodic subshift factors. Ergodic Theory Dynam. Systems 20 (2000), no. 4, 1061–1078. MR 1779393 Zbl 0965.37013 q.v. 974 [47] F. Durand, A theorem of Cobham for non primitive substitutions. Acta Arith. 104 (2002), no. 3, 225–241. MR 1914721 Zbl 1014.11016 q.v. 963 [48] F. Durand, Cobham–Semenov theorem and Nd -subshifts. Theoret. Comput. Sci. 391 (2008), no. 1–2, 20–38. MR 2381349 Zbl 1133.68036 q.v. 975 [49] F. Durand, Cobham’s theorem for substitutions. J. Eur. Math. Soc. (JEMS) 13 (2011), no. 6, 1799–1814. MR 2835330 Zbl 1246.11073 q.v. 964 [50] F. Durand, Decidability of the HD0L ultimate periodicity problem. RAIRO Theor. Inform. Appl. 47 (2013), no. 2, 201–214. MR 3072319 Zbl 1361.68112 q.v. 978 [51] F. Durand and M. Rigo, Syndeticity and independent substitutions. Adv. in Appl. Math. 42 (2009), no. 1, 1–22. MR 2475310 Zbl 1160.68028 q.v. 963, 966, 967, 970 [52] F. Durand and M. Rigo, Multidimensional extension of the Morse–Hedlund theorem. European J. Combin. 34 (2013), no. 2, 391–409. MR 2994406 Zbl 1338.68227 q.v. 956 [53] H.-D. Ebbinghaus, J. Flum, and W. Thomas, Mathematical logic. Second edition. Translated from the German by M. Meßmer. Undergraduate Texts in Mathematics. Springer, New York, 1994. MR 1278260 Zbl 0795.03001 q.v. 957 [54] S. Eilenberg, Automata, languages, and machines. Vol. A. Pure and Applied Mathematics, 58. Academic Press, New York, 1974. MR 0530382 Zbl 0317.94045 q.v. 953, 954 [55] H. B. Enderton, A mathematical introduction to logic. Academic Press, New York and London, 1972. MR 0337470 Zbl 0298.02002 q.v. 958 [56] S. Fabre, Une généralisation du théorème de Cobham. Acta Arith. 67 (1994), no. 3, 197–208. MR 1292734 Zbl 0814.11015 q.v. 963 [57] S. Fabre, Substitutions et ˇ -systèmes de numération. Theoret. Comput. Sci. 137 (1995), no. 2, 219–236. MR 1311222 Zbl 0872.11017 q.v. 964, 970 [58] I. Fagnot, Sur les facteurs des mots automatiques. Theoret. Comput. Sci. 172 (1997), no. 1–2, 67–89. MR 1432857 Zbl 0983.68102 q.v. 964 [59] A. S. Fraenkel, Systems of numeration. Amer. Math. Monthly 92 (1985), no. 2, 105–114. MR 0777556 Zbl 0568.10005 q.v. 950 [60] C. Frougny, Representations of numbers and finite automata. Math. Systems Theory 25 (1992), no. 1, 37–60. MR 1139094 Zbl 0776.11005 q.v. 951, 958 [61] C. Frougny, Non-standard number representation: computer arithmetic, beta-numeration and quasicrystals. In Physics and theoretical computer science (J.-P. Gazeau, J. Nešetřil, and B. Rovan, eds.). From numbers and languages to (quantum) cryptography. Proceedings of the NATO Advanced Study Institute (ASI) School on Emerging Computer Security Technologies held in Cargese, October 17–29, 2005. NATO Security through Science Series D: Information and Communication Security, 7. IOS Press, Amsterdam, 2007, 155–169. MR 2504335 q.v. 948 [62] F. R. Gantmacher, The theory of matrices. Chelsea Publishing Co., New York, 1959. MR 0107649 q.v. 961

982

Fabien Durand and Michel Rigo

[63] S. Ginsburg and E. H. Spanier, Semigroups, Presburger formulas, and languages. Pacific J. Math. 16 (1966), 285–296. MR 0191770 Zbl 0143.01602 q.v. 959 [64] W. H. Gottschalk and G. A. Hedlund, Topological dynamics. American Mathematical Society Colloquium Publications, 36. American Mathematical Society, Providence, R.I., 1955. MR 0074810 Zbl 0067.15204 q.v. 953 [65] G. Hansel, A propos d’un théorème de Cobham. In Actes de la fête des mots (D. Perrin, ed.). Greco de Programmation, CNRS, Rouen, 1982, 55–59. q.v. 953 [66] G. Hansel, Systèmes de numération indépendants et syndéticité. Theoret. Comput. Sci. 204 (1998), no. 1–2, 119–130. MR 1637516 Zbl 0952.68073 q.v. 953 [67] G. Hansel and T. Safer, Vers un théorème de Cobham pour les entiers de Gauss. Bull. Belg. Math. Soc. Simon Stevin 10 (2003), suppl., 723–735. MR 2073023 Zbl 1071.68089 q.v. 976, 977 [68] G. H. Hardy and E. M. Wright, An introduction to the theory of numbers. Fifth edition. The Clarendon Press, Oxford University Press, New York, 1979. MR 0568909 Zbl 0423.10001 q.v. 948 [69] T. Harju and M. Linna, On the periodicity of morphisms on free monoids. RAIRO Inform. Théor. Appl. 20 (1986), no. 1, 47–54. MR 0849965 Zbl 0608.68065 q.v. 978 [70] M. Hollander, Greedy numeration systems and regularity. Theory Comput. Syst. 31 (1998), no. 2, 111–133. MR 1491655 Zbl 0895.68088 q.v. 951 [71] C. Holton and L. Q. Zamboni, Directed graphs and substitutions. Theory Comput. Syst. 34 (2001), no. 6, 545–564. MR 1865811 Zbl 0993.68075 q.v. 974 [72] J. Honkala, A decision method for the recognizability of sets defined by number systems. RAIRO Inform. Théor. Appl. 20 (1986), no. 4, 395–403. MR 0880843 Zbl 0639.68074 q.v. 977 [73] J. Honkala, On the simplification of infinite morphic words. Theoret. Comput. Sci. 410 (2009), no. 8–10, 997–1000. MR 2492043 Zbl 1162.68031 q.v. 963 [74] R. A. Horn and C. R. Johnson, Matrix analysis. Corrected reprint of the 1985 original. Cambridge University Press, Cambridge, 1990. MR 1084815 Zbl 0704.15002 q.v. 961 [75] I. Kátai and J. Szabó, Canonical number systems for complex integers. Acta Sci. Math. (Szeged) 37 (1975), no. 3–4, 255–260. MR 0389759 Zbl 0297.12003 q.v. 976 [76] K. S. Kedlaya, The algebraic closure of the power series field in positive characteristic. Proc. Amer. Math. Soc. 129 (2001), no. 12, 3461–3470. MR 1860477 Zbl 1012.12007 q.v. 971 [77] K. S. Kedlaya, Finite automata and algebraic extensions of function fields. J. Théor. Nombres Bordeaux 18 (2006), no. 2, 379–420. MR 2289431 Zbl 1161.11317 q.v. 971 [78] B. P. Kitchens, Symbolic dynamics. One-sided, two-sided and countable state Markov shifts. Universitext. Springer, Berlin, 1998. MR 1484730 Zbl 0892.58020 q.v. 973 [79] D. E. Knuth, The art of computer programming. Vol. 2. Seminumerical algorithms. Second edition. Addison-Wesley Series in Computer Science and Information Processing. Addison-Wesley Publishing Co., Reading, MA, 1981. MR 0633878 Zbl 0477.65002 q.v. 971 [80] T. J. P. Krebs, A more reasonable proof of Cobham’s theorem. Preprint, 2018. arXiv:1801.06704 [cs.FL] q.v. 953 [81] D. Krieger, A. Miller, N. Rampersad, B. Ravikumar, and J. Shallit, Decimations of languages and state complexity. Theoret. Comput. Sci. 410 (2009), no. 24–25, 2401–2409. MR 2522444 Zbl 1168.68026 q.v. 960

26. On Cobham’s theorem

983

[82] P. Kůrka, Topological and symbolic dynamics. Cours Spécialisés, 11. Société Mathématique de France, Paris, 2003. MR 2041676 Zbl 1038.37011 q.v. 973 [83] L. Latour, Presburger arithmetic: from automata to formulas. Ph.D. thesis. University of Liège, Liège, 2006. q.v. 958 [84] P. B. A. Lecomte and M. Rigo, Numeration systems on a regular language. Theory Comput. Syst. 34 (2001), no. 1, 27–44. MR 1799066 Zbl 0969.68095 q.v. 959, 960 [85] J. Leroux, A polynomial time Presburger criterion and synthesis for number decision diagrams. In 20 th Annual IEEE Symposium on Logic in Computer Science. LICS’ 05. Chicago, IL, June 26–29, 2005. IEEE Computer Society, Los Alamitos, CA, 2005, 147–156. IEEEXplore 1509219 q.v. 977 [86] D. Lind and B. Marcus, An introduction to symbolic dynamics and coding. Cambridge University Press, Cambridge, 1995. MR 1369092 Zbl 1106.37301 q.v. 961, 962, 973 [87] M. Lothaire, Combinatorics on words. A collective work by D. Perrin, J. Berstel, C. Choffrut, R. Cori, D. Foata, J.-É. Pin, G. Pirillo, C. Reutenauer, M. P. Schützenberger, J. Sakarovitch, and I. Simon. With a foreword by R. Lyndon. Edited and with a preface by D. Perrin. Encyclopedia of Mathematics and its Applications, 17. Addison-Wesley, Reading, MA, 1983. MR 0675953 Zbl 0514.20045 q.v. 953, 955 [88] M. Lothaire, Algebraic combinatorics on words. A collective work by J. Berstel, D. Perrin, P. Seebold, J. Cassaigne, A. De Luca, S. Varricchio, A. Lascoux, B. Leclerc, J.-Y. Thibon, V. Bruyère, C. Frougny, F. Mignosi, A. Restivo, C. Reutenauer, D. Foata, G.-N. Han, J. Désarménien, V. Diekert, T. Harju, J. Karhumäki and W. Plandowski. With a preface by Berstel and Perrin. Encyclopedia of Mathematics and its Applications, 90. Cambridge University Press, Cambridge, 2002. MR 905123 Zbl 1001.68093 q.v. 950, 951, 952, 953 [89] A. Maes, Morphisms and almost-periodicity. Discrete Appl. Math. 86 (1998), no. 2–3, 233–248. MR 1636500 Zbl 0906.68111 q.v. 978 [90] A. Maes, More on morphisms and almost-periodicity. Theoret. Comput. Sci. 231 (2000), no. 2, 205–215. MR 1739891 Zbl 0951.68118 q.v. 978 [91] C. Michaux and R. Villemaire, Cobham’s theorem seen through Büchi’s theorem. In Automata, languages and programming (A. Lingas, R. G. Karlsson, and S. Carlsson, eds.). Proceedings of the Twentieth International Colloquium (ICALP 93) held at Lund University, Lund, July 5–9, 1993. Lecture Notes in Computer Science, 700. Springer, Berlin, 1993, 325–334. MR 1252419 Zbl 1422.68156 q.v. 959 [92] C. Michaux and R. Villemaire, Presburger arithmetic and recognizability of sets of natural numbers by automata: new proofs of Cobham’s and Semenov’s theorems. Ann. Pure Appl. Logic 77 (1996), no. 3, 251–277. MR 1370990 Zbl 0857.03003 q.v. 959 [93] I. V. Mitrofanov, Periodicity of morphic words. Fundam. Prikl. Mat. 18 (2013), no. 4, 107–119. In Russian. English translation, J. Math. Sci. (N.Y.) 206 (2015), no. 6, 679–687 MR 3431835 Zbl 1385.68028 q.v. 978 [94] M. Morse and G. A. Hedlund, Symbolic dynamics. Amer. J. Math. 60 (1938), no. 4, 815–866. MR 1507944 JFM 64.0798.04 Zbl 0019.33502 q.v. 955 [95] A. A. Muchnik, The definable criterion for definability in Presburger arithmetic and its applications. Theoret. Comput. Sci. 290 (2003), no. 3, 1433–1444. MR 1937730 Zbl 1052.68079 q.v. 959, 975, 977 [96] M. Nivat, invited talk at ICALP. Bologna, 1997. q.v. 956 [97] N. Ormes, C. Radin, and L. Sadun, A homeomorphism invariant for substitution tiling spaces. Geom. Dedicata 90 (2002), 153–182. MR 1898159 Zbl 0997.37006 q.v. 976

984

Fabien Durand and Michel Rigo

[98] J.-J. Pansiot, Complexité des facteurs des mots infinis engendrés par morphismes itérés. In Automata, languages and programming (in J. Paredaens, ed.). Lecture Notes in Computer Science, 172. Springer, Berlin, 1984, 380–389. MR 0784265 Zbl 0554.68053 q.v. 969 [99] J.-J. Pansiot, Decidability of periodicity for infinite words. RAIRO Inform. Théor. Appl. 20 (1986), no. 1, 43–46. MR 0849964 Zbl 0617.68063 q.v. 978 [100] W. Parry, On the ˇ -expansions of real numbers. Acta Math. Acad. Sci. Hungar. 11 (1960), 401–416. MR 0142719 Zbl 0099.28103 q.v. 951 [101] B. Pascal, Œuvres complètes. Seuil, 1963. The treatise De numeris multiplicibus. Written with the other arithmetical treatises before 1654. Guillaume Desprez, Paris, 1665. q.v. 949 [102] D. Perrin, Finite automata. In J. van Leeuwen (ed.), Handbook of theoretical computer science. Vol. B. Formal models and semantics. Elsevier Science Publishers, B.V., Amsterdam, and MIT Press, Cambridge, MA, 1990, 1–57. MR 1127186 Zbl 1127185 q.v. 948, 953 [103] F. Point and V. Bruyère, On the Cobham–Semenov theorem. Theory Comput. Syst. 30 (1997), no. 2, 197–220. MR 1424937 Zbl 0900.68312 q.v. 959 [104] F. Point, M. Rigo, and L. Waxweiler, Defining multiplication in some additive expansions of polynomial rings. Comm. Algebra 44 (2016), no. 5, 2075–2099. MR 3490667 Zbl 1403.03015 q.v. 977 [105] M. Presburger, Über die Volständigkeit eines gewissen Systems der Arithmetik ganzer Zahlen, in welchem die Addition als einzige Operation hervortritt. C. R. Congrès Math. Pays slaves 1930, 92–101. JFM 56.0825.04 q.v. 957, 958 [106] M. Presburger, On the completeness of a certain system of arithmetic of whole numbers in which addition occurs as the only operation. Hist. Philos. Logic 12 (1991), no. 2, 225–233. Translated from the German and with commentaries by D. Jacquette. MR 1111343 Zbl 0741.03027 q.v. 957, 958 [107] N. Pytheas Fogg, Substitutions in dynamics, arithmetics and combinatorics (V. Berthé, S. Ferenczi, C. Mauduit and A. Siegel, eds.). Lecture Notes in Mathematics, 1794. Springer, Berlin, 2002. MR 1970385 Zbl 1014.11015 q.v. 955, 962 [108] A. Quas and L. Zamboni, Periodicity and local complexity. Theoret. Comput. Sci. 319 (2004), no. 1–3, 229–240. MR 2074955 Zbl 1068.68117 q.v. 956 [109] M. Queffélec, Substitution dynamical systems – spectral analysis. Lecture Notes in Mathematics, 1294. Springer, Berlin, 1987. MR 0924156 Zbl 0642.28013 q.v. 967, 973 [110] A. Rényi, Representations for real numbers and their ergodic properties. Acta Math. Acad. Sci. Hungar. 8 (1957), 477–493. MR 0097374 Zbl 0079.08901 q.v. 951 [111] M. Rigo, Generalization of automatic sequences for numeration systems on a regular language. Theoret. Comput. Sci. 244 (2000), no. 1–2, 271–281. MR 1774400 Zbl 0945.68105 q.v. 960 [112] M. Rigo, Syntactical and automatic properties of sets of polynomials over finite fields. Finite Fields Appl. 14 (2008), no. 1, 258–276. MR 2381492 Zbl 1140.11059 q.v. 977 [113] M. Rigo, Formal languages, automata and numeration systems. 2. Applications to recognizability and decidability. With a foreword by V. Berthé. Networks and Telecommunications Series. ISTE, London, and John Wiley & Sons, Hoboken, N.J., 2014. MR 3526118 Zbl 1326.68003 q.v. 957 [114] M. Rigo and A. Maes, More on generalized automatic sequences. J. Autom. Lang. Comb. 7 (2002), no. 3, 351–376. MR 1957696 Zbl 1033.68069 q.v. 960

26. On Cobham’s theorem

985

[115] M. Rigo and L. Waxweiler, A note on syndeticity, recognizable sets and Cobham’s theorem. Bull. Eur. Assoc. Theor. Comput. Sci. 88 (2006), 169–173. MR 2222340 Zbl 1169.68490 q.v. 953 [116] J. Sakarovitch, Éléments de théorie des automates. Vuibert, Paris, 2003. English corrected edition, Elements of automata theory. Translated by R. Thomas. Cambridge University Press, Cambridge, 2009. MR 2567276 Zbl 1178.68002 (French ed.) Zbl 1188.68177 q.v. 949 [117] A. Salomaa and M. Soittola, Automata-theoretic aspects of formal power series. Texts and Monographs in Computer Science. Springer, Berlin, 1978. MR 0483721 Zbl 0377.68039 q.v. 965 [118] O. Salon, Suites automatiques à multi-indices. With an appendix by J. O. Shallit. Sémin. Théor. Nombres, Univ. Bordeaux I 1986-1987, Exp. No. 4, 27 p. (1987); Appendix 29A–36A (1987). Zbl 0653.10049 q.v. 956 [119] O. Salon, Suites automatiques à multi-indices et algébricité. C. R. Acad. Sci. Paris Sér. I Math. 305 (1987), no. 12, 501–504. MR 0916320 Zbl 0628.10007 q.v. 956 [120] A. L. Semenov, The Presburger nature of predicates that are regular in two number systems. Sibirsk. Mat. Ž. 18 (1977), no. 2, 403–418, 479. In Russian. English translation, Siberian Math. J. 18 (1977), no. 2, 289–299. MR 0450050 Zbl 0411.03054 q.v. 958, 975 [121] E. Seneta, Non-negative matrices and Markov chains. Revised reprint of the second (1981) edition. Springer Series in Statistics. Springer, New York, 2006. MR 2209438 Zbl 1099.60004 q.v. 961 [122] J. O. Shallit, Numeration systems, linear recurrences, and regular sets. Inform. and Comput. 113 (1994), no. 2, 331–347. MR 1285236 Zbl 0810.11006 q.v. 950, 960 [123] B. Solomyak, Dynamics of self-similar tilings. Ergodic Theory Dynam. Systems 17 (1997), no. 3, 695–738. MR 1452190 Zbl 0884.58062 q.v. 976 [124] B. Tan and Z.-Y. Wen, Some properties of the Tribonacci sequence. European J. Combin. 28 (2007), no. 6, 1703–1719. MR 2339496 Zbl 1120.11009 q.v. 962 [125] R. Villemaire, Joining k - and l -recognizable sets of natural numbers. In STACS 92 (A. Finkel and M. Jantzen, eds.). Proceedings of the Ninth Annual Symposium on Theoretical Aspects of Computer Science held in Cachan, February 13–15, 1992. Lecture Notes in Computer Science, 577. Springer, Berlin, 1992, 83–94. MR 1255595 Zbl 0744.03044 q.v. 959 [126] R. Villemaire, The theory of hN ; C; Vk ; Vl i is undecidable. Theoret. Comput. Sci. 106 (1992), no. 2, 337–349. MR 1192774 Zbl 0773.03008 q.v. 959 [127] M. Waldschmidt, Diophantine approximation on linear algebraic groups. Transcendence properties of the exponential function in several variables. Grundlehren der Mathematischen Wissenschaften, 326. Springer, Berlin, 2000. MR 1756786 Zbl 0944.11024 q.v. 977 [128] L. Waxweiler, Caractère reconnaissable d’ensembles de polynômes à coefficients dans un corps fini. Ph.D. thesis. University of Liège, Liège, 2009. https://hdl.handle.net/2268/11381 q.v. 977 [129] P. Wolper and B. Boigelot, Verifying systems with infinite but regular state spaces. In Computer aided verification (A. J. Hu and M. Y. Vardi, eds.). Proceedings of the 10 th International Conference (CAV’98) held at the University of British Columbia, Vancouver, BC, June 28–July 2, 1998. Lecture Notes in Computer Science, 1427. Springer, Berlin, 1998, 88–97. Springer. MR 1729031 q.v. 948

986

Fabien Durand and Michel Rigo

[130] É. Zeckendorf, Représentation des nombres naturels par une somme de nombres de Fibonacci ou de nombres Lucas. Bull. Soc. Roy. Sci. Liège 41 (1972), 179–182. MR 0308032 Zbl 0252.10011 q.v. 950

Chapter 27

Symbolic dynamics Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin

Contents 1. 2. 3. 4. 5. 6. 7.

Introduction . . . . . . . . . Shift spaces . . . . . . . . . Automata . . . . . . . . . . Minimal automata . . . . . . Symbolic conjugacy . . . . . Special families of automata . Syntactic invariants . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

987 988 997 1004 1010 1017 1023

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1028

1. Introduction Symbolic dynamics is part of dynamical systems theory. It studies discrete dynamical systems called shift spaces and their relations under appropriately defined morphisms, in particular isomorphisms called conjugacies. A special emphasis has been put on the classification of shift spaces up to conjugacy or flow equivalence. There is a considerable overlap between symbolic dynamics and automata theory. Actually, one of the basic objects of symbolic dynamics, the sofic systems, are essentially the same as finite automata. In addition, the morphisms of shift spaces are a particular case of rational transductions, that is, functions defined by finite automata with output. The difference is that symbolic dynamics mostly considers infinite words and that all states of the automata are both initial and final. Also, the morphisms are particular kinds of transductions that are given by local maps. This chapter presents some of the links between automata theory and symbolic dynamics. The emphasis is on two particular points. The first one is the interplay between some particular classes of automata, such as local automata, and results on embeddings of shifts of finite type. The second one is the connection between syntactic semigroups and the classification of sofic shifts up to conjugacy. Note that symbolic dynamics also appears in Chapter 17. The chapter is organised as follows. In § 2 we introduce the basic notions of symbolic dynamics: shift spaces, conjugacy and flow equivalence. We state without proof two important results: the decomposition theorem and the classification theorem.

988

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin

In § 3 we introduce automata in relation to sofic shifts. In § 4 we define two kinds of minimal automata for shift spaces: the Krieger automaton and the Fischer automaton. We also relate these automata to the syntactic semigroup of a shift space. In § 5 we state and prove an analogue due to Nasu of the decomposition theorem and the classification theorem. In § 6 we consider two special families of automata: local automata and automata with finite delay. We show that they are related to shifts of finite type and almost finite type, respectively. We prove an embedding theorem (Theorem 6.4) that is a counterpart, for automata, of a result known as Nasu’s masking lemma. In § 7 we study syntactic invariants of sofic shifts. We introduce the syntactic graph of an automaton. We show that the syntactic graph of an automaton is invariant under conjugacy (Theorem 7.4) and also under flow equivalence. Finally, we state some results concerning the shift spaces corresponding to pseudovarieties of ordered semigroups. We follow the notation of the book of Doug Lind and Brian Marcus [21]. In general, we have not reproduced the proofs of the results that can be found there. We thank Mike Boyle and Alfredo Costa for their help.

2. Shift spaces This section contains basic definitions concerning symbolic dynamics. The first subsection gives the definition of shift spaces, and the important case of edge shifts. The next subsection (§ 2.2) introduces conjugacy, and the basic notion of state splitting and merging. It contains the statement of two important theorems, the decomposition theorem (Theorem 2.12) and the classification theorem (Theorem 2.14). The last subsection (§ 2.3) introduces flow equivalence, and states Frank’s characterisation of flow equivalent edge shifts (Theorem 2.16). 2.1. Shift spaces. Let A be a finite alphabet. We let A denote the set of words on A and AC denote the set of nonempty words. A word v is a factor of a word t if t D uvw for some words u; w . We denote by AZ the set of bi-infinite sequences of symbols from A. This set is a topological space in the product topology of the discrete topology on A. The shift transformation on AZ is the map A from AZ onto itself defined by y D A .x/ if yn D xnC1 for n 2 Z. A set X  AZ is shift invariant if .X / D X . A shift space on the alphabet A is a shift-invariant subset of AZ which is closed in the topology. The set AZ itself is a shift space called the full shift. For a set W  A of words (whose elements are called the forbidden factors), we let X .W / denote the set of x 2 AZ such that no w 2 W is a factor of x . Proposition 2.1. The shift spaces on the alphabet A are the sets X .W / for W  A .

A shift space X is of finite type if there is a finite set W  A such that X D X .W / .

27. Symbolic dynamics

989

Example 2.1. Let A D ¹a; bº, and let W D ¹bbº. The shift X .W / is composed of the sequences without two consecutive b ’s. It is a shift of finite type, called the golden mean shift. Recall that a set W  A is recognisable if it can be recognised by a finite automaton or, equivalently, defined by a regular expression. A shift space X is sofic if there is a recognisable set W such that X D X .W / . Since a finite set is recognisable, any shift of finite type is sofic. Example 2.2. Let A D ¹a; bº, and let W D a.bb/ ba. The shift X .W / is composed of the sequences where two consecutive occurrences of the symbol a are separated by an even number of b ’s. It is a sofic shift called the even shift. It is not a shift of finite type. Indeed, assume that X D X .V / for a finite set V  A . Let n be the maximal length of the words of V . A bi-infinite repetition of the word ab n has the same blocks of length at most n as a bi-infinite repetition of the word ab nC1 . However, one is in X if and only if the other is not in X , a contradiction. Example 2.3. Let A D ¹a; bº and let W D ¹ban b m a j n; m > 1, n ¤ mº. The shift X .W / is composed of infinite sequences of the form    ani b ni ani C1 b ni C1    . The set W is not recognisable and it can be shown that X is not sofic. Edge shifts. In this chapter, a graph G D .Q; E/ is a pair composed of a finite set Q of vertices (or states), and a finite set E of edges. The graph is equipped with two maps i; tW E ! Q which associate, with an edge e , its initial and terminal vertex. 1 We say that e starts in i.e/ and ends in t.e/. Sometimes, i.e/ is called the source and t.e/ is called the target of e . We also say that e is an incoming edge for t.e/, and an outgoing edge for i.e/. Two edges e; e 0 2 E are consecutive if t.e/ D i.e 0 /. For p; q 2 Q, we denote by Epq the set of edges of a graph G D .Q; E/ starting in state p and ending in state q . The adjacency matrix of a graph G D .Q; E/ is the Q  Q-matrix M.G/ with elements in N defined by M.G/pq D Card.Epq /:

A (finite or bi-infinite) path is a (finite or bi-infinite) sequence of consecutive edges. The edge shift on the graph G is the set of bi-infinite paths in G . It is denoted by XG and is a shift of finite type on the alphabet of edges. Indeed, it can be defined by taking the set of non-consecutive edges for the set of forbidden factors. The converse does not hold, since the golden mean shift is not an edge shift. However, we shall see below (Proposition 2.5) that every shift of finite type is conjugate to an edge shift. A graph is essential if every state has at least one incoming and one outgoing edge. This implies that every edge is on a bi-infinite path. The essential part of a graph G is the subgraph obtained by restricting to the set of vertices and edges that are on a bi-infinite path. 1 We avoid the use of the terms “initial state” or “terminal state” of an edge to avoid confusion with the initial or terminal states of an automaton

990

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin

2.2. Conjugacy Morphisms. Let X be a shift space on an alphabet A, and let Y be a shift space on an alphabet B . A morphism ' from X into Y is a continuous map from X into Y that commutes with the shift. This means that ' ı A D B ı ' . Let k be a positive integer. A k -block of X is a factor of length k of an element of X . We denote by B.X / the set of all blocks of X and by Bk .X / the set of k -blocks of X . A function f W Bk .X / ! B is called a k -block substitution. Now let m; n be fixed nonnegative integers with k D m C 1 C n. Then the function f defines a map ' called a sliding block map with memorym and anticipation n as follows. The image of x 2 X is the element y D '.x/ 2 B Z given by yi D f .xi

m

   xi    xi Cn /:

Œm;n We write ' D f1 . It is a sliding block map from X into Y if y is in Y for all x in X . We also say that ' is a k -block map from X into Y . The simplest case occurs when m D n D 0. In this case, ' is a 1-block map. The following result is Theorem 6.2.9 in [21].

Theorem 2.2 (Curtis, Lyndon, and Hedlund). A map from a shift space X into a shift space Y is a morphism if and only if it is a sliding block map. Conjugacies of shifts. A morphism from a shift X onto a shift Y is called a conjugacy if it is one-to-one from X onto Y . Note that in this case, using standard topological arguments, one can show that the inverse mapping is also a morphism, and thus a conjugacy. We define the n-th higher block shift X Œn of a shift X over the alphabet A as follows. The alphabet of X Œn is the set B D Bn .X / of blocks of length n of X . Proposition 2.3. The shifts X and X Œn for n > 1 are conjugate. Proof. Let f W Bn .X / ! B be the n-block substitution that maps the factor x1    xn to itself, viewed as a symbol of the alphabet B . By construction, the shift X Œn is the Œn 1;0 image of X by the map f1 . This map is a conjugacy since it is bijective, and its inverse is the 1-block map g1 corresponding to the 1-block map which associates the symbol xn of A with the symbol x1    xn of B . Let G D .Q; E/ be a graph. For an integer n > 1, let G Œn denote the following graph called the n-th higher edge graph of G . For n D 1, one has G Œ1 D G . For n > 1, the set of states of G Œn is the set of paths of length n 1 in G . The edges of G Œn are the paths of length n of G . The start state of an edge .e1 ; e2 ; : : : ; en / is .e1 ; e2 ; : : : ; en 1 / and its end state is .e2 ; e3 ; : : : ; en /. The following result shows that the higher block shifts of an edge shift are again edge shifts.

27. Symbolic dynamics

991

Proposition 2.4. Let G be a graph. For n > 1, one has XGŒn D XG Œn .

A shift of finite type need not be an edge shift. For example, the golden mean shift of Example 2.1 is not an edge shift. However, any shift of finite type comes from an edge shift in the following sense. Proposition 2.5. Every shift of finite type is conjugate to an edge shift. Proof. We show that for every shift of finite type X there is an integer n such that X Œn is an edge shift. Let W  A be a finite set of words such that X D X .W / , and let n be the maximal length of the words of W . If n D 0, X is the full shift. Thus we assume n > 1. Define a graph G whose vertices are the blocks of length n 1 of X , and whose edges are the block of length n of X . For w 2 Bn .X /, the initial (resp. terminal) vertex of w is the prefix (resp. suffix) of length n 1 of w . We show that XG D X Œn . An element of X Œn is always an infinite path in G . To show the other inclusion, consider an infinite path y in G . It is the sequence of n-blocks of an element x of AZ that does not contain any block on W . Since X D X .W / , we get that x is in X . Consequently, y is in X Œn . This proves the equality. Proposition 2.6. A shift space that is conjugate to a shift of finite type is itself of finite type. Proof. Let 'W X ! Y be a conjugacy from a shift of finite type X onto a shift space Y . By Proposition 2.5, we may assume that X D XG for some graph G . Changing G into some higher edge graph, we may assume that ' is 1-block. We may consider G as a graph labelled by ' . Suppose that ' 1 has memory m and anticipation n. Set Œm;n ' 1 D f1 . Let W be the set of words of length m C n C 2 that are not the label of a path in G . We show that Y D X .W / , which implies that Y is of finite type. Indeed, the inclusion Y  X .W / is clear. Conversely, consider y in X .W / . For each i 2 Z, set xi D f .yi m    yi    yi Cn /. Since yi m    yi    yi Cn yi CnC1 is the label of a path in G , the edges xi and xi C1 are consecutive. Thus x D .xi /i 2Z is in X and y D '.x/ is in Y . Conjugacy invariants. No effective characterisation of conjugate shift spaces is known, even for shifts of finite type. There are however several quantities that are known to be invariant under conjugacy. The entropy of a shift space X is defined by 1 h.X / D lim log sn ; n!1 n where sn D Card.Bn .X //. The limit exists because the sequence sn is sub-additive (see Lemma 4.1.7 in [21]). Note that since Card.Bn .X // 6 Card.A/n , we have h.X / 6 log Card.A/. If X is nonempty, then 0 6 h.X /. The following statement shows that the entropy is invariant under conjugacy (see Corollary 4.1.10 in [21]). Theorem 2.7. If X; Y are conjugate shift spaces, then h.X / D h.Y /.

992

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin

Example 2.4. Let X be the golden mean shift of Example 2.1. Then a block of length n C 1 is either a block of length n 1 followed by ab or a block of length n followed by a. ThuspsnC1 D sn C sn 1 . Thus we get the classical result that h.X / D log  where  D .1 C 5/=2 is the golden mean.

An element x of a shift space X over the alphabet A has period n if An .x/ D x . If 'W X ! Y is a conjugacy, then an element x of X has period n if and only if '.x/ has period n. The zeta function of a shift space X is the power series X pn X .z/ D exp zn ; n n>0

where pn is the number of elements x of X of period n. It follows from the definition that the sequence .pn /n2N is invariant under conjugacy, and thus the zeta function of a shift space is invariant under conjugacy. Several other conjugacy invariants are known. One of them is the Bowen–Franks group of a matrix, which defines an invariant of the associated shift space. This will be defined below. Example 2.5. Let X D AZ . Then X .z/ D 1 1kz , where k D Card.A/. Indeed, one has pn D k n , since an element x of AZ has period n if and only if it is a bi-infinite repetition of a word of length n over A.

State splitting. Let G D .Q; E/ and H D .R; F/ be graphs. A pair .h; k/ of surjective maps kW R ! Q and hW F ! E is called a graph morphism from H onto G if the two diagrams in Figure 1 are commutative. F

h

! Q

R

! E t

!

k

h

t

i

!

R

F

!

!

i

! E

k

! Q

Figure 1. Graph morphism

A graph morphism .h; k/ from H onto G is an in-merge from H onto G if for each q q p; q 2 Q there is a partition .Ep .t// t 2k 1 .q/ of the set Ep such that for each r 2 k 1 .p/ and t 2 k 1 .q/, the map h is a bijection from Frt onto Epq .t/. If this holds, then G is called an in-merge of H , and H is an in-split 2 of G . Thus an in-split H is obtained from a graph G as follows: each state q 2 Q is split into copies which are the states of H in the set k 1 .q/. Each of these states t receives a copy of Epq .t/ starting in r and ending in t for each r in k 1 .p/. Each r in k 1 .p/ has the same number of edges going out of r and coming in s , for any s 2 R. 2 In this chapter, a partition of a set X is a family .Xi /i 2I of pairwise disjoint, possibly empty subsets of X , indexed by a set I , such that X is the union of the sets Xi for i 2 I .

27. Symbolic dynamics

993

Moreover, for any p; q 2 Q and e 2 Epq , all edges in h 1 .e/ have the same terminal vertex, namely the state t such that e 2 Epq .t/.

Example 2.6. Let G and H be the graphs represented in Figure 2. Here Q D ¹1; 2º and R D ¹3; 4; 5º. The graph H is an in-split of the graph G . The graph morphism .h; k/ is defined by k.3/ D k.4/ D 1 and k.5/ D 2. Thus the state 1 of G is split into two states 3 and 4 of H , and the map h is associated with the partition obtained as follows: the edges from 2 to 1 are partitioned into two classes, indexed by 3 and 4 respectively, and containing each one edge from 2 to 1.

3 1

5

2 4

Figure 2. An in-split from G (on the left) onto H (on the right)

The following result is well known (see [21]). It shows that if H is an in-split of a graph G , then XG and XH are conjugate. Proposition 2.8 (Theorem 2.4.10 in [21]). If .h; k/ is an in-merge of a graph H onto a graph G , then h1 is a 1-block conjugacy from XH onto XG and its inverse is 2-block. The map h1 from XH to XG is called an edge in-merging map and its inverse an edge in-splitting map. A column division matrix over two sets R; Q is an R  Q-matrix D with elements in ¹0; 1º such that each column has at least one 1 and each row has exactly one 1. Thus, the columns of such a matrix represent a partition of R into Card.Q/ sets. The following result is Theorem 2.4.14 of [21]. Proposition 2.9. Let G and H be essential graphs. The graph H is an in-split of the graph G if and only if there is an R  Q-column division matrix D and a Q  R-matrix E with nonnegative integer entries such that M.G/ D ED;

M.H / D DE:

Example 2.7. For the graphs G; H of Example 2.6, M.H / D ED with 2   1 2 0 1 ED ; D D 41 1 1 0 0

(1)

one has M.G/ D DE and

3 0 05 : 1

Observe that a particular case of a column division matrix is a permutation matrix. The corresponding in-split (or merge) is a renaming of the states of a graph.

994

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin

The notion of an out-merge is defined symmetrically. A graph morphism .h; k/ from H onto G is an out-merge from H onto G if for each p; q 2 Q there is a partition q q .Ep .r//r2k 1 .p/ of the set Ep such that for each r 2 k 1 .p/ and t 2 k 1 .q/, the map t h is a bijection from Fr onto Epq .r/. If this holds, then G is called an out-merge of H , and H is an out-split of G . Proposition 2.8 also has a symmetrical version. Thus if .h; k/ is an out-merge from G onto H , then h1 is a 1-block conjugacy from XH onto XG whose inverse is 2-block. The conjugacy h1 is called an edge out-merging map and its inverse an edge outsplitting map. Symmetrically, a row division matrix is a matrix with elements in the set ¹0; 1º such that each column has at least one 1 and each row has exactly one 1. The following statement is symmetrical to Proposition 2.9. Proposition 2.10. Let G and H be essential graphs. The graph H is an out-split of the graph G if and only if there is a row division matrix D and a matrix E with nonnegative integer entries such that M.G/ D DE;

(2)

M.H / D ED:

Example 2.8. Let G and H be the graphs represented in Figure 3. Here Q D ¹1; 2º and R D ¹3; 4; 5º. The graph H is an out-split of the graph G . The graph morphism .h; k/ is defined by k.3/ D k.4/ D 1 and k.5/ D 2. The map h is associated with the partition indicated by the colours. One has M.G/ D ED and M.H / D DE with 2 3   1 1 1 1 0 DD ; E D 42 05 : 0 0 1 1 0 3 5 1

2

4

Figure 3. The graphs G and H

We use the term split to mean either an in-split or an out-split. The same convention holds for a merge. Proposition 2.11. For n > 2, the graph G Œn



is an in-merge of the graph G Œn .

Proof. For n > 2, consider the equivalence on the states of G Œn that relates two paths of length n 1 which differ only by the first edge. It is clear that this equivalence is such that two equivalent elements have the same output. Thus G Œn 1 is an in-merge of G Œn .

27. Symbolic dynamics

995

The decomposition theorem. The following result is known as the decomposition theorem (Theorem 7.1.2 in [21]). Theorem 2.12. Every conjugacy from an edge shift onto another is the composition of a sequence of edge splitting maps followed by a sequence of edge merging maps. The statement of Theorem 2.12 given in [21] is less precise, since it does not specify the order of splitting and merging maps. The proof relies on the following statement (Lemma 7.1.3 in [21]). Lemma 2.13. Let G; H be graphs and let 'W XG ! XH be a 1-block conjugacy whose x Hx inverse has memory m > 1 and anticipation n > 0. There are in-splittings G; of the graphs G; H and a 1-block conjugacy with memory m 1 and anticipation n 'W N XGx ! XHx such that the following diagram commutes: XG

!

'N

!

'

! XGx

XH

! XHx

The horizontal edges in the above diagram represent the edge in-splitting maps from XG to XGx and from XH to XHx respectively. The classification theorem. Two nonnegative integral square matrices M; N are elementary equivalent if there exists a pair R; S of nonnegative integral matrices such that M D RS; N D SR: Thus if a graph H is a split of a graph G , then, by Proposition 2.9, the matrices M.G/ and M.H / are elementary equivalent. The matrices M and N are strong shift equivalent if there is a sequence .M0 ; M1 ; : : : ; Mn / of nonnegative integral matrices such that Mi and Mi C1 are elementary equivalent for 0 6 i < n with M0 D M and Mn D N . The following theorem is Williams’ classification theorem (Theorem 7.2.7 in [21]). Theorem 2.14. Let G and H be two graphs. The edge shifts XG and XH are conjugate if and only if the matrices M.G/ and M.H / are strong shift equivalent. Note that one direction of this theorem is contained in the decomposition theorem. Indeed, if XG and XH are conjugate, there is a sequence of edge splitting and edge merging maps from XG to XH . And if G is a split or a merge of H , then M.G/ and M.H / are elementary equivalent, whence the result in one direction follows. Note also that, in spite of the easy definition of strong shift equivalence, it is not even known whether there exists a decision procedure for determining when two nonnegative integral matrices are strong shift equivalent. 2.3. Flow equivalence. In this section, we give basic definitions and properties concerning flow equivalence of shift spaces. The notion comes from the notion of equivalence of continuous flows; see § 13.6 of [21]. A characterisation of flow equivalence

996

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin

for shift spaces (which we will take below as our definition of flow equivalence for shift spaces) is due to Parry and Sullivan [25]. It is noticeable that the flow equivalence of irreducible shifts of finite type has an effective characterisation, by Franks’ theorem (Theorem 2.16). Let A be an alphabet and a be a letter in A. Let ! be a letter that does not belong to A. Set B D A [ ! . The symbol expansion of a set W  AC relative to a is the image of W by the semigroup morphism 'W AC ! B C such that '.a/ D a! and '.b/ D b for all b 2 A n a. Recall that a semigroup morphism f W AC ! B C is a map satisfying f .xy/ D f .x/f .y/ for all words x; y . It should not be confused with the morphisms of shift spaces defined earlier. The semigroup morphism ' is also called a symbol expansion. Let X be a shift space on the alphabet A. The symbol expansion of X relative to a is the least shift space X 0 on the alphabet B D A [ ! which contains the symbol expansion of B.X /. Note that if ' is a symbol expansion, it defines a bijection from B.X / onto B.X 0 /. The inverse of a symbol expansion is called a symbol contraction. Two shift spaces X; Y are flow equivalent if there is a sequence X0 ; : : : ; Xn of shift spaces such that X0 D X , Yn D Y and for 0 6 i 6 n 1, either Xi C1 is the image of Xi by a conjugacy, a symbol expansion or a symbol contraction. Example 2.9. Let A D ¹a; bº. The symbol expansion of the full shift AZ relative to b is conjugate to the golden mean shift. Thus the full shift on two symbols and the golden mean shift are flow equivalent. For edge shifts, symbol expansion can be replaced by another operation. Let G be a graph and let p be a vertex of G . The graph expansion of G relative to p is the graph G 0 obtained by replacing p by an edge from a new vertex p 0 to p to and replacing all edges coming in p by edges coming in p 0 (see Figure 4). The inverse of a graph expansion is called a graph contraction. Note that graph expansion (relative to vertex 1) changes the adjacency matrix of a graph as indicated below. 2 3 2 3 0 a11 a12 : : : a1n a11 a12 : : : a1n 61 0 0 ::: 0 7 6a21 a22 : : : a2n 7 6 7 6 7 60 a21 a22 : : : a2n 7 ! 6 :: 7 6 7 :: :: 6 :: 4 : :: :: :: 7 : : 5 4: : : : 5 an1 an2 : : : ann 0 an1 an2 : : : ann

Proposition 2.15. The flow equivalence relation on edge shifts is generated by conjugacies and graph expansions.

p

p0

Figure 4. Graph expansion

p

27. Symbolic dynamics

997

Proof. Let G D .Q; E/ be a graph and let p be a vertex of G . The graph expansion of G relative to p can be obtained by a symbol expansion of each of the edges coming into p , followed by a conjugacy that merges all the new symbols into one new symbol. Conversely, let e be an edge of G . The symbol expansion of XG relative to e can be obtained by a input split that makes e the only edge going into its end vertex q , followed by a graph expansion relative to q . The Bowen–Franks group of a square n  n-matrix M with integer elements is the Abelian group BF.M / D Zn =Zn .I M /;

where Zn .I M / is the image of Zn under the matrix I M acting on the right. In other terms, Zn .I M / is the Abelian group generated by the rows of the matrix I M . This notion is due to Bowen and Franks [5], who showed that it is an invariant for flow equivalence. The following result is due to Franks [16]. We say that a graph is trivial if it is reduced to one cycle.

Theorem 2.16. Let G; G 0 be two strongly connected nontrivial graphs and let M; M 0 be their adjacency matrices. The edge shifts XG ; XG 0 are flow equivalent if and only if det.I M / D det.I M 0 / and the groups BF.M /, BF.M 0 / are isomorphic. In the case of trivial graphs, the theorem is false. Indeed, any two edge shifts on strongly connected trivial graphs are flow equivalent, and are not flow equivalent to any edge shift on a nontrivial irreducible graph. For any trivial graph G with adjacency matrix M , one has det.I M / D 0 and BF.M /  Z. However there are nontrivial strongly connected graphs such that det.I M / D 0 and BF.M /  Z. The case of arbitrary shifts of finite type has been solved by Huang (see [6] and [9]). A similar characterisation for sofic shifts is not known (see [7]). Example 2.10. Let M D

 4 1

 1 ; 0

M0 D



 3 2 : 1 0

One has det.I M / D det.I M 0 / D 4. Moreover BF.M /  Z=4Z. Indeed, the rows of the matrix I M are Œ 3 1 and Œ 1 1. They generate the same group as Œ4 0 and Œ 1 1. Thus BF.M /  Z=4Z. In the same way, BF.M 0 /  Z=4Z. Thus, according to Theorem 2.16, the edge shifts XG and XG 0 are flow equivalent. Actually, XG and XG 0 are both flow equivalent to the full shift on 5 symbols.

3. Automata In this section, we start with the definition and notation for automata recognising shifts, and we show that sofic shifts are precisely the shifts recognised by finite automata (Proposition 3.3).

998

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin

We introduce the notion of labelled conjugacy; it is a conjugacy preserving the labelling. We extend the decomposition theorem and the classification theorem to labelled conjugacies (Theorems 3.8 and 3.9). 3.1. Automata and sofic shifts. The automata considered in this section are finite automata. We do not mention the initial and final states in the notation when all states are both initial and final. Thus, an automaton is denoted by A D .Q; E/ where Q is the finite set of states and E  Q  A  Q is the set of edges. The edge .p; a; q/ has initial state p , label a and terminal state q . The underlying graph of A is the same as A except that the labels of the edges are not used. An automaton is essential if its underlying graph is essential. The essential part of an automaton is its restriction to the essential part of its underlying graph. We let XA denote the set of bi-infinite paths in A. It is the edge shift of the underlying graph of A. Note that since the automaton is finite by assumption, the shift space XA is over a finite alphabet, as required for a shift space. We let LA denote the set of labels of bi-infinite paths in A. We let A denote the 1-block map from XA into the full shift AZ taking the value of the label of a path. Thus LA D A .XA /. If this holds, we say that XA is the shift space recognised by A. The following propositions describe how this notion of recognition is related to that for finite words. In the context of finite words, we let A D .Q; I; E; T / denote an automaton with distinguished subsets I (resp. T ) of initial (resp. terminal) states. A word w is recognised by A if there is a path from a state in I to a state in T labelled w . Recall that a set is recognisable if it is the set of words recognised by a finite automaton. An automaton A D .Q; I; T / is trim if, for every state p in Q, there is a path from a state in I to p and a path from p to a state in T . Proposition 3.1. Let W  A be a recognisable set and let A D .Q; I; T / be a trim finite automaton recognising the set A n A WA . Then LA D X .W / . Proof. The label of a bi-infinite path in the automaton A does not contain a factor w w in W . Otherwise, there is a finite path p ! q that is a segment of this infinite path. The w u w v path p ! q can be extended to a path i ! p ! q ! t for some i 2 I; t 2 T , and uwv is accepted by A, which is a contradiction. Next, consider a bi-infinite word x D .xi /i 2Z in X .W / . For every n > 0, there is a path n in the automaton A labelled wn D x n    x0    xn because the word wn has no factor in W . By compactness (König’s lemma) there is an infinite path in A labelled x . Thus x is in LA . The following proposition states in some sense the converse. Proposition 3.2. Let X be a sofic shift over A, and let A D .Q; I; T / be a trim finite automaton recognising the set B.X / of blocks of X . Then LA D X . Proof. Set W D A n B.X /. Then one easily checks that X D X .W / . Next, A recognises A n A WA . By Proposition 3.1, one has LA D X .

27. Symbolic dynamics

999

Proposition 3.3. A shift X over A is sofic if and only if there is a finite automaton A such that X D LA . Proof. The forward implication results from Proposition 3.1. Conversely, assume that X D LA for some finite automaton A. Let W be the set of finite words that are not labels of paths in A. Clearly X  X .W / . Conversely, if x 2 X .W / , then all its factors are labels of paths in A. Again by compactness, x itself is the label of a bi-infinite path in A. Example 3.1. The golden mean shift of Example 2.1 is recognised by the automaton of Figure 5 on the left, while the even shift of Example 2.2 is recognised by the automaton of Figure 5 on the right. b a

1

b a

2 a

1

2 b

Figure 5. Automata recognising the golden mean and the even shift

The adjacency matrix of the automaton A D .Q; E/ is the Q  Q-matrix M.A/ with elements in NhAi defined by ´ 1 if .p; a; q/ 2 E , .M.A/pq ; a/ D 0 otherwise. We write M for M.A/ when the automaton is understood. The entries in the matrix M n , for n > 0, have an easy combinatorial interpretation: for each word w of length n, n the coefficient .Mp;q ; w/ is the number of distinct paths from p to q carrying the label w . A matrix M is called alphabetic over the alphabet A if its elements are homogeneous polynomials of degree 1 over A with nonnegative coefficients. Adjacency matrices are special cases of alphabetic matrices. Indeed, its elements are homogeneous polynomials of degree 1 with coefficients 0 or 1. 3.2. Labelled conjugacy. Let A and B be two automata on the alphabet A. A labelled conjugacy from XA onto XB is a conjugacy ' such that A D B ' , that is such that the following diagram is commutative: '

XA

! XB !

B

!

A

Z

A

We say that A and B are conjugate if there exists a labelled conjugacy from XA to XB . The aim of this paragraph is to give two characterisations of labelled conjugacy.

1000

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin

Labelled split and merge. Let A D .Q; E/ and B D .R; F / be two automata. Let G; H be the underlying graphs of A and B, respectively. A labelled in-merge from B onto A is an in-merge .h; k/ from H onto G such that for each f 2 F the labels of f and h.f / are equal. We say that B is a labelled in-split of A, or that A is a labelled in-merge of B. The following statement is the analogue of Proposition 2.8 for automata. Proposition 3.4. If .h; k/ is a labelled in-merge from the automaton B onto the automaton A, then the map h1 is a labelled conjugacy from XB onto XA . Proof. Let .h; k/ be a labelled in-merge from B onto A. By Proposition 2.8, the map h1 is a 1-block conjugacy from XB onto XA . Since the labels of f and h.f / are equal for each edge f of B, this map is a labelled conjugacy. The next statement is the analogue of Proposition 2.9 for automata. Proposition 3.5. An automaton B D .R; F / is a labelled in-split of the automaton A D .Q; E/ if and only if there is an RQ-column division matrix D and an alphabetic Q  R-matrix N such that M.A/ D ND;

M.B/ D DN:

(3)

Proof. First, suppose that D and N are as described in the statement, and define a map kW R ! Q by k.r/ D q if Drq D 1. We define hW F ! E as follows. Consider an edge .r; a; s/ 2 F . Set p D k.r/ and q D k.s/. Since M.B/ D DN , we have .Nps ; a/ D 1. Since M.A/ D ND , this implies that .M.A/pq ; a/ D 1 or, equivalently, that .p; a; q/ 2 E . We set h.r; a; s/ D .p; a; q/. Then .h; k/ is a labelled in-merge. Indeed, h is associated with the partitions defined by Epq .t/ D ¹.p; a; q/ 2 E j .Npt ; a/ D 1 and k.t/ D qº:

Suppose conversely that .h; k/ is a labelled in-merge from B onto A. Let D be the R  Q-column division matrix defined by ´ 1 if k.r/ D q; Drq D 0 otherwise. For p 2 Q and t 2 R, we define Nrt as follows. Set q D k.t/. By definition of an in-merge, there is a partition .Epq .t// t 2k 1 .q/ of Epq such that h is a bijection from Frt onto Epq .t/. For a 2 A, set ´ q 1 if .p; a; q/ 2 Ep .t/; .Npt ; a/ D 0 otherwise. Then M.A/ D ND and M.B/ D DN .

27. Symbolic dynamics

1001

Example 3.2. Let A and B be the automata represented in Figure 6. Here Q D ¹1; 2º and R D ¹3; 4; 5º. One has M.A/ D ND and M.B/ D DN with 2 3   1 0 aCc 0 b N D ; D D 41 05 : 0 a 0 0 1 a a

3

c

b

b c

1

2

c

a

5

b

a a

4

Figure 6. An in-split from A to B

A labelled out-merge from B onto A is an out-merge .h; k/ from H onto G such that for each f 2 F the labels of f and h.f / are equal. We say that B is a labelled out-split of A, or that A is a labelled in-merge of B. Thus if B is a labelled out-split of A, there is a labelled conjugacy from XB onto XA . Proposition 3.6. The automaton B D .R; F / is a labelled out-split of the automaton A D .Q; E/ if and only if there is a Q  R-row division matrix D and an alphabetic R  Q-matrix N such that M.A/ D DN;

(4)

M.B/ D ND:

Example 3.3. Let A and B be the automata represented on Figure 7. Here Q D ¹1; 2º and R D ¹3; 4; 5º. One has M.A/ D ND and M.B/ D DN with 2 3   a b 1 1 0 N D 4 c 05 ; D D : 0 0 1 a 0 a

a

3

b

b c

1

2

c

a

a

5

a c

4

Figure 7. An out-split from A to B

a

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin

1002

Let A D .Q; E/ be an automaton. For a pair of integers m; n > 0, let AŒm;n denote the following automaton called the .m; n/-th extension of A. The underlying graph of AŒm;n is the higher edge graph G Œk for k D m C n C 1. The label of an edge a1

a2

am

p0 ! p1 !    ! pm

amC1

! pmC1

amC2

! 

amCn

! pmCn

amCnC1

! pmCnC1

is the letter amC1 . Observe that AŒ0;0 D A. By this construction, each graph G Œk produces k extensions according to the choice of the labelling. Proposition 3.7. For m > 1; n > 0, the automaton AŒm 1;n is a labelled in-merge of the automaton AŒm;n and for m > 0; n > 1, the automaton AŒm;n 1 is a labelled out-merge of the automaton AŒm;n . Proof. Suppose that m > 1; n > 0. Let k be the map from the paths of length m C n in A onto the paths of length m C n 1 that erases the first edge of the path. Let h be the map from the set of edges of AŒm;n to the set of edges of AŒm 1;n defined by h.; a; / D .k./; a; k.//. Then .h; k/ is a labelled in-merge from AŒm;n onto AŒm 1;n . The proof that, for m > 0; n > 1, the automaton AŒm;n 1 is an out-merge of the automaton AŒm;n is symmetrical. The following result is the analogue, for automata, of the decomposition theorem. Theorem 3.8. Every conjugacy of automata is a composition of labelled splits and merges. Proof. Let A and B be two conjugate automata. Let ' be a labelled conjugacy from A onto B. Let G0 and H0 be the underlying graphs of A and B, respectively. By the decomposition theorem 2.12, there are sequences .G1 ; : : : ; Gn / and .H1 ; : : : ; Hm / of graphs with Gn D Hm and such that Gi C1 is a split of Gi for 0 6 i < n and Hj C1 is a split of Hj for 0 6 j < m. Moreover, ' is the composition of the sequence of edge splitting maps from Gi onto Gi C1 , followed by the sequence of edge merging maps from Hj C1 onto Hj . Let .hi ; ki /, for 1 6 i 6 n, be a merge from Gi onto Gi 1 and .uj ; vj /, for 1 6 j 6 m be a merge from Hj onto Hj 1 . Then we may define labels on the edges of G1 ; : : : ; Gn in such a way that Gi becomes the underlying graph of an automaton Ai and .hi ; ki / is a labelled merge from Ai onto Ai 1 . In the same way, we may define labels on the edges of Hj in such a way that Hj becomes the underlying graph of an automaton Bj and .uj ; vj / is a labelled merge from Bj onto Bj 1 . G0

.h1 ;k1 /

G1   

.hn ;kn /

Gn D Hm

.um ;vm /

!    H1

.u1 ;v1 /

! H0 :

Let h D h1    hn and u D u1 u2    um . Since ' D u1 h11 , and ' is a labelled conjugacy, we have A h1 D B u1 . This shows that the automata An and Bm are equal. Thus there is a sequence of labelled splitting maps followed by a sequence of labelled merging maps which is equal to ' .

27. Symbolic dynamics

1003

Let M and M 0 be two alphabetic square matrices over the same alphabet A. We say that M and M 0 are elementary equivalent if there exists a nonnegative integral matrix D and an alphabetic matrix N such that M 0 D ND;

M D DN;

or vice-versa:

By Proposition 3.5, if B is an in-split of A, then M.B/ and M.A/ are elementary equivalent. We say that M; M 0 are strong shift equivalent if there is .M0 ; M1 ; : : : ; Mn / such that Mi and Mi C1 are elementary equivalent for 0 6 i < n with M0 D M and Mn D M 0 . The following result is the version, for automata, of the classification theorem. Theorem 3.9. Two automata are conjugate if and only if their adjacency matrices are strong shift equivalent. Note that when D is a column division matrix, the statement results from Propositions 3.4 and 2.9. The following statement proves the theorem in one direction. Proposition 3.10. Let A and B be two automata. If M.A/ is elementary equivalent to M.B/, then A and B are conjugate. Proof. Let A D .Q; E/ and B D .R; F /. Let D be an R  Q nonnegative integral matrix and let N be an alphabetic Q  R matrix such that M.A/ D ND;

M.B/ D DN:

Consider the map f from the set of paths of length 2 in A into F defined as follows (see a

b

Figure 9 on the left). Let p ! q ! r be a path of length 2 in A. Since .M.A/pq ; a/ D 1 and M.A/ D ND , there is a unique t 2 R such that .Npt ; a/ D D t q D 1. In the same way, since .M.A/qr ; b/ D 1, there is a unique u 2 R such that .Nqu ; b/ D Dur D 1. Since M.B/ D DN , we have .M.B/ t u ; b/ D D t q D .Nqu ; b/ D 1 and thus .t; u; b/ is an edge of B. We set a

b

b

f .p ! q ! r/ D t ! u:

Similarly, we may define a map g from the set of paths of length 2 in B into E by a

b

a

g.s ! t ! u/ D p ! q

Œ1;0 Œ0;1 if Dsp D .Npt ; a/ D D t q D 1. Let ' D f1 and D g1 (see Figure 9 on the right). We verify that ' D IdF ; ' D IdE where IdE and IdF are the identities on E Z and F Z . Indeed, let  be a path in XA and let  D './. Set i D .pi ; ai ; pi C1 / and i D .ri ; bi ; ri C1 / (see Figure 8). Then, by definition of ' , we have for all i 2 Z, bi D ai and .Npi ri C1 ; ai / D Dri pi D 1. Let  D ./ and  D .si ; ci ; si C1 /. By definition of , we have ci D bi and Dri si D .Nsi ri C1 ; bi / D 1. Thus we have simultaneously Dri pi D .Npi ri C1 ; ai / D 1 and Dri si D .Nsi ri C1 ; ai / D 1. Since M.A/ D DN , this forces pi D si . Thus  D  and this shows that ' D IdE . The fact that ' D IdF is proved in the same way.

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin ai 1

!

! pi

N

D



! ri

1

! pi

ai 1

ai

! D 1 ! ri

! pi C1

! 

! D ! ri C1

! 

N

ai

!



!

1004

Figure 8. Conjugacy of automata

D

D

! q N

s

a

N

D

! t

!

! u

:: :

a

!

! b

p

b

! u

!

N

D

t

! r

!

N

!

:: :

b

! q

!

!

f

a

!

p

!

:: :

g

:: :

Figure 9. The maps f and g

Proof of Theorem 3.9. In one direction, the above statement is a direct consequence of the Decomposition Theorem 2.12. Indeed, if A and B are conjugate, there is a sequence A0 ; A1 ; : : : ; An of automata such that Ai is a split or a merge of Ai C1 for 0 6 i < n with A0 D A and An D B. The other direction follows from Proposition 3.10.

4. Minimal automata In this section, we define two notions of minimal automaton for sofic shifts: the Krieger automaton and the Fischer automaton. The first is defined for any sofic shift, and the second for irreducible ones. The main result is that the Fischer automaton has the minimal number of states among all deterministic automata recognising a given sofic shift (Proposition 4.6). We then define the syntactic semigroup of a sofic shift, as an ordered semigroup. We show that this semigroup is isomorphic to the transition semigroup of the Krieger automaton and, for irreducible shifts, to the transition semigroup of the Fischer automaton (Proposition 4.8). Minimal automata of sets of finite words. Recall that an automaton A D .Q; E/ recognises a shift X if X D LA . There should be no confusion with the notion of acceptance for sets of finite words in the usual sense: if A has an initial state i and a set of terminal states T , the set of finite words recognised by A is the set of labels of finite paths from i to a terminal state t in T . In this chapter,3 an automaton is called deterministic if, for each state p and each letter a, there is at most one edge starting in p and carrying the label a. We write, as usual, p  u for the unique end state, provided it exists, of a path starting in p and labelled u. For a set W of A , there exists a unique 3 This contrasts with the more traditional definition which assumes, in addition, that there is a unique initial state.

27. Symbolic dynamics

1005

deterministic minimal automaton (this time with a unique initial state) recognising W . Its states are the nonempty sets u 1 W for u 2 A , called the right contexts of u or also the residuals of W . Its edges are the triples .u 1 W; a; .ua/ 1 W /, for a 2 A (see Chapter 1) Let A D .Q; E/ be a finite automaton. For a state p 2 Q, we denote by Lp .A/ or simply Lp the set of labels of finite paths starting from p . The automaton A is reduced if p ¤ q implies Lp ¤ Lq . A word w is synchronising for a deterministic automaton A if the set of paths labelled w is nonempty and all paths labelled w end in the same state. An automaton is synchronised if there is a synchronising word. The following result holds because all states are terminal. Proposition 4.1. A reduced deterministic automaton is synchronised. Proof. Let A D .Q; E/ be a reduced deterministic automaton. Given any word x , we denote by Q  X the set Q  x D ¹q  x j q 2 Qº. Let x be a word such that Q  x has minimal nonzero cardinality. Let p; q be two elements of the set Q  x . If u is a word such that p  u is nonempty, then q  u is also nonempty since otherwise Q  xu would be of nonzero cardinality less than Q  x . This implies that Lp D Lq and thus p D q since A is reduced. Thus x is synchronising. 4.1. Krieger automata and Fischer automata Krieger automata. We let A N denote the set of left-infinite words x D    x 1 x0 . For y D    y 1 y0 2 A N and z D z0 z1    2 AN , we let y  z D .wi /i 2Z denote the bi-infinite word defined by wi D yi C1 for i < 0 and wi D zi for i > 0. Let X be a shift space. For y 2 A N , the set of right contexts of y is the set CX .y/ D ¹z 2 AN j y  z 2 X º. For u 2 AC , we write u! D uu    and u ! D    uu. The Krieger automaton of a shift space X is the deterministic automaton whose states are nonempty sets of the form CX .y/ for y 2 A N , and whose edges are the triples .p; a; q/ where p D CX .y/ for some left-infinite word, a 2 A and q D CX .ya/. The definition of the Krieger automaton uses infinite words. One could use, instead of the sets CX .y/ for y 2 A N , the sets DX .y/ D ¹u 2 A j 9z 2 AN W yuz 2 X º:

Indeed CX .y/ D CX .y 0 / if and only if DX .y/ D DX .y 0 /. However, one cannot dispense completely with infinite words (see Proposition 4.2). Example 4.1. Let A D ¹a; bº, and let X D X .ba/ . The Krieger automaton of X is represented in Figure 10. The states are the sets 1 D CX .a ! / D a! [ a b ! and 2 D CX .a ! b/ D b ! . Proposition 4.2. The Krieger automaton of a shift space X is reduced and recognises X . It is finite if and only if X is sofic.

1006

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin b

a

1

b

2

Figure 10. The Krieger automaton of X .ba/

Proof. Let A D .Q; E/ be the Krieger automaton of X . Let p; q 2 Q and let y; z 2 A N be such that p D CX .y/, q D CX .z/. If Lp D Lq , then the labels of infinite paths starting from p and q are the same. Thus p D q . This shows that A is reduced. If A finite, then X is sofic by Proposition 3.3. Conversely, if X is sofic, let A be a finite automaton recognising X . The set of right contexts of a left-infinite word y only depends on the set of states p such that there is a path in the automaton A labelled y ending in state p . Thus the family of sets of right contexts is finite. We say that a deterministic automaton A D .Q; E/ over the alphabet A is a subautomaton of a deterministic automaton A0 D .Q0 ; E 0 / if Q  Q0 and for each edge .p; a; q/ 2 E such that p 2 Q one has q 2 Q and .p; a; q/ 2 E 0 . The following proposition appears in [24] and [13], where an algorithm to compute the states of the minimal automaton which are in the Krieger automaton is described. Proposition 4.3. The Krieger automaton of a sofic shift X is, up to an isomorphism, a subautomaton of the minimal automaton of the set of blocks of X . Proof. Let X be a sofic shift. Let y 2 A N and set y D    y 1 y0 with yi 2 A for i 6 0. Set ui D y i    y0 and Ui D ui 1 B.X /. Since B.X / is regular, the chain     Ui      U1  U0

is stationary. Thus there is an integer n > 0 such that UnCi D Un for all i > 0. We define s.y/ D Un . We show that the map CX .y/ 7! s.y/ is well defined and injective. Suppose first that CX .y/ D CX .y 0 / for some y; y 0 2 A N . Let u 2 A be such that y m    y0 u 2 B.X / for all m > n. By compactness, there exists a z 2 AN such that yuz 2 X . Then y 0  uz 2 X implies u 2 s.y 0 /. Symmetrically u 2 s.y 0 / implies u 2 s.y/. This shows that the map is well defined. To show that it is injective, consider y; y 0 2 A N such that s.y/ D s.y 0 /. Let z 2 CX .y/. For each integer m > 0, we have z0    zm 2 s.y/ and thus z0    zm 2 s.y 0 /. Since X is closed, this implies that y 0  z 2 X and thus z 2 CX .y 0 /. The converse implication is proved in the same way. Example 4.2. Consider the automaton A on 7 states i; j; k; 1; 2; 3; 4 given in Figure 11. It can be obtained, starting with the subautomaton over the states 1; 2; 3; 4, using the subset construction computing the accessible nonempty sets of states, starting from the state i D ¹1; 2; 3; 4º with j D ¹3; 4º, k D ¹1; 3; 4º.

27. Symbolic dynamics

1007

a a

c; f a

c; f

i

1

c; f

b

3

b

c d

j a

2

d

b

f b

e

4

k

a

e a

d

e e

Figure 11. An example of a Krieger automaton

The subautomaton with dark shaded states 1; 2; 3; 4 is strongly connected and recognises an irreducible sofic shift denoted by X . The whole automaton is the minimal automaton (with initial state i ) of the set B.X /. We identify each state of B.X / with one of the 7 states of A. Thus, for example, i D B.X / and k D a 1 B.X /. The Krieger automaton of X is the automaton on the five shaded states. Indeed, with the map s defined in the proof of Proposition 4.3, there is no left-infinite word y such that s.y/ D i or s.y/ D j . On the contrary, since i  a D k and k  a D k , one has s.a ! / D k .

Fischer automata of irreducible shift spaces. A shift space X  AZ is called irreducible if for any u; v 2 B.X / there exists a w 2 B.X / such that uwv 2 B.X /. An automaton is strongly connected if its underlying graph is strongly connected. Clearly a shift recognised by a strongly connected automaton is irreducible. A strongly connected component of an automaton A is minimal if all successors of vertices of the component are themselves in the component. One may verify that a minimal strongly connected component is the same as a strongly connected subautomaton. The following result is due to Fischer [15] (also, see § 3 in [21]). It implies, in particular, that an irreducible sofic shift can be recognised by a strongly connected automaton. Proposition 4.4. The Krieger automaton of an irreducible sofic shift X is synchronised and has a unique minimal strongly connected component.

Proof. Let A D .Q; E/ be the Krieger automaton of X . By Proposition 4.2, A is reduced and by Proposition 4.1, it follows that it is synchronised. Let x be a synchronising word. Let R be the set of states reachable from the state q D Q  x . The set R is a minimal strongly connected component of A. Indeed, for y any r 2 R there is a path q ! r . Since X is irreducible there is a word z such that

1008

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin

yzx 2 B.X /. Since q  yzx D q , r belongs to the same strongly connected component as q . Next, if p belongs to a minimal strongly connected component S of A, since X is irreducible, there is a word y such that p  yx is not empty. Thus q is in S , which implies S D R. Thus R is the only minimal strongly component of A.

Example 4.3. Let X be the even shift. The Krieger and Fischer automata of X are represented in Figure 12. The word a is synchronising.

b

0 a b

a

1

b 2

a

a

1

2 a

Figure 12. The Krieger and Fischer automata of X

Example 4.4. The Fischer automaton of the irreducible shift of Example 4.2 is the subautomaton on states 1; 2; 3; 4 represented with dark shaded states in Figure 11. Let X be an irreducible sofic shift X . The minimal strongly connected component of the Krieger automaton of X is called its Fischer automaton. Proposition 4.5. The Fischer automaton of an irreducible sofic shift X recognises X . Proof. The Fischer automaton F of X is a subautomaton of the Krieger automaton of X , which in turn is a subautomaton of the minimal automaton A of the set B.X /. Let i be the initial state of A. Since A is trim, there is a word w such that i  w is a state of F. Let v be any block of X . Since X is irreducible, there is a word u such that wuv is a block of X . This shows that v is a label of a path in F. Thus every block of X is a label of a path in F and conversely. In view of Proposition 3.2, the automaton F recognises X . Let A D .Q; E/ and B D .R; F / be two deterministic automata. A reduction from A onto B is a map h from Q onto R such that for any letter a 2 A, one has .p; a; q/ 2 E if and only if .h.p/; a; h.q// 2 F . Thus any labelled in or out-merge is a reduction. However, the converse is not true since a reduction is not, in general, a conjugacy. For any automaton A D .Q; E/, there is reduction from A onto a reduced automaton B. It is obtained by identifying the pairs of states p; q 2 Q such that Lp D Lq . The following statement is Corollary 3.3.20 of [21]. Proposition 4.6. Let X be an irreducible shift space. For any strongly connected deterministic automaton A recognising X there is a reduction from A onto the Fischer automaton of X .

27. Symbolic dynamics

1009

Proof. Let A D .Q; E/ be a strongly connected automaton recognising X . Let B D .R; F / be the reduced automaton obtained from A identifying the pairs p; q 2 Q such that Lp D Lq . By Proposition 4.1, B is synchronised. We now show that B can be identified with the Fischer automaton of X . Let w be a synchronising word for B. Set s D Q  w . Let r be a state such that r  w D s . and let y 2 A N be the label of a left-infinite path ending in the state s . For any state t in R, let u be a word such that s  u D t . The set CX .ywu/ depends only on the state t , and not on the word u such that s  u D t . Indeed, for each right infinite word z , one has ywuz in X if and only if there is a path labelled z starting at t . This holds because w is synchronising. Thus the map t 7! CX .ywu/ is well defined and defines a reduction from B onto the Fischer automaton of X . This statement shows that the Fischer automaton of an irreducible shift X is minimal, in the sense that it has the minimal number of states among all deterministic strongly connected automata recognising X . The statement also gives the following practical method to compute the Fischer automaton of an irreducible shift. We start with a strongly connected deterministic automaton recognizing X and merge the pairs of states p; q such that Lp D Lq . By the above result, the resulting automaton is the Fischer automaton of X . 4.2. Syntactic semigroup. Recall that a preorder on a set is a relation that is reflexive and transitive. The equivalence associated with a preorder is the equivalence relation defined by u  v if and only if u 6 v and v 6 u. Let S be a semigroup. A preorder on S is stable if s 6 s 0 implies us 6 us 0 and su 6 s 0 u for all s; s 0 ; u 2 S . An ordered semigroup S is a semigroup equipped with a stable preorder. Any semigroup can be considered as an ordered semigroup equipped with the equality order. A congruence in an ordered semigroup S is the equivalence associated with a stable preorder that is coarser than the preorder of S . The quotient of an ordered semigroup by a congruence is the ordered semigroup formed by the classes of the congruence. The set of contexts of a word u with respect to a set W  AC is the set €W .u/ of pairs of words defined by €W .u/ D ¹.`; r/ 2 A  A j `ur 2 W º. The preorder on AC defined by u 6W v if €W .u/  €W .v/ is stable and thus defines a congruence of the semigroup AC equipped with the equality order called the syntactic congruence. The syntactic semigroup of a set W  A is the quotient of the semigroup AC by the syntactic congruence. Let A D .Q; E/ be a deterministic automaton on the alphabet A. Recall that for p 2 Q and u 2 AC , there is at most one path  labelled u starting in p . We set p  u D q if q is the end of  and p  u D ; if  does not exist. The preorder defined on AC by u 6A v if p  u  p  v for all p 2 Q is stable. The quotient of AC by the congruence associated with this preorder is the transition semigroup of A. The following property is standard; see Chapter 1.

1010

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin

Proposition 4.7. The syntactic semigroup of a set W  AC is isomorphic to the transition semigroup of the minimal automaton of W . The syntactic semigroup of a shift space X is by definition the syntactic semigroup of B.X /. Proposition 4.8. Let X be a sofic shift and let S be its syntactic semigroup. The transition semigroup of the Krieger automaton of X is isomorphic to S . Moreover, if X is irreducible, then it is isomorphic to the transition semigroup of its Fischer automaton. Proof. Let A be the minimal automaton of B.X /, and let K be the Krieger automaton of X . We have to show that for any u; v 2 AC , one has u 6A v if and only if u 6K v . Since, by Proposition 4.3, K is isomorphic to a subautomaton of A, the direct implication is clear. Indeed, if p is a state of K, then Lp .K/ is equal to the set Lp .A/. Consequently, if u 6A v then u 6K v . Conversely, suppose that u 6K v . We prove that u 6B.X/ v . For this, let .`; r/ 2 €B.X/ .u/. Then `ur 2 B.X /. Then y  `urz 2 X for some y 2 A N and z 2 AN . But since CX .y`u/  CX .y`v/, this implies rz 2 CX .y`v/ and thus `vr 2 B.X /. Thus u 6B.X/ v which implies u 6A v . Next, suppose that X is irreducible. We have to show that u 6A v if and only if u 6F.X/ v . Since F.X / is a subautomaton of K.X / and K.X / is a subautomaton of A, the direct implication is clear. Conversely, assume that u 6F.X/ v . Suppose that `ur 2 B.X /. Let i be the initial state of A and let w be such that i  w is a state of F.X /. Since X is irreducible, there is a word s such that ws`ur 2 B.X /. But then i  ws`ur ¤ ; implies i  ws`vr ¤ ;. Thus `vr 2 B.X /. This shows that u 6B.X/ v and thus u 6A v .

5. Symbolic conjugacy This section is concerned with a new notion of conjugacy between automata called symbolic conjugacy. It extends the notion of labelled conjugacy and captures the fact that the automata may be over different alphabets. The table below summarises the various notions. object type shift spaces edge shifts integer matrices automata (same alphabet) automata alphabetic matrices

isomorphism conjugacy conjugacy strong shift equivalence labelled conjugacy symbolic conjugacy symbolic strong shift

elementary transformation split/merge edge split/merge elementary equivalence labelled split/merge split/merge elementary symbolic

There are two main results in this section. Theorem 5.7, due to Nasu, is a version of the classification theorem for sofic shifts. It implies, in particular, that conjugate sofic shifts have symbolic conjugate Krieger or Fisher automata.The proof uses the notion of bipartite automaton, which corresponds to the symbolic elementary equivalence of

27. Symbolic dynamics

1011

adjacency matrices. Theorem 5.8 is due to Hamachi and Nasu: it characterises symbolic conjugate automata by means of their adjacency matrices. In this section, we will use for convenience automata in which several edges with the same source and target can have the same label. Formally, such an automaton is a pair A D .G; / of a graph G D .Q; E/ and a map assigning to each edge e 2 E of a label .e/ 2 A. The adjacency matrix of A is the Q  Q-matrix M.A/ with elements in NhAi defined by .M.A/pq ; a/ D Card¹e 2 E j .e/ D aº:

(5)

Note that M.A/ is alphabetic but may have arbitrary nonnegative coefficients. The advantage of this version of automata is that for any alphabetic Q  Q-matrix M there is an automaton A such that M.A/ D M . We still let XA denote the edge shift XG and LA denote the set of labels of infinite paths in G . Symbolic conjugate automata. Let A; B be two automata. A symbolic conjugacy from A onto B is a pair .'; / of conjugacies 'W XA ! XB and W LA ! LB such that the following diagram is commutative: XA

! XB !

B

!

A

'

LA

! LB

5.1. Splitting and merging maps. Let A; B be two alphabets and let f W A ! B be a map from A onto B . Let X be a shift space on the alphabet A. We consider the set of words A0 D ¹f .a1 /a2 j a1 a2 2 B2 .X /º as a new alphabet. Let gW B2 .X / ! A0 be the 2-block substitution defined by g.a1 a2 / D f .a1 /a2 . The in-splitting map defined on X and relative to f or to g is the sliding block map 1;0 1;0 g1 corresponding to g. It is a conjugacy from X onto its image by X 0 D g1 .X / since 0 its inverse is 1-block. The shift space X , is called the in-splitting of X , relative to f or g. The inverse of an in-splitting map is called an in-merging map. In addition, any renaming of the alphabet of a shift space is also considered to be an in-splitting map (and an in-merging map). Example 5.1. Let A D B and let f be the identity on A. The out-splitting of a shift X relative to f is the second higher block shift of X . The following proposition relates splitting maps to edge splittings as defined in § 2.2. Proposition 5.1. An in-splitting map on an edge shift is an edge in-splitting map, and conversely. Proof. Let first G D .Q; E/ be a graph, and let f W E ! I be a map from E onto a set I . Set E0 D ¹f .e1 /e2 j e1 e2 2 B2 .XG /º. Let gW B2 .XG / ! E0 be the 2-block substitution defined by g.e1 e2 / D f .e1 /e2 . Let G 0 D .Q0 ; E0 / be the graph on the

1012

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin

set of states Q0 D I  Q defined for e 0 D f .e1 /e2 by i.e 0 / D .f .e1 /; i.e2 // and t.e 0 / D .f .e2 /; t.e2 //. Define hW E0 ! E and kW Q0 ! Q by h.f .e1 /e2 / D e2 for e1 e2 2 B2 .XG / and k.i; q/ D q for .i; q/ 2 I  Q. Then the pair .h; k/ is an in-merge 1;0 from G 0 onto G and h1 is the inverse of g1 . Indeed, one may verify that .h; k/ is a 0 graph morphism from G onto G . Next, it is an in-merge because for each p; q 2 Q, the partition .Epq .t// t 2k 1 .q/ of Epq is defined by Epq .i; q/ D Epq \ f 1 .i /. Conversely, set G D .Q; E/ and G 0 D .Q0 ; E0 /. Let .h; k/ be an in-merge from G 0 onto G . Consider the map f W E ! Q0 defined by f .e/ D r if r is the common end of the edges in h 1 .e/. The map ˛ from E0 to Q0  E defined by ˛.i / D .r; h.i // where r is the origin of i is a bijection by definition of an in-merge. Let us show that, up to the bijection ˛ , the in-splitting map relative to f is inverse of the map h1 . For e1 ; e2 2 E, let r D f .e1 / and e 0 D ˛ 1 .r; e2 /. Then h.e 0 / D e2 1;0 and thus h1 is the inverse of the map g1 corresponding to the 2-block substitution g.e1 e2 / D .r; e2 /. Symmetrically an out-splitting map is defined by the substitution g.ab/ D af .b/. Its inverse is an out-merging map. We use the term splitting to mean either an in-splitting or out-splitting. The same convention holds for a merging. The following result, from [23], is a generalisation of the decomposition theorem (Theorem 2.12) to arbitrary shift spaces.

Theorem 5.2. Any conjugacy between shift spaces is a composition of splitting and merging maps. The proof is similar to the proof of Theorem 2.12. It relies on the following lemma, similar to Lemma 2.13. Lemma 5.3. Let 'W X ! Y be a 1-block conjugacy whose inverse has memory m > 1 z Yz , respectively, and anticipation n > 0. There are in-splitting maps from X; Y to X; such that the 1-block conjugacy 'Q making the diagram below commutative has an inverse with memory m 1 and anticipation n: X

Y

'

!

!

'

! Xz ! Yz

Proof. Let A; B the alphabets of X and Y respectively. Let hW A ! B be the 1block substitution such that ' D h1 . Let Xz be the in-splitting of X relative to the map h. Set A0 D ¹h.a1 /a2 j a1 a2 2 B2 .X /º. Let Yz D Y Œ2 be the second higher Q A0 ! B 0 be the 1-block substitution block shift of Y and let B 0 D B2 .Y /. Let hW Q defined by h.h.a Q D hQ 1 has the required 1 /a2 / D h.a1 /h.a2 /. Then the 1-block map ' properties. Lemma 5.3 has a dual where ' is a 1-block map whose inverse has memory m > 0 and anticipation n > 1 and where in-splits are replaced by out-splits.

27. Symbolic dynamics

1013

Proof of Theorem 5.2. Let 'W X ! Y be a conjugacy from X onto Y . Replacing X by a higher block shift, we may assume that ' is a 1-block map. Using iteratively Lemma 5.3, we can replace ' by a 1-block map whose inverse has memory 0. Then using the dual of Lemma 5.3 iteratively, we finally obtain a 1-block map whose inverse is also 1-block and is thus just a renaming of the symbols. Symbolic strong shift equivalence. Let M and M 0 be two alphabetic Q Q-matrices over the alphabets A and B , respectively. We say that M and M 0 are similar if they are equal up to a bijection of A onto B . We write M $ M 0 when M and M 0 are similar. We say that two alphabetic square matrices M and M 0 over the alphabets A and B respectively are symbolic elementary equivalent if there exist two alphabetic matrices R; S over the alphabets C and D respectively such that M

! RS;

M0

! SR:

In this definition, the sets CD and DC of two-letter words are identified with alphabets in bijection with A and B , respectively. We say that two matrices M; M 0 are symbolic strong shift equivalent if there is a sequence .M0 ; M1 ; : : : ; Mn / of alphabetic matrices such that Mi and Mi C1 are symbolic elementary equivalent for 0 6 i < n with M0 D M and Mn D M 0 . We introduce the following notion. An automaton A on the alphabet A is bipartite if there are partitions Q D Q1 [Q2 of the set of states and A D A1 [A2 of the alphabet such that all edges labelled in A1 go from Q1 to Q2 and all edges labelled in A2 go from Q2 to Q1 . Let A be a bipartite automaton. Its adjacency matrix has the form   0 M1 M.A/ D ; M2 0 where M1 is a Q1  Q2 -matrix with elements in NhA1 i and M2 is a Q2  Q1 matrix with elements in NhA2 i The automata A1 and A2 which have M1 M2 and M2 M1 , respectively, as adjacency matrices are called the components of A, and the pair A1 ; A2 is a decomposition of A. We let A D .A1 ; A2 / denote a bipartite automaton A with components A1 ; A2 . Note that A1 ; A2 are automata on the alphabets A1 A2 and A2 A1 , respectively.

Proposition 5.4. Let A D .Q; E/ be a bipartite deterministic essential automaton. Its components A1 ; A2 are deterministic essential automata which are symbolic conjugate. If moreover A is strongly connected (resp. reduced/synchronised), then A1 ; A2 are strongly connected (resp. reduced/synchronised). Proof. Let Q D Q1 [ Q2 and A D A1 [ A2 be the partitions of the set Q and the alphabet A corresponding to the decomposition A D .A1 ; A2 /. It is clear that A1 ; A2 are deterministic and they are strongly connected if A is strongly connected. Let 'W XA1 ! XA2 be the conjugacy defined as follows. For any y D .yn /n2Z in XA1 there is an x D .xn /n2Z in XA such that yn D x2n x2nC1 . Then z D .zn /n2Z with zn D x2nC1 x2n is an element of XA2 . We define '.y/ D z . The analogous map W LA1 ! LA2 is such that .'; / is a symbolic conjugacy from A1 onto A2 .

1014

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin

Assume that A is reduced. For p; q 2 Q1 , there is a word w such that w 2 Lp .A/ and w … Lq .A/ (or conversely). Set w D a1 a2    an with ai 2 A. If n is even, then .a1 a2 /    .an 1 an / is in Lp .A1 / but not in Lq .A1 /. Otherwise, since A is essential, there is a letter anC1 such that wanC1 is in Lp .A/. Then .a1 a2 /    .an anC1 / is in Lp .A1 / but not in Lq .A1 /. Thus A1 is reduced. One proves in the same way that A2 is reduced. Finally, suppose that A is synchronised. Let x be a synchronising word and set x D a1 a2    an with ai 2 A. Suppose that all paths labelled x end in q 2 Q1 . Let anC1 be a letter such that q  anC1 ¤ ; and let a0 be a letter such that a0 x is the label of at least one path. If n is even, then .a1 a2 /    .an 1 an / is synchronising for A1 and .a0 a1 /    .an anC1 / is synchronising for A2 . Otherwise, .a0 a1 /    .an 1 an / is synchronising for A1 and .a1 a2 /    .an anC1 / is synchronising for A2 . Proposition 5.5. Let A; B be two automata such that M.A/ and M.B/ are symbolic elementary equivalent. Then there is a bipartite automaton C D .C1 ; C2 / such that M.C1 /; M.C2 / are similar to M.A/; M.B/ respectively. Proof. Let R; S be alphabetic matrices over alphabets C and D , respectively, such that M.A/ $ RS and M.B/ $ SR. Let C be the bipartite automaton on the alphabet C [D defined by the adjacency matrix   0 R M.C/ D : S 0 Then M.A/ is similar to M.C1 / and M.B/ is similar to M.C2 /. Proposition 5.6. If the adjacency matrices of two automata are symbolic strong shift equivalent, the automata are symbolic conjugate. Proof. Since a composition of conjugacies is a conjugacy, it is enough to consider the case where the adjacency matrices are symbolic elementary equivalent. Let A; B be such that M.A/; M.B/ are symbolic elementary equivalent. By Proposition 5.5, there is a bipartite automaton C D .C1 ; C2 / such that M.C1 /; M.C2 / are similar to M.A/ and M.B/ respectively. By Proposition 5.4, the automata C1 ; C2 are symbolic conjugate. Since automata with similar adjacency matrices are obviously symbolic conjugate, the result follows. Example 5.2. Let A; B be the automata represented in Figure 13. The matrices M.A/ and M.B/ are symbolic elementary equivalent. Indeed, we have M.A/ $ RS and M.B/ $ SR for     x y z t RD ; SD : 0 x t 0 Indeed, one has

RS D



xz C yt xt

 xt ; 0

SR D

 zx tx

 zy C tx : ty

27. Symbolic dynamics

1015

Thus the following tables give two bijections between the alphabets: a xz

b yt

c ; xt

d zx

e zy

f tx

c a; b

1

e; f 2

c

g : ty

d

1

2

g

f

Figure 13. Two symbolic conjugate automata

The following result is due to Nasu [23]. The equivalence between conditions (i) and (ii) is a version, for sofic shifts, of the classification theorem (Theorem 7.2.12 in [21]). The equivalence between conditions (i) and (iii) is due to Krieger [20]. Theorem 5.7. Let X; X 0 be two sofic shifts (resp. irreducible sofic shifts) and let A; A0 be their Krieger (resp. Fischer) automata. The following conditions are equivalent:

i. X; X 0 are conjugate; ii. the adjacency matrices of A; A0 are symbolic strong shift equivalent; iii. A; A0 are symbolic conjugate. Proof. We prove the result for irreducible shifts. The proof of the general case is in [23]. Assume that X; X 0 are conjugate. By the decomposition theorem (Theorem 5.2), it is enough to consider the case where X 0 is an in-splitting of X . Let f W A ! B be a map and let A0 D ¹f .a1 /a2 j a1 a2 2 B2 .X /º in such a way that X 0 is the in-splitting of X relative to f . Let C D A [ B and let Z be the shift space composed of all biinfinite sequences    ai f .ai /ai C1 f .ai C1 /    such that    ai ai C1    is in X . Then Z is an irreducible sofic shift. Let A be the Fischer automaton of Z . Then A is bipartite and its components recognise, up to a bijection of the alphabets, X and X 0 respectively. By Proposition 5.4 the components are the Fischer automata of X and X 0 respectively. Since the components of a bipartite automaton have symbolic elementary equivalent adjacency matrices, this proves that (i) implies (ii). That (ii) implies (iii) is Proposition 5.6. Finally, (iii) implies (i) by definition of symbolic conjugacy. 5.2. Symbolic conjugate automata. The following result is due to Hamachi and Nasu [18]. It shows that, in Theorem 5.7, the equivalence between conditions (ii) and (iii) holds for automata that are not reduced. Theorem 5.8. Two essential automata are symbolic conjugate if and only if their adjacency matrices are symbolic strong shift equivalent. The first element of the proof is a version of the decomposition theorem for automata.

1016

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin

Let A; A0 be two automata. An in-split from A onto A0 is a symbolic conjugacy .'; / such that 'W XA ! XA0 and W LA ! LA0 are in-splitting maps. A similar definition holds for out-splits. Theorem 5.9. Any symbolic conjugacy between automata is a composition of splits and merges. The proof relies on the following variant of Lemma 5.3. Lemma 5.10. Let ˛; ˇ be 1-block maps and '; be 1-block conjugacies such that the diagram below on the left is commutative. If the inverses of '; have memory m > 1 and anticipation n > 0, there z Y; z Z; z Tz of X; Y; Z; T respectively and 1-block maps ˛W z, exist in-splits X; Q Xz ! Z Q z z z ˇW Y ! T such that the 1-block conjugacies '; Q making the diagram below on the right commutative have inverses with memory m 1 and anticipation n: '

X

X

˛

! T

ˇQ

˛Q

!

z Z

! Yz

ˇ

Q

! Tz ! ! T

!

! Z

'Q

!

ˇ

Xz

!

Z

!

! Y !

!

˛

'

! Y !

Proof. Let A; B; C; D be the alphabets of X; Y; Z and T respectively. Let hW A ! B and kW C ! D be the 1-block substitutions such that ' D h1 and D k1 . Set z) Az D ¹h.a1 /a2 j a1 a2 2 B2 .X /º and Cz D ¹k.c1 /c2 j c1 c2 2 B2 .Z/º. Let Xz (resp. Z be the image of X (resp. of Z ) under the in-splitting map relative to h (resp. k ). Set z D B2 .T /. Define ˛Q and ˇQ by Yz D Y Œ2 , Bz D B2 .Y /, Tz D T Œ2 and D ˛.h.a Q 1 /a2 / D k˛.a1 /˛.a2 /;

Q Az ! Bz , kW Q Cz ! D z by and hW

Q h.h.a 1 /a2 / D h.a1 /h.a2 /;

Q 1 b2 / D ˇ.b1 /ˇ.b2 / ˇ.b

Q k.k.c 1 /c2 / D k.c1 /k.c2 /

Then the 1-block conjugacies 'Q D hQ 1 and z D kQ1 satisfy the conditions of the statement. Proof of Theorem 5.9. Let A D .G; / and A0 D .G 0 ; 0 / be two automata with G D .Q; E/ and G 0 D .Q0 ; E0 /. Let .'; / be a symbolic conjugacy from A onto A0 . Replacing A and B by some extension AŒm;n and BŒm;n , we may reduce to the case where '; are 1-block conjugacies. By repeatedly using Lemma 5.10, we may reduce to the case where the inverses of '; have memory 0. Repeatedly using the dual

27. Symbolic dynamics

1017

version of Lemma 5.10, we are reduced to the case where '; alphabets.

are renaming of the

The second step for the proof of Theorem 5.8 is the following statement. Proposition 5.11. Let A; A0 be two essential automata. If A0 is an in-split of A, the matrices M.A/ and M.A0 / are symbolic elementary equivalent. Proof. Set A D .G; / and A0 D .G 0 ; 0 /. Let A0 D ¹f .a/b j ab 2 B2 .LA /º be the alphabet of A0 for a map f W A ! B . By Proposition 5.1, the symbolic in-splitting map from XG onto XG 0 is also an in-splitting map. Thus there is an in-merge .h; k/ from G 0 onto G such that the in-split from A onto A0 has the form .h11 ; /. We define an alphabetic Q0  Q-matrix R and a Q  Q0 -matrix S as follows. Let r; t 2 Q0 and let p D k.r/, q D k.t/. Let e be an edge of A0 ending in r , and set a D .h.e//). Then the label of any edge going out of r is of the form f .a/b for some b 2 A. Thus f .a/ does not depend on e but only on r . We define a map W Q0 ! B by .r/ D f .a/. Then, we set ´ .r/ if k.r/ D p , Rrp D Spt D M.A/pq : 0 otherwise, Let us verify that M.A0 / D RS and M.A/ $ SR. We first have for r; t 2 Q0 X .RS /rt D Rrp Spt D .r/Mk.r/k.q/ D M.A0 /rt ; p2Q

and thus RS D M.A0 /. Next, for p; q 2 Q we have X X X .SR/pq D Rrp Spt D M.A/pq .t/ D .M.A/pq ; a/af .a/; p2Q

t 2k

1 .q/

a2A

and thus SR $ M.A/ using the bijection a ! af .a/ between A and AB . Proof of Theorem 5.8. The condition is sufficient by Proposition 5.6. Conversely, let A; A0 be two symbolic conjugate essential automata. By Theorem 5.9, we may assume that A0 is a split of A. We assume that A0 is an in-split of A. By Proposition 5.11, the adjacency matrices of A and A0 are symbolic elementary equivalent.

6. Special families of automata In this section, we consider two particular families of automata: local automata and automata with finite delay. Local automata are closely related to shifts of finite type. The main result is an embedding theorem (Theorem 6.4) related to Nasu’s masking lemma (Proposition 6.5). Automata with finite left and right delay are related to a class of shifts called shifts of almost finite type (Proposition 6.10).

1018

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin

6.1. Local automata. Let m; n > 0. An automaton A D .Q; E/ is .m; n/-local if u v u v whenever p ! q ! r and p 0 ! q 0 ! r 0 are two paths with juj D m and jvj D n, then q D q 0 . It is local if it is .m; n/-local for some m; n.

Example 6.1. The automaton represented in Figure 14 is .3; 0/-local. Indeed, a simple inspection shows that each of the six words of length 3 which are labels of paths uniquely determines its terminal vertex. It is also .0; 3/-local. It is not .2; 0/-local (check the word ab ), but it is .2; 1/-local and also .1; 2/-local. b 2

3

a; b

a 1

Figure 14. A local automaton

We say that an automaton A D .Q; E/ is contained in an automaton A0 D .Q0 ; E 0 / if Q  Q0 and E  E 0 . We note that if A is contained in A0 and if A0 is local, then A is local. Proposition 6.1. An essential automaton A is local if and only if A W XA ! LA is a conjugacy from XA onto LA . Proof. First, suppose that A is .m; n/-local. Consider an m C 1 C n-block w D uav of LA , with juj D m, jvj D n. All finite paths of A labelled w have the form u a v a r ! p ! q ! s and share the same edge p ! q . This shows that A is injective and that A1 is a map with memory m and anticipation n. Conversely, assume that A1 exists, and that it has memory m and anticipation n. We show that A is .m C 1; n/-local. Let u

a

v

r !p !q !s

u

a

v

and r 0 ! p 0 ! q 0 ! s 0

be two paths of length m C 1 C n, with juj D m, jvj D n and a a letter. Since A is essential, there exist two bi-infinite paths which contain these finite paths, respectively. Since A1 has memory m and anticipation n, the blocks uav of the bi-infinite words a

a

carried by these paths are mapped by A1 onto the edges p ! q and p 0 ! q 0 respectively. This shows that p D p 0 and q D q 0 . The next statement is Proposition 10.3.10 in [4].

Proposition 6.2. The following conditions are equivalent for a strongly connected finite automaton A: i. A is local; ii. distinct cycles have distinct labels.

27. Symbolic dynamics

1019

Two cycles in this statement are considered to be distinct if, viewed as paths, they are distinct. The following result shows the strong connection between shifts of finite type and local automata. It gives an effective method to verify whether or not a shift space is of finite type. Proposition 6.3. A shift space (resp. an irreducible shift space) is of finite type if and only if its Krieger automaton (resp. its Fischer automaton) is local. Proof. Let X D X .W / for a finite set W  A . We may assume that all words of W have the same length n. Let A D .Q; i; Q/ be the .n; 0/-local deterministic automaton defined as follows. The set of states is Q D An n W and there is an edge .u; a; v/ for every u; v 2 Q and a 2 A such that ua 2 Av . Then A recognises the set B.X /. Since the reduction of a local automaton is local, the minimal automaton of B.X / is local. Since the Krieger automaton of X is contained in the minimal automaton of B.X /, it is local. If X is irreducible, then its Fischer automaton is also local since it is contained in the Krieger automaton. Conversely, Proposition 6.1 implies that a shift space recognised by a local automaton is conjugate to a shift of finite type and thus is of finite type. Example 6.2. Let X be the shift of finite type on the alphabet A D ¹a; bº defined by the forbidden factor ba. The Krieger automaton of X is represented in Figure 15. It is .1; 0/-local. b

a

1

b

2

Figure 15. The Krieger automaton of a reducible shift of finite type

For m; n > 0, the standard .m; n/-local automaton is the automaton with states the set of words of length m C n and edges the triples .uv; a; u0 v 0 / for u; u0 2 Am , a 2 A and v; v 0 2 An such that for some letters b; c 2 A, one has uvc D bu0 v 0 and a is the first letter of vc . The standard .m; 0/-local automaton is also called the De Bruijn automaton of order m. Example 6.3. The standard .1; 1/-local automaton on the alphabet ¹a; bº is represented in Figure 16. Complete automata. An automaton A on the alphabet A is called complete if any word over A is the label of some path in A. As an example, the standard .m; n/-local automaton is complete. The following result is from [3].

1020

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin ab

a a

aa

a a

b b

ba

bb

b

b

Figure 16. The standard .1; 1/-local automaton

Theorem 6.4. Every local automaton is contained in a complete local automaton. The proof relies on the following version of the masking lemma. Proposition 6.5 (masking lemma). Let A and B be two automata and assume that M.A/ and M.B/ are elementary equivalent. If B is contained in an automaton B0 , then A is contained in some automaton A0 which is conjugate to B0 . Proof. Let A D .Q; E/, B D .R; F / and B0 D .R0 ; F 0 /. Let D be an R  Q nonnegative integral matrix and N be an alphabetic QR matrix such that M.A/ D ND and M.B/ D DN . Set Q0 D Q [ .F 0 n F /. Let D 0 be the R0  Q0 nonnegative integral matrix defined for r 2 R0 and u 2 Q0 by 8 ˆ 0. The following statement is Proposition 5.1.11 in [21].

Proposition 6.7. An automaton has finite right delay if and only if it is conjugate to a deterministic automaton. In the same way the automaton has left delay d > 0 if for any pair of paths a z a p ! q ! r and p 0 ! q 0 ! r with a 2 A, if jzj D d , then q D q 0 . z

Corollary 6.8. If two automata are conjugate, and if one has finite right (left) delay, then the other also has. Proposition 6.9. An essential .m; n/-local automaton has right delay n and left delay m. a

z

a

z

Proof. Let p ! q ! r and p ! q 0 ! r 0 be two paths with a 2 A and jzj D n. Since y A is essential there is a path u ! p of length m in A. Since A is .m; n/-local, we have q D q 0 . Thus A has right delay n. The proof for the left delay m is symmetrical.

A shift space has almost finite type if it can be recognised by a strongly connected automaton with both finite left and finite right delay. An irreducible shift of finite type is also of almost finite type, since a local automaton has finite right and left delay by Proposition 6.9.

Example 6.6. The even shift has almost finite type. Indeed, the automaton of Figure 5 on the right has right and left delay 0. The following result is from [22]. Proposition 6.10. An irreducible shift space is of almost finite type if and only if its Fischer automaton has finite left delay. Proof. The condition is obviously sufficient. Conversely, let X be a shift of almost finite type. Assume the Fischer automaton A D .Q; E/ of X does not have finite left delay. In view of Proposition 6.6, let u; v 2 A and p; q; q 0 2 Q with q ¤ q 0 be such

27. Symbolic dynamics

1023

that q  u D q , q 0  u D q 0 and p D q  v D q 0  v . Since A is strongly connected, there is a word w such that p  w D q . Let B D .R; F / be an automaton with finite right and left delay that recognises X . By Proposition 6.7, we may assume that B is deterministic. Let 'W R ! Q be a reduction from B onto A. Since R is finite, there is an x 2 uC such that r  x D r  x 2 for all r 2 R (this means that the map r 7! r  x is idempotent; such a word exists since each element in the finite transition semigroup of the automaton B has a power which is an idempotent). Set S D R  x;

T D'

1

.q/ \ S;

T0 D '

1

.q 0 / \ S:

Since q ¤ q 0 , we have T \ T 0 D ;. For any t 2 T , we have '.t  vw/ D q and thus t  vwx 2 T . For t; t 0 2 T with t ¤ t 0 , we cannot have t  vwx D t 0  vwx , since otherwise B would have infinite left delay. Thus the map t 7! t  vwx is a bijection of T . Let t 0 2 T 0 . Since '.t 0  vw/ D q , we have t 0  vwx 2 T . Since the action of vwx induces a permutation on T , there exists t 2 T such that t  vwx D t 0  vwx . This contradicts the fact that B has finite left delay. Example 6.7. The deterministic automaton represented in Figure 20 has infinite left b

a

b

a

delay. Indeed, there are paths    1 ! 1 ! 1 and    2 ! 2 ! 1. Since this automaton cannot be reduced, X D LA is not of almost finite type. a; b

c 1

2

b

a

Figure 20. An automaton with infinite left delay

7. Syntactic invariants In this section, we introduce the syntactic graph of an automaton. It uses the Green relations in the transition semigroup of the automaton. We show that the syntactic graph is an invariant for symbolic conjugacy (Theorem 7.4). The proof uses bipartite automata. The final subsection considers the characterisation of sofic shifts with respect to the families of ordered semigroups known as pseudovarieties. 7.1. The syntactic graph. Let A D .Q; E/ be a deterministic automaton over the alphabet A. Each word w 2 A defines a partial map denoted by 'A .w/ from Q to Q that maps p 2 Q to q 2 Q if p  w D q . The transition semigroup of A, already defined in § 4.2, is the image of AC by the morphism 'A (in this subsection, we will not use the order on the transition semigroup).

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin

1024

We give a short summary of Green relations in a semigroup (see [19], for example). Let S be a semigroup and let S 1 D S [ 1 be the monoid obtained by adding an identity to S . Two elements s; t of S are R-equivalent if sS 1 D tS 1 . They are L-equivalent if S 1 s D S 1 t . It is a classical result (see [19]) that LR D RL. Thus LR D RL is an equivalence on the semigroup S called the D-equivalence. A class of the R; L or D-equivalence is called an R; L or D-class. An idempotent of S is an element e such that e 2 D e . A D-class is regular if it contains an idempotent. The equivalence H is defined as H D R \ L. It is classical result that the H-class of an idempotent is a group. The H-class of idempotents in the same D-class are isomorphic groups. The structure group of a regular D-class is any of the H-classes of an idempotent of the D-class. When S is a semigroup of partial maps from a set Q into itself, each element of S has a rank which is the cardinality of its image. The elements of a D-class all have the same rank, which is called the rank of the D-class. There is at most one element of rank 0 which is the zero of the semigroup S and is denoted 0. A fixpoint of a partial map s from Q into itself is an element q such that the image of q by s is q . The rank of an idempotent is equal to the number of its fixpoints. Indeed, in this case, every element in the image is a fixpoint. The preorder 6J on S is defined by s 6J t if S 1 sS 1  S 1 tS 1 . Two elements s; t 2 S are J-equivalent if S 1 sS 1 D S 1 tS 1 . One has D  J, and it is a classical result that in a finite semigroup we have D D J. The preorder 6J induces a partial order on the D-classes, still denoted 6J . We associate with A a labelled graph G.A/ called its syntactic graph. The vertices of G.A/ are the regular D-classes of the transition semigroup of A. Each vertex is labelled by the rank of the D-class and its structure group. There is an edge from the vertex associated with a D-class D to the vertex associated with a D-class D 0 if and only if D >J D 0 . Example 7.1. The automaton A of Figure 21 on the left is the Fischer automaton of the even shift (Example 4.3). The semigroup of transitions of A has 3 regular D-classes of ranks 2 (containing 'A .b/), 1 (containing 'A .a/), and 0 (containing 'A .aba/). Its syntactic graph is represented on the right.

b a

1

2

2; Z=2Z

1; Z=Z

0; Z=Z

b

Figure 21. The syntactic graph of the even shift

The following result shows that one may reduce to the case of essential automata. Proposition 7.1. The syntactic graphs of an automaton and of its essential part are isomorphic.

27. Symbolic dynamics

1025

Proof. Let A D .Q; E/ be a deterministic automaton on the alphabet A and let A0 D .Q0 ; E 0 / be its essential part. Let w 2 AC be such that e D 'A .w/ is an idempotent. Then any fixpoint of e is in Q0 and thus e 0 D 'A0 .w/ an idempotent of the same rank as e . This shows that G.A/ and G.A0 / are isomorphic. The following result shows that the syntactic graph characterises irreducible shifts of finite type. Proposition 7.2. A sofic shift (resp. an irreducible sofic shift) is of finite type if and only if the syntactic graph of its Krieger automaton (resp. its Fischer automaton) has nodes of rank at most 1. In the proof, we use the following classical property of finite semigroups. Proposition 7.3. Let S be a finite semigroup and let J be an ideal of S . The following conditions are equivalent:

i. all idempotents of S are in J ; ii. there exists an integer n > 1 such that S n  J . Proof. Assume that (i) holds. Let n D Card.S / C 1 and let s D s1 s2    sn with si 2 S . Then there exist i; j with 1 6 i < j 6 n such that s1 s2    si D s1 s2    si    sj . Let t; u 2 S 1 be defined by t D s1    si and u D si C1    sj . Since tu D t , we have tuk D t for all k > 1. Since S is finite, there is a k > 1 such that uk is idempotent and thus uk 2 J . This implies that t 2 J and thus s 2 J . Thus (ii) holds. It is clear that (ii) implies (i). Proof of Proposition 7.2. Let X be a shift space (resp. an irreducible shift space), let A be its Krieger automaton (resp. its Fischer automaton), and let S be the transition semigroup of A. If X is of finite type, by Proposition 6.3, the automaton A is local. Any idempotent in S has rank 1 and thus the condition is satisfied. Conversely, assume that the graph G.A/ has nodes of rank at most 1. Let J be the ideal of S formed of the elements of rank at most 1. Since all idempotents of S belong to J , by Proposition 7.3, the semigroup S satisfies S n D J for some n > 1. This shows that for any sufficiently long word x , the map 'A .x/ has rank at most 1. Thus for p; q; r; s 2 Q, if p  x D r and q  x D s then r D s . This implies that A is .n; 0/-local. The following result is from [2]. Theorem 7.4. Two symbolic conjugate automata have isomorphic syntactic graphs. We use the following intermediary result. Proposition 7.5. Let A D .A1 ; A2 / be a bipartite automaton. The syntactic graphs of A; A1 and A2 are isomorphic.

1026

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin

Proof. Let Q D Q1 [ Q2 and A D A1 [ A2 be the partitions of the set of states and of the alphabet of A corresponding to the decomposition .A1 ; A2 /. Set B1 D A1 A2 and B2 D A2 A1 . The semigroups S1 D 'A1 .B1C / and S2 D 'A2 .B2C / are included in the semigroup S D 'A .AC /. Thus the Green relations of S are refinements of the corresponding Green relations in S1 or in S2 . Any idempotent e of S belongs either to S1 or to S2 . Indeed, if e D 0 then e is in S1 \ S2 . Otherwise, it has at least one fixpoint p 2 Q1 [ Q2 . If p 2 Q1 , then e is in 'A .B1C / and thus e 2 S1 . Similarly if p 2 Q2 then e 2 S2 . Let e be an idempotent in S1 and let e D 'A .u/. Since u 2 B1C , we have u D au0 with a 2 A1 and u0 2 B2 A2 . Let v D u0 a. Then f D 'A .v/2 is idempotent. Indeed, we have 'A .v 3 / D 'A .u0 au0 au0 a/ D 'A .u0 uua/ D 'A .u0 ua/ D 'A .v 2 /:

Moreover e; f belong the same D-class. Similarly, if e 2 S2 , there is an idempotent in S1 which is D equivalent to e . This shows that a regular D-class of 'A .AC / contains idempotents in S1 and in S2 . Finally, two elements of S1 which are D-equivalent in S are D-equivalent in S1 . Indeed, let s; t 2 S1 be such that sRLt . Let u; u0 ; v; v 0 2 S be such that suu0 D s;

v 0 vt D t;

su D tv

in such a way that sRsu and vtLt . Then su D vt implies that u; v are both in S1 . Similarly suu0 D s and v 0 vt D t imply that u0 v 0 2 S1 . Thus sDt in S1 . This shows that a regular D class D of S contains exactly one D-class D1 of S1 (resp. D2 of S2 ). Moreover, an H-class of D1 is also an H-class of D . Thus the three syntactic graphs are isomorphic. Proof of Theorem 7.4. Let A D .Q; E/ and B D .R; F / be two symbolic conjugate automata on the alphabets A and B , respectively. By the decomposition theorem (Theorem 5.9), we may assume that the symbolic conjugacy is a split or a merge. Assume that A0 is an in-split of A. By Proposition 7.1, we may assume that A and A0 are essential. By Proposition 5.11, the adjacency matrices of A and A0 are symbolic elementary equivalent. By Proposition 5.5, there is a bipartite automaton C D .C1 ; C2 / such that M.C1 / and M.C2 / are similar to M.A/ and M.B/ respectively. By Proposition 7.5, the syntactic graphs of C1 ; C2 are isomorphic. Since automata with similar adjacency matrices have obviously isomorphic syntactic graphs, the result follows. A refinement of the syntactic graph which is also invariant by flow equivalence has been introduced in [11]. The vertices of the graph are the idempotent-bound D classes, where an element s of a semigroup S is called idempotent-bound if there exist idempotents e; f 2 S such that s D esf . The elements of a regular D-class are idempotent-bound. Flow equivalent automata. Let A be an automaton on the alphabet A and let G be its underlying graph. An expansion of A is a pair .'; / of a graph expansion of G and a

27. Symbolic dynamics

1027

symbol expansion of LA such that the diagram below is commutative: '

XA

!

B

!

A

! XB

LA

! LB

The inverse of an automaton expansion is called a contraction. Example 7.2. Let A and B be the automata represented in Figure 22. The second automaton is an expansion of the first one.

a 1

2 a

b

!

5

a

a

4

!

3

6

b

Figure 22. An automaton expansion

The flow equivalence of automata is the equivalence generated by symbolic conjugacies, expansions and contractions. Theorem 7.4 has been generalised by Costa and Steinberg [14] to flow equivalence. Theorem 7.6. Two flow equivalent automata have isomorphic syntactic graphs. Example 7.3. The syntactic graphs of the automata A, B of Example 5.2 are isomorphic to the syntactic graph of the Fischer automaton C of the even shift. Note that the automata A; B are not flow equivalent to C . Indeed, the edge shifts XA , XB on the underlying graphs of the automata A, B are flow equivalent to the full shift on 3 symbols while the edge shift XC is flow equivalent to the full shift on 2 symbols. Thus the converse of Theorem 7.6 is false. 7.2. Pseudovarieties. In this subsection, we will see how one can formulate characterisations of some classes of sofic shifts by means of properties of their syntactic semigroup. In order to formulate these syntactic characterisations of sofic shifts, we introduce the notion of pseudovariety of ordered semigroups. For a systematic exposition, see the original articles [27] and [29], or the surveys in [28] or [26]. A morphism of ordered semigroups ' from S into T is an order-compatible semigroup morphism, such that s 6 s 0 implies '.s/ 6 '.s 0 /. An ordered subsemigroup of S is a subsemigroup equipped with the restriction of the preorder. A pseudovariety of finite ordered semigroups is a class of ordered semigroups closed under taking ordered subsemigroups, finite direct products and image under morphisms of ordered semigroups. Let V be a pseudovariety of ordered semigroups. We say that a semigroup S is locally in V if all the submonoids of S are in V . The class of these semigroups is a pseudovariety of ordered semigroups.

1028

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin

The following result is due to Costa [12]. Theorem 7.7. Let V be a pseudovariety of finite ordered semigroups containing the class of commutative ordered monoids such that every element is idempotent and greater than the identity. The class of shifts whose syntactic semigroup is locally in V is invariant under conjugacy. The following statements give examples of pseudovarieties satisfying the above condition. Proposition 7.8. An irreducible shift space is of finite type if and only if its syntactic semigroup is locally commutative. An inverse semigroup is a semigroup which can be represented as a semigroup of partial one-to-one maps from a finite set Q into itself. The family of inverse semigroups does not form a variety (it is not closed under homomorphic image. However, according to Ash’s theorem [1], the variety generated by inverse semigroups is characterised by the property that the idempotents commute. Using this result, the following result is proved in [12]. Theorem 7.9. An irreducible shift space is of almost finite type if and only if its syntactic semigroup is locally in the pseudovariety generated by inverse semigroups. The fact that shifts of almost finite type satisfy this condition was proved in [2]. The converse was conjectured in the same paper. In [14] it is shown that this result implies that the class of shifts of almost finite type is invariant under flow equivalence. This is originally from [17]. 7.3. Flow equivalence of sofic shifts. By contrast to Theorem 2.16, determining whether two irreducible sofic shifts are flow equivalent is very difficult, even in the AFT case, since this problem in a sense contains the covariant flow equivalence problem for reducible shifts of finite type (cf. [9] and [10]). Under added hypotheses, however, some classification results may be obtained [8].

References [1] C. J. Ash, Finite semigroups with commuting idempotents. J. Austral. Math. Soc. Ser. A 43 (1987), no. 1, 81–90. MR 0886805 Zbl 0634.20032 q.v. 1028 [2] M.-P. Béal, F. Fiorenzi, and D. Perrin, The syntactic graph of a sofic shift is invariant under shift equivalence. Internat. J. Algebra Comput. 16 (2006), no. 3, 443–460. MR 2241616 Zbl 1098.68062 q.v. 1025, 1028 [3] M.-P. Béal, S. Lombardy, and D. Perrin, Embeddings of local automata. Illinois J. Math. 54 (2010), no. 1, 155–174. MR 2776990 Zbl 1257.68095 q.v. 1019 [4] J. Berstel, D. Perrin, and C. Reutenauer, Codes and automata. Encyclopedia of Mathematics and its Applications, 129. Cambridge University Press, Cambridge, 2010. MR 2567477 Zbl 1187.94001 q.v. 1018

27. Symbolic dynamics

1029

[5] R. Bowen and J. Franks, Homology for zero-dimensional nonwandering sets. Ann. of Math. (2) 106 (1977), no. 1, 73–92. MR 0458492 Zbl 0375.58018 q.v. 997 [6] M. Boyle, Flow equivalence of shifts of finite type via positive factorizations. Pacific J. Math. 204 (2002), no. 2, 273–317. MR 1907894 Zbl 1056.37008 q.v. 997 [7] M. Boyle, Open problems in symbolic dynamics. In Geometric and probabilistic structures in dynamics (K. Burns, D. Dolgopyat, and Y. Pesin, eds.). Papers from the Workshop on Dynamical Systems and Related Topics in honor of Michael Brin on the occasion of his 60 th birthday held at the University of Maryland, College Park, MD, March 15–18, 2008. Contemporary Mathematics, 469. American Mathematical Society, Providence, R.I., 2008, 69–118. MR 2478466 Zbl 1158.37007 q.v. 997 [8] M. Boyle, T. M. Carlsen, and S. Eilers, Flow equivalence of sofic shifts. Israel J. Math. 225 (2018), no. 1, 111–146. MR 3805644 Zbl 1400.37020 q.v. 1028 [9] M. Boyle and D. Huang, Poset block equivalence of integral matrices. Trans. Amer. Math. Soc. 355 (2003), no. 10, 3861–3886. MR 1990568 Zbl 1028.15006 q.v. 997, 1028 [10] M. Boyle and M. C. Sullivan, Equivariant flow equivalence for shifts of finite type, by matrix equivalence over group rings. Proc. London Math. Soc. (3) 91 (2005), no. 1, 184–214. MR 2149534 Zbl 1114.37010 q.v. 1028 [11] A. Costa, Conjugacy invariants of subshifts: an approach from profinite semigroup theory. Internat. J. Algebra Comput. 16 (2006), no. 4, 629–655. MR 2258833 Zbl 1121.37013 q.v. 1026 [12] A. Costa, Pseudovarieties defining classes of sofic subshifts closed under taking equivalent subshifts. J. Pure Appl. Algebra 209 (2007), no. 2, 517–530. MR 2293324 Zbl 1130.20043 q.v. 1028 [13] A. Costa, Semigroupos profinitos e dinâmica simbólica. Ph.D. thesis. Universidade do Porto, Porto, 2007. q.v. 1006 [14] A. Costa and B. Steinberg, A categorical invariant of flow equivalence of shifts. Ergodic Theory Dynam. Systems 36 (2016), no. 2, 470–513. MR 3503033 Zbl 1355.37021 q.v. 1027, 1028 [15] R. Fischer, Sofic systems and graphs. Monatsh. Math. 80 (1975), no. 3, 179–186. MR 0407235 Zbl 0314.54043 q.v. 1007 [16] J. Franks, Flow equivalence of subshifts of finite type. Ergodic Theory Dynam. Systems 4 (1984), no. 1, 53–66. MR 0758893 Zbl 0555.54026 q.v. 997 [17] M. Fujiwara and M. Osikawa, Sofic systems and flow equivalence. Math. Rep. Kyushu Univ. 16 (1987), no. 1, 17–27. MR 0953693 Zbl 0649.54021 q.v. 1028 [18] T. Hamachi and M. Nasu, Topological conjugacy for 1-block factor maps of subshifts and sofic covers. In Dynamical systems (J. C. Alexander, ed.). Proceedings of the Special Year on Ergodic Theory and Dynamics held at the University of Maryland, College Park, Maryland, 1986–1987. Lecture Notes in Mathematics, 1342. Springer, Berlin, 1988, 251–260. MR 0970559 Zbl 0681.54019 q.v. 1015 [19] J. M. Howie, An introduction to semigroup theory. Academic Press [Harcourt Brace Jovanovich Publishers], London, 1976. L.M.S. Monographs, 7. Academic Press, New York and London, 1976. MR 0466355 Zbl 0355.20056 q.v. 1024 [20] W. Krieger, On sofic systems. I. Israel J. Math. 48 (1984), no. 4, 305–330. MR 0776312 Zbl 0573.54032 q.v. 1015 [21] D. A. Lind and B. H. Marcus, An introduction to symbolic dynamics and coding. Cambridge University Press, Cambridge, 1995. MR 1369092 Zbl 1106.37301 q.v. 988, 990, 991, 993, 995, 1007, 1008, 1015, 1022

1030

Marie-Pierre Béal, Jean Berstel, Søren Eilers, and Dominique Perrin

[22] M. Nasu, An invariant for bounded-to-one factor maps between transitive sofic subshifts. Ergodic Theory Dynam. Systems 5 (1985), no. 1, 89–105. MR 0782790 Zbl 0603.54042 q.v. 1022 [23] M. Nasu, Topological conjugacy for sofic systems. Ergodic Theory Dynam. Systems 6 (1986), no. 2, 265–280. MR 0857201 Zbl 0607.54026 q.v. 1012, 1015 [24] M. Nasu, Topological conjugacy for sofic systems and extensions of automorphisms of finite subsystems of topological Markov shifts. In Dynamical systems (J. C. Alexander, ed.). Proceedings of the Special Year on Ergodic Theory and Dynamics held at the University of Maryland, College Park, Maryland, 1986–1987. Lecture Notes in Mathematics, 1342. Springer, Berlin, 1988, 564–607. MR 0970572 Zbl 0664.28008 q.v. 1006 [25] B. Parry and D. Sullivan, A topological invariant of flows on 1-dimensional spaces. Topology 14 (1975), no. 4, 297–299. MR 0405385 Zbl 0314.54045 q.v. 996 [26] D. Perrin and J.-É. Pin, Infinite words. Automata, semigroups, logic and games. Pure and Applied Mathematics (Amsterdam), 141. Elsevier/Academic Press, 2004. Zbl 1094.68052 q.v. 1027 [27] J.-É. Pin, Eilenberg’s theorem for positive varieties of languages. Izv. Vyssh. Uchebn. Zaved. Mat. 1995, no. 1, 80–90. In Russian. English translation, Russian Math. (Iz. VUZ) 39 (1995), no. 1, 74–83. MR 1391325 Zbl 0852.20059 q.v. 1027 [28] J.-É. Pin, Syntactic semigroups. In Handbook of formal languages (G. Rozenberg and A. Salomaa, eds.). Word, language, grammar. Springer, Berlin, 1997, 679–746. MR 1470002 q.v. 1027 [29] J.-É. Pin, A. Pinguet, and P. Weil, Ordered categories and ordered semigroups. Comm. Algebra 30 (2002), no. 12, 5651–5675. MR 1941917 Zbl 1017.06007 q.v. 1027

Chapter 28

Automatic structures Sasha Rubin

Contents 1. 2. 3. 4. 5. 6. 7. 8.

Introduction . . . . . . . . . . . . . . . . . . . Automatic Structures . . . . . . . . . . . . . . The connection with MSOL . . . . . . . . . . . Operations on automatic structures . . . . . . . Proving a structure has no automatic presentation Equivalent automatic presentations . . . . . . . Automatic-like structures . . . . . . . . . . . . Outlook . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

1031 1036 1041 1049 1052 1057 1060 1064

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1064

1. Introduction Automatic structures are structures that can be represented by automata. Intuitively, a structure is called automatic if its domain can be coded as a regular set in such a way that its relations, under this coding, are recognised by synchronous multi-tape automata. This chapter is about using such representations to answer algorithmic and modeltheoretic problems about interesting mathematical structures A. A typical algorithmic problem is to find algorithms for answering queries about A expressed in a given logical language, and a typical model-theoretic problem is to characterise the sets and relations in A that are definable in a given logical language. What are interesting structures A? Let’s start with arithmetics. Much elementary algebra and geometry can be expressed in the language of firstorder logic (FOL) of the structure of real arithmetic, i.e., .R; C; ; C1; 2, the ternary relation C consists of all triples .a; b; c/ such that a C b D c , and the binary relation jk consists of all pairs .n; m/ such that (i) n is a power of k and (ii) n divides m.

Moreover, R  .† /r is regular if and only if R is FOL definable in .† ; .a /a2† ; pref ; Dlen /. The following says that MSOL definability in the structure A is the same as FOL definability in the power structure PŒA. Proposition 3.7. Fix a signature. For every FOL formula .x1 ; : : : ; xk / there is an MSOL formula ˆ.X1 ; : : : ; Xk / (and vice versa) such that for all structures A and all Ui 2 P.A/: PŒA ˆ .Ux / () A ˆ ˆ.Ux /: A similar result holds replacing MSOL by WMSOL and P by Pf .

The following theorem connects the two worlds. It says that automatic presentations are equivalent to FOL interpretations in the appropriate power structure.

28. Automatic structures

1045

Theorem 3.8 (connecting interpretations and automatic presentations, [13] and [18]). The following properties hold. 1. A structure is FOL interpretable in PŒT2  if and only if it is ! -tree automatic. 2. A structure is FOL interpretable in PŒT1  if and only if it is ! -string automatic. 3. A structure is FOL interpretable in Pf ŒT2  if and only if it is finite-tree automatic. 4. A structure is FOL interpretable in Pf ŒT1  if and only if it is finite-string automatic. Moreover, translations between interpretations and automatic presentations are effective. Proof. We illustrate the first item. For the forward direction use the fact that PŒT2  is Rabin-automatic and that Rabin-automatic structures are effectively closed under interpretations (Proposition 3.6). For the reverse direction let J be an ! -tree automatic presentation of C. By Lemma 2.1 we can assume that the trees in L.M / are ¹0; 1º-labelled. By Rabin’s theorem (Theorem 3.1) each Rabin-tree automaton in J can be translated into an MSOL formula of T2 . By viewing MSOL formulas over T2 as FOL formulas over PŒT2  (Proposition 3.7), the resulting tuple of FOL formulas is an FOL interpretation of C in PŒT2 . The items also hold if we replace the structure on the LHS by any structures that are mutually FOL interpretable, e.g., item (4) holds replacing Pf ŒT1  by any of the structures in Example 3.2. These structures are therefore called universal automatic structures. See [12], [9], and [8] for universal structures for the other classes. Remark 3.9. There is an alternative way to think about FOL interpretations in the power structure PŒA of A. Indeed, sets interpretations in A are interpretations I that consist of MSOL formulas with free set variables (instead of FOL formulas), and code elements in the domain of the interpreted structure I.A/ by subsets of the domain of the interpreting structure A, see [24], [80], and [18]. Taking A to be T1 (resp. T2 ) we get an analogue of Theorem 3.8: a structure is Büchi-automatic (resp. Rabin-automatic) if and only if it is sets-interpretable in T1 (resp. T2 ). Similarly, finite-sets interpretations consist of WMSOL formulas, code elements of I.A/ by finite subsets of the domain of A, and taking A to be T1 (resp. T2 ) allow one to get the finite-string automatic (resp. finite-tree automatic) structures. 3.4. Extending the fundamental theorem. Motivated by the tight connection between FOL on the power structure of T2 , MSOL on T2 , and Rabin-automatic structures (Theorem 3.8), we end this section with a short survey of important properties that are MSOL definable in T2 . In particular, we study how to enrich MSOL by the quantifiers that talk about the number of sets satisfying a property. This allows us to extend the fundamental theorem for automatic structures. For a cardinal  let 9> denote the quantifier “there exists at least  many sets X .” Thus 9>@0 means “there exist infinitely many sets X ” and 9>@1 means “there exist uncountably many sets X .” Write MSO.9> / for MSOL enriched by the quantifier 9> .

Sasha Rubin

1046

Lemma 3.10. The property “X is finite” is MSOL definable in T2 . Proof. The lexicographic ordering @1 in MSOL over T2 . Theorem 3.11 ([6]). For every MSO.9>@0 ; 9>@1 / formula ˆ.Xx / one can construct an MSOL formula ˆ0 .Xx / equivalent over T2 to ˆ.Xx/. Proof. For the 9>@0 case, note that the following are equivalent: 1. there are only finitely many Y satisfying ˆ.Y; Xx /;

2. there is a finite set Z such that every pair of different sets Y1 ; Y2 which both satisfy ˆ.Yi ; Xx/ differ on Z . The second condition can be expressed in MSOL using Lemma 3.10. The proof of the 9>@1 case, which is out of our scope, uses the composition method and basic notions from descriptive set theory (we will prove a similar theorem for T1 below). The results above are stated for T2 . One can deduce similar results for T1 , or prove them directly using automata operating on infinite strings. A direct proof of the following theorem follows the same schema as that of Rabin’s Theorem (Theorem 3.1); also see Theorem 5.1 in Chapter 8. The main work is to show that Büchi-string automata are effectively closed under complementation. Recall that a Büchi-string automaton is like an ordinary nondeterministic finite-string automaton except that it reads ! -strings and produces infinite runs; a run of a Büchi-string automaton is declared successful if some final state appears infinitely often in the run. For X  N write X for the ¹0; 1º-labelled ! -string such that X Œn D 1 if and only if n 2 X . Theorem 3.12 (Büchi’s Theorem). For each MSOL formula ˆ.X1 ; : : : ; Xk / in the signature of T1 there is a Büchi-string automaton over alphabet ¹0; 1ºk , and vice-versa, such that the language accepted by the automaton is ¹˝.X1 ; : : : ; Xk /W T1 ˆ ˆ.X1 ; : : : ; Xk /º:

The translations are effective. Here is the corresponding result for MSOL over T1 . Theorem 3.13 ([13] and [67]). For every MSO.9>@0 ; 9>@1 / formula ˆ.Xx / one can construct an MSOL formula ˆ0 .Xx / equivalent over T1 to ˆ.Xx /. Proof. Eliminating 9>@0 is done as in Theorem 3.11. Eliminating 9>@1 is done as follows. Say that two subsets of N have the same end if their symmetric difference

28. Automatic structures

1047

is finite. We will show that there is a constant K (that depends only on ˆ) such that the following are equivalent for all Yx : 1. there are uncountably many sets Y satisfying ˆ.Y; Xx/; 2. there are K many sets Y , each satisfying ˆ.Y; Xx /, and that pairwise have different ends (note that this condition can be easily expressed in MSOL). Note that (1) implies (2) for every K since every set has the same end as only countably many other sets. For the reverse the idea is that if there are enough sets with different ends then we can find two that “confuse” (an automaton for) ˆ, and so shuffle these to get uncountably many. Fix, by Büchi’s Theorem (Theorem 3.12), a Büchi-string automaton for ˆ and let K be larger than the number of its states. For ease of writing assume that Xx is a singleton and so write X instead. For every set A satisfying ˆ.A; X / write A for some accepting run of the automaton on ˝.A ; X /. For two sets A; B , define D.A; B/ WD ¹n 2 NW A Œn ¤ B Œnº. There are at least two sets, say A and B , with different ends and such that D.A; B/ is co-infinite. Otherwise, for every A; A0 with different ends the set D.A; A0 / is co-finite. Then, T for every finite set S  P.N/ consisting of sets with pairwise different ends, the set A¤A0 2S D.A; A0 / is co-finite and thus non-empty. But this is impossible if jSj > K , since the automaton has < K states. To continue, let H denote the infinite set N n D.A; B/. List H as h1 < h2 < h3 <    . Without loss of generality we can assume, by passing to an infinite subset of H if required, that for all n, both A Œhn ; hnC1  and B Œhn ; hnC1  mention final states and A \ Œhn ; hnC1 1 ¤ B \ Œhn ; hnC1 1. By knitting segments of the runs we see that the automaton accepts every string of the form ˝.Y ; X / where  S 1 and for every n the set Zn is one Y D .A \ Œ0; h1 1/ [ n Zn \ Œhn ; hnC1 of A or B . This gives uncountably many (actually continuum many) distinct sets Y satisfying ˆ.Y; X /. Write MSO.9mod / for MSOL enriched by the quantifiers 9k;m stating “there exists exactly k modulo m many sets X ” (0 6 m < k 2 N). The following says that these modulo quantifiers can be eliminated from MSOL formulas over T1 (it is not known if the same holds for MSOL over T2 ). Theorem 3.14 ([67]). For every MSO.9mod / formula ˆ.Xx/ one can construct an MSOL formula ˆ0 .Xx / equivalent over T1 to ˆ.Xx/. Proof. Eliminating 9mod is done by building an automaton that counts (modulo m) the number of accepting runs in a deterministic automaton (say with Muller condition) for ˆ on inputs of the form .Y; Xx / where Y is a set. We now state the corresponding results for WMSOL: Theorem 3.15. Let A be T1 or T2 . For every WMSO.9>@0 ; 9mod / formula ˆ.Xx / one x equivalent over A to ˆ.Xx /. can construct a WMSOL formula ˆ0 .X/

1048

Sasha Rubin

Proof. The only case that cannot be deduced from the results above is that of eliminating 9mod in T2 . For ease of writing, assume Xx is a singleton X . Build a finite-tree automaton that on input X counts, modulo m, the number of accepting runs of a frontierto-root deterministic finite-tree automaton M for ˆ on inputs of the form .Y; X / where Y is a finite set. Let N.q; w/ denote the number, modulo m, over all finite sets Y , of runs of M with root labelled q whose input is the subtree of ˝.Y ; X / at position w . The automaton for 9k;m will store, in its state, while a position w of the input tree X , the tuple .N.q; w//q of numbers; and it will accept if the sum, over the final states f of M , of N.f; /, is equal to k modulo m. Note that if w is a leaf of X then N.q; w/ can be computed offline and depends only on M and X .w/. If w is an internal node then N.q; w/ is equal, modulo m, to the sum over all transitions .q0 ; q1 ; ˝.Y ; X /.w// ! q of M of the terms N.q0 ; w0/  N.q1 ; w1/. In other words, suppose there are ni modulo m runs with root qi on the subtree at position wi (for i D 0; 1), and there is a transition that, at w , sends the left-subtree in state q0 and the right-subtree in state q1 to state q . Then there are n0 n1 modulo m many runs with root q on the subtree at position w .

We now rephrase these in terms of FOL on automatic structures. We overload the quantifier notation and, e.g., let 9> denote the quantifier “there exists at least  many elements x .” Theorem 3.16 (extension of fundamental theorem; see [13] and [56], and [67]). Let ı be one of “finite-string,” “finite-tree,” “! -string,” or “! -tree.” Let 8 >@ >@ ˆ if ı is “! -tree,” @0 ; 9>@1 ; 9mod / if ı is “! -string,” ˆ : >@0 mod .9 ;9 / if ı is “finite-string” or “finite-tree.” Let A be ı-automatic via the co-ordinate map . Then,

1. for every FOL.Q/-definable relation R in A, the set code .R/ is accepted by an ı-automaton; 2. the FOL.Q/ theory of A is decidable.

How far can we push this? First we need a rigorous definition of quantifier. This is neatly provided by Lindström’s definition of “generalised quantifier,” see [71]. We don’t have a clear picture of those generalised quantifiers that can be added to FOL and still get the definability and definability properties as in the fundamental theorem. However, here is a special case. Define the unary cardinality quantifier QC parameterised by C , for C a class of cardinals, as “there exists exactly ˛ 2 C many elements x such that . . . ” Examples include “there exists an element x ” (take C to be all cardinals except 0) and “there exist a prime number of elements” (take C to be the set of prime numbers). Write FOL.QC / for the extension of FOL by the unary cardinality quantifier QC . It turns out that, on string-automatic structures, the only unary cardinality quantifiers we can add to FOL and still get the fundamental theorem are (or are easily definable using) the ones mentioned in Theorem 3.16, see [81] and [67].

28. Automatic structures

1049

What about other extensions of FOL? Unfortunately set quantification is too much to hope for, as are other standard extensions of FOL such as fixed point logics [12]. Of course one can hope for more if one restricts the class of automatic structures under consideration. For instance, in the finite-string and ! -string cases one can extend FOL by the binary Ramsey quantifier which says “there is an infinite set X of elements such that for all distinct x; y 2 X we have that .x; y; z/ N ,” see [81] and [67] (this allows one to express, for instance, that an automatic tree with ancestor relation has an infinite path). Interestingly, one can add any computable cardinality quantifiers (of arbitrary arity) to ! -string automatic structures of bounded-degree and retain decidability [67].

4. Operations on automatic structures The goal of this section is to answer the following question. Which operations on automatic structures preserve automaticity? Let ı stand for “finite-string,” “! -string,” “finite-tree,” or “! -tree.” The fact that ı-automatic structures are effectively closed under FOL interpretations (Proposition 3.6) is very useful. For instance, this immediately implies that ı-automatic structures are effectively closed under FOL definable expansions. There is a more general notion of interpretation called a FOL interpretation of dimension d . Here ı has d 2 N free variables and each i has d  ri free variables. In this case, elements of I.C/ are d -tuples of elements of C. If we don’t specify d we simply talk about many-dimensional FOL interpretations. Proposition 4.1 ([13]). The ı-automatic structures are effectively closed under manydimensional FOL interpretations. Proof. We illustrate the idea. Suppose B 2 S-AutStr and I is an FOL interpretation of dimension d in the signature of B. It is enough to show that I.B/ 2 S-AutStr. An element a of I.B/ is a d -tuple of elements .b1 ; : : : ; bd / of B each of which is coded by a finite string code.bi /. Coding the element a by the string ˝.code.b1 /; : : : ; code.bd // we get a finite-string automatic presentation of I.B/ over alphabet ¹0; 1; ºd .

For instance, if A and B are FOL interpretable in C then the disjoint union of A and B is 2-dimensionally interpretable in C. Similarly for their direct product. Thus, ı-automatic structures are effectively closed under disjoint union and direct product. The (weak) direct power of A is a structure with the same signature as A, its domain consists of (finite) sequences of A, and the interpretation of a relation symbol R is the set of sequences  such that A ˆ RA ..i // for all i . For example, .N; / is isomorphic to the weak Q direct power of .N; C/; the isomorphism sends n to the finite sequence .ei /i where piei is the prime-power decomposition of n.

Proposition 4.2 ([12]). Each of T-AutStr and !T-AutStr is effectively closed under weak direct power. The class !T-AutStr is effectively closed under direct power.

1050

Sasha Rubin

Proof. We illustrate the second statement. Let A be a Rabin-automatic structure with relation symbol R. Let  D .an /n be an element of the direct power of A. Code the sequence  by the tree t whose subtree at 0n 1 is the tree code.an /. The interpretation of R in the direct power is accepted by a tree automaton: it processes t by checking, for every n, that the convolution of the subtrees rooted at 0n 1 is accepted by the automaton for code.R/. On the other hand, S-AutStr is not closed under weak direct power since, as we will see later, .N; / is not in S-AutStr (Theorem 5.1). Also, !S-AutStr is not closed under weak direct power or direct power. Indeed, for the weak direct power use the fact that every countable !S-AutStr structure is in S-AutStr (Proposition 2.6); for the direct power use the fact that .N; / is FOL.9>@0 /-definable in the direct power of .N; C; @0 /-definitions (Theorem 3.16). A generalisation of direct product, weak direct power, and disjoint union, called finitary generalised product, has the property that if a finite sequence .Ai / of structures are many-dimensional FOL interpretable in B then all their finitary generalised products are many-dimensional FOL interpretable in B, see [13]. Thus, if each Ai is ı-automatic, then so are all their finitary generalised products. 4.1. Closure under quotients. Let A D .A; R1 ; : : : ; RN / be a structure. An equivalence relation  on the domain A is called a congruence for A if each relation Ri , say of arity ri , satisfies the following property: for every pair of ri -tuples a; N bN of elements N of A, if aj  bj for all 1 6 j 6 ri then aN 2 Ri if and only if b 2 Ri . The quotient of A by , written A= is the structure whose domain is the set of equivalence classes of  and whose i -th relation is ¹.Œa1  ; : : : ; Œari  /W .a1 ; : : : ; ari / 2 Ri º. Let ı stand for “finite-string,” “! -string,” “finite-tree,” or “! -tree.” We consider the following natural question for each class in turn. If .A; / is ı-automatic, is A= also ı-automatic? 3

Theorem 4.3 ([13]). If .A; / 2 S-AutStr then also A= 2 S-AutStr.

Proof. There is a regular well-ordering of the set of finite strings, for instance the length-lexicographic ordering @0 ; 9>@1 ; 9mod / theory if .A; / is ı-automatic?

If a class is known to be closed under quotient (i.e., ı is “finite-string”; ı is “finitetree”; ı is “! -string” and A is countable) the answer is “yes” by the extension of the fundamental theorem (Theorem 3.16). It turns out that the answer is also “yes” for the case that ı is “! -string” (and no restriction on the cardinality of A), see [7].

5. Proving a structure has no automatic presentation In the previous sections we have seen examples of automatic structures as well as how to define new automatic structures from old ones. The goal of this section is to answer the following question. How does one prove that a given structure has no automatic presentation? Observe that by the fundamental theorem (Theorem 2.2), if the FOL theory of A is undecidable, then A has no automatic presentation. This applies, for instance, to the integer arithmetics .N; C; j/ and .N; C; /. Indeed, multiplication is FOL definable in terms of C and j (see [79]) and the FOL theory of .N; C; / is undecidable. In the next sections we will see arguments that allow one to get finer results, including some of the following: Theorem 5.1. The following structures are not in S-AutStr: 1. the free group on two or more generators, see [52]; 2. the random graph, see [20] and [54]; 3. any countably infinite integral domain, see [54]; 4. the countable atomless Boolean algebra, see [54]; 5. the arithmetics .N; / and .N; j/, see [13]; 6. the ordinal .! ! ; 1 note that Gn .D/ contains the primes ¹p1 ; p2 ; : : : ; pn º. Moreover, it also contains the following two sets: An WD ¹p1k1 W k1 6 2n º and Bn WD ¹p2k2 : : : pnkn W p1k2 : : : pnkn 1 2 Gn 1 .D/º. The set Bn is included because if p1k2 : : : pnkn 1 is generated in the first n 1 steps then a symmetric generation process also generates p2k2 : : : pnkn in the last n 1 steps of an n step process. Take the product of An

Sasha Rubin

1054

and Bn to see that GnC1 .D/ contains all elements of the form p1k1 p2k2 : : : pnkn where k1 6 2n and p1k2 : : : pnkn 1 2PGn 1 .D/. Thus, jGnC1 .D/j > 2n  jGn 1 .D/j, and so Q n2 n 2i jGnC1 .D/j > i 6 n 22i D 2 i 6 2 > 2 4 . 2

The next technique limits the growth of certain projections of relations definable with parameters. Consider structures of the form .A; R/ where R is a relation of arity greater than 1. For a tuple uN of elements from A define R.; u/ N WD ¹a 2 AW .a; u/ N 2 Rº. Definition 5.2. Fix R. For finite E  A the shadow cast by uN on E is the set R.; u/\E N and the shadow count of E is the number of distinct shadows cast by uN on E as uN varies over tuples of elements of A.

Every subset of E is a potential shadow, and thus the shadow count of E is between 0 and 2jE j . In an finite-string/finite-tree automatic structure, there are arbitrarily large subsets whose shadow counts are linear/polynomial. Proposition 5.5. Suppose .A; R/ is finite-string (resp. finite-tree) automatic. Then there is a constant k , that depends on the automata for the domain A and the relation R, and arbitrarily large finite subsets E  A such that the shadow count of E is at most kjEj (resp. jEjk ).

Proof. We prove the finite-string case. To simplify readability, we suppose that R has arity 2 and the co-ordinate map is the identity, and that the domain A consists of binary strings. Let An be the set of strings in A of length at most n. The conclusion follows from two facts. First, there is a constant c such that for all n and all x 2 A there is a y 2 AnCc such that x and y cast the same shadow on An ; thus the shadow count of An is at most jAnCc j. Second, for every k there is a constant k 0 such that for all n we have that jAnCk j 6 k 0 jAn j. We now discuss how to establish these facts. The second fact can be proved using a simple pumping argument. For the first fact, let Q be the state set of the automaton for R and consider the sequence of functions fi W Q ! Q with fi .q/ defined to be the state reached when the automaton for R starts in q and reads the string ˝.xŒn C 1; n C i ; /. If len.x/ > n C jQjjQj then there are two positions k < l such that fk D fl . If we remove the segment xŒk; l 1 from x we get a shorter string x 0 that casts the same shadow on An as x does. Repeat until the string is short enough.

We illustrate by showing that the random graph is not finite-tree automatic [20] (the finite-string case is also due, independently, to Frank Stephan, as reported in [54]). Example 5.3 (random graph). The random graph 5 .A; R/ is a countably infinite graph that is characterised by the following extension property: for every pair of disjoint finite sets E; F  A there is a point u 2 A such that (i) .x; u/ 2 R for all x 2 E , and (ii) .x; u/ 62 R for all x 2 F , i.e., there is an edge from u to every element in E and to 5 It is called the random graph because it is almost surely the result of the following process on infinitely many vertices: independently flip a fair coin for every pair of vertices to decide if there is an edge between them.

28. Automatic structures

1055

no element of F . In particular, taking R to be the edge relation in the random graph, the shadow count of every finite E  A is the largest possible, namely 2jE j .

5.2. Sum- and product-decompositions. All definitions and results in this section are due to Delhommé [20]. In these definitions all structures are over the same signature.

Definition 5.3. Say that a structure B is sum-decomposable using a set of structures C if there is a finite partition of B D B1 [    [ Bn such that for each i the substructure B  Bi is isomorphic to some structure in C . Theorem 5.6. Suppose .A; R/ 2 S-AutStr. There is a finite set of structures C so that for every tuple of elements uN from A, the substructure A  R.; u/ N is sum-decomposable using C . Proof. To simplify readability, suppose .A; S; R/ is finite-string automatic via the identity co-ordinate map. For T 2 ¹S; Rº fix a deterministic automaton .QT ; T ; T ; FT / recognising T . Naturally extend T to all strings and so write T .q; w/. Given a tuple of strings uN write len.u/ N WD maxi len.ui /. Observe that we can partition the set R.; u/ N into the finitely many sets: the singletons ¹cº such that R.c; u/ N and len.c/ < len.u/ N , and the sets Ra .; u/ N WD ¹aw 2 AW .aw; u/ N 2 R; w 2 ¹0; 1ºº where len.a/ D len.u/ N . There are finitely many isomorphism types amongst substructures of the form A  ¹cº, for c 2 A. So, it is sufficient to show that as we vary the tuple .a; u/ N subject to len.a/ D len.u/ N , there are finitely many isomorphism types amongst substructures of the form A  Ra .; u/ N . We do this by bounding the number of isomorphism types in terms of the number of states of the automata. To this end, define a function f as follows. Its domain consists of tuples .a; u/ N satisfying len.a/ D len.u/ N ; and f sends .a; u/ N to the pair of states .R .R ; ˝.a; u//; N S .S ; ˝.a; a/// :

The range of f is finite (it is bounded by jQR j  jQS j). To finish, we argue that the isomorphism type of the substructure A  Ra .; u/ N depends only on the value f .a; u/ N . This follows from the fact that if f .a; u/ N D f .a0 ; u0 /, then the corresponding substructures are isomorphic via the mapping aw 7! a0 w (w 2 ¹0; 1º). For instance .aw; u/ N 2 R if and only if the automaton for R starting in state R .R ; ˝.a; u// N and reading ˝.w; ; : : : ; / reaches a final state if and only if starting in R .R ; ˝.a0 ; u0 // and reading ˝.w; ; : : : ; / it reaches a final state if and only if .a0 w; u0 / 2 R. This 0 establishes that f is a bijection between the domains Ra .; u/ N and Ra .; u0 /. A similar 0 0 argument shows that .aw1 ; aw2 / 2 S if and only if .a w1 ; a w2 / 2 S . Corollary 5.7. The ordinal .! ! ; jg.x1 /j. Write LD for the regular relation consisting of pairs .x; y/ 2 D  D such that jxj > jyj. We proceed by a series of claims. The first claim says that the translation f has constant delay.

C la i m 1 . There is a constant ı such that f has ı -delay. To see this, since f 1 .LC / WD ¹.a; b/W jf .a/j > jf .b/jº  B  B is regular (by assumption of equivalence) and locally

28. Automatic structures

1059

finite (every a is related to finitely many b s) there is a ı (by a pumping argument as in Proposition 5.2) such that .a; b/ 2 f 1 .LC / implies jbj 6 jaj C ı . The next claim says that the strings no longer than x are translated into strings that are no more than a constant longer than the string x is translated into.

C la i m 2 . There is a constant K with F .jxj/ jf .x/j 6 K , where F is the growth of f . To see this, since f .LB / WD ¹.f .a/; f .b//W jaj > jbjº  C  C is regular (by assumption of equivalence) and locally finite, there is a constant K such that jxj > jyj implies jf .y/j 6 jf .x/j C K . Thus F .jxj/ WD maxjaj6jxj jf .a/j is at most jf .x/j C K . We now proceed to the padding. Let \ be a new symbol. Define a translation f 0 that maps x 2 B to f .x/\F .jxj/ jf .x/j . In words, f 0 .x/ pads f .x/ by \ symbols so that f 0 is length-monotonic. Since the property ¹xW F .jxj/ jf .x/j D i º is regular for fixed i the graph of f 0 is regular (by the previous claim). Write C0 for f 0 .B/. Write F 0 for the growth of f 0 .

C la i m 3 . The translation f 0 W B ! C0 is length-monotonic and has ı -delay. To see this, use the fact that jf 0 .x/j D F .jxj/ D F 0 .jxj/. C la i m 4 . There exists p; s 2 N such that F 0 .n C p/ F 0 .n/ D s for almost all n. To see this, let l0 < l1 <    be the sequence of integers l for which there is a string in B of length l . Let un 2 C0 denote the length-lexicographically smallest element amongst ¹f 0 .x/W jxj D ln ; x 2 Bº. The set L of all such un is regular. Note that jun j D F 0 .ln / and so jun j 6 junC1 j (length-monotonic) and jun j < junCı j (ı -delay). Thus L has at most ı many strings of any given length. So partition L into regular sets Lk for k 6 ı , i.e., x 2 Lk if there are exactly k strings of length jxj in L. The length-preserving morphism of these onto 0 results in unary presentations of Lk . These are ultimately periodic. For simplicity assume the previous claim holds for all n (in general one has to deal with finitely many outliers). Now, for x 2 B of length n write f 0 .x/ as v1 v2 : : : vn where jvi j D s (if jf 0 .x/j is not a multiple of s , append a (new) blank symbol until it is). For a string w of length s write w b for a new alphabet symbol. Define f 00 W x 7! vb1 : : : vbn . Write 00 00 C for f .B/. Clearly the translation f 00 W B ! C00 is length-preserving and preserves all regular relations.

C la i m 5 . The graph of f 00 is regular. The idea is that we can use lengths of elements of B (and C00 ) as pointers to simultaneously identify the symbols in x and f 00 .x/. For simplicity, suppose that for every n 2 N there is an element in B of length n (in general the gap between lengths is bounded). For a symbol  define the regular relation S  B  B consisting of pairs .p; z/ such that  occurs in z at position jpj. Write R  C00  C00 for the image of S under f 00 . It is also regular. Then f 00 .x/ D y if and only if jxj D jyj and for all p 2 B and q 2 C00 with jpj D jqj and each symbol  we have S .p; x/ () R .q; y/. This latter condition is regular.

1060

Sasha Rubin

Finally, write W C ! C0 for the map sending f .x/ 7! f 0 .x/, and ˇW C0 ! C00 for the map sending f 0 .x/ 7! f 00 .x/. Since  1 is a projection its graph is regular. Finally, ˇ 1 is semi-synchronous sending blocks of size 1 to blocks of size s . Equivalence for presentations using other types of automata can be defined in a similar way to Definition 6.1. The following says that there is only one way to present, using finite-string/finite-tree automata, the universal finite-string automatic structures. The study of equivalent ! -string/! -tree automatic presentations is open. Theorem 6.3. Let A be one of the structures in Example 3.2, e.g., .N; C; jk / for k > 2.

1. A has exactly one finite-string automatic presentation up to equivalence, see [2] and [3]. 2. A has exactly one finite-tree automatic presentation up to equivalence, see [18].

7. Automatic-like structures We now look at other ways to use automata to establish that a structure has a decidable FOL theory. 7.1. Countable substructures. We will see a way of producing, from an uncountable Büchi- or Rabin-automatic structure B, a countable substructure A with the same theory. Thus, although A may itself not be automatic, one can still conclude it has decidable theory. Example 7.1 ([19], p. 106). Code the real numbers in base 2 to get an ! -string automatic presentation of A WD .R; C/. The substructure of A consisting of those reals coded by ultimately periodic ! -strings is isomorphic to B WD .Q; C/. Since the FOL theory of .R; C/ and .Q; C/ are identical (e.g., see below) we conclude that .Q; C/ has decidable FOL theory. This is in spite of the fact that .Q; C/ is not in S-AutStr (Theorem 5.1) and thus also not in !S-AutStr (Proposition 2.6). Let A; B have the same signature. The following is a sufficient condition to establish that A and B have the same FOL.9>@0 ; 9mod / theory (the lemma with no additional quantifiers is called the Tarski–Vaught test [33]): Lemma 7.1. Fix QC1 ; QC2 ; : : : unary cardinality quantifiers, and suppose A is a N , substructure of B. Further, suppose that for every FOL..QCi /i / formula .x; y/ N B j 2 Ci if and only if every quantifier QCi , and all aN from A, we have that j.; a/ j.; a/ N A j 2 Ci . Then A and B have the same FOL..QCi /i /-theory. N and all aN Proof. We prove a stronger statement: for all FOL..QCi /i / formulas .x/ from A we have that A ˆ .a/ N if and only if B ˆ .a/ N . To see this proceed by induction on  . The case that  is atomic holds by definition of being a substructure. The Boolean operations are immediate by the inductive hypothesis. For the quantifiers, B ˆ QC x.x; a/ N means (by definition) that j.; a/ N B j 2 C . Now apply the assumption.

28. Automatic structures

1061

Say A is ! -string (resp. ! -tree) automatic presentable via the co-ordinate mapping W L.M / ! A, and write Aup for the countable substructure of A restricted to the set .U / where U  L.M / is the set of ultimately periodic strings (resp. regular trees) in L.M /. Proposition 7.2 (cf. [7] and [32]). 1. If A 2 !S-AutStr then Aup has the same FOL.9>@0 ; 9mod /-theory as A. 2. If A 2 !T-AutStr then Areg has the same FOL.9>@0 ; 9mod /-theory as A.

Proof. We sketch the proof for trees. To apply Lemma 7.1 let  be a FOL.9>@0 ; 9mod / formula and convert it to an automaton (use Theorem 3.16). Then repeatedly call Rabin’s Basis Theorem (Theorem 3.2) with regular parameters, and remove the resulting regular trees from the language of the automaton. That is, if the automaton accepts finitely many inputs of the form ˝.; tN / then this procedure shows that if ˝.s; tN / is accepted by the automaton then s is a regular tree. Similarly, if the automaton accepts infinitely many inputs of the form ˝.; tN / then this procedure shows that there are infinitely many regular trees s such that the automaton accepts ˝.s; tN /. 7.2. Automatic with advice. If A has decidable MSOL (resp. WMSOL) theory then every structure FOL interpretable in PŒA (resp. Pf ŒA) has decidable FOL theory. In this section, we take A to be expansions of T1 and T2 by unary predicates P (for simplicity we restrict to a single unary predicate, although one could take finitely many Px without changing much). This justifies the following definition. Definition 7.1 ([18]). A structure is Rabin-automatic with advice P  ¹0; 1º if it is FOL interpretable in PŒ.T2 ; P /. A structure is Büchi-automatic with advice P  ¹0º if it is FOL interpretable in PŒ.T1 ; P /. A structure is finite-tree automatic with advice P  ¹0; 1º if it is FOL interpretable in Pf Œ.T2 ; P /. A structure is finite-string automatic with advice P  ¹0º if it is FOL interpretable in Pf Œ.T1 ; P /. There is an automata-theoretic characterisation of such structures. A Rabin-automaton with advice P  ¹0; 1º is one that, while in position u 2 ¹0; 1º, can decide on its next state using the additional information of whether or not u 2 P . In other words, the advice 6 P is simply read as part of the input. Similar machines exist for ! -strings, finite-strings and finite-trees. Such automata with advice were studied (sometimes implicitly) in [24], [16], [77], [4], [18], [60], and [29]. Note that since the automata have to read the advice P , which is typically infinite, the acceptance conditions of the finitestring/finite-tree automata with advice are also infinitary. So, a structure has a presentation by ı-automata with advice P if and only if it is ı-automatic with advice P . 6 The word “advice” is meant to connote that we can ask for a bit of information based on the current state and the current symbol being read. The other term found in the literature is “oracle” which we choose not to use because in computability theory it means that the machine can ask if the whole content written on a tape is in the oracle language.

1062

Sasha Rubin

Example 7.2. The rational group .Q; C/, although not finite-string automatic [84], is finite-string automatic with advice (this result was discovered independently by Frank Stephan and Joe Miller, and reported in [73]). To simplify the exposition we give a presentation of .Œ0; 1/ \ Q; C/ by finite strings over the alphabet ¹0; 1; #º where the automata have access to the advice string 10#11#100#101#110#111#1000# : : : :

This string, a version of the Champernowne–Smarandache string, is the concatenation of the sequence of binary representations of integers, most significant-digit first, greater than 1, separated by #s. To every rational in Œ0; 1/P there is a unique finite sequence of integers a1 ; : : : ; an such that 0 6 ai < i and niD2 ai Ši and n minimal. The presentation codes this rational as f .a2 /#f .a3 /#f .a4 / : : : #f .an / where f sends ai to the binary string of length dlog2 i e C 1 representing ai . Addition a C b is performed least significant digit last, based on the fact that 1 ai C bi C c i ai C bi C c D C iŠ .i 1/Š iŠ where c 2 ¹0; 1º is the carry in. In other words, if ai Cbi Cc > i then write ai Cbi Cc i in the i -th segment and carry a 1 into the .i 1/st segment; and if ai C bi C c < i then write this under the i -th segment and carry a 0 into the .i 1/st segment. These comparisons and additions can be performed since the advice tape is storing i in the same segment as ai and bi . Of course, since the automaton reads the input and advice from left to right it should non-deterministically guess the carry bits and verify the addition. A similar coding shows that .Q; C/ is finite-string automatic with (the same) advice, by coding the integer part on a separate but parallel “track.” Which expansions have decidable MSOL? In what follows, we say that a predicate P has decidable MSOL theory if the corresponding expanded structure does, e.g., P  ¹0; 1º has decidable MSOL theory if .T2 ; P / does. The advice string in Example 7.2 is known to have decidable MSOL theory [3]. 7 This allows us to conclude that the FOL theory of .Q; C/ is decidable. We now discuss the problem of establishing which expansions of T1 and T2 have decidable MSOL. Elgot and Rabin [24] use automata theoretic arguments to show that certain expansions of T1 by unary predicates have decidable MSOL theories. For instance, they showed that .T1 ; Fact/ with Fact WD ¹0nŠ W n 2 Nº has decidable MSOL theory. Their technique, called the contraction method, was generalised in [16] and this led to a characterisation of .T1 ; P / having decidable MSOL theories, see [77] and [75]. Explicit predicates with decidable MSOL theories, including the morphic predicates, are discussed in [16] and [3]. What about expansions of T2 ? Fratani [28] shows that if W ¹0; 1º ! ¹0º is a semigroup morphism (the operations in both are concatenation) and .T1 ; P / has 7 In this case, since the alphabet is ¹0; 1; #º and not ¹0; 1º, we actually expand T1 by more than one predicate, e.g., P0 , P1 and P# .

28. Automatic structures

1063

decidable MSOL theory then so does .T2 ;  1 .P //. As a concrete application, consider  that sends u to the unary string whose length is the number of 1s in u; then  1 .Fact/ consists of all strings with a factorial number of 1s. Finally, the pushdown/Caucal hierarchy is a well studied collection of trees (and graphs) with decidable MSOL theory, see [17] and [83]. Why do we restrict to expansions by unary predicates? The reason is that expansions by non-trivial binary relations, such as ¹.n; 2n/W n 2 Nº have undecidable MSOL theory [24]. Fundamental theorem for structures that are automatic with advice. There are also corresponding fundamental theorems. Theorem 7.3. Let ı be one of “finite-string,” “! -string,” “finite-tree,” or “! -tree.” Let C be an ı-automatic structure with advice P via co-ordinate map . 1. For every first-order definable relation R in C, the set code .R/ is accepted by an ı-automaton with advice P . 2. If the MSOL theory of P is decidable then the first-order theory of C is decidable. Since we can eliminate quantifiers in the presence of parameters (Theorems 3.11, 3.13, and 3.14) we have corresponding extensions of the fundamental theorem. Closure properties. The automatic structures with advice P are easily seen to be closed under disjoint union and direct product. Here are some non-trivial closure properties. Theorem 7.4. 1. If .A; / is ! -string automatic with advice P and  is a congruence on A of countable index then A= is also ! -string automatic with advice P , see [86]. 2. If .A; / is finite-tree automatic with advice P and  is a congruence on A then A= is also finite-tree automatic with advice P , see [18]. The first item generalises the result without advice (Theorem 4.4). The proof of the second item is an extension of the proof of the result without advice (Theorem 4.5). Proving that a structure is not automatic with advice. We turn to the problem of proving that a structure is not automatic with advice. Note that pumping arguments (see § 5.1) no longer hold in the presence of advice. However, decomposition arguments (cf. § 5.2) and confusion arguments (cf. Theorem 3.13) do. Such arguments can be used to prove the following: Theorem 7.5. None of the following structures is isomorphic to a structure of the form A= where .A; / is a finite-tree automatic structure with advice: 1. the free monoid on two or more generators, see [18]; 2. the random graph, see [18]. None of the following structures is isomorphic to a structure of the form A= where .A; / is an ! -string automatic structure with advice:

Sasha Rubin

1064

1. 2. 3. 4. 5.

the ordinal ! ! , see [76] and [78]; the random graph, see [86]; the free semigroup with two or more generators, see [86]; the arithmetics .N; / and .Q; /, see [86]; the countable atomless Boolean algebra, see [86].

Finally, we return to real-arithmetic, mentioned in the introduction. We highlight the following important result that says that real-arithmetic is not the regular quotient of an ! -string automatic structure, even with advice. It is not known if this result also holds if one replaces “! -string” by “! -tree.” Theorem 7.6 ([85], [1], and [86]). Real arithmetic .R; C; ; C1; , and make it recognise all A ;  direct all transitions leading to states of the form .?p ; sk / to >;  remove all states .sp ; sk / from which a state .sp0 ; ?k / is reachable by a (possibly empty) sequence of uncontrollable actions. Let AC be the result of these operations and let C D L.Ac /. We claim that C constructed this way is the solution announced by the theorem. The first observation gives a useful characterisation of the behaviour of the controlled plant. Lemma 3.2. w 2 P \ C if and only if w 2 P \ K and wAunc \ P  wAunc \ K .

To see why this characterisation holds, consider a word w 2 P \ C . By construction, after reading this word Ac ends up in a state .sp ; sk / with none of its components being a sink state. This means that w 2 P \ K . To check the second condition, take u 2 Aunc and suppose wu 2 P . This means that for the state .sp0 ; sk0 / reached on this word in Ac the first component is not ?p . Observe that .sp0 ; sk0 / is also the state reached on u from .sp ; sk / on u. Since .sp ; sk / is not removed, sk0 is not ?k . Hence wu 2 K . For the other direction of the Lemma 3.2, take w 2 P \ K . Automaton A reading w reaches a state .sp ; sk / with neither of the two components being a sink state. It is sufficient to check that this state is not removed in the second step of the construction. Using Lemma 3.2 we can check that C D L.Ac / is a solution to the problem. The lemma implies that P \ C  K . By construction C is prefix closed. It remains to check the control condition. For this we take w 2 C and a 2 Aunc . We want to show wa 2 C . There are two cases.  If wa 62 P then wa D vbu, where v is the longest prefix included in P . We have that in Ac word v leads to a state .sp ; sk / with sp not a sink state. By the construction a transition on b from this state leads to >c . Hence vbA  C , and in particular, wa 2 C .  The second case is when wa 2 P . Then w 2 P \ C , so using Lemma 3.2 we get wa 2 K . Moreover for every u 2 Aunc we have that if wau 2 P then wau 2 K . Hence wa 2 C again by Lemma 3.2. Finally, we check that C is the largest solution with respect to language inclusion. For this, consider some other controller C 0 and take w 2 C 0 . We need to show w 2 C . If w 2 P then w 2 P \ C 0  K , since C 0 is supposed to be a controller for the specification K . Analogously, for every u 2 .Aunc / , if wu 2 P then wu 2 K , since wu 2 C 0 by the control condition. From Lemma 3.2 we get that w 2 C . The remaining

1230

Igor Walukiewicz

case is when w 62 P . Let v be the longest prefix of w that is in P , so w D vbu for some letter b and word u. From the previous case we have v 2 P \ C . Since vb 62 P , we have that in Ac the word vb leads to >c . This means that vbA  C , and in particular w D vbu 2 C . The interesting point of the above theorem is that it guarantees the existence of a maximal controller implementable as a finite automaton. The price to pay is rather severe limits on the form of a specification. In particular, the formulation does not allow expressing deadlock or liveness constraints. For example, the controlled plant from Figure 5 has deadlock states from which no transition is possible. In this example, a controller avoiding deadlocks should permit nothing but a actions. The simplest approach to handle deadlock is to explicitly require that the resulting controlled plant does not have deadlock states: Blocking. Every state of the controlled plant has an outgoing transition. Observe that this time the condition refers to a controlled plant and not on a controller alone. Of course there are many different variants of the blocking condition. We take this one as a representative example. It is not difficult to solve the control problem with a blocking requirement. We take the automaton Ac as constructed above. We call a state .p; k/ blocked if there is no action enabled from it. Clearly, in order to satisfy the blocking condition we should remove all blocked states. We call a state .p; k/ unstable if there is there is some uncontrollable action enabled in p and not enabled in .p; k/. In Ac there are no unstable states, but once we remove blocked states we can get unstable states. Removing unstable states can produce new blocked or unstable states that should be removed. We repeat this process until there are no states to remove. The automaton we obtain at the end is the solution to the synthesis problem with blocking condition and it is the largest solution with respect to language inclusion. Another important example concerns unobservable actions. These are actions whose execution by a plant should not be visible to a controller. In other words, transition labelled with an unobservable action should not change the state of the controller – it should be a self-loop. Formally, the observability condition with respect to a set Auno  A of unobservable actions is as follows: Observability. Every transition on a 2 Auno is a self-loop. Example. Again consider the plant from Figure 5. Suppose that additionally c is unobservable; that is Auno D ¹cº, and as before Aunc D ¹eº. If the specification is to avoid doing the action e then the largest controller for this plant has just one state with actions a; c; e being self-loops on this state. Put differently, the language of the controller permits all the sequences of actions not containing b . In general, in order to solve the control problem with observability constraints, one first needs to perform a kind of powerset construction on the plant with respect to unobservable actions. Then the rest of the argument is the same as in the previous cases.

33. Synthesis with finite automata

1231

While conceptually easy in a centralised setting, the observability condition increases the algorithmic complexity of the problem. The constructions we have presented before were all in polynomial time. In particular, a controller has been always a subautomaton of the product of the plant and the specification automata. A controller under observability constraints can be exponentially bigger than this product. 3.2. General specifications. In the previous subsection we have seen three types of constraints: controllability, blocking, and observability. One can, of course, very well imagine variations on these properties, as well as completely different properties. For example, liveness properties of the form: some action appears infinitely often on every execution. Or branching properties like: from every state it is possible to reach a reset state. In this subsection we will present a way to handle such extensions. A specification in the centralised control problem talks about some properties of the product Ap  Ac of the plant and the controller. The most general way to specify such properties would be to talk directly about the product as a graph with labelled edges: states are nodes, and edges are given by the transition function. More precisely, an automaton hQ; q0 ; eW Q  A ! Ai can be seen as a graph with labelled edges and a distinguished node hQ; q0 ; ¹Ra ºa2A i where .q; q 0 / 2 Ra when e.q; a/ D q 0 . To connect this to standard terminology we will call such graphs transition systems. Observe that these transition systems are deterministic: for every node and label there is at most one outgoing edge with that label. Example. The transition system view of an automaton encourages formulation of properties not expressible in terms of language inclusion. For example, we may require that from every node one can reach a node where a transition on a reset action r is possible. In terms of properties of the language this means that every word in the language has a prolongation ending with r . We are looking for a logic capable of describing properties of graphs with labelled edges. This time, though, we cannot just take monadic second-order logic as we have done for sequences, since the logic is undecidable over graphs. Fortunately, there exists a well-determined fragment of the logic that is decidable and can express most of the properties we are interested in. Mu-calculus with loop testing. Mu-calculus is a modal logic with fixpoints. A formula of this logic describes a set of states of a transition system, see [4], [11], [71], and [12]. To define the syntax we fix an alphabet A of actions, that is, labels of edges of a transition system, and a countable set of variables, whose meanings will be sets of states of the transition system. The set of formulas of the logic is the smallest set containing variables, the constant true, closed under Boolean connectives, and two additional constructs: modalities: if ˛ is a formula and a an action, then hai˛ is a formula; fixpoint:

if ˛ is a formula and X a variable whose all occurrences in ˛ are positive (under even number of negations), then X:˛ is a formula.

1232

Igor Walukiewicz

The meaning of a formula in a transition system is a set of states satisfying the formula. Since a formula may have free variables, its meaning depends on the meanings of the free variables. More formally, given a transition system M D hS; ¹Ra ºa2Act i and a valuation VW Var ! P.S / we define the meaning of a formula ŒŒ˛M V by induction on its structure. The meaning of variables is given by the valuation. The meaning of true is the set of all the states of the transition system. The meaning of Boolean connectives is standard. The meaning of modalities is determined by transitions. Formula hai˛ holds in all states from which there is an transition on a to a state satisfying ˛ M 0 0 0 ŒŒhai˛M V D¹s 2 S W 9s :Ra .s; s / ^ s 2 ŒŒ˛ V º:

Finally, the  construct is interpreted as the least fixpoint of an operator determined by ˛ . A formula ˛.X / containing a free variable X can be seen as an operator on sets of states mapping a set S 0 to the semantics of ˛ when X is interpreted as S 0 ; in symbols: S 0 7! ŒŒ˛ M VŒS 0 =X . As all occurrences of X in ˛ are positive, this operator is monotonic. Its least fixpoint is given by \ 0 ¹S 0  S W ŒŒ˛ M ŒŒX:˛ M V D VŒS 0 =X  S º: We will often write M; s; V  ˛ instead of s 2 ŒŒ˛M V . Moreover we will omit V or M if it is not important or clear from the context. Before giving some examples, let us introduce some useful abbreviations. We will write Œa˛ for :hai:˛ . This formula holds in a state if ˛ holds in all states reachable from it by transitions on a. We will write X:˛.X / for :X::˛.:X /. It can be checked that X:˛.X / is the greatest fixpoint of the operator defined by ˛.X /.

Example. Formula haitrue means “there is transition labelled by a.” With one fixpoint, we can talk about termination properties of paths in a transition system. The formula X:haiX means that there is an infinite sequence of a transitions. The formula X:ŒaX means that all sequences of a transitions are finite. Observe the crucial role of fixpoints in the last two formulas; indeed changing  to  in the last formula gives X:ŒaX that is always true since Œatrue is always true. With two fixpoints, we can write fairness formulas, such as Y:Z:.haiZ/ _ hbiY , meaning “there is a path of a’s and V b ’s with infinitely many occurrences of b ’s.” A very useful formula X:.hbitrue/ ^ a2A ŒaX says that action b is possible from every node reachable by a sequence of actions from A. Since A is the set of all actions, it means that action b is possible from every node reachable from the initial node. We write Everywhere. / for the same formula with replacing hbitrue. So Everywhere. / says that in every node reachable from the initial node the formula is true. The relation between MSOL and the mu-calculus is expressed in terms of bisimulation invariance. Two states are bisimilar if the computations from them behave in the same way. More formally, a bisimulation is a symmetric relation on states of a transition system such that for every .s1 ; s2 / in the relation and letter b the following holds: if there is a transition on b from s1 to s10 , then there is a transition on b from s2 to a state in

33. Synthesis with finite automata

1233

the relation with s10 . A set of states of a transition system is bisimulation invariant if for every pair of states in some bisimulation relation, the two states are either both in the set or both outside the set. For every transition system, a mu-calculus sentence defines a set of states where the sentence holds. This set is always bisimulation invariant. In short, we can say that every sentence of the mu-calculus defines a bisimulation invariant property. An MSOL formula with one free variable also defines a set of states of a given transition system. This set may not be bisimulation invariant. Mu-calculus can be translated to MSOL in the sense that for every mu-calculus sentence one can construct an MSOL formula with one free variable such that in every transition system the two formulas define the same set of states. The translation is syntax directed, and follows from the fact that least and greatest fixpoints are definable in MSOL. The following characterisation shows under which condition the inverse translation is possible. Theorem 3.3 ([38]). The mu-calculus is expressively equivalent to the bisimulationinvariant fragment of MSOL: if an MSOL formula '.x/ defines a bisimulation-invariant property, then it is equivalent to a mu-calculus sentence. We will use this theorem to avoid writing complicated formulas. Formulations of a synthesis problem use specifications of the form “the label of every path is in a regular language K .” Since this condition is expressible in MSOL and bisimulation invariant, it is also expressible in the mu-calculus. Going in the opposite direction, we introduce a useful construct that does not preserve bisimulation invariance. We consider a loop testing predicate a . This predicate holds in a state if there is a transition on a that is a self-loop: s  a if .s; s/ 2 Ra . This construction allows to express observability conditions, and at the same time does not increase the complexity of the satisfiability problem for the mucalculus. Theorem 3.4 ([5]). The satisfiability problem for the modal mu-calculus with loop testing predicates is decidable in E x p t i m e. Generalised specifications. The first advantage of the mu-calculus with loop testing is that it is expressive enough to express control, blocking and observability constraints, as well as their many possible variations:  control: every accessible transition on every uncontrollable  V state has outgoing hai true ; action: Everywhere a2Aunc  blocking: every accessible state has at least one outgoing transition: _  Everywhere haitrue I a2A

 observability: for every accessible state all actions on unobservable events are V self-loops: Everywhere ˛2Auno a

Note that for the last of the above properties we really need a loop testing predicate; the property is not expressible in the standard mu-calculus, since it is not bisimulation invariant.

1234

Igor Walukiewicz

Let us see some more examples of new conditions we can express in the mucalculus. Suppose that A contains two actions c1 ; c2 and we want to say that at each moment at most one of the two is controllable: Everywhere.hc1 itrue _ hc2 itrue/. For another example take the alphabet A D ¹a; f º. Suppose the failure action f is uncontrollable, and we want to say that action a becomes uncontrollable after f occurs: X:.ŒaX ^ hf i Everywhere.haitrue//. Finally, our example from the beginning of the section “fromW every state a reset action is reachable” is expressible by Everywhere.X: .hritrue/ _ a2A haiX /. These examples justify the following definition.

Definition 3.2 (generalised centralised controller synthesis). Given an automaton Ap and formulas ˛ , ˇ of the mu-calculus with loop testing, decide if there is an automaton Ac such that Ac  ˇ and Ap  Ac  ˛ .

Observe that we can use ˇ to state controllability or observability constraints. The blocking constraint can be expressed using ˛ . On the other hand, in this approach we have no way to express maximality of a controller. In its original formulation, the maximality constraint was introduced to avoid trivial solutions. Here we can avoid trivial solutions using specifications. The price to pay for the richer specification language is that we cannot expect to always have a maximal controller. For example, if the specification ˛ says that every sequence of b actions should be finite, then we can have controllers permitting longer and longer sequences of b actions, but there is no maximal controller for this specification since there is no bound on the length of these sequences. Theorem 3.5 ([5]). The generalised controller synthesis problem is decidable.

It turns out that it is not a restriction to require that a controller is a finite automaton. It can be shown that whenever a potentially infinite controller exists then there exists a finite one too. The proof of the theorem uses an operation called division, see [3] and [5]. It can be shown that for a transition system P and a formula ˛ of the calculus with loop predicates there is a formula ˛=P of the same logic such that for every transition system C : P  C  ˛ () C  ˛=P:

With the help of this operation, we have that

Ac  ˇ ^ .˛=P / () Ac  ˇ and Ac  P  ˛:

This means that the synthesis problem is reduced to checking satisfiability of the formula ˇ ^ .˛=P / of the mu-calculus with loop predicates. By Theorem 3.4 the satisfiability problem is decidable, and, in case the answer is positive, a finite automaton can be effectively constructed. As a final remark about the generalised Ramadge and Wonham problem, we sketch the encoding of the Church problem. Recall that Church problem is given by an MSOL formula '.X; Y / specifying the relation between input and output sequences of bits. To

33. Synthesis with finite automata

1235

make the distinction between input and output explicit we take two alphabets Ain and Aout of input and output bits. For the plant we take a two-state automaton Ap accepting the language .Ain  Aout / ; that is, all the words with interleaved input and output bits. We declare all the letters of Ain uncontrollable. This way the controller cannot influence what letters appear in the input. The constraint ˇ that we put on controllers is the conjunction of the controllability condition together with the requirement that there should be no deadlock states in a controller. Finally, the specification ˛ for the controlled plant says that all the labels of all the infinite paths considered as infinite sequences of pairs of bits satisfy the Church specification '.X; Y /. Since this is a bisimulation-invariant property, it can be written as a mu-calculus formula. If Ac is a solution to a problem formulated this way, we can directly transform Ap  Ac to a device in the Church sense satisfying the specification. 3.3. Notes. The Ramadge and Wonham formulation of the synthesis problem has been intensively investigated, see [8], [40], [18], and [73]. The problem for generalised specifications has been extended to nondeterministic automata, see [6]. Some extensions not covered by this line of research concern the case when a plant is a pushdown automaton. In this case there are some non-regular specifications talking about stack properties for which the problem is decidable, see [16], [9], and [63]. There are also other formulations of the synthesis problem with devices given beforehand. For example, one can ask to construct a system from a given set of I/O devices, that is, finite-state transducers, subject to some precise restrictions for composing them, see [42] and [61]. Other types of general devices and composition methods are considered in the field of web-services orchestration, see [2] and [7].

4. Distributed synthesis: synchronous architectures Recall that the Church synthesis problem requires constructing a device that interacts with an environment by reading input signals and sending output signals. We have represented this schematically in Figure 1 as a box with an ingoing arrow for the input and an outgoing arrow for the output. The distributed synthesis problem formulated by Pnueli and Rosner [54] asks to construct several devices that communicate with the environment and between themselves. This is represented as a graph, with boxes being place-holders for the devices to construct, and edges being communication channels (see Figure 6). As in the Church problem, in every box we put an input/output automaton that reads a letter from every input channel going into the box, and outputs a letter to every output channel going out of the box. The behaviour of the whole system is totally synchronous: in one cycle every device reads its input letter and then produces its output letter. Observe that the output of one device can be the input of another device. In this case the letter output by the first device is read in the same cycle by the second device. This kind of semantics is, of course, problematic if there are loops in the architecture graph. For this reason we will restrict our discussion to architectures without loops. Adding loops complicates the

1236

Igor Walukiewicz

semantics, but does not add new insights to the problem, at least from the perspective of this chapter. After formulating the problem precisely, we will show that the problem is undecidable for essentially all architectures except pipelines. Our objective in this section is to present a selection of results highlighting phenomena that can appear in the distributed synthesis problem. Concerning undecidability, we will discuss a couple of representative architectures in detail. Concerning the pipeline architecture, we will not only present the decidability proof, but also a detailed sketch of the nonelementary lower bound for the synthesis problem. Even though this architecture is of limited use in itself, the tools used in the analysis may be of broader interest. c1

c2

P1

P2

c3

c4 P3 c5

Figure 6. A distributed architecture

Formally, an architecture over a channel alphabet A is a tuple hA; P; C; srcW C ! P.P [ ¹º/; tgtW C ! P.P [ ¹º/i

where P is the set of processes, nodes in the graph; C is the set of channels, edges in the graph; and src, tgt are the functions defining the incidence relation. Here  is a special label standing for the environment. So if src.c/ D  then c goes from the environment, or in other words, c is an input channel for the system. Similarly, if tgt.c/ D  then c is an output channel. As it will be clear from the semantics below, channels c with src.c/ D tgt.c/ D  do not make much sense, since they do not influence the behaviour of the architecture. We will write Inp D tgt 1 .p/ for the set of channels that arrive at the process p , similarly we set Outp D src 1 .p/. Going back to the architecture from Figure 6, we have P D ¹p1 ; p2 ; p3 º, and C D ¹c1 ; : : : ; c5 º. For the edges, we have, for example, src.c1 / D  , and tgt.c1 / D p1 . At every moment each channel contains one letter. So the content of the channels is described by a function W C ! A. We write Inp ./ for a restriction of  to Inp , and analogously for Outp ./. A device for a process p 2 P is a function fp W .AInp / ! AOutp . Given a device for each process a behaviour of a system is a sequence: 0 ; 1 ; : : : such that for every i D 0; 1; : : : and every p 2 P we have

Outp .i / D fp .Inp .0 /    Inp .i //:

33. Synthesis with finite automata

1237

This means that the output of p in cycle i depends on the contents of all its input channels during all the previous cycles, including cycle i . Observe that a system may have many behaviours, as the contents of the channels coming from the environment is not constrained. Let us go back to the example in Figure 6. We take f1 to be the identity function. We then take f2 ./ E D 1 if and only if E contains a 1; this means that f2 will emit constantly 1 after the first appearance of 1 on the input. For f3 we take the Boolean conjunction: a function such that f3 ./ E D 1 if and only if E 2 .¹0; 1º2/ ends in the letter .1; 1/. Representing a channel contents  as a vector of five bits, the following is a possible behaviour .1; 0; 1; 0; 0/;

.0; 1; 0; 1; 0/;

.1; 0; 1; 1; 1/;

:::

In the second cycle device f2 receives 1 on its input, so from that moment the value of its output channel, c4 , will be always 1. Hence, from that moment the architecture will copy the contents of the channel c1 to c5 . Distributed synthesis problem for a fixed architecture asks if for a given specification there exist devices such that when put into the boxes the behaviour of the resulting system satisfies the specification. Definition 4.1 (distributed synthesis problem). Given an architecture hA; P; C; src; tgti with p processes and k channels, and a specification in a form of a regular language K of infinite trees over Ak , decide if there exist devices f1 ; : : : ; fp such that the tree of all the behaviours of the resulting system is in K . In this formulation we have permitted branching regular specifications since this is the most general case we treat in this chapter. Undecidability results from the next subsection hold also when we take much weaker specifications as in the formulation of the original Ramadge and Wonham problem, namely when we require that all finite prefixes of all the behaviours are in a given language of finite words. 4.1. Undecidability: global and local specifications. It turns out that for most architectures the synthesis problem is undecidable. We will first discuss a general undecidability result that uses the power of specifications to talk about all the processes at the same time. This leads to a notion of local specification that is a conjunction of requirements on each process separately. We show that somehow surprisingly this does not help substantially. The class of decidable architectures for this kind of specifications does not increase substantially. Consider an architecture presented in Figure 7 consisting of two independent processes, each having its own input and its own output. In1 P1

Out1

In2 P2

Out2

Figure 7. A simple undecidable architecture

Igor Walukiewicz

1238

Theorem 4.1 ([54]). The synthesis problem for the architecture from Figure 7 is undecidable. Proof. We reduce the halting problem for deterministic Turing machines. Given a Turing machine we construct a specification that is realisable if and only if the run of the machine from the empty configuration is infinite. We fix a deterministic Turing machine M . We assume some encoding of configurations of M by infinite words; say that there is a blank symbol to make the encoding infinite. Let AM be the alphabet used to write configurations. For two infinite words v; w 2 .AM /! we write v `M w to say that w is a successor configuration of v in a computation of M ; in particular v must be a configuration too. The alphabet of the architecture will contain the alphabet of configurations, AM , together with two special letters # and $. The specification will only consider what happens when the input to the two processes is of the form #i $! , namely a sequence of # symbols followed by infinitely many $ symbols. The first requirement is that each process on the input #i $! produces an output #i v with v 2 .AM /! . We will also require that on the input $! , namely when there is no # symbol, the word v on the output is the encoding of the initial configuration of M . The remaining two requirements will talk about behaviour of the two processes at the same time. The first is pictorially expressed by if

In1 W Out1 W In2 W Out2 W

# # # #



# $ $ # a1 a2 # $ $ # b1 b2



then ai D bi for all i ,

(1)

where we have represented channel contents at consecutive cycles as vertical tuples. The specification says that if the first $ sign arrives at the same time in In1 and in In2 then the outputs of the two processes should be the same, namely, ai D bi for all i . The second requirement is schematically represented by if

In1 W Out1 W In2 W Out2 W

# # # #



# $ $ $ # a1 a2 a3 # # $ $ # # b1 b2



then .a1 a2    / `M .b1 b2    /.

(2)

It says that when in the second input the first $ sign arrives one cycle later than in the first input, the word .b1 b2    / should represent the successor configuration of the configuration represented by the word .a1 a2    /. We claim that this specification is realisable if and only if the run of M from the initial configuration is infinite. Suppose that we have devices f and g that realise the specification. The crucial observation is that the output in Out1 is independent from the input in In2 . This means that on the input #i $! device f outputs the same #i v independently of what is the input to the other device. Let vi denote the word that is output by f when reading #i $! . Similarly wi for g. For the proof in one direction suppose that the computation of M from the initial state is infinite. For f , we can take the device outputting the i -th configuration of M as vi . We take same device for g. Clearly this strategy realises the specification.

33. Synthesis with finite automata

1239

For the proof in the other direction, the first part of the specification tells us that v0 is the initial configuration of M . The requirement (1) tells us that vi D wi for all i . The requirement (2) enforces vi `M wi C1 for all i . This way we have v0 `M w1 D v1 `M w2 D v2 `M w3 D    :

So the sequence v1 ; v2 ; : : : is an infinite computation of M . Hence, a specification is realisable if an only if there is an infinite computation of M on the empty input. Looking at the proof one is tempted to “blame” the specification for undecidability. Indeed, the specification links behaviours of the two processes while they have no means to communicate between themselves. One can say that the specification is global, i.e., describes the behaviour of the system from the outside; while the visibility of each process is local, i.e., it sees only its input and output channels. This observation suggest to consider only local specifications [44]: specifications that are conjunctions of requirements on input and output channels of each process. Restricting to local specifications remedies our immediate trouble because it makes the synthesis problem decidable for the architecture in Figure 7. Indeed, in this case we just need to solve two independent instances of Church synthesis. Surprisingly, the restriction to local specifications does not enlarge the class of decidable architectures substantially.

P3

P1

P2

Figure 8. Undecidable architectures for local specifications. The argument for the architecture in the middle of the figure relies on a specification “fixing the input.”

Theorem 4.2 ([44]). The synthesis problem for local specifications is undecidable for the architectures presented in Figure 8. The consequence of this theorem is that the synthesis problem is undecidable for all architectures in which we can find one of the patterns from Figure 8. This theorem leaves us with not much more than the pipeline architecture that we will discuss in the next section. Let us explain the reasons for the undecidability results from Theorem 4.2. For the first architecture of Figure 8 it is not difficult to imagine that a local specification on the process at the bottom can simulate a global specification on the two processes

1240

Igor Walukiewicz

above it. This gives undecidability since these two processes form exactly undecidable architecture from Figure 7. The reasons for undecidability for the two other architectures are a bit different. We will briefly describe the argument for the second architecture, the one for the third being similar. Consider the following specification. For every i : on the input of the form #i $v , the output should be #i v . This means that after reading $ on the input the process should output the first letter of v . In the next cycle the same first letter should appear on the input and the process outputs the second letter of v , etc. This strange specification fixes one arbitrary v in a sense that after reading #i $ only the fixed v can be send to the input without violating the specification. As in the proof of Theorem 4.1, we want to write a specification which is realisable if and only if a given deterministic Turing machine has an infinite run on the empty word. For this we impose the strange specification from the previous paragraph on both P2 and P3 . We will be interested in inputs to P1 of two forms: #i .eq/$! or #i .succ/$! ; where .eq/ and .succ/ are new symbols. On an input #i .eq/$! process P1 should send #i $v both to P2 and P3 . The only restriction on v is that if i D 0 then v should be encoding of the initial configuration of M . On an input #i .succ/$! process P1 should send #i $v to P2 and #i C1 $w to P3 , with a condition that v and w are successive configurations of our fixed Turing machine M . This property can be checked using a finite automaton since v and w are produced letter by letter in parallel. Now, because of the specifications we have imposed on P2 and P3 , for every i there is only one vi such that #i $vi can be an input to P2 . Similarly wi for P3 . Because of the conditions on P1 we have that: v0 is the initial configuration, vi D wi as well as wi C1 is the successor configuration of vi ; for all i D 0; 1; : : : . Hence, the specification is realisable if and only if the computation of M from the initial configuration is infinite. 4.2. How to solve pipeline. A pipeline is a sequence of processes, each reading the output of the preceding one (cf. Figure 9). For unrestricted specifications, the undecidability results from the previous subsection leave pipelines as essentially the only candidates for architectures with decidable distributed synthesis problem. We give a decidability proof, and examine the computational complexity of the problem. Theorem 4.3 ([54]). For every pipeline architecture the synthesis problem is decidable. Before presenting the proof let us mention that for local specifications the problem is decidable also for a pipeline with additional input at process P1 , see [44] and [43]. Observe that Theorem 4.2 implies that the problem is undecidable if we add an input at some process Pi for n > i > 1. Proof. Consider a pipeline as in Figure 9. Suppose that for z D n; : : : ; 1, we have a device fz W A ! A for the process Pz . A behaviour of the pipeline with these devices is a sequence 0 ; 1 ; : : : of channel contents i W ¹cn ; : : : ; c0 º ! A. The constraints coming from the architecture tell us that the output on the channel cz 1 is the result of

33. Synthesis with finite automata c0

P1

Pn

cn 1

1

1241 Pn

cn

Figure 9. A pipeline architecture. The input on cn is processed in sequence by Pn ; : : : ; P1 and the results is output on c0 .

applying device fz to the input on the channel cz : i .cz

1/

D fz .0 .cz /    i .cz //;

for i D 0; 1 : : : and z D 1; : : : ; n:

In particular, once devices are fixed, the input on channel cn determines the behaviour w of the pipeline. For w 2 A! , let w 0 ; 1 ; : : : denote the behaviour of the pipeline on the input w , namely a unique sequence as above satisfying: w i .cn / D wi for i D 0; 1; : : : . The solution we will present uses a reformulation of the notion of the behaviour of the pipeline. We will need an operation of composition: for functions h1 W A1 ! A2 and h2 W A2 ! A3 , the composition comp .h1 ; h2 / is a function A1 ! .A2  A3 / such that for all v 2 A1 comp .h1 ; h2 /.v/ D .h1 .v/; h2 .hN 1 .v/// (3) N where h1 .v/ is the sequence of results of h1 on prefixes of v , namely the sequence h1 .v0 /h1 .v0 v1 /    h1 .v0    vi / where v D v0    vi . Using the operation of composition we can put together the devices, starting from the rightmost one g1 D f1

and

gi D comp .fi ; gi

1/

for i D 2; : : : ; n:

(4) 

It may be useful at this point to recall the types of objects in these formulas, fi W A ! A, and gi W A ! Ai . In particular gn W A ! An . We get that gn describes the semantics of the pipeline. Lemma 4.4. For every infinite sequence w 2 A! , and its prefix w1    wi we have w gn .w1    wi / D .w i .cn 1 /; : : : ; i .c0 //.

Recall that the semantics of a distributed system is a tree of all its behaviours. Using the above lemma we can consider that the semantics of the pipeline is rather given as a function hW A ! An . In particular, we can assume that the specification is given as a regular language Ln of functions hW A ! An . Indeed, it is straightforward to translate an MSOL specification on tree of behaviours into MSOL specification on such functions. Hence, the pipeline problem can be stated as: given by a regular language L of functions hW A ! An , decide if there exist devices fn ; : : : ; f1 such that gn as defined in (4) is in L. To solve this problem we consider an automaton construction that allows us to deal with comp operation used in (4) to define the semantics of the pipeline. Given L, a set of functions hW A1 ! .A2  A3 /, we define the set shape .L/ D ¹gW A2 ! A3 W 9f W A1 ! A2 : comp .f; g/ 2 Lº:

So shape .L/ is the set of all g for which it is possible to find f such that the result of the composition of the two is in L.

Igor Walukiewicz

1242

Theorem 4.5 ([41]). If L is a regular tree language of functions hW A1 ! .A2  A3 /, then shape .L/ is a regular tree language of functions h0 W A2 ! A3 . The parity automaton for shape .L/ can be effectively constructed from the parity automaton for L. The construction stated in the theorem is based on the equivalence of non-deterministic and alternating tree automata. This theorem gives us a tool to solve the pipeline problem. Taking our specification L we define a sequence of languages Ln D L;

Li D shape .Li C1 /

for i D n

1; : : : ; 1:

Let us verify that the pipeline problem has a solution if and only if L1 is not empty. If fn ; : : : ; f1 is a solution to the pipeline problem then we take the functions gi as defined in (4). Since the pipeline with devices fn ; : : : ; f1 satisfies the specification, we have that gn 2 Ln . By definition gn D comp .fn ; gn 1 /, hence gn 1 2 shape .Ln / D Ln 1 . By a straightforward induction gi 2 Li for all i D n; : : : ; 1. For the opposite direction, suppose that L1 6D ;. Take f1 W A ! A from L1 . Since L1 D shape .L2 /, by definition there exists f2 W A ! A such that comp .f2 ; f1 / 2 L2 . Let us use h2 to denote comp .f2 ; f1 /. By induction on i we show that there exists fi W A ! A such that hi D comp .fi ; hi C1 / 2 Li . Hence hn 2 Ln . Function hn describes the semantics of the pipeline as in (4). We get that fn ; : : : ; f1 is a solution to the pipeline problem. 4.3. A lower bound for pipeline architecture. Even though the synthesis problem for pipeline is decidable, it turns out to be algorithmically difficult. The complexity grows by one exponential with every new element of the pipeline. We will show that every algorithm for solving the synthesis problem for a pipeline of n elements needs order of Towern 2 .k/ time, where k is the size of the specification given as a finite automaton. We denote by Towern .k/ the tower of exponentials function, namely, Tower0 .k/ D k and Toweri C1 .k/ D 2Toweri .k/ . Theorem 4.6 ([54]). The complexity of the synthesis problem for pipeline architectures is nonelementary in the number of components.

This subsection is devoted to a rather detailed sketch of this result since the lower bounds stated in the literature refer to results on multi-player Turing machines [53]. The argument presented below gives an opportunity to show some peculiar specifications one can write in this framework. Among others, we will see once again strange specifications fixing the input we have used to show undecidability of architectures from Figure 8. The specifications we present below can be made local in a sense of [44]. So the lower bound applies also to local specifications. It is a challenging problem to find an interesting subclass of specifications for which the pipeline synthesis problem has lower complexity. The proof of the lower bound will use similar tools as the proof from [66] of the nonelementary complexity of the satisfiability problem for first-order logic over hN; 6i. We will simulate an alternation of quantifiers of the form 8x1 9x2 >x1 8x3 >x2    , where variables range over positions in an infinite word. Universal quantifiers will be

33. Synthesis with finite automata

1243

simulated by the input from the environment, existential quantifiers by guessing. The nesting of quantifiers will be simulated by visibility restrictions. Let us fix n. We will be interested in counters counting to Towern .n/. We suppose that we have alphabets †k D ¹ak ; b k ; `k ; ak º for k D 1; : : : ; n. Additionally we will have a blank symbol B . Definition 4.2. A 1-counter is a sequence of n-letters from †1 prefixed with `1 and finished with a1 . Such a sequence represents a number between 0 and 2n 1 by interpreting a1 as 0 and b 1 as 1, and assuming that the most significant bit is on the right. A k -counter, for k > 1, is a sequence of the form `k c0 0    ci i ak , where all cj are .k 1/-counters and all j 2 †k . Moreover we require that c0 represents 0; cj C1 represents the successor of a number represented by cj , for all j D 0; : : : ; i 1; and ci represents the maximal possible value (that is Towerk .n/ 1). The value of the counter is given by the sequence 0    i interpreted as a binary number with the most the significant bit to the right (as before we consider that ak stands for 0 and b k for 1). In what follows we will construct a specification forcing a controller for a pipeline architecture to output an .n C 1/-counter. As soon as we achieve this, it will be then easy to modify the construction so that this word is not an .n C 1/-counter but a computation of Towern .n/-space bounded Turing machine. The architecture will have n C 2 processes. We will start by describing what we mean by forcing a controller to output a word, then we will describe, a quite complicated, specification that forces this word to be an n-counter. Fixing a word. Suppose that we ask that the letter output by process P 1 should appear on its input in the next cycle. In other words, in the first cycle the input to P 1 should be the blank symbol B , and the output some letter a0 . Then in a cycle i the input should be ai 1 , the letter on the output from the cycle i 1, and the output some letter ai . This curious requirement implies that in some sense process P 1 controls its input. We have already used this kind of specifications to prove undecidability for architectures from Figure 8. We use this requirement in the following context. Suppose that for all processes P0 ; : : : ; Pn 1 we just demand that they copy their input to their output. We do not put any requirement on process Pn . The accumulated effect of these requirements is that the processes have to agree on the word that they will output: they should output this word independently on what is the input to Pn . This situation is schematically presented in Figure 10. For a fixed infinite word w , process P 1 outputs w , and the other processes output Bw . In particular process Pn disregards its input u. Marking question and answer positions. Being able to fix a word gives us a great deal of control. Our objective is to force this word to be an n-counter. In what follows we will forget about process P 1 whose unique role is to fix a word as described above.

Igor Walukiewicz

1244 w

P

Bw 1

P0

Bw

Pn

u

Figure 10. Fixing a word

Of course the construction cannot ignore the input completely. The actual mechanics will be slightly more complicated since we will allow processes to output letters decorated by pointers. We will have two sets of pointers: Questions D ¹"0 ; : : : ; "n º [ ¹"s0 ; : : : ; "sn º and Answers D ¹#0 ; : : : ; #n º:

We have two kinds of question pointers and one kind of answer pointers. One should think of pointers as accents: they come at the same time as a letter. So the alphabet of the pipeline is really  [  ¹Bº [ †i  .Questions [ Answers/: i D1:::;n

We will consider only inputs that are sequences of blanks possibly decorated with pointers. The only interesting input sequences will be those that have n C 1 question pointers, starting from index n and going down to 0, followed by .nC1/ answer pointers with the same order of indices. The pointers will be marking beginnings of counters in the sense that a k pointer will mark the beginning of a k counter. So, in the majority of cases, only symbols `k will have pointers attached.

We will now describe formally the requirements on appearances of pointers. These requirements intend to simulate the quantifier alternation. As we will see, for an n-counter we will need n quantifier alternations. The descriptions of encodings of 1-counters and 2-counters, presented later, give an example of how these requirements work. A schema of the desired behaviour of pointers is presented in Figure 11. We require that when the pointer "j , or "js (for j D 0; : : : ; n) appears at the input of a process Pi , for i D n; : : : ; n j C 1, then the process should immediately copy it to the output. The process Pn j is forbidden to copy the pointer so the other processes are not aware of its position. In particular "0 is not copied, so only Pn knows its placement. The only interesting case will be when all the pointers are placed in the n-counter pointed by "n . After this, process P0 can mark one of the following n-counter with #n . Hence #n should mark an occurrence of `n symbol. When P0 emits #n all other processes should be informed about it, but this information has to flow against the sense of the arrows. We will use once again the “fixing the input” trick to accomplish this. When P0 emits #n , we require in the next cycle #n appears also on the input of Pn and that it is copied immediately by all other processes. The specification is instantly satisfied if it is not the case that #n appears on the input of P0 one cycle after it appeared on the output of P0 . Observe that P0 emits #n before it learns that there is one on the input. With this mechanism we “discard” all the inputs but the one that points to the position following the one chosen by P0 . Next process P1

33. Synthesis with finite automata

1245

emits #n 1 at the beginning of some .n 1/-counter inside the n-counter pointed by #n . All process are informed about this by the same mechanism as before. This procedure continues till Pn emits #0 . At that moment some properties of the structure of pointers, as described in the following paragraphs, will be checked. A schema of this desired behaviour is presented in Figure 11. For clarity we do not show pointers #i used to inform other processes about the answer. From now on we assume the behaviour of the pipeline as described above.

#n

P0

"n

#n

P1 1

"n "n

1

#n

P2 2

"n "n

1

"n

2

#0

Pn

"n    "0

Figure 11. The behaviour of question and answer pointers

Specifications eq1 and struct1 . This first part of the specification will force process Pn 1 to place #1 pointer at the beginning of a 1-counter with the same value as the 1-counter pointed by "1 . For this specification we are interested only in processes Pn and Pn 1 . The following picture describes their behaviour according to the general mechanics described above. Bw

#1

Pn

Bw 1

"1

#0

Pn

"1

"0

Figure 12

We will say that "1 and "0 are well placed if "1 arrives at the beginning of a 1-counter and "0 follows at the distance at most n from "1 . By the distance we mean the number of letters in between the two pointers in our fixed word, or equivalently, the number of cycles between appearances of the two pointers. The specification eq1 requires that if "1 and "0 are well placed then "0 points to the same position in the counter pointed by "1 as the position pointed by #0 in the counter pointed by #1 . Moreover the bits pointed by "0 and #0 should be the same. We claim that if there is a strategy satisfying eq1 , and the general constraints on mechanics described above are satisfied, then after receiving "1 pointer marking a 1-counter, process Pn 1 should at some later moment emit #1 marking a 1-counter with the same value. Indeed, placing #1 by Pn 1 depends only on the position of "1 , as Pn 1 should emit #1 before seeing #0 . Now, Pn 1 should put the pointer in such a way that Pn will be able to put its pointer #0 , no matter when "0 pointer arrives on the input. As Pn 1 does not know the position of "0 , the only way it can make it possible for Pn to satisfy this condition is to put #1 at the sequence of n-letters that is the same as that after "1 .

Igor Walukiewicz

1246

The specification struct1 will force the output word w to be of the form c1 v1 c2 v2    where ci are 2-counters and vi are some words of size 6 n over the alphabet †2 [  [†n . In principle, these will be the parts used for bigger counters. The specification will be very similar to eq1 . One difference will be that "1 pointer is replaced by "s1 pointer in order to signal that a modified behaviour is required. Bw

Pn

#1

Bw 1

"s1

#0

Pn

"s1

"0

Figure 13

The other changes to eq1 are as follows. We demand that #1 points at the 1-counter immediately following the 1-counter pointed by "s1 . If the value of "s1 counter is not maximal (not a sequence of n letters b 1 ) then the bits pointed by "0 and #0 should satisfy the dependencies required by the successor relation. Moreover, there must be precisely one letter from †2 in between the two counters. If the value is maximal then we can have up to n letters before the next 2-counter, and the value of the counter pointed by #1 should be 0 (the sequence of n letters a1 ). To sum up, the specification struct1 enforces that each 1-counter in w is followed by a 1-counter representing the successor number, and after the maximal number is reached the counter restarts at 0. This forces the fixed word to be a sequence of 2-counters. Specifications eq2 and struct2 . Before giving an inductive construction we examine the constructions for 2-counters. We want to write a specification that permits the input to chose a 2-counter, after which the only way for controllers to win will be to choose another 2-counter with the same value. For this we will use pointers with indices 0; 1; 2. In the picture below we present the relevant part of the behaviour of the pipeline that satisfies our general requirements.

#2

Pn

Bw 2

"2

#1

Pn

Bw 1

"2 "1

#0

Pn

Bw

"2 "1"0

Figure 14

To give an intuition we repeat the description of general mechanics in this particular case. On the input we have three pointers arriving in the order "2 , "1 , "0 . When "2 arrives both process Pn and Pn 1 have to copy it to their respective output channels, while Pn 2 does not copy the pointer. When "1 arrives, only Pn copies the pointer. The pointer "0 is not copied. At some later moment, process Pn 2 should output a #2 pointer. If one cycle later #2 pointer does not appear on the input of Pn then the specification is instantaneously satisfied. So the only interesting case is when #2 appears at the right moment and it is copied by all the processes. In consequence, Pn and Pn 1 get to know that #2 has been emitted. At some later point Pn 1 outputs a pointer #1 , and Pn is informed about it by the same mechanism. Finally, Pn outputs #0 .

33. Synthesis with finite automata

1247

We will require that struct1 holds so we can think that w is a sequence of 2-counters with some other letters in between. Similarly to the previous case, we will say that "2 , "1 , are well placed if "2 is at the beginning of a 2-counter (at the symbol `2 followed by the 1-counter representing 0) and "1 is inside this 2-counter (i.e., before the next letter a2 ). We will also talk about well-placed #2 , #1 . The specification eq2 requires four things:    

specification struct1 should be satisfied; the policy of pointer placement as described above should be followed; if "2 , "1 are well placed then so should be #2 , #1 ; the specification eq1 should hold, and the letter from †2 just after 1-counter pointed by "1 should be the same as the letter after one counter pointed by #1 .

We claim that if there is a strategy satisfying eq2 then in a reply to a "2 pointer placed at the beginning of some 2-counter the strategy must put #2 at the beginning of some other 2-counter with the same value. First, as struct1 is satisfied, we know that w is a sequence of 2-counters separated by some control letters. The mechanics of putting pointers is set so that Pn 2 has to place #2 pointer only knowing the placement of "2 pointer. It should do this so that Pn 1 has then a chance to put #1 pointer without violating the specification. As #1 must be well placed, this means that the 1-counter pointed by #1 should be inside the 2-counter pointed by #2 . Because eq1 must hold, the values of the counters pointed by "1 and #1 should be the same. This means that the position of the #1 pointer is uniquely determined when the position of #2 is chosen. Because Pn 2 does not know the position of #1 , the only way to satisfy the specification is to choose a 2-counter that has the same value as the one pointed by "2 . Let us briefly describe a specification struct2 that will force the output to be of the form c1 v1 c2 v2    , where ci are 3-counters and vi are some words of size 6 n 1. It is very similar to eq2 but for the following modifications. Pointer "2 is replaced by "s2 to signal that a different type of check is need. We additionally require that #2 pointer should point to the 2-counter immediately after the one pointed by "s2 . We still require that eq1 condition holds, but now the letters from †2 just after 1-counters pointed by "1 and #1 should not be compared for equality but rather should follow the rules for successor. There is one exception, when the counter pointed by "s2 is maximal (consists only of b2 ’s) then the counter pointed by #2 should represent 0 (consist only of a2 ’s). These conditions mean that #2 should point to the counter just following the one pointed by "s2 . Pointers "1 and #1 should point at the same positions in respective counters, and the bits at these positions should respect the successor rules. So the value of a 2-counter pointed by "s2 is the successor of the value of the counter pointed by #2 . Specifications eqk and structk . We can now present a generalisation of eq2 to arbitrary k 6 n. We want to enforce that #k pointer is put at the beginning of a k -counter with the same value as the k -counter pointed by "k . The construction is by induction so we assume that we already have specifications eqk 1 and structk 1 . The behaviour implied by the general rules of pointer placement is depicted below.

Igor Walukiewicz

1248

#k

Pn

Bw k

"k

"k    "1

#k

Pn

.k 1/

1

#0

"k "k

Pn

1

#k

2

"k    "0

Figure 15

As in the previous cases we say that "k , "k 1 are well placed when (i) "k points at the beginning of some k -counter, and (ii) "k 1 points inside this k -counter. The condition (i) means that "k marks the symbol `k , and the value of the k 1 counter that follows is 0. This is easily verified as it amounts to checking that there are no b k 1 letters before a letter from †k . The condition (ii) amounts to checking that "k 1 is before the next †kC1 letter. We will also talk about well-placed #k and #k 1 . The specification eqk requires four things:  the specification structk 1 should be satisfied;  the policy of pointer placement should be followed;  if "k , "k 1 are well placed then #k , #k 1 should be too;  specification eqk 1 should be satisfied, and the letter from †k appearing after .k 1/-counter pointed by "k 1 should be the same as the letter appearing after .k 1/-counter pointed by #k 1 . Since structk 1 , holds we know that the fixed word is a sequence of k -counters. The placement of "k depends only on #k . The later should be placed in such a way that process Pn .k 1/ can place #k 1 pointer without violating the specification. The specification asks that positions of #k 1 and "k 1 in their respective counters should be the same. The specification also says that the bits of the k -counters at these positions should be the same. Since process Pn k does not know the position of "k 1 pointer, the only way for him to permit satisfaction of the specification is to put #k pointer at the counter whose value is equal to the one pointed by "k pointer. The specification structk should say that the fixed word is of the form c1 v1 c2 v2    , where ci are k C 1-counters and vi are some words of size 6 n 1. This specification is a modification of eqk specification in the same way as struct2 is that of eq2 . Pointer "sk is used in place of "k . It is required that #k points to the k -counter immediately following the one pointed by "sk . Finally, the dependencies of bits should follow the rules for successor. Summing up. Our encoding of long computations of a Turing machine uses the technique of “fixing an input word” that is done by process P 1 in Figure 10. Then with structn 1 we can force the fixed word to be of the form c1 v1 c2 v2    where c1 ; c2 ; : : : are n-counters. An n-counter is a sequence `n c00 0    ci0 i an where c00 ; : : : ; ci0 are all the n 1 counters listed in the increasing order. The sequence 0 ; : : : ; i can then encode a configuration of a Turing machine of size Towern .n/. We can subsequently modify structn so that it does not force the fixed word to be .nC1/-counter, but rather a sequence

33. Synthesis with finite automata

1249

d1 $d2 $d3    where d1 ; d2 ; : : : are successive configurations of size Towern .n/ of some given Turing machine. This way for a pipeline with n C 2 processes we can write a specification that is realisable if and only if the given Towern .n/-space bounded Turing machine accepts a given input.

4.4. Notes. The idea of synthesising a distributed system is of course very attractive. Since [25] it has reappeared in many contexts, see [48], [19], and [65]. One can encode distributed synthesis problem of Pnueli and Rosner into Ramadge and Wonham setting using visibility restrictions, see [60], [74], and [5]. Unfortunately, this does not give distinctively new decidable classes, see [58], [70], [5], [68], and [6]. The decidability proof for pipelines presented here is based on [41]. Generalisation to architectures with loops requires to consider a semantics with a delay: a process reads its input in one cycle and reacts with an output in the following cycle. This complicates notation considerably but does not add anything substantially new to the problem. In op. cit. it is shown that the synthesis problem is decidable for doubly flanked pipelines. An extension of the model with broadcast has been also studied in [32] and [62]. The notion of local specifications has been introduced and studied in [44] and [43]. Some more decidability results have been obtained by further restricting specifications to talk only about external inputs and outputs, see [34] and [67]. One promising attempt to get a decidable framework of distributed synthesis is to change the way information is distributed in the system. In the setting presented in this chapter, every controller sees only its inputs and its outputs. In order to deduce some information about the global state of the system a controller can use only his knowledge about the architecture and the initial state of the system. In particular, controllers are not permitted to pass additional information during communication. It is clear though that when we allow some transfer of information during communication, we give more power to controllers. Pushing the idea of sharing information to the limit, we obtain a model where two processes involved in a communication share all the information they have about the global state of the system [33]. This point of view is not as unrealistic as it may seem at the first glance. It is rooted in the theory of traces that studies finite communicating automata with this kind of information transfer. A fundamental result of Zielonka [75] and [29] implies that in fact there is a bound on the size of additional information that needs to be transferred during communication. In our terms, the theory of traces considers the case of distributed synthesis for closed systems, i.e., systems without environment. For the distributed synthesis with environment, decidability results for some special cases are known, see [33], [45], [52], [20], [36], [51], and [31]. Moreover, similarly to Zielonka’s Theorem, these results give a bound on additional information that needs to be transferred. The decidability of the general case is open. Interestingly, the general case can be formulated as an extension of the Ramadge and Wonham setting from words, that is linear orders, to special partial orders called Mazurkiewicz traces. We describe this approach in the next section.

1250

Igor Walukiewicz

5. Distributed synthesis: Zielonka automata The synthesis problem for synchronous architectures from the previous section is not constrained enough. We have seen that suitably using the interplay between specifications and an architecture, one can get undecidability results for most architectures. Yet the kinds of specifications that lead to these results are rather artificial, like: using a constraint linking two disconnected parts of the system; or using an output channel to single out one input of an unbounded length. These observations motivate a search for other formulations of the distributed synthesis problem that would eliminate some of these undesirable phenomena, and would be decidable for some larger classes of systems. A Zielonka automaton is a very simple parallel device. It is a parallel composition of several finite automata synchronising on common actions. Every component has its own alphabet of actions, but these alphabets may have letters in common. A Zielonka automaton accepts a regular language respecting the parallelism implied by the distribution of actions over component automata: if a is followed by b and the two letters do not share a component, i.e. do not appear together in an alphabet of some component, then it can be as well that b is followed by a. Languages of this kind are called trace languages. The theory of trace languages offers many results and tools. In particular, many fundamental results of the theory of regular languages have their equivalent trace versions [29]. In this section we present an adaptation of the Ramadge and Wonham formulation of the control problem to Zielonka automata. We obtain this way a setting for distributed synthesis since the devices we construct are distributed by design. In this formulation specifications cannot constraint the flow of information between the components. In consequence, we avoid many pathological behaviours of the Pnueli-Rosner formulation from the previous section. Still, as we will see, the setting is far from being trivial. There are more architectures for which the problem is known to be decidable. It is even possible that the synthesis problem for Zielonka automata is decidable for all architectures. 5.1. Zielonka automata and Zielonka’s theorem. Take P a finite set of processes, these are names for the components of a Zielonka automaton. An alphabet A is distributed over these components. It means that there is a function domW A ! .2P n¹;º/ assigning to each letter a set of processes the letter uses. A Zielonka automaton for this distribution is a tuple: A D h.Sp /p2P ; .ıa /a2A ; s 0 ; F i where Sp is a finite set of states for each process p 2 P; we will denote by S the product Q p2P Sp . We can think of a Zielonka automaton as a constrained product of automata: each over its set of states Sp . The set S is the set of all possible states of this product; they are called global states. State s 0 2 S is the initial (global) state; and F  S is the set of finial states. The crucial part is the definition of the transition relation. We Q have that ıa  . p2dom.A/ Sp /2 namely that it acts only on the components assigned to the letter a. The transition relation ı  S  A  S on global states is then given by

33. Synthesis with finite automata

1251

.s; a; s 0 / 2 ı if ..sp /p2dom.a/ ; a; .sp0 /p2dom.a/ / 2 ıa and sp0 D sp for p 62 dom.a/. The automaton is deterministic if ıa is a function for all a 2 A. The language of the automaton A is the language of the finite automaton hS; ı; s 0; F i where the components are defined as above. This language has a particular property: if two letters a; b 2 A have disjoint process domains, and a word vabw is in the language then vbaw is too. We can thus define an independence relation on letters .a; b/ 2 I if dom.a/ \ dom.b/ D ;, and say that L  A is I -closed if whenever vabw 2 L then vbaw 2 L for all words v; w and all .a; b/ 2 I . So languages accepted by Zielonka automata over a fixed distributed alphabet are I -closed. Zielonka’s theorem says that the converse is true. Namely, if a language is I -closed for some I induced by a distribution domW A ! .2P n ¹;º/ then there is a Zielonka automaton accepting this language.

Theorem 5.1 ([75]). Let domW A ! .2P n ¹;º/ be a distribution of letters, let I be the induced independence relation. If a language L  A is regular and I -closed then there is a deterministic Zielonka automaton accepting L. This theorem gives us a tool to implement a regular language on a simple distributed device. Even though the theorem has been proved more than 20 years ago, there is still continuing effort to simplify the proof and improve the complexity of the translation. To give an idea of the complications involved let us look at an example from [27]. Example. Let P D ¹1; : : : ; nº be the set of processes. The letters of the alphabet are pairs of processes, two letters are dependent if they have a process in common. Formally, the distributed alphabet A consists of two element subsets of P, and the distribution is dom.¹p; qº/ D ¹p; qº. The language Pathn is the set of words a1    ak such that every two consecutive letters have a process in common: ai \ ai C1 6D ; for i D 1; : : : ; k 1. Observe that a deterministic sequential automaton recognising this language simply needs to remember the last letter it has read. So it has less than jPj2 states. Zielonka’s theorem guarantees that there is a deterministic Zielonka automaton for the language. Even for n D 4 it is not clear how to construct such an automaton with less than a hundred of states. In general we know how to construct a Zielonka automaton that is polynomial in the size of a given sequential automaton and simply exponential in the number of processes [35]. There are no non-trivial lower bounds known for the size of deterministic Zielonka automata for the languages Pathn . A lower bound is known under additional assumption of automata being locally-rejecting, meaning that a word is rejected if and only if the run of the automaton passes through some local state designated as rejecting. Locally-rejecting deterministic automaton for Pathn must have at least 2n=4 states [35]. One could try to use Zielonka’s Theorem directly to solve a distributed synthesis problem. For example, one can start with the Church synthesis problem, solve it, and if the solution happens to respect the required independence, then one could distribute it. Unfortunately, there is no reason for the solution to respect the independence.

1252

Igor Walukiewicz

Even worse, the following, relatively simple, result says that it is usually algorithmically impossible to approximate a regular language by a language respecting a given independence relation. Theorem 5.2 ([65]). It is not decidable if given an independence relation I and a regular language L  A there is an I -closed language included in L such that every letter from A appears in some word of that language. The condition on appearance of letters is not crucial here. Observe that we need some condition in order to make the problem non-trivial, since by definition the empty language is I -closed. The above theorem suggest that we will not be able to obtain decidability just by restricting the class of possible implementations to Zielonka automata. We need also to restrict specifications. It is quite natural to ask that specifications should be I -closed as well. This is the direction we will present below. Instead of extending Church formulation we will immediately look at a, more general, Ramadge and Wonham formulation. 5.2. Control of Zielonka automata. The starting point of the Ramadge and Wonham formulation was a notion of a plant that is just a finite automaton with all its states accepting. In the distributed case we first fix a set of processes P and a distribution of actions over processes domW A ! .2P n ;/. With these fixed, a plant is just any Zielonka automaton with all states accepting. Similarly, a controller is also such an automaton. A controlled plant is modeled by a product of the two Zielonka automata. The control condition for a set of uncontrollable actions Aunc  A is formulated as in the case of finite automata: in every global state of the controller every uncontrollable action should be possible. Once again it would be more convenient to formulate the synthesis problem in terms of languages instead of automata. For this we need to understand what kind of languages are recognised by deterministic Zielonka automata with all the states accepting. We call such languages implementable. The characterisation of implementable languages is based on Zielonka’s theorem. One needs to observe that languages of this kind satisfy an additional property called forward diamond: if wa; wb 2 L and aI b then wab 2 L. Proposition 5.3 ([64] and [28]). A regular language over a distributed alphabet is implementable if and only if it is prefix-closed, I -closed, and satisfies the forward diamond condition. Observe that I in the above proposition is determined by the distributed alphabet. Definition 5.1 (decentralised control problem). Fix a distribution of actions over processes domW A ! .2P n ;/ and a set of uncontrollable actions Aunc  A. Given languages P , K implementable with respect to distribution dom, find the biggest with respect to set inclusion implementable language C such that P \ C  K and two conditions are satisfied: implementable: C is implementable; control: if w 2 C and a 2 Aunc then wa 2 C .

33. Synthesis with finite automata

1253

Observe that the definition is very similar to the centralised case; the difference being that the prefix closure requirement is replaced by implementability requirement. This is not surprising as prefix closure is indeed a characterisation of languages of finite automata with all states accepting. The distributed aspect in this definition is hidden in the notion of implementability since it is equivalent to being the language of a Zielonka automaton. The intersection P \ C in the definition translates to the product of two Zielonka automata, the one of a plant and the one of a controller. Recall that a Zielonka automaton is a distributed device, so the plant is a set of processes communicating via rendez-vous. Controller can be then seen as a set of local controllers: one for each process. In the product of the plant and the controller at the rendez-vous these controllers can exchange their information. So controller cannot add communication to the plant, but it can use existing communication to transfer information between its parts. The important point is that the specification cannot limit the information they can exchange. In consequence, controllers have more power in this formulation than in Pnueli and Rosner setting. We will show that with a simple restriction on uncontrollable actions decentralised control problem can be reduced to centralised case. We say that an action b is local if dom.b/ is a singleton. Theorem 5.4. Let domW A ! .2P n ;/ be distributed alphabet, and suppose that every uncontrollable action is local. Every decentralised control problem over such an alphabet has a unique maximal solution. Moreover this solution is an implementable language and can be computed algorithmically. Proof. For the proof of the theorem we will show that a controller C constructed in the centralised case is implementable. We know that C is prefix-closed. From the characterisation in Lemma 3.2 on p. 1229, and using implementability of P and K , it is not difficult to show that C is I -closed. It remains to check the forward diamond property. Suppose that wa; wb 2 C with aI b . We need to show that wab 2 C . For this we will use the characterisation from Lemma 3.2. First observe that if wa 2 P then wb 2 P by forward diamond property of P . Hence we consider two cases. The first case is when wa; wb 62 P . Hence automaton Ac reading wa reaches the state >c . This means that waA  C . In particular wab 2 C . The second case is when wa; wb 2 P . By Lemma 3.2 we get that wa 2 P \ K and wa.Aunc / \ P  wa.Aunc / \ K ; and similarly for wb . To show wab 2 C we need to show two things:  wab 2 P \ K . This follows from forward diamond properties of P and K .  wab.Aunc / \ P  wab.Aunc / \ K . Take a word u 2 .Aunc / and suppose that wabu 2 P . We show that wabu 2 K . Since all uncontrollable letters are local, we can split u into three parts: ua the subsequence of letters located on processes from doma ; ub similarly but for domb ; ur the subsequence of the remaining letters. Observe that the three words are independent and moreover ur is independent from a and b . By I -closure property of P we have that wur aua bub 2 P and wur bub aua 2 P . We get wur aua 2 K since waua ur

1254

Igor Walukiewicz

is in P and hence also in K . For the same reason wur bub 2 K . By induction on the length of ua and ub , from I -diamond property we get wur abua ub 2 K . Then by I -closure we get the desired wabur ua ub 2 K . This positive result is not that satisfactory. The controllers it gives will often have deadlocks. For a simple example consider an alphabet A D ¹a; b; cº with aI b and c dependent on both a and b . One can imagine that a is on executed on one process, b on the other, and c on both. Suppose that all actions are controllable. Let P D A and K D .a C b/c  C .ab/. The two languages are implementable. Of course the maximal controller is just K . But K has deadlocks since when a and b happen concurrently no c action is possible. The maximal controller without deadlocks given by the procedure in the sequential setting is C D .a C b/c  . But this controller is not implementable since it does not satisfy forward diamond property. An implementable controller needs to decide to permit only a or only b actions. So for example, Ca D ac  or Cb D bc  are reasonable controllers for the problem, but there is no implementable controller containing the two at the same time. 5.3. Notes. It is not known at present if the decentralised control synthesis problem is decidable. One direction to approach this problem could be to study the decidability of the MSOL theory of event structures generated by plants. Using the same ideas as in the sequential case, it is possible to encode the decentralised control problem into satisfiability problem of an MSOL formula over the event structure determined by the plant. Unfortunately, there are very simple plants with undecidable MSOL theory of the generated event structure but with decidable decentralised control problem. There exist though an interesting case when the MSOL theory is decidable [45]. The idea is that for every process of the automaton there should be a fixed bound on the number of actions other processes can do in parallel with this process. Under this restriction there is an MSOL definable encoding of the event structure of the language of a given automaton into the full binary tree. On the other hand, it is easy to see that MSOL theory of the event structure is undecidable if for every n it contains traces of the form xun v n y with u independent form v , and y dependent on both u and v . The conjecture due to Thiagarajan is that this is the only forbidden pattern, namely the even structure of a trace language without such a pattern has a decidable MSOL theory. Other decidable cases of the decentralised control problem refer to the notion of a communication graph. This is a graph where nodes are processes and edges are possible communication channels between them, or more precisely, there is an edge between two processes if there is an action involving both of them. So the communication graph is determined by the distribution of actions over processes. Decentralised control problem is decidable if the communication graph is a co-graph [33]. It is also decidable when all actions involve at most two processes and the communication graph is a tree [51], or in other words when every process can communicate only with its parent and with its children. The case when the communication graph is a cycle of more than 4 processes is open.

33. Synthesis with finite automata

1255

References [1] P. A. Abdulla, A. Bouajjani, and J. d’Orso, Deciding monotonic games. In Computer science logic (M. Baaz and J. A. Makowsky, eds.). Proceedings of the 17th International Workshop (CSL 2003), the 12th Annual Conference of the EACSL, and the 8th Kurt Gödel Colloquium (KGC 2003) held at the Vienna University of Technology, Vienna, August 25–30, 2003. Lecture Notes in Computer Science, 2803. Springer, Berlin etc., 2003, 1–14. MR 2043544 Zbl 1116.68491 q.v. 1227 [2] G. Alonso, F. Casati, H. A. Kuno, and V. Machiraju, Web services. Concepts, architectures and applications. Data-Centric Systems and Applications. Springer, 2004. Zbl 1029.68007 q.v. 1235 [3] H. R. Andersen, Partial model checking. In Proceedings of 10 th Annual IEEE Symposium on Logic in Computer Science. Held in San Diego, CA, June 26–29, 1995. IEEE Press, Los Alamitos, CA, 398–407. Zbl 523274 IEEEXplore 523274 q.v. 1234 [4] A. Arnold and D. Niwiński, Rudiments of -calculus. Studies in Logic and the Foundations of Mathematics, 146. North-Holland Publishing Co., Amsterdam, 2001. MR 1854973 Zbl 0968.03002 q.v. 1231 [5] A. Arnold, A. Vincent, and I. Walukiewicz, Games for synthesis of controllers with partial observation. Theoret. Comput. Sci. 303 (2003), no. 1, 7–34. Logic and complexity in computer science (Créteil, 2001). MR 1990739 Zbl 1175.93148 q.v. 1227, 1233, 1234, 1249 [6] A. Arnold and I. Walukiewicz, Nondeterministic controllers of nondeterministic processes. In Logic and automata (J. Flum, E. Grädel, and T. Wilke, eds.). History and perspectives. Texts in Logic and Games, 2. Amsterdam University Press, Amsterdam, 2008, 29–52. MR 2508739 Zbl 1215.93082 q.v. 1235, 1249 [7] D. Berardi, D. Calvanese, G. D. Giacomo, R. Hull, and M. Mecella, Automatic composition of web services with messaging. In VLDB ’05: Proceedings of the 31st International Conference on Very Large Data Bases (K. Böhm, C. S. Jensen, L. M. Haas, M. L. Kersten, P. Larson, and B. C. Ooi, eds.). Held in Trondheim, Norway, August 30–September 2, 2005. Association for Computer Machinery, New York, 2005, 613–624. q.v. 1235 [8] A. Bergeron, A unified approach to control problems in discrete event processes. RAIRO Inform. Théor. Appl. 27 (1993), no. 6, 555–573. MR 1258752 Zbl 0807.93002 q.v. 1235 [9] A. Bouquet, O. Serre, and I. Walukiewicz, Pushdown games with the unboundedness and regular conditions. In Pushdown games with unboundedness and regular conditions (P. K. Pandya and J. Radhakrishnan, eds.). Proceedings of the 23rd Conference held in Mumbai, December 15–17, 2003. Lecture Notes in Computer Science, 2914. Springer, Berlin, 2003, 88–99. MR 2093640 Zbl 1205.68194 q.v. 1235 [10] P. Bouyer, T. Brihaye, and F. Chevalier, O-minimal hybrid reachability games. Log. Methods Comput. Sci. 6 (2010), no. 1, 1:1, 48 pp. MR 2581397 Zbl 189.68070 q.v. 1227 [11] J. Bradfield and C. Stirling, Modal mu-calculi. In The handbook of modal logic (P. Blackburn, J. van Benthem, and F. Wolter, eds.). Elsevier, 2006, 721–756. q.v. 1231 [12] J. Bradfield and I. Walukiewicz, The mu-calculus and model-checking. In Handbook of model checking (E. M. Clarke, T. A. Henzinger, H. Veith and R. Bloem, eds.). Handbook of model checking Springer, Cham, 2018, 871–919. MR 3837861 Zbl 1392.68236 q.v. 1231 [13] C. Broadbent, A. Carayol, L. Ong, and O. Serre, Recursion schemes and logical reflection. In 25 th Annual IEEE Symposium on Logic in Computer Science. LICS 2010. Proceedings of the International Symposium held in Edinburgh, July 11–14, 2010. IEEE Press, Los Alamitos, CA, 2010, 120–129. MR 2953901 IEEEXplore 5570928 q.v. 1227

1256

Igor Walukiewicz

[14] J. R. Büchi, On a decision method in restricted second order arithmetic. In Logic, Methodology and Philosophy of Science (E. Nagel, P. Suppes, and A. Tarski, eds.). Proceedings of the 1960 International Congress. Stanford University Press, Stanford, CA, 1962, 1–11. MR 0183636 Zbl 0147.25103 q.v. 1218, 1222, 1226 [15] J. R. Büchi and L. Landweber, Solving sequential conditions by finite state strategies. Trans. Amer. Math. Soc. 138 (1969), 295–311. MR 0280205 Zbl 0182.02302 q.v. 1218 [16] T. Cachat, J. Duparc, and W. Thomas, Solving pushdown games with a †3 winning condition. In Computer science logic (J. C. Bradfield, ed.). Proceeding of the 16th International Workshop (CSL 2002) held at the 11th Annual Conference of the European Association for Computer Science (EACSL) at the University of Edinburgh, Edinburgh, September 22–25, 2002. Lecture Notes in Computer Science, 2471. Springer, Berlin, 2002, 322–336. MR 2048058 Zbl 1020.68049 q.v. 1235 [17] A. Carayol, M. Hague, A. Meyer, C.-H. L. Ong, and O. Serre, Winning regions of higherorder pushdown games. In 23 rd Annual IEEE Symposium on Logic in Computer Science. Held in Pittsburgh, PA, June 24–27, 2008 IEEE Press, Los Alamitos, CA, 2008, 193–204. IEEEXplore 4557911 q.v. 1227 [18] C. G. Cassandras and S. Lafortune, Introduction to discrete event systems. The Kluwer International Series on Discrete Event Dynamic Systems, 11. Kluwer Academic Publishers, Boston, MA, 1999. MR 1728175 Zbl 0934.93001 q.v. 1218, 1235 [19] I. Castellani, M. Mukund, and P. S. Thiagarajan, Synthesizing distributed transition systems from global specifications. In Foundations of software technology and theoretical computer science (C. P. Rangan, V. Raman, and R. Ramanujam, eds.). Proceedings of the 19 th Conference (FST&TCS) held in Chennai, December 13–15, 1999. Lecture Notes in Computer Science, 1738. Springer, Berlin, 1999, 219–231. MR 1776798 Zbl 0956.68008 q.v. 1249 [20] T. Chatain, P. Gastin, and N. Sznajder, Natural specifications yield decidability for distributed synthesis of asynchronous systems. In SOFSEM 2009: theory and practice of computer science (M. Nielsen, A. Kučera, P. B. Miltersen, C. Palamidessi, P. Tůma, and F. Valencia, eds.) Proceedings of the 35rd conference on current trends in theory and practice of computer science, held in Špindlerův Mlýn, Czech Republic, January 24–30, 2009. Lecture Notes in Computer Science, 5404. Springer, Berlin, 2009, 141–152. MR 2540011 Zbl 1206.68210 q.v. 1249 [21] K. Chatterjee, L. de Alfaro, and T. A. Henzinger, Qualitative concurrent parity games. ACM Trans. Comput. Log. 12 (2011), no. 4, art. no. 28, 51 pp. MR 2820103 Zbl 1351.68179 q.v. 1227 [22] K. Chatterjee and T. A. Henzinger, A survey of stochastic ! -regular games. J. Comput. System Sci. 78 (2012), no. 2, 394–413. MR 2881338 Zbl 1237.91036 q.v. 1227 [23] K. Chatterjee, T. A. Henzinger, and V. Prabhu, Timed games: complexity and robustness. Log. Methods Comput. Sci. 7 (2011), no. 4, 4:08, 55 pp. MR 2869430 Zbl 1237.68112 q.v. 1227 [24] A. Church, Applications of recursive arithmetic to the problem of circuit synthesis. In Summaries of the summer institute of symbolic logic. Vol. I, Cornell University, Ithaca, N.Y., 1957, 3–50. q.v. 1217 [25] E. M. Clarke and E. A. Emerson, Design and synthesis of synchronization skeletons using branching time temporal logic. In Logics of programs (D. Kozen, ed.). Proceedings of the 3rd Workshop on Logics of Programs (LOP) held in Yorktown Heights, N.Y., May 4–6,

33. Synthesis with finite automata

[26] [27] [28]

[29] [30]

[31]

[32] [33]

[34] [35]

[36]

[37]

1257

1981. Lecture Notes in Computer Science, 131. Springer, Berlin, 1982, 52–71. MR 0663750 Zbl 0546.68014 q.v. 1226, 1249 A. Condon, The complexity of stochastic games. Inform. and Comput. 211 (2012), 29–48. MR 2878805 Zbl 1238.91022 q.v. 1227 R. Cori, Y. Métivier, and W. Zielonka, Asynchronous mappings and asynchronous cellular automata. Inform. and Comput. 106 (1993), no. 2, 159–202. MR 1241309 Zbl 0785.68068 q.v. 1251 V. Diekert and A. Muscholl, On distributed monitoring of asynchronous systems. In Logic, language, information and computation (C. L. Ong and R. J. G. B. de Queiroz, eds.). Proceedings of the 19 th International Workshop (WoLLIC 2012) held at the University of Buenos Aires, Buenos Aires, September 3–6, 2012. Lecture Notes in Computer Science, 7456. Springer, Berlin, 2012, 70–84. MR 3023180 Zbl 1361.68144 q.v. 1252 V. Diekert and G. Rozenberg (eds.), The book of traces. World Scientific Publishing Co., River Edge, N.J., 1995. MR 1478992 q.v. 1249, 1250 E. A. Emerson and C. S. Jutla, Tree automata, mu-calculus and determinacy. In Proceedings of the 32 nd Annual Symposium of Foundations of Computer Science. Held in San Juan, Puerto Rico, October 1–4, 1991. IEEE Press, Los Alamitos, CA, 368–377. IEEEXplore 185392 q.v. 1226 B. Finkbeiner and E. Olderog, Petri games: Synthesis of distributed systems with causal memory. Inform. and Comput. 253 (2017), part 2, 181–203. MR 3621745 Zbl 1362.68211 q.v. 1249 B. Finkbeiner and S. Schewe, Uniform distributed synthesis. In 20 th Annual IEEE Symposium on Logic in Computer Science. LICS ’05. Held in Chicago, IL, June 26–29, 2005. IEEE Press, Los Alamitos, CA, 321–330. IEEEXplore 1509236 q.v. 1249 P. Gastin, B. Lerman, and M. Zeitoun, Distributed games with causal memory are decidable for series-parallel systems. In FST & TCS 2004: Foundations of software technology and theoretical computer science (K. Lodaya and M. Mahajan, eds.). Proceedings of the 24th International Conference held in Chennai, December 16–18, 2004. Lecture Notes in Computer Science, 3328. Springer, Berlin, 2004, 275–286. MR 2140402 Zbl 1117.68448 q.v. 1249, 1254 P. Gastin, N. Sznajder, and M. Zeitoun, Distributed synthesis for well-connected architectures. Form. Methods Syst. Des. 34, no. 3 (2009), 215–237. Zbl 1180.68056 q.v. 1249 B. Genest, H. Gimbert, A. Muscholl, and I. Walukiewicz, Optimal Zielonka-type construction of deterministic asynchronous automata. In Automata, languages and programming (S. Abramsky, C. Gavoille, C. Kirchner, F. M. auf der Heide, and P. G. Spirakis, eds.) Part II. Proceedings of the 37th International Colloquium (ICALP 2010) held in Bordeaux, July 6–10, 2010. Lecture Notes in Computer Science, 6199. Springer, Berlin, 2010, 52–63. MR 2734635 Zbl 1288.68154 q.v. 1251 B. Genest, H. Gimbert, A. Muscholl, and I. Walukiewicz, Asynchronous games over tree architectures. In Automata, languages, and programming (F. V. Fomin, R. Freivalds, M. Z. Kwiatkowska, and D. Peleg, eds.). Part II. Proceedings of the 40 th International Colloquium (ICALP 2013) held at the University of Latvia, Riga, July 8–12, 2013. Lecture Notes in Computer Science, 7966. Springer, Berlin, 2013, 275–286. MR 3109153 Zbl 1334.68150 q.v. 1249 P. Hänsch, M. Slaats, and W. Thomas, Parametrized regular infinite games and higherorder pushdown strategies. In Fundamentals of computation theory (M. Kutyłowski, W. Charatonik, and M. Gębala, eds.). Proceedings of the 17 th International Symposium

1258

Igor Walukiewicz (FCT 2009) held at the Wrocław University of Technology, Wrocław, September 2–4, 2009. Lecture Notes in Computer Science, 5699. Springer, Berlin, 2009. 181–192. MR 2824644 Zbl 1252.68173 q.v. 1226

[38] D. Janin and I. Walukiewicz, On the expressive completeness of the propositional mucalculus with respect to monadic second order logic. In CONCUR ’96: concurrency theory (U. Montanari and V. Sassone, eds.). Lecture Notes in Computer Science, 1119. Springer, Berlin, 1996, 263–277. MR 1480434 q.v. 1233 [39] S. C. Kleene, Representation of events in nerve nets and finite automata. In Automata studies (C. E. Shannon and J. McCarthy, eds.). Annals of Mathematics Studies, 34. Princeton University Press, Princeton, N.J., 1956, 3–42. MR 0077478 q.v. 1226 [40] R. Kumar and V. K. Garg, Modeling and control of logical discrete event systems. The Kluwer International Series in Engineering and Computer Science. 300. Kluwer Academic Publishers, Dordrecht, 1995. Zbl 0875.68980 q.v. 1218, 1235 [41] O. Kupferman and M. Y. Vardi, Synthesizing distributed systems. In Proceedings of the 16 th Annual IEEE Symposium on Logic in Computer Science. Held in Boston, MA, June 16–19, 2001. IEEE Press, Los Alamitos, CA, 389–398. IEEEXplore 932514 q.v. 1242, 1249 [42] Y. Lustig and M. Y. Vardi, Synthesis from component libraries. In Foundations of software science and computational structures (L. de Alfaro, ed.). Proceedings of the 12th International Conference (FOSSACS 2009) held in York, U.K., March 22–29, 2009. Lecture Notes in Computer Science, 5504. Springer, Berlin, 2009, 395–409. MR 2545234 Zbl 1234.68260 q.v. 1235 [43] P. Madhusudan, Control and synthesis of open reactive systems. PhD thesis, University of Madras, Chennai, 2001. q.v. 1240, 1249 [44] P. Madhusudan and P. Thiagarajan, Distributed controller synthesis for local specifications. In Automata, languages and programming (F. Orejas, P. G. Spirakis, and J. van Leeuwen, eds.). Proceedings of the 28th International Colloquium (ICALP 2001) held in Crete, July 8–12, 2001. Lecture Notes in Computer Science, 2076. Springer, Berlin, 2001, 396–407. MR 2065879 Zbl 0986.68079 q.v. 1239, 1240, 1242, 1249 [45] P. Madhusudan, P. S. Thiagarajan, and S. Yang, The MSO theory of connectedly communicating processes. In FST & TCS 2005: Foundations of software technology and theoretical computer science (R. Ramanujam and S. Sen, eds.). Proceedings of the 25th International Conference held in Hyderabad, December 15–18, 2005. Lecture Notes in Computer Science, 3821. Springer, Berlin, 2005, 201–212. MR 2212993 Zbl 1172.68556 q.v. 1249, 1254 [46] Z. Manna and P. Wolper, Synthesis of communicating processes from temporal logic specifications. ACM Trans. Prog. Lang. Syst. 6 (1984), no 1, 68–93. Zbl 0522.68030 q.v. 1226 [47] D. Martin, Borel determinacy. Ann. of Math. (2) 102 (1975), no. 2, 363–371. MR 0403976 Zbl 0336.02049 q.v. 1226 [48] R. Morin, Decompositions of asynchronous systems. In CONCUR ’98: concurrency theory (D. Sangiorgi and R. de Simone, eds.). Proceedings of the 9 th International Conference held in Nice, September 8–11, 1998. Lecture Notes in Computer Science, 1466. Springer, Berlin, 1998, 549–564. MR 1683385 Zbl 0940.68056 q.v. 1249

33. Synthesis with finite automata

1259

[49] A. W. Mostowski, Regular expressions for infinite trees and a standard form of automata. In Computation theory (A. Skowron, ed.). Proceedings of the 5th Symposium held in Zaborów, December 3–8, 1984. Lecture Notes in Computer Science, 208. Springer, Berlin, 1985, 57–168. MR 0827531 Zbl 0612.68046 q.v. 1226 [50] A. W. Mostowski, Games with forbidden positions. Technical Report 78, Uniwersytet Gdański, Instytut Matematyki, Gdański, q.v. 1226 [51] A. Muscholl and I. Walukiewicz, Distributed synthesis for acyclic architectures. In 34 th International Conference on Foundation of Software Technology and Theoretical Computer Science (V. Raman and S. P. Suresh, eds.). Proceedings of the conference (FST & TCS 2014) held at the India International Centre, New Delhi, December 15–17, 2014. LIPIcs. Leibniz International Proceedings in Informatics, 29. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2014, 639–651. MR 3345028 Zbl 1360.68594 q.v. 1249, 1254 [52] A. Muscholl, I. Walukiewicz, and M. Zeitoun, A look at the control of asynchronous automata. In Perspectives in concurrency theory (K. Lodaya, M. Mukund and R. Ramanujam, eds.). A Festschrift for P. S. Thiagarajan. Universities Press, Hyderabad, and CRC Press, Boca Raton, FL, 2009, 356–371. MR 2569354 Zbl 1194.68145 q.v. 1249 [53] G. L. Peterson and J. H. Reif, Multiple-person alternation. In 20 th Annual Symposium on Foundations of Computer Science. SFCS 1979. Held in San Juan, Puerto Rico, October 29–31, 1979. IEEE Computer Society, Long Beach, CA, 348–363. MR 0598117 IEEEXplore 4568030 q.v. 1242 [54] A. Pnueli and R. Rosner, Distributed reactive systems are hard to synthesize. In Proceedings of the 31st Annual Symposium on Foundations of Computer Science Held in St. Louis, MO, October 22–24, 1990. IEEE Press, Los Alamitos, CA, 746–757. IEEEXplore 89597 q.v. 1235, 1238, 1240, 1242 [55] M. O. Rabin, Decidability of second-order theories and automata on infinite trees. Trans. Amer. Math. Soc. 141 (1969), 1–35. MR 0246760 Zbl 0221.02031 q.v. 1218, 1222 [56] M. O. Rabin, Automata on infinite objects and Church’s problem. Conference Board of the Mathematical Sciences Regional Conference Series in Mathematics, 13. American Mathematical Society, Providence, R.I., 1972. MR 0321708 Zbl 0315.02037 q.v. 1218 [57] A. Rabinovich, The Church synthesis problem with parameters. Log. Methods Comput. Sci. 3 (2007), no. 4, 4:9, 23 pp. MR 2357496 Zbl 1131.03016 q.v. 1226 [58] P. J. G. Ramadge, Some tractable supervisory control problems for discrete-event systems modeled by Büchi automata. IEEE Trans. Automat. Control 34 (1989), no. 1, 10–19. MR 0970928 Zbl 0666.93095 IEEEXplore 8645 q.v. 1227, 1249 [59] P. J. G. Ramadge and W. M. Wonham, The control of discrete event systems. Proceedings of the IEEE 77 (1989), no. 2, 81–98. IEEEXplore 21072 q.v. 1218, 1227, 1229 [60] K. Rudie and W. Wonham, Think globally, act locally: Decentralized supervisory control. IEEE Trans. Automat. Control 37 (1992), no. 11, 1692–1708. MR 1195211 Zbl 0778.93002 IEEEXplore 173140 q.v. 1249 [61] S. Salvati and I. Walukiewicz, Evaluation is MSOL-compatible. In 33 rd International Conference on Foundations of Software Technology and Theoretical Computer Science (A. Seth and N. K. Vishnoi, eds.). Proceedings of the conference (FST & TCS 2013) held in Guwahati, December 12–14, 2013. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2013, 103–114. MR 3166005 Zbl 1359.03015 q.v. 1235 [62] S. Schewe, Synthesis of distributed systems. Ph.D. thesis, Universität des Saarlandes, Saarbrücken, Germany, 2008. q.v. 1249

1260

Igor Walukiewicz

[63] O. Serre, Games with winning conditions of high Borel complexity. In Automata, languages and programming (J. Díaz, J. Karhumäki, A. Lepistö, and D. Sannella, eds.) Proceedings of the 31st International Colloquium (ICALP 2004) held in Turku, July 12–16, 2004. Lecture Notes in Computer Science, 3142. Springer, Berlin, 2004, 1150–1162. MR 2161089 Zbl 1099.03510 q.v. 1235 [64] A. Stefanescu, Automatic synthesis of distributed transition systems. Ph.D. thesis. Universität Stuttgart, Stuttgart, Germany, 2006. q.v. 1252 [65] A. Stefanescu, J. Esparza, and A. Muscholl, Synthesis of distributed algorithms using asynchronous automata. In CONCUR 2003 – concurrency theory (R. M. Amadio and D. Lugiez, eds.). Proceedings of the 14 th International Conference held in Marseille, September 3–5, 2003. Lecture Notes in Computer Science, 2761. Springer, Berlin, 2003, 27–41. MR 2081879 Zbl 1274.68680 q.v. 1249, 1252 [66] L. J. Stockmeyer, The complexity of decision problems in automata theory and logic. Ph.D. thesis. Department of Electrical Engineering MIT, Boston, MA, 1974. q.v. 1242 [67] N. Sznajder, Synthèse de systèmes distribués ouverts. Ph.D. thesis. École Normale Supérieure de Cachan, Cachan, France, 2009. q.v. 1249 [68] J. G. Thistle, Undecidability in decentralized supervision. Systems Control Lett. 54 (2005), no. 5, 503–509. MR 2132820 Zbl 1129.93451 q.v. 1249 [69] W. Thomas, Languages, automata, and logic. In Handbook of formal languages (G. Rozenberg and A. Salomaa, eds.). Vol. 3. Beyond words. Springer, Berlin, 1997, 389–455. MR 1470024 q.v. 1226 [70] S. Tripakis, Undecidable problems of decentralized observation and control on regular languages. Inform. Process. Lett. 90 (2004), no. 1, 21–28. MR 2041981 Zbl 1178.68327 q.v. 1249 [71] M. Y. Vardi and T. Wilke, Automata: from logics to algorithms. In Logic and automata (J. Flum, E. Grädel, and T. Wilke, eds.). History and perspectives. Texts in Logic and Games, 2. Amsterdam University Press, Amsterdam, 2008, 629–736. MR 2508757 Zbl 1234.03026 q.v. 1231 [72] I. Walukiewicz, Pushdown processes: games and model-checking. Inform. and Comput. 164 (2001), no. 2, 234–263. MR 1816150 Zbl 1003.68072 q.v. 1227 [73] W. Wonham, Supervisory control of discrete-event systems. Technical Report ECE 1636F/1637S 2009-10, University of Toronto, Toronto, Canada, 2010. q.v. 1235 [74] T.-S. Yoo and S. Lafortune, A general architecture for decentralized supervisory control of discrete-event systems. Discrete Event Dyn. Syst. 12 (2002), no. 3, 335–377. WODES2000 (Ghent). MR 1914332 Zbl 1048.93067 q.v. 1249 [75] W. Zielonka, Notes on finite asynchronous automata. RAIRO Inform. Théor. Appl. 21 (1987), no. 2, 99–135. MR 0894706 Zbl 0623.68055 q.v. 1249, 1251

Chapter 34

Timed automata Patricia Bouyer

Contents 1. 2. 3. 4. 5. 6. 7.

Introduction . . . . . . . . . . . . . . . . . . . . Timed automata . . . . . . . . . . . . . . . . . . The emptiness problem, why and how? . . . . . . The region abstraction: a key for decidability . . . Applications of the region automaton construction The language-theoretic perspective . . . . . . . . Conclusion and current developments . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

1261 1262 1265 1266 1278 1282 1289

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1290

1. Introduction The growing importance of embedded systems and complex models. Embedded systems are a modern technology that has an important impact on our everyday life. There are indeed more and more computing systems, and our comfort more and more relies on them (as an illustration, in thirty years, we have gone from “no phone at home” to “everyone with a mobile phone”). One of the characteristics shared by these systems is that they have to meet numerous quantitative constraints, such as resource constraints (power consumption, memory usage, costs, bandwidth, etc.), timing constraints (response time, propagation delays, etc.), and constraints on the environment in which they operate (signal sensors, interactions with a possibly continuous environment, etc.). Another important characteristic of embedded systems is that they have to be reliable, powerful, and efficient. Thus, their conception and verification pose a great challenge, and require the development of complex models and analysis techniques. The timed automaton model. In this chapter, we concentrate on one fundamental aspect of embedded systems, the timing constraints. One of the prominent models for real-time systems is that of timed automata, defined by Rajeev Alur and David Dill in the early 1990’s, lee [4] and [5]. A timed automaton is a finite automaton that can manipulate variables. These variables have very specific behaviours: they increase synchronously with time, they can be compared with a constant, and they can be reset to 0. The number of states of a timed automaton is infinite, and hence no properties of standard finite automata can trivially be transferred to timed automata. However, a finite-state abstraction, called the region automaton, can be constructed that allows us

1262

Patricia Bouyer

to check for reachability properties, ! -regular properties, some branching-time timed temporal logics, etc. An extensive literature has been written on this model since the original paper [4], and it is worth noticing that Rajeev Alur and David Dill received the CAV award in 2008 for this seminal article, which is among the most quoted in computer-aided verification.1 Research in timed automata is not only concerned by theory, but also by practical applications. Indeed, tools such as Uppaal 2 [11] have been developed and implement verification algorithms for timed automata, and many industrial case studies have been analyzed using such technologies. Content of the chapter. The aim of this chapter is to give some basics of timed automata. First, we present the model of timed automata, and give two main semantics, one operational and one in terms of timed languages (§ 2). Then, we discuss the emptiness problem in timed automata (§ 3). One of the main parts of this chapter is § 4, where we present the fundamental region automaton construction, used to prove (among other results) the decidability of the reachability problem in timed automata. In § 6 we focus on the language semantics of timed automata and study properties of timed languages accepted by timed automata. In particular, we present the undecidability of the universality problem (and the inclusion problem) for timed automata, which shows that not everything can be reduced to the region automaton abstraction. We conclude with some current developments (§ 7).

2. Timed automata 2.1. Preliminary notations. If Z is a set, let Z  be the set of finite sequences of elements in Z . We consider as time domain T the set QC of non-negative rationals or the set RC of non-negative reals and † as a finite set of actions. A time sequence over T is a finite non-decreasing sequence  D .ti /16i 6p 2 T . A timed word ! D .ai ; ti /16i 6p is an element of .†  T/ , also written as a pair ! D .; /, where  D .ai /16i 6p is a word in † and  D .ti /16i 6p a time sequence in T of same length. In the following, we denote by Untime.w/ the finite word  over the alphabet †. 2.1.1. Clock valuations, operations on clocks. We consider a finite set X of variables, called clocks. A (clock) valuation over X is a mapping vW X ! T which assigns a time value to every clock. The set of all clock valuations over X is denoted TX , and 0X denotes the valuation assigning 0 to every clock x 2 X . Let v 2 TX be a valuation and t 2 T, the valuation v C t is defined by .v C t/.x/ D v.x/ C t for every x 2 X . For Y  X , we let ŒY 0v denote the valuation such that .ŒY 0v/.x/ D 0 for every x 2 Y , and .ŒY 0v/.x/ D v.x/ for every x 2 X n Y . 1 See https://www.princeton.edu/cav2008/cav_award_announce.html 2 See https://www.uppaal.org

34. Timed automata

1263

2.1.2. Clock constraints. Given a finite set of clocks X , we introduce two sets of clock constraints over X . The most general one, denoted C.X /, is defined by the grammar g WWD x ‰ c j x

y ‰ c j g ^ g j true

where x; y 2 X; c 2 Z and ‰ 2 ¹; >º:

Remark 2.1. We could allow rational constants in clock constraints (i.e., have c 2 Q in the above grammar), and everything that follows would still hold, but developments would then be a bit more technical. A clock constraint of the form x y ‰ c is called diagonal. Next, we also use the proper subset of diagonal-free clock constraints where the diagonal constraints are not allowed. This set is denoted Cdf .X /. A k -bounded clock constraint is a clock constraint which involves only (integral) constants c between k and Ck . The set of k -bounded (resp. k -bounded diagonal-free) clock constraints is denoted Ck .X / (resp. Ckdf .X /). If v 2 TX , we write v ˆ g when v satisfies the clock constraint g, and we say that v satisfies x ‰ c (resp. x y ‰ c ) whenever v.x/ ‰ c (resp. v.x/ v.y/ ‰ c ). If g is a clock constraint, we write JgKX for the set of clock valuations ¹v 2 TX j v ˆ gº.

2.2. The model of timed automata. A timed automaton over T is a tuple A D .L; L0 ; LF ; X; †; T /, where L is a finite set of locations, L0  L is the set of initial locations, LF  L is the set of final locations, X is a finite set of clocks, † is a finite alphabet of actions, and T  L  C.X /  †  2X  L is a finite set of transitions.3 If all constraints appearing in A are diagonal-free (i.e., are in Cdf .X /), we say that A is a diagonal-free timed automaton. For modelling purpose (for instance, for ensuring some liveness property in the system), some definitions assume invariants in the model. Invariants are clock constraints assigned to locations which have to be satisfied while the system is in the location. A timed automaton with invariants is then a tuple A D .L; L0 ; LF ; X; †; T; Inv/ where .L; L0 ; LF ; X; †; T / is a timed automaton in the previous sense, and InvW L ! Cdf .X /, the invariant, assigns a clock constraint to every location. A timed automaton, as defined initially, is a special case of a timed automaton with invariants where the invariant assigns “true” to every location. In the sequel, when we speak of timed automata, we will equivalently mean timed automata with, or without, invariants. Several semantics can be given to timed automata. We first give an operational semantics and then a language-based semantics. Let A D .L; L0 ; LF ; X; †; T; Inv/ be a timed automaton. 2.2.1. Semantics as a timed transition system. The operational semantics of A is given as the timed transition system TA D .S; S0 ; !/ over alphabet †, where the set of states is S D ¹.`; v/ 2 L  TX j v ˆ Inv.`/º, the set of initial states is S0 D ¹.`0 ; 0X / 2 S j `0 2 L0 º, and !  S  .T [ †/  S is the set of moves defined as follows: 3 For more readability, a transition will often be written as ` instead of simply the tuple .`; g; a; Y; `0 /.

g;a;Y

! `0 or even as `

g;a;Y WD0

! `0

Patricia Bouyer

1264

d

 delay moves: if .`; v/ 2 S and d 2 T, there is a move .`; v/ ! .`; v C d / whenever .`; v C d 0 / 2 S (i.e., v C d 0 ˆ Inv.`/) for every 0 6 d 0 6 d ; g;a;Y

a

 action moves: if .` ! `0 / 2 T and .`; v/ 2 S , there is a move .`; v/ ! 0 .` ; ŒY 0v/ whenever v ˆ g and .ŒY 0v/ ˆ Inv.`0 /. In that case we say g;a;Y

that the action move is associated with the transition ` ! `0 . Next, it will be more convenient to consider mixed moves, and hence to consider the m timed transition system TA D .S; S0 ; !/ where S and S0 are defined as in TA , and !  S  .T  †/  S . In that case a move is composed of a delay move directly d;a

d

followed by an action move: .`; v/ ! .`0 ; v 0 / is a mixed move if .`; v/ ! .`; v C d / a is a delay move and .`; v C d / ! .`0 ; v 0 / is an action move.

2.2.2. Semantics as a timed language. We now give the language-based semantics of A. A path in A is a finite sequence of consecutive transitions: P D `0

g1 ;a1 ;Y1

! `1

g2 ;a2 ;Y2

! 

gp

1 ;ap 1 ;Yp 1

! `p

gp ;ap ;Yp 1

! `p ;

gi ;ai ;Yi

where `i 1 ! `i 2 T for every 1 6 i 6 p . The path is said to be accepting if it starts in an initial location (`0 2 L0 ) and ends in a final location (`p 2 LF ). A run of the automaton along the path P is a sequence of the form: % D .`0 ; v0 /

d1 ;a1

! .`1 ; v1 /

d2 ;a2

! 

dp

1 ;ap 1

! .`p

1 ; vp 1 /

dp ;ap

! .`p ; vp /;

di ;ai

where v0 ˆ Inv.`0 /, and for each 1 6 i 6 p , .`i 1 ; vi 1 / ! .`i ; vi / is a mixed m move of the timed transition system TA (see § 2.2.1) associated with the transition gi ;ai ;Yi

! `i . The run is accepting if the underlying path is accepting and if v0 D 0X . The label of the run % is the timed word w D .a1 ; t1 /    .ap ; tp /, where ti D Pi j D1 dj for every 1 6 i 6 p is the absolute time at which the i -th action ai occurs (we also say that run % reads the timed word w ). If the run % is accepting, then the timed word w is said to be accepted by A. The set of all timed words accepted by A is denoted L.A/ and is the timed language accepted (or equivalently recognised) by A. `i

1

Remark 2.2. For simplicity, we only consider finite paths, finite runs and finite timed words, but as for classical finite automata, one could consider infinite timed words and ! -regular accepting conditions (Büchi, Muller, etc.); see [5]. 2.3. An example of a timed automaton. We consider the (diagonal-free) timed automaton in Figure 1, with two clocks x and y , and over the alphabet ¹problem; delayed; repair; doneº. It has four locations (“safe,” “alarm,” “failsafe,” “repairing”), and location “safe” is both initial and final. Transitions are depicted in a standard way as labelled edges between locations. The transition between “alarm” and “repairing” has constraint x < 15, action label repair, and resets clock y . A state of this automaton is a location, and a valuation for the two clocks x and y . The following is a sequence of delay and action moves in the timed transition system of the above timed

34. Timed automata

22

safe

6 25 6y

¹xº , problem, ;

, don

1265

e, ;

x
0 yD0

R1 1 06xy

0

0 R3 1 x>1 @y > 1 A x>y

0

0 R4 1 x>0 @y > 1A x 1; x > y; x < yº. xDy

y R4 R3 1

R1 0

yD1

R2 1

x R0

Figure 2. Set of regions R

The main property of region equivalence is the following, which is a direct application of the three compatibility conditions. Proposition 4.1. Let A D .L; L0 ; LF ; X; †; T / be a timed automaton with set of constraints C. Assume that R is a set of regions for X and C. Then the equivalence relation ŠR defined on states of A by .`; v/ ŠR .`0 ; v 0 / () .` D `0 and v R v 0 /;

is a time-abstract bisimulation, which is called the region equivalence relation.

34. Timed automata

1269

This property of region equivalence will be used to construct a finite automaton, which is the quotient of the original timed automaton with the region equivalence relation. We start by constructing the region graph, which represents the (abstract) evolution of time and clock valuations. Then we construct the region automaton, which is a finite automaton, representing (in an abstract way) the behaviours of the timed automaton. 4.2.2. The region graph. From a set of regions R satisfying the three compatibility conditions, one can define the so-called region graph, which represents the possible evolutions of time in the system: the region graph is a finite automaton whose set of states is R and whose transitions are as follows: ´ " R ! R0 if R0 is a time successor5 of R; Y

R ! R0 if ŒY

0R  R0 :

Intuitively, the region graph records possible evolutions of time in the system: there " is a transition R ! R0 if, from some valuation in region R, it is possible to let some Y

time elapse and reach region R0 ; there is a transition R ! R0 if, from some valuation in region R, it is possible to reach region R0 by resetting clocks in Y . Note that due to conditions ➁ and ➂, if this is the case “for some valuation in region R,” then this is the case “for all valuation in region R.” The region graph is closed by reflexivity and transitivity for "-transitions, i.e., " " " " R ! R, and if R ! R0 and R0 ! R00 , then R ! R00 . Example 4.2. The region graph associated with the set of regions R mentioned in " Example 4.1 is represented in Figure 3. Plain edges are ! transitions of the region ¹xº

¹yº

graph, whereas dashed (resp. dotted) edges are ! (resp. !) transitions. We have ¹x;yº

omitted transitions ! (that reset both clocks) for readability reasons: there should be such a transition from any state to region R0 .

4.2.3. The region automaton. Let A D .L; L0 ; LF ; X; †; T / be a timed automaton, and assume that the set of constraints occurring in A is C. Let R be a finite set of regions for X and C (i.e., a partition of TX satisfying conditions ➀–➂). The region automaton €R .A/ is the finite automaton .Q; Q0 ; QF ; †; T 0 /, where Q D LR is the set of states, Q0 D L0  ¹Œ0X R º is the set of initial states, QF D LF  R is the set of final states, † is the same alphabet as that of A, and T 0 is the set of transitions defined as follows: a there is a transition .`; R/ ! .`0 ; R0 / in T 0 whenever there exists some region R00 2 R 5 R0 is a time successor of R whenever there is some v 2 R and some t 2 T such that v C t 2 R0 . Note that, due to the compatibility condition ➁, if R0 is a time successor of R, then for every v 2 R, there is some t 2 T such that v C t 2 R0 .

Patricia Bouyer

1270

R1 06x0 yD0

R4 x>0 y>1 x1 y>1 x>y

Figure 3. A simple example of a region graph g;a;Y

and some transition ` ! `0 in A such that 8 " 00 ˆ ˆ 1; a; x WD 0

y > 1; b

`1

`2

d

x < y ^ y 6 1; e

c; y WD 0

`4

`3

Figure 4. An example timed automaton

`0 ; R 0

a

`1 ; R 4

b

`2 ; R 4 c

d

`4 ; R 1

`1 ; R 3 `1 ; R 2

d b b b

`3 ; R 0 `2 ; R 3

`1 ; R 0

Figure 5. The region automaton

d

Patricia Bouyer

1272

For every timed automaton A for which we can effectively construct a finite set of regions R (satisfying conditions ➀–➂), we can transfer the checking of reachability properties in A to the finite automaton €R .A/. It remains to see how we can effectively build sets of regions for timed automata. 4.3. Effective construction of sets of regions. In the previous section, we presented an abstract construction which allows us to reduce the model-checking of reachability properties in timed automata to the model-checking of reachability properties in finite automata, under the condition that there is a finite set of regions. However, we did not explain how to construct a set of regions for timed automata, which is the basis of the whole construction. In this section, we fix a finite set of clocks X . 4.3.1. Regions for sets of diagonal-free constraints. Let M 2 N be an integer. We define a set of regions for X and the set of M -bounded diagonal-free clock constraints CM df .X /. A natural partition would be to take the partition induced by the set of constraints itself; see Figure 6(a) for an illustration with two clocks. But this is actually not refined enough because compatibility condition ➂ is not satisfied (as illustrated on the figure by the two gray valuations). A correct partition is given in Figure 6(b). clock y

clock y

2

2

1

1

0

0

clock x 0 1 2 (a) Partition compatible with the 2-bounded clock constraints and the resets (conditions ➀ and ➂), but not with time elapsing (condition ➁): the two gray points are not equivalent

clock x 0 1 2 (b) Partition R2df .¹x; yº/ satisfying all the compatibility constraints ➀–➂ for the set of constraints C2df .¹x; yº/

Figure 6. Region construction for set of diagonal-free constraints C2df .¹x; yº/

We formalise this idea and we define a partition of TX , denoted RM df .X /. We will give three different (but equivalent) definitions.

i. The first definition formalises the 2-dimensional intuition that we have seen earlier, and is the most standard one. It uses the definition of an equivalence relation X;M over valuations. Let v and v 0 be two valuations of TX . We say df X;M 0 that v df v if all three following conditions are satisfied:

34. Timed automata

1273

a. v.x/ > M if and only if v 0 .x/ > M , for every x 2 X ; b. if v.x/ 6 M , then bv.x/c D bv 0 .x/c and (¹v.x/º D 0 if and only if ¹v 0 .x/º D 0), for every 6 x 2 X ; c. if v.x/ 6 M and v.y/ 6 M , then ¹v.x/º 6 ¹v.y/º if and only if ¹v 0 .x/º 6 ¹v 0 .y/º, for every x; y 2 X . The relation X;M is an equivalence relation of finite index, and it naturally df X induces a finite partition RM df .X / of T . The construction for two clocks is precisely the one illustrated in Figure 6(b). ii. We give a second description of the regions in RM df .X /, which makes it easier to give an upper bound on the number of regions in RM df .X /. An interval of T with integral bounds is called M -simple if it is of one of the following forms: .c; c C 1/ with 0 6 c < M , or Œc; c with 0 6 c 6 M , or .M; C1/. It is called bounded if it is one of the first two forms, and singular in the second form. Each region of RM df .X / can then be characterised uniquely as follows:  an M -simple interval Ix for every clock x 2 X ;  a preorder  on the set of clocks Z.Ix /x2X D ¹x 2 X j Ix bounded and non-singularº:

Intuitively the interval Ix is the interval to which x belongs, and the preorder is given by the preorder on the fractional parts of all clocks, whose values are bounded by M and are non-integral. Assume that region R 2 RM df .X / is given by .Ix /x2X and . Then ´ 8x 2 X; v.x/ 2 Ix , v 2 R () 8x; y 2 Z.Ix /x2X ; .x  y () ¹v.x/º 6 ¹v.y/º/.

iii. We give a third description of the regions, which gives an interesting onedimensional understanding of the regions. It reuses elements of the characterisation above. Each region of RM df .X / can be characterised uniquely by  the set X1 D ¹x 2 X j Ix D .M; C1/º,  the set X0 D ¹x 2 X n X1 j Ix D Œc; c for some c 6 M º,  a partition .Xi /16i 6p of X n .X0 [ X1 / such that 7 – Xi ¤ ; for every 1 6 i 6 p , – for x 2 X n .X0 [ X1 /, writing i.x/ for the unique index such that x 2 Xi.x/ , for every x; y 2 X n .X0 [ X1 / x  y () i.x/ 6 i.y/,

 for every x 2 X n X1 , cx is an integer bounded by M , and if x … X0 , cx < M . Note: sets X0 and X1 can be empty (contrary to the other Xi ’s). 6 bc (resp. ¹º) represents the integral (resp. fractional) part. 7 That is, clocks in the same Xi have the same fractional part, whereas fractional parts of clocks in two different Xi ’s have their fractional parts ordered accordingly.

Patricia Bouyer

1274

The generic representation of such a region is given in Figure 7. It represents the interval Œ0; 1/ and shows the distribution of the clocks within that interval, according to their fractional part. X1 X0 0

X1

X2



Xp 1

8x 2 X n X1 ; x 7 ! cx

Figure 7. Linear representation of a region: clocks are ordered according to their fractional parts (except that are above the maximal constant), and their integral part is given apart

Assume that region R 2 RM df .X / is given by X0 , .Xi /16i 6p , X1 and .cx /x2XnX1 ; for every x 2 X n X1 , write i.x/ 2 ¹0; 1; : : : ; pº such that x 2 Xi . Then, 8 pC1 ˆ s.t. 0 D 0; i < j H) i < j ; M .

Example 4.4. For instance, the light gray region depicted in Figure 6(b) has the following second characterisation 8 ˆ 4. Those constraints express the fact that cell i contains an a and a b , respectively. We define the resetting sets Ya;i D ¹xi º and Yb;i D ;. These are sets of clocks to be reset for expressing the fact that we write an a, respectively a b , in cell i . – Consider a rule 8 .q; Read˛ ; Writeˇ ; Right; q 0 / in M. For every i 2 ¹1; : : : ; N 1º, there are the following transitions in T : uD2

 .q; i / ! .q; i; 0/;  if i ¤ j with i < N , there are two transitions .q; i; j

1/

.q; i; j

1/

ga;j ;Ya;j

! .q; i; j /

and

 .q; i; i

1/

gb;j ;Yb;j

! .q; i; j /I

g˛;i ;Yˇ;i

uD3

0

! .q; i; i /;

 .q; i; N / ! .q ; i C 1/. The case ˛ D a and ˇ D b is depicted in Figure 11, and informal explanations are given. – Consider a rule .q; Read˛ ; Writeˇ ; Left; q 0 / in M. For every i 2 ¹2; : : : ; N º, there are the following transitions in T : uD2

 .q; i / ! .q; i; 0/;  if i ¤ j with i < N , there are two transitions .q; i; j

1/

.q; i; j

1/

ga;j ;Ya;j

! .q; i; j /

and

 .q; i; i

1/

gb;j ;Yb;j

! .q; i; j /I

g˛;i ;Yˇ;i

uD3;uWD0

! .q; i; i /;

 .q; i; N / ! .q 0 ; i 1/. There is an extra transition from state init to initialise the input word w0 on the tape. We let w0 .i / denote the i -th letter of w0 . Then, there is a transition init

u>2;Yw0

! .q0 ; 1/.

We claim that there is a halting computation in M if and only if L.A/ ¤ ;, which we let as an exercise for the reader. As the halting problem for LBTMs is PSPACE-hard, we get the expected lower bound. 8 This rule reads as follows: from state q , if we read an ˛ in the current cell of the tape, then we write a ˇ onto the current cell, move the head of the tape to the right and go to state q 0 .

34. Timed automata u WD 0

q; i

uD2

x1 6 4; x1 WD 0 x1 > 4

xi 6 4

1281

xN 6 4; xN WD 0 uD3 xN > 4

q0 ; i C 1

Figure 11. Module for rule .q; Reada ; Writeb ; Right; q 0 /. We assume that initially clock xj encodes the content of cell Cj (a if xj 6 1 and b if xj > 2). In the central part of the module (between test u D 2 and u D 3), when clock xj 6 4, it means that cell Cj contains an a, and if xj > 4, it means that cell Cj contains a b . Due to proper resetting of the clocks, when we leave the module, the value of the clocks encode the new configuration of M (cell Cj contains an a if xj 6 1 and a b if xj > 2 – note that only the content of cell Ci has been changed). The head is moved to the right, thanks to new index i C 1.

Remark 5.2. In [22], a proof of PSPACE-hardness is given for timed automata with only three clocks, which is rather technical. This result has been recently refined in [25], where a PSPACE-hardness proof is given for timed automata with two clocks. 5.2. The case of “simply-timed” timed automata. The previous complexity result holds for timed automata with two clocks or more. For simpler systems – for instance, for systems with a single clock – this result can be improved. Of course, the same set of regions cannot be used, because even though there is only one clock, the number of regions given in Lemma 4.7 remains exponential, due to the binary encoding of constants in the timed automaton. However, we can choose a smaller and coarser set of regions which yields the following result [31]. Proposition 5.3. The reachability problem for timed automata with a single clock is NLOGSPACE-complete.

Proof. NLOGSPACE-hardness follows from that of reachability in finite graphs, see [28]. The NLOGSPACE membership can be obtained using a coarser set of regions than that presented in § 4.3. Given a finite set C of constraints over a single clock x , we define the set of constants C D ¹c 2 N j 9.x ‰ c/ 2 Cº [ ¹0º, and we assume that this set is ordered: C D ¹c0 < c1 < c2 <    < cp º. We define the partition RC as the (finite) set of intervals of one of the forms: (i) ¹ci º with 0 6 i 6 p , or (ii) .ci ; ci C1 / with 0 6 i < p , or (iii) .cp ; C1/. This is not hard to prove that RC is a set of regions for ¹xº and C. The size of RC is polynomial in the size of C, which yields a polynomial-size region automaton, thus proving the expected result. 5.3. Further applications of the region automaton abstraction. We have seen that the region automaton abstraction can be used for verifying reachability properties. It can actually be used for verifying many more properties, and more precisely it can be used for verifying all properties that are invariant by time-abstract bisimulation. This is the case for safety properties, ! -regular properties, and untimed properties expressed in LTL

1282

Patricia Bouyer

(see [35]) or CTL (see [21] and [36]). 9 However, this construction cannot be directly used to verify properties expressed in a timed temporal logic like TCTL [3] because a property like “reaching a state in exactly 5 units of time” is not invariant by time-abstract bisimulation. For these properties a refined construction is required, which uses an extra clock for the formula. We do not develop this construction here, but refer instead to the original articles on the subject [3].

6. The language-theoretic perspective In the previous section we presented the region automaton abstraction, which can be used to model-check several kinds of simple properties, such as reachability properties. From a language perspective, this means that the emptiness problem is decidable for timed automata. In this section we further study language-theoretic properties of timed languages accepted by timed automata, and show, in particular, some negative results. 6.1. Boolean operations. Closure under Boolean operations is a basic property that is interesting for modelling and verification reasons. Proposition 6.1. The class of timed languages accepted by timed automata is closed under finite union and finite intersection. Sketch of proof. Closure under finite union is rather straightforward by taking the disjoint union of all timed automata. Closure under finite intersection follows the lines of the standard product construction used in the case of finite automata. Only clock constraints, invariants, and resets of clocks need be carefully handled. We illustrate the general construction with the intersection of two timed automata A1 D .L1 ; L1;0 ; L1;F ; X1 ; †; T1 ; Inv1 / and A2 D .L2 ; L2;0 ; L2;F ; X2 ; †; T2 ; Inv2 / over a single alphabet †. We assume that the two sets of clocks X1 and X2 are disjoint (otherwise we rename clocks so that this is actually the case). Then we define the timed automaton A D .L; L0 ; LF ; X; †; T; Inv/ as follows:  L D L1  L2 , L0 D L1;0  L2;0 , LF D L1;F  L2;F ;  X D X1 [ X2 (disjoint union);

 the set T is composed of transitions of the form .`1 ; `2 /

whenever there exist two transitions `1 in T2 such that – g D g1 ^ g2 ; – Y D Y1 [ Y2 ;  Inv..`1 ; `2 // D Inv1 .`1 / ^ Inv2 .`2 /.

g1 ;a;Y1

g;a;Y

! `01 in T1 and `2

! .`01 ; `02 / g2 ;a;Y2

! `02

9 Note that if we require infinite behaviours to be non-Zeno, that is, time-diverging, properties are no longer invariant by time-abstract bisimulation, but the region construction can nevertheless be used [5]. We refer to [37] for interesting developments on the detection of Zeno behaviours in timed automata.

34. Timed automata

1283

It is straightforward to prove that a timed word is accepted by A if and only if it is both accepted by A1 and A2 . For modelling purposes, closure under intersection is too strong, and it is better to use parallel composition (the difference arises from the fact that some actions can be performed independently by the various components of the system). However it is not hard to modify the previous construction and prove that timed automata are also closed under parallel composition. The following proposition is, to the contrary, rather bad news. Proposition 6.2. The class of timed languages accepted by timed automata is not closed under complementation. The most well-known timed automaton that cannot be complemented, already given in [5], is depicted in Figure 12. a

`0

a a; x WD 0

`1

a x D 1; a

`2

Figure 12. A non-complementable timed automaton

This automaton, over the single-letter alphabet ¹aº, recognises the timed language

¹.a; t1 /.a; t2 /    .a; tn / j n > 2 and there exist 1 6 i < j 6 n with tj

ti D 1º.

Intuitively, to be recognised by a timed automaton, the complement of this timed language would require an unbounded number of clocks, because for any action a, we need to check that there is no a-action one time unit later, so, intuitively, a fresh clock is required. However the complete proof is rather technical and annoying [13], and we do not provide it here. An alternative and elegant proof of the above proposition was proposed in [8], and this is the one we present here. Proof. We consider the timed automaton in Figure 13. It accepts the following timed language over the alphabet ¹a; bº: ¹.˛1 ; t1 /    .˛n ; tn / j n > 1; 91 6 i 6 n s.t. ˛i D a and 8i < j 6 n; tj

ti ¤ 1º:

x (the complement of L) can be recognised by We assume, to get a contradiction, that L a timed automaton. It is not hard to get convinced that the timed language over ¹a; bº: L0 D ¹.aC b  ; / j all a0 s happen before 1 and no two a0 s simultaneouslyº

is accepted by the timed automaton in Figure 14. x \ L0 is accepted by some timed automaton. The Hence by Proposition 6.1, L 0 x following lemma is just a matter of expanding and manipulating the definition of L\L .

Patricia Bouyer

1284

x ¤ 1; a; b

a; b a; x WD 0

`0

`1

Figure 13. Another non-complementable timed automaton

y > 0; x < 1; a; y WD 0 `0

y > 0; x < 1; a

b

`1

Figure 14. Timed automaton for L0

x \ L0 is the non-regular language Lemma 6.3. The untiming of L ¹an b m j n 2 N; n > 1 and m > nº: x \ L0 is accepted by some This lemma yields a contradiction with the fact that L 0 x timed automaton, say B, because the untiming of L \ L would be recognised by the region automaton of B. Hence we conclude that the complement of L is not recognised by any timed automaton.

6.2. The universality and inclusion problems. The universality problem asks, given a timed automaton A, whether A accepts all (finite) timed words. The inclusion problem asks, given two timed automata A and B, whether all timed words accepted by B are also accepted by A; that is, whether L.B/  L.A/. Note that the universality problem is a special instance of the inclusion problem, where B is universal, i.e., accepts all (finite) timed words. The following result [5] is bad news in the context of verification, as argued in § 3. Theorem 6.4. The universality problem is undecidable for timed automata. Proof. We encode the halting problem for a two-counter machine as a universality problem of a timed automaton. Let M be a deterministic two-counter machine. We assume that Q is the set of states of M. A configuration of M is a triple .q; c; d / where q 2 Q is the current state, c 2 N is the value of the first counter, and d 2 N is the value of the second counter. We encode a finite execution .q0 ; c0 ; d0 / ! .q1 ; c1 ; d1 / !    ! .qn ; cn ; dn /

of M as a finite timed word w over the alphabet † D Q [ ¹c; d º such that

34. Timed automata

1285

1. Untime.w/ D q0 :c c0 :d d0 :q1 :c c1 :d d1    qn :c cn :d dn ; 2. no two events happen at the same date, and we write c0 d0 1 1 t0 < c;0 <    < c;0 < d;0 <    < d;0 < t1 <    < tn dn 1 cn 1 < c;n <    < c;n < d;n <    < d;n

for the corresponding increasing sequence of dates; 3. event qi happens at time i (ti D i ); 4. states faithfully follow instructions of M:  if the instruction starting in qi D q is of the form “q: if c D 0 then goto q 0 else c WD c 1; goto q 00 ” – if ci D 0, then qi C1 D q 0 ; – if ci > 0, then qi C1 D q 00 I  if the instruction starting in qi D q is of the form “q: c WD c C 1; goto q 0 ” 0 then qi C1 D q ; 5. depending on the nature of the instruction starting in location qi , we distinguish several rules: i. if the instruction does not change the first counter (this is for instance the case when this is the second counter which is updated), then ci C1 D ci j j and, c;i C1 D 1 C c;i for all 1 6 j 6 ci ; ii. if the instruction increments the first counter, then ci C1 D ci C 1, and j j c;i C1 D 1 C c;i for all 1 6 j 6 ci ; iii. if the instruction checks that the value of the first counter is positive and j j then decrements it, then ci C1 D ci 1, and c;i C1 D 1 C c;i for all 1 6 j 6 ci 1. Similarly for the second counter. The constraints are illustrated in Figure 15. We use c C C (resp. c denote an incrementation (resp. decrementation) of counter c . D 1 t.u. c q0

D 1 t.u. c c

q1

q2

cCC

cCC

) as macros to

D 1 t.u.

c c d

c ccd

q3

q4

d CC

cCC

c c d q5 c

c c dd q6

d CC

c q7

c

Figure 15. Encoding of the two-counter machine as a timed word

We construct a timed automaton that accepts all timed words over the alphabet † that are not encodings of (finite) halting executions of M. This automaton is a kind of test automaton [2] for those timed words. This automaton is “highly” non-deterministic and deniesone-by-one the conditions for a timed word to be an encoding of a halting

Patricia Bouyer

1286

execution of M (an execution as above is halting whenever it starts in the initial state and ends in the halting state of M). The first condition for being an encoding of an execution of M, as well as the characteristics for being a halting execution, can be denied by a(n untimed) finite automaton. The second condition can be denied by a simple timed automaton which performs two actions in 0-delay. The third and fourth conditions can also be denied by simple timed automata. We now explain how we can deny condition 5 (i). Similar reasoning can be given for all the other conditions, but we will omit it in these notes. Condition 5 (i) is made of two main constraints, and we deny each of them separately.  It says that every c that happens within the interval .i; i C 1/ must be followed one time unit later by another c . There are two ways of denying it: – either there is a c in .i; i C 1/ which is not followed by any action one time unit later, – or there is a c in .i; i C 1/ which is followed by some action one time unit later, but this is not a c as expected. A gadget to test for the above is given in Figure 16. Whenever the current instruction starts from state q , the automaton non-deterministically guesses a c (of the current configuration), such that either there is no action one time unit later (first branch), or there is an action, but it is not a c (second branch). On the picture, :c means “any action in † except c ,” and no action label on a transition means “any action in †.” c

x1

xD1 :c

Figure 16. Denies that every c in .i; i C 1/ is followed by a c 1 t.u. later

 It says that every c that happens within .i C 1; i C 2/ must be preceded one time unit earlier by another c . As in the previous case, there are two ways of denying it: – either there is a c in .i C 1; i C 2/ which is not preceded by any action one time unit earlier, – or there is a c in .i C 1; i C 2/ which is preceded by an action one time unit earlier, but this is not a c .

34. Timed automata

1287

The construction is illustrated in Figure 17 (:Q means “any action which does not belong to Q”). There are two possibilities: either (first automaton in the picture) it non-deterministically guesses two consecutive letters within the interval .i; i C 1/ that happen at dates, say t and t 0 , and checks that there is a c within the interval .t C 1; t 0 C 1/; hence not one time unit after another c (there is no action in the interval .t; t 0 /); or (second automaton in the picture) it non-deterministically guesses an action which is not a c and checks that there is a c one time unit later. We omit all other cases, but the reader is invited to take one of the remaining conditions and to construct the corresponding gadget as an exercise.

:Q; x WD 0 q x WD 0

y WD 0

c x > 1; y < 1

:Q q

:c x WD 0

c xD1

Figure 17. Denies that every c in .i C 1; i C 2/ is preceded 1 t.u. earlier by a c

The following is a straightforward corollary of the initial observation that the universality problem is a special instance of the inclusion problem. Corollary 6.5. The inclusion problem is undecidable for timed automata. It is interesting to notice that the reduction used in the above proof builds a timed automaton with two clocks. And actually, the universality problem (and also the inclusion problem) is decidable – though non-primitive recursive – for single-clock timed automata, see [1]. Recent developments have considered alternating timed automata – a natural extension of timed automata with alternations), see [32] and [34] – but Theorem 6.4 implies that the emptiness problem is already undecidable for alternating timed automata for two clocks. 6.3. Timed automata and determinism. In the context of formal languages, determinism is a standard and central notion that expresses that for a word there is at most one execution which reads that word. For regular languages, determinism does not restrict the recognition of languages, but for context-free languages, it does [27]. In this section, we discuss the issue of determinism in the context of timed automata, which gives some explanation for the previous negative results.

1288

Patricia Bouyer

6.3.1. The class of deterministic timed automata. We give a syntactic definition of determinism in timed automata (with no invariants, for simplicity). A timed automaton A D .L; L0 ; LF ; X; †; T / is deterministic whenever L0 is a singleton, and for every ` 2 L and a 2 †, it holds that .`; g1 ; a; Y1 ; `1 / 2 T and .`; g2 ; a; Y2 ; `2 / 2 T imply Jg1 ^ g2 KX D ;. This notion extends the standard notion of determinism in finite automata in a natural way. In a deterministic timed automaton, for every timed word, there is at most one run that reads that timed word from a given state. Example 6.1. The timed automaton in Figure 1 (see p. 1265) is deterministic. From location “alarm,” there are two outgoing transitions, but the constraints labelling those two transitions are disjoint. From the other locations, there is only one outgoing transition. On the other hand, the timed automata in Figures 12 and 13 are not deterministic. In the first one, there is a non-deterministic choice from location `1 , but it can be removed by strengthening the constraint on the self-loop (adding one self-loop with the constraint x < 1 and another one with the constraint x > 1). There is another non-deterministic choice from location `0 , and this one cannot be removed (note that it is in general not obvious to see whether a non-deterministic choice can be removed or not!): it is not possible to predict when there is an a that is followed one time unit later by another a. Deterministic timed automata form a strict subclass of timed automata.10 Using a product construction, as done in the proof of Proposition 6.1 for intersection, it is easy to convince oneself that this subclass is closed under finite union and finite intersection. On the other hand, the two timed automata we have given to illustrate the non-closure under complementation of the class of standard timed automata (Proposition 6.2) are not deterministic. And actually it is not very hard to be convinced that the class of deterministic timed automata is closed under complementation: add a sink location, add transitions to that sink from every location, with constraints complementing the union of all the constraints labelling the outgoing transitions from that location, and finally swap final and non-final locations. As a consequence, it is not possible to construct deterministic timed automata that accept the same languages as the two timed automata in Figures 12 and 13. And it is even possible to prove that it is not possible to decide whether a timed automaton can be made deterministic 11 or not, see [39] and [26]. Finally, it is interesting to mention that the reduction proving the undecidability of the universality problem (proof of Theorem 6.4) builds a non-deterministic timed automaton. And indeed the universality problem (and the inclusion problem) is decidable for the class of deterministic timed automata: to check the universality of a given deterministic timed automaton A, first build a (deterministic) timed automaton that accepts the complement of L.A/, and then check for emptiness of this automaton. 10 The strictness is obvious at the syntactic level, and also holds at the semantic level, as will be argued later: there exists a timed automaton such that no deterministic timed automata accepts the same timed language. 11 That is, whether we can construct another timed automaton, which is deterministic, and which recognises the same timed language.

34. Timed automata

1289

6.3.2. Determinisable classes of timed automata. As mentioned in the previous paragraph, not all timed automata can be determinised. Classes of automata which can be determinised (called determinisable) are of high interest for verification purpose, since they enjoy nice closure and decidability properties. One of the first determinisable classes of timed automata which have been investigated is the class of event-recording timed automata [6]. In such an automaton, every letter of the alphabet is associated with an event-recording clock, which measures delays since the last occurrence of this action. In the syntax of event-clock automata, resets of clocks are omitted as they are implicitly given by actions. Example 6.2. In Figure 18, we give an event-clock timed automaton; we make the convention that xa is the event-recording clock associated with a. In this automaton, b

`0

a

`1

xa D 1; b

`2

xa : event-recording clock for a

Figure 18. An event-clock timed automaton

the time between the last b and the unique initial a is precisely one time unit (specified with the constraint on the last transition xa D 1): when the last b is performed, it is checked that the last a was precisely one time unit earlier. It is shown in [6] that event-recording automata are (effectively) determinisable, in the sense that, given such an automaton, a standard deterministic timed automaton recognising the same timed language can be constructed. We give the intuition why an event-recording timed automaton can be determinised. The reason is that the timed behaviour of those automata is input-determined: given a timed word, the value of the clocks after each prefix of the timed word is determined by that prefix, and not by the run followed in the timed automaton. For that reason a subset construction can be done. This kind of arguments has later been used for more complex classes of timed systems [24]. Recently more determinisable classes of timed automata have been investigated (see [10] and [12]), among which we can find the class of so-called strongly non-Zeno timed automata (we omit the definition of this class here, but basically it enforces time elapsing in a rather strong way) or more dedicated classes, e.g., corresponding to logical formalisms (see [33]) or to simpler classes of timed systems (see [38]).

7. Conclusion and current developments In this chapter, we have presented the model of timed automata, and given the basic and fundamental properties of that model. Even though there is some negative news for

1290

Patricia Bouyer

the use of timed automata in verification (undecidability of the inclusion problem), this field of research has greatly expanded in the last thirty years, and is still very active, both from a theoretical and a practical point of view. We mention here some of the current developments.  Quantitative extensions of timed automata to answer the need for models for complex systems, see [30] and [16].  Verification of real-time properties expressed in some timed extensions of standard temporal logics like CTL or and LTL. As mentioned in § 5.3, the verification of TCTL, the timed extension of CTL, can be done using a refinement of the region automaton abstraction, see [3]. On the contrary, the verification of properties expressed as timed extensions of LTL is unfortunately very hard (see [7] and [34]), prompting the quest for tractable fragments. We refer to [17] for a recent survey.  Development of multi-agent timed systems, with the development of timed games, see [9], [20], [19], and [14].  Implementability and robustness of timed systems: timed automata are an idealised mathematical model which assumes perfect and infinitely precise clocks, instantaneous communication, etc. Those assumptions are unfortunately not realistic, and the correctness of systems proven on models can often not be transferred to real systems. This has naturally led to the question of the implementability of timed systems (“can a correct system be implemented on a processor?”) [23] and to the development of robust model-checking and control, which ensures correctness of the system even under slight perturbations of clock evolution. We refer to [18] for a survey.

References [1] P. A. Abdulla, J. Deneux, J. Ouaknine, and J. Worrell, Decidability and complexity results for timed automata via channel machines. In Automata, languages and programming (L. Caires, G. F. Italiano, L. Monteiro, C. Palamidessi, and M. Yung, eds.). Proceedings of the 32nd International Colloquium (ICALP 2005) held in Lisbon, July 11–15, 2005. Lecture Notes in Computer Science, 3580. Springer, Berlin, 2005, 1089–1101. MR 2184703 Zbl 1085.68078 q.v. 1287 [2] L. Aceto, P. Bouyer, A. Burgueño, and K. G. Larsen, The power of reachability testing for timed automata. Theoret. Comput. Sci. 300 (2003), no. 1–3, 411–475. MR 1976188 Zbl 1023.68060 q.v. 1285 [3] R. Alur, C. Courcoubetis, and D. L. Dill, Model-checking in dense real-time. Inform. and Comput. 104 (1993), no. 1, 2–34. Selected papers from the 1990 IEEE Symposium Logic in computer science. MR 1221370 Zbl 0783.68076 IEEEXplore 113766 q.v. 1266, 1282, 1290

34. Timed automata

1291

[4] R. Alur and D. L. Dill, Automata for modeling real-time systems. In Automata, languages and programming (M. Paterson, ed.). Proceedings of the Seventeenth International Colloquium held at the University of Warwick, Coventry, July 16–20, 1990. Lecture Notes in Computer Science, 443. Springer, New York, 1990, 322–335. MR 1076829 Zbl 0765.68150 q.v. 1261, 1262, 1266, 1279 [5] R. Alur and D. L. Dill, A theory of timed automata. Theoret. Comput. Sci. 126 (1994), no. 2, 183–235. MR 1271580 Zbl 0803.68071 q.v. 1261, 1264, 1266, 1279, 1282, 1283, 1284 [6] R. Alur, L. Fix, and T. A. Henzinger, A determinizable class of timed automata. In Computer aided verification (D. L. Dill, ed.). Proceedings of the Sixth International Conference (CAV ’94) held at Stanford University, Stanford, California, June 21–23, 1994. Lecture Notes in Computer Science, 818. Springer, Berlin, 1994, 1–13. MR 1323433 q.v. 1289 [7] R. Alur and T. A. Henzinger, Real-time logics: complexity and expressiveness. Inform. and Comput. 104 (1993), no. 1, 35–77. Selected papers from the 1990 IEEE Symposium Logic in computer science. MR 1221371 Zbl 0791.68103 IEEEXplore 113764 q.v. 1290 [8] R. Alur and P. Madhusudan, Decision problems for timed automata: a survey. In Formal methods for the design of real-time systems (M. Bernardo and F. Corradini, eds.). International school on formal methods for the design of computer, communication and software systems, SFM-RT 2004. Held in Bertinoro, Italy, September 13–18, 2004. Revised lectures. Lecture Notes in Computer Science, 3185. Springer, Berlin, 122–133. Zbl 1105.68057 q.v. 1283 [9] E. Asarin, O. Maler, A. Pnueli, and J. Sifakis, Controller synthesis for timed automata. In 5 th IFAC Conference on System Structure and Control (J.-F. Lafay, ed.). SSC ’98. Held in Nantes, France, July 8–10, 1998. FAC Proceedings Volumes, 31, no. 18. Elsevier, Oxford, U.K., and Burlingon, MA, 447–452. q.v. 1266, 1290 [10] C. Baier, N. Bertrand, P. Bouyer, and T. Brihaye, When are timed automata determinizable? In Automata, languages and programming (S. Albers, A. Marchetti-Spaccamela, Y. Matias, S. E. Nikoletseas, and W. Thomas:, eds.). Part II. Proceedings of the 36th International Colloquium (ICALP 2009) held on Rhodes, July 6–10, 2009. Lecture Notes in Computer Science, 5556. Springer, Berlin, 2009, 43–54. MR 2544783 Zbl 1248.68284 q.v. 1289 [11] G. Behrmann, A. David, K. G. Larsen, J. Håkansson, P. Pettersson, W. Yi, and M. Hendriks, UPPAAL 4.0. In Third International Conference on the Quantitative Evaluation of Systems. QEST ’06. Held in Riverside, CA, September 11–14, 2006. IEEE Press, Los Alamitos, CA, 125–126. IEEEXplore 1704000 q.v. 1262 [12] N. Bertrand, A. Stainer, J. Thierry, and M. Krichen, A game approach to determinize timed automata. In Foundations of software science and computational structures (M. Hofmann, ed.). Proceedings of the 14th International Conference (FOSSACS 2011) held as part of the Joint European Conferences on Theory and Practice of Software (ETAPS 2011) in Saarbrücken, March 26–April 3, 2011. Lecture Notes in Computer Science, 6604. Springer, Berlin, 2011, 245–259. MR 2813614 Zbl 1326.68173 q.v. 1289 [13] P. Bouyer, Automates temporisés et modularité. Master’s thesis. DEA Algorithmique. ENS Cachan, Paris, 1998. q.v. 1283 [14] P. Bouyer, R. Brenguier, and N. Markey, Nash equilibria for reachability objectives in multi-player timed games. In CONCUR 2010 – concurrency theory (P. Gastin and F. Laroussinie, eds.). Proceedings of the 21st International Conference, CONCUR 2010. Held in Paris, France, August 31–September 3, 2010. Lecture Notes in Computer Science, 6269. Springer, Berlin, 2010, 192–206. MR 2726752 Zbl 1287.68123 q.v. 1290

1292

Patricia Bouyer

[15] P. Bouyer, C. Dufourd, E. Fleury, and A. Petit, Updatable timed automata. Theoret. Comput. Sci. 321 (2004), no. 2–3, 291–345. MR 2076150 Zbl 1070.68063 q.v. 1266 [16] P. Bouyer, U. Fahrenberg, K. G. Larsen, and N. Markey, Quantitative analysis of real-time systems using priced timed automata. Comm. ACM 54 (2011), no. 9, 78–87. q.v. 1290 [17] P. Bouyer, F. Laroussinie, N. Markey, J. Ouaknine, and J. Worrell, Timed temporal logics. In Models, algorithms, logics and tools (L. Aceto, G. Bacci, G. Bacci, A. Ingólfsdóttir, A. Legay, and R. Mardare, eds.). Essays dedicated to K. G. Larsen on the occasion of his 60 th birthday. Lecture Notes in Computer Science, 10460. Springer, Cham, 2017, 211–230. MR 3710118 Zbl 07172768 q.v. 1290 [18] P. Bouyer, N. Markey, and O. Sankur, Robustness in timed automata. In Reachability problems (P. A. Abdulla and I. Potapov, eds.). Proceedings of the 7th International Workshop (RP 2013) held in Uppsala, September 24–26, 2013. Lecture Notes in Computer Science, 8169. Springer, Berlin, 2013, 1–18. MR 3139229 Zbl 1407.68282 q.v. 1290 [19] T. Brihaye, F. Laroussinie, N. Markey, and G. Oreiby, Timed concurrent game structures. In CONCUR 2007 – concurrency theory (L. Caires and V. T. Vasconcelos, eds.). Proceedings of the 18th International Conference held in Lisbon, Portugal, September 3–8, 2007. Lecture Notes in Computer Science, 4703. Springer, Berlin, 2007, 445–459. MR 2490897 Zbl 1151.68510 q.v. 1290 [20] F. Cassez, A. David, E. Fleury, K. G. Larsen, and D. Lime, Efficient on-the-fly algorithms for the analysis of timed games. In CONCUR 2005 – concurrency theory (M. Abadi and L. de Alfaro, eds.). Proceedings of the 16th International Conference held in San Francisco, CA, August 23–26, 2005. Lecture Notes in Computer Science, 3653. Springer, Berlin, 2005, 66–80. MR 2198051 Zbl 1134.68382 q.v. 1290 [21] E. M. Clarke and E. A. Emerson, Design and synthesis of synchronization skeletons using branching time temporal logic. In Logics of programs (D. Kozen, ed.). Proceedings of the 3rd Workshop on Logics of Programs (LOP) held in Yorktown Heights, N.Y., May 4–6, 1981. Lecture Notes in Computer Science, 131. Springer, Berlin, 1982, 52–71. MR 0663750 Zbl 0546.68014 q.v. 1282 [22] C. Courcoubetis and M. Yannakakis, Minimum and maximum delay problems in real-time systems. Form. Methods Syst. Des. 1 (1992), no. 4, 385–415. Zbl 0777.68045 q.v. 1281 [23] M. De Wulf, L. Doyen, and J.-F. Raskin, Almost ASAP semantics: From timed models to timed implementations. In Hybrid systems: computation and control (R. Alur and G. J. Pappas, eds.). Proceedings of the 7 th international workshop, HSCC 2004, held Philadelphia, PA, March 25–27, 2004. Lecture Notes in Computer Science, 2993. Springer, Berlin, 296–310. Zbl 1135.93345 q.v. 1290 [24] D. D’Souza and N. Tabareau, On timed automata with input-determined guards. In Formal techniques, modelling and analysis of timed and fault-tolerant systems (Y. Lakhnech and S. Yovine, eds.). Proceedings of the joint international conferences on formal modelling and analysis of timed systems, FORMATS 2004, and formal techniques in real-time and fault-tolerant systems, FTRTFT 2004, held in Grenoble, France, September 22–24, 2004. Lecture Notes in Computer Science 3253. Berlin, Springer, 68–83. Zbl 1109.68503 q.v. 1289 [25] J. Fearnley and M. Jurdziński, Reachability in two-clock timed automata is PSPACEcomplete. In Automata, languages, and programming (F. V. Fomin, R. Freivalds, M. Z. Kwiatkowska, and D. Peleg, eds.). Part II. Proceedings of the 40 th International Colloquium (ICALP 2013) held at the University of Latvia, Riga, July 8–12, 2013. Lecture

34. Timed automata

[26]

[27] [28]

[29] [30]

[31]

[32] [33]

[34]

[35]

[36]

[37] [38]

1293

Notes in Computer Science, 7966. Springer, Berlin, 2013, 212–223. MR 3109148 Zbl 1327.68118 q.v. 1281 O. Finkel, Undecidable problems about timed automata. In Formal modeling and analysis of timed systems (E. Asarin and P. Bouyer, eds.). Proceedings of the 4th international conference, FORMATS 2006, held in Paris, France, September 25–27, 2006. Lecture Notes in Computer Science, 4202. Springer, Berlin, 2006. 187–199. Zbl 1141.68433 q.v. 1288 J. E. Hopcroft, R. Motwani, and J. D. Ullman, Introduction to automata theory, languages and computation. Second edition. Addison-Wesley Publishing Co., Reading, MA, 2001. Zbl 0980.68066 q.v. 1287 J. E. Hopcroft and J. D. Ullman, Introduction to automata theory, languages, and computation. Addison-Wesley Series in Computer Science. Addison-Wesley, Reading, MA, 1979. MR 0645539 Zbl 0426.68001 q.v. 1266, 1281 P. W. Kopke, The theory of rectangular hybrid automata. Ph.D. thesis. Cornell University, Ithaca, N.Y., 1996. q.v. 1277 M. Kwiatkowska, G. Norman, D. Parker, and J. Sproston, Verification of real-time probabilistic systems. In Modeling and verification of real-time systems (S. Merz and N. Navet, eds.). Formalisms and software tools. ISTE, London, U.K., and John Wiley & Sons, Hoboken, N.J., 2008, Chapter 8, 257–297. q.v. 1290 F. Laroussinie, N. Markey, and Ph. Schnoebelen, Model checking timed automata with one or two clocks. In CONCUR 2004–concurrency theory (P. Gardner and N. Yoshida, eds.). Proceedings of the 5th International Conference, held in London, UK, August 31–September 3, 2004. Springer, Berlin, 2004, 387–401. Zbl 1099.68057 q.v. 1281 S. Lasota and I. Walukiewicz, Alternating timed automata. ACM Trans. Comput. Log. 9 (2008), no. 2, art. no. 10, 27 pp. MR 2398572 Zbl 1367.68172 q.v. 1287 D. Nickovic and N. Piterman, From MTL to deterministic timed automata. In Formal modeling and analysis of timed systems (K. Chatterjee and T. A. Henzinger, eds.). Proceedings of the 8th international conference, FORMATS 2010, held Klosterneuburg, Austria, September 8–10, 2010. Lecture Notes in Computer Science, 6246. Springer, Berlin, 152–167. Zbl 1290.68077 q.v. 1289 J. Ouaknine and J. Worrell, On the decidability and complexity of metric temporal logic over finite words. Log. Methods Comput. Sci. 3 (2007), no. 1, 1:8, 27 pp. MR 2295796 Zbl 1128.03008 q.v. 1287, 1290 A. Pnueli, The temporal logic of programs. In 18 th Annual Symposium on Foundations of Computer Science. SFCS 1977. Held in Providence, R.I., October 31–November 2, 1977. IEEE Press, Computer Society, Long Beach, CA, 1977, 46–57. MR 0502161 IEEEXplore 4567924 q.v. 1282 J.-P. Queille and J. Sifakis, Specification and verification of concurrent systems in CESAR. In International symposium on programming (M. Dezani-Ciancaglini and U. Montanari, eds.). Proceedings of the symposium held in Turin, April 6–8, 1982. Lecture Notes in Computer Science, 137. Springer, Berlin, 1982, 337–351. MR 0807187 Zbl 1142.68440 q.v. 1282 B. Srivathsan, Abstractions for timed automata. PhD thesis. Université Bordeaux 1, Bordeaux, France, 2012. q.v. 1282 P. V. Suman and P. K. Pandya, Determinization and expressiveness of integer reset timed automata with silent transitions. In Language and automata theory and applications (A.-H. Dediu, A.-M. Ionescu, and C. Martín-Vide, eds.). Proceedings of the 3rd International Conference, LATA 2009, held in Tarragona, Spain, April 2–8, 2009. Lecture Notes

1294

Patricia Bouyer

in Computer Science, 5457. Springer, Berlin, 2009, 728–739. MR 2544460 Zbl 1234.68242 q.v. 1289 [39] S. Tripakis, Folk theorems on the determinization and minimization of timed automata. In Formal modeling and analysis of timed systems (K. G. Larsen and P. Niebert, eds.). 1st International Workshop, FORMATS 2003, held in Marseille, France, September 6–7, 2003. Revised papers. Lecture Notes in Computer Science, 2791. Springer, Berlin, 2004, 182–188. Zbl 1099.68648 q.v. 1288

Chapter 35

Higher-order recursion schemes and their automata models Arnaud Carayol and Olivier Serre

Contents 1. 2. 3. 4. 5.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . Preliminaries . . . . . . . . . . . . . . . . . . . . . . . From CPDA to recursion schemes . . . . . . . . . . . . . From recursion schemes to collapsible pushdown automata Safe higher-order recursion schemes . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

1295 1300 1316 1322 1334

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1336

1. Introduction The main goal of this chapter is to give a self-contained presentation of the equivalence between two models: higher-order recursion schemes and collapsible pushdown automata. Roughly speaking, a recursion scheme is a finite typed term rewriting system and a natural view of recursion schemes is to be considered generators for (possibly infinite) trees. Collapsible pushdown automata (CPDA) are an extension of deterministic (higher-order) pushdown automata and they naturally induce labelled transition systems (Lts). An Lts is merely a set of relations labelled by a finite alphabet, together with a distinguished element called the root. Hence unfolding an Lts and contracting silent transitions define an infinite tree. Applying this construction to CPDA defines a family of trees that exactly coincides with the family of trees defined by higher-order recursion schemes. This introduction tries to provide the necessary background and motivation for these objects. Recursive applicative program schemes. Historically, recursion schemes go back to Nivat’s recursive applicative program schemes [47] that correspond to order-1 recursion schemes in our sense (also see related work by Garland and Luckham on so-called monadic recursion schemes [30]). We refer the reader to [24] that, among others things, contains a very detailed and rich history of the topic. For Nivat, a recursive applicative program scheme is a finite system of equations, each of the form Fi .x1 ; : : : ; xn / D pi , where the xj are order-0 variables and pi is some order-0 term over the nonterminals (the Fi ’s), terminals, and the variables x1 ; : : : ; xk . In Nivat’s work, a program is a pair: a program scheme together with an interpretation over some domain. An interpretation gives any terminal a function (of the correct rank) over the

1296

Arnaud Carayol and Olivier Serre

domain. Taking the least fixed point of the rewriting rules of a program scheme gives a (possibly infinite) term over the terminal alphabet (known as the value of the program in the free/Hebrand interpretation); applying the interpretation to this infinite term gives the value of the program. Hence, the program scheme gives the uninterpreted syntax tree of some functional program that is then fully specified owing to the interpretation. Nivat also defined a notion of equivalence for program schemes: two schemes are equivalent if and only if they compute the same function under every interpretation. Later, Courcelle and Nivat [19] showed that two schemes are equivalent if and only if they generate the same infinite term tree. This latter result clearly underscores the importance of studying the trees generated by a scheme. Following the work by Courcelle (see [16] and [17]), the equivalence problem for schemes is known to be interreducible to the problem of decidability of language equivalence between deterministic pushdown automata (DPDA). Research on the equivalence for program schemes was halted until Sénizergues in [58] and [59] established the decidability of DPDA equivalence, which therefore also solved the scheme equivalence problem. Sénizegues’ proof was later simplified and improved by Stirling in [62] and [60]. For more insight about this topic, we refer the reader to [61]. Extension of schemes to higher orders. A recursive function is said to be of higherorder if it takes arguments that are themselves functions. In Nivat’s program scheme, both the nonterminals and the variables have order-0. Therefore, they cannot be used to model higher-order recursive programs. In the late 1970s, there was a substantial effort in extending program schemes in order to capture higher-order recursion, see [35], [20], [21], [27], and [28]. Note that evaluation, i.e., computing the value of a scheme in some interpretation, has been a very active topic, in particular because different evaluation policies, e.g., call by name (OI) or call by value (IO), lead to different semantics, see [27], [28], and [22]. In a very influential paper [22], Damm introduced order-n -schemes and extended the previously mentioned result of Courcelle and Nivat. Damm’s schemes mostly coincide with the safe fragment of recursion schemes as we define them later in this chapter. Note that at that time there was no known model of automata equi-expressive with Damm’s scheme; in particular, there was no known reduction of the equivalence problem for schemes to a language equivalence problem for (some model of) automata. Later, Damm and Goerdt in [22] and [23] considered the word languages generated by level-n -schemes and they showed that they coincide with a hierarchy previously defined by Maslov in [44] and [45]. To define his hierarchy Maslov introduced higherorder pushdown automata (higher-order PDA). He also gave an equivalent definition of the hierarchy in terms of higher-order indexed grammars. In particular, Maslov’s hierarchy offers an attractive classification of the semi-decidable languages: orders 0, 1, and 2 are, respectively, the regular, context-free, and indexed languages, though little is known about languages at higher orders (see [34] for recent results on this topic). Later, Engelfriet in [25], and [26] considered the characterisation of complexity classes by higher-order pushdown automata. In particular, he showed that alternating pushdown automata characterise deterministic iterated exponential time complexity classes.

35. Higher-order recursion schemes and their automata models

1297

Higher-order recursion schemes as generators of infinite structures. Since the late 1990s there has been a strong interest in infinite structures admitting finite descriptions (either internal, algebraic, logical or transformational), mainly motivated by applications to program verification. See [5] for an overview about this topic. The central question is model-checking: given some presentation of a structure and some formula, decide whether the formula holds. Of course, here decidability is a trade-off between the richness of the structure and the expressivity of the logic. Of special interest are tree-like structures. Higher-order PDA as a generating device for (possibly infinite) labelled ranked trees were first studied by Knapik, Niwiński and Urzyczyn [37]. As in the case of word languages, an infinite hierarchy of trees is defined according to the order of the generating PDA; lower orders of the hierarchy are well-known classes of trees: orders 0, 1, and 2 are respectively the regular [53], algebraic [18], and hyperalgebraic trees [36]. Knapik et al. considered another method of generating such trees, namely by higher-order (deterministic) recursion schemes that satisfy the constraint of safety. A major result in their work is the equi-expressivity of both methods as tree generators. In particular, it implies that the equivalence problem for higher-order safe recursion schemes is interreducible to the problem of decidability of language equivalence between deterministic higher-order PDA. An alternative approach was developed by Caucal, who introduced [15] two infinite hierarchies, one made of infinite trees and the other made of infinite graphs, defined by means of two simple transformations: unfolding, which goes from graphs to trees, and inverse rational mapping (or MSO-interpretation [14]), which goes from trees to graphs. He showed that the tree hierarchy coincides with the trees generated by safe schemes. However the fundamental question open since the early 1980s of finding a class of automata that characterises the expressivity of higher-order recursion schemes was left open. Indeed, the results of Damm and Goerdt, as well as those of Knapik et al. may only be viewed as attempts to answer the question, as they have both had to impose the same syntactic constraints on recursion schemes, called of derived types and safety, respectively, in order to establish their results. A partial answer was later obtained by Knapik, Niwiński, Urzyczyn, and Walukiewicz, who proved that order-2 homogeneously-typed (but not necessarily safe) recursion schemes are equi-expressive with a variant class of order-2 pushdown automata called panic automata [38]. Finally, Hague, Murawski, Ong, and Serre gave a complete answer to the question in [32]. They introduced a new kind of higher-order pushdown automata, which generalises pushdown automata with links [2], or equivalently panic automata, to all finite orders, called collapsible pushdown automata (CPDA), in which every symbol in the stack has a link to a (necessarily lower-ordered) stack situated somewhere below it. A major result of their paper is that for every n > 0, order-n recursion schemes and order-n CPDA are equi-expressive as generators of trees. Decidability of monadic second order logic. This quest for finding an alternative description of those trees generated by recursion schemes took place in parallel with the study of the decidability of the model-checking problem for monadic second-order

1298

Arnaud Carayol and Olivier Serre

logic (MSO) and modal -calculus (see [63], [3], [31], and [29] for background about these logics and connections with finite automata and games). Decidability of the MSO theories of trees generated by safe schemes was established by Knapik, Niwiński and Urzyczyn [37] and then Caucal [15] proved a stronger decidability result that holds on graphs as well. The decidability for order-2 unsafe schemes follows from [38] and was obtained thanks to the equi-expressivity with panic automata. This result was independently obtained in [2] with similar techniques. In 2006, Ong showed the decidability of MSO for arbitrary recursion schemes [48], and established that this problem is n-EXPTIME complete. This result was obtained using tools from innocent game semantics (in the sense of Hyland and Ong [33]) and does not rely on an equivalent automata model for generating trees. Thanks to their equi-expressivity result, Hague et al. provided an alternative proof of the MSO decidability for schemes. Indeed, thanks to the equi-expressivity between schemes and CPDA together with the well-known connections between MSO modelchecking (for trees) and parity games, the model-checking problem for schemes is interreducible to the problem of deciding the winner in a two-player perfect information turn-based parity game played over the Lts (i.e., transition graph) associated with a CPDA. They extended the techniques and results of Walukiewicz (for pushdown games) [64], Cachat (for higher-order pushdown) [11] (also see [12] for a more precise study on higher-order pushdown games) and the one from Knapik et al. [38]. These techniques were later extended by Broadbent, Carayol, Ong, and Serre to establish stronger results on schemes – in particular closure under MSO marking [8] – and later by Carayol and Serre to prove that recursion schemes enjoy the effective MSO selection property [13]. Some years later, following initial ideas by Aehlig [1], Kobayashi [40], and Kobayashi and Ong [43] gave another proof of the decidability of MSO. The proof consists of showing that one can associate, with any scheme and formula, a typing system (based on intersection types) such that the scheme is typable in this system if and only if the formula holds. Typability is then reduced to solving a parity game. Using the Y -calculus and Krivine Machines, Salvati and Walukiewicz proposed an alternative approach for the decidability of MSO, as well as, a new proof for the equivalence between schemes and CPDA, see [55], and [56]. In particular, the translation from schemes to CPDA is very similar to the one that we present in this chapter and was independently obtained by the authors in [13]. Recently, Parys established decidability of weak-MSO logic extended by the unbounding quantifier (WMSO+U), for schemes [52]. Verification of higher-order programs. Functional languages such as Haskell, OCaML and Scala strongly encourage the use of higher-order functions. This represents a challenge for software verification, which usually does not model recursion accurately, or models only first-order calls (e.g., SLAM [4] and Moped [57]). However higher-order recursion schemes offer a way of abstracting functional programs in

35. Higher-order recursion schemes and their automata models

1299

a manner that precisely models higher-order control-flow, and because of the -calculus/MSO decidability results for them, it opened a very active line of research toward the verification of higher-order programs. Even reachability properties (subsumed by the -calculus) are very useful in practice: indeed, as a simple example, the safety of incomplete pattern matching clauses could be checked by asking whether the program can reach a state where a pattern match failure occurs. More complex reachability properties can be expressed using a finite automaton and could, for example, specify that the program respects a certain discipline when accessing a particular resource (see [42] for a detailed overview of the field). Despite even reachability being .n 1/-EXPTIME complete, recent research has revealed that useful properties of HORS can be checked in practice. Kobayashi’s TRecS [39] tool, which checks properties expressible by a deterministic trivial Büchi automaton (all states accepting), was the first to demonstrate modelchecking of schemes was possible in practice. It works by determining whether a HORS is typable in an intersection type system characterising the property to be checked [42]. In a bid to improve scalability, a number of other algorithms have subsequently been designed and implemented, such as Kobayashi’s GTRecS(2) [41] and Neatherway, Ramsay, and Ong’s TravMC [46] tools, all based on intersection type inference. Another approach, providing a fresh set of tools that contrast with previous intersection type techniques, was developed by Broadbent, Carayol, Hague and Serre, relying on an automata-theoretic perspective [9]. Their idea is to start from a recursion scheme and to translate it to an equivalent CPDA, and then perform the verification on the latter. In order to avoid state explosion, they used saturation methods (that were well known to work successfully for pushdown systems [57]) together with an initial forward analysis. This lead to the C-SHORe tool, which is the first model-checking tool for the (direct) analysis of collapsible pushdown systems. Since C-SHORe was released, two new tools were developed. Broadbent and Kobayashi introduced HorSat (later subsumed by HorSat2), which is an application of the saturation technique and initial forward analysis directly to intersection type analysis of recursion schemes [10]. Secondly, Ramsay, Neatherway and Ong introduced Preface [54], using a type-based abstraction-refinement algorithm that attempts to simultaneously prove and disprove the property of interest. Both HorSat2 and Preface perform significantly better than previous tools. Structure of this chapter. Higher-order recursion schemes are a very rich domain and we had to make some choices for both the presentation and the content of this chapter. We decided to devote a large part to the equi-expressivity result between recursion schemes and collapsible pushdown automata. Indeed, it was a longstanding open question in the field; it allowed providing an automata-based proof of the decidability of MSO for recursion schemes; and it gives a tool to who wants to tackle the equivalence problem for recursion schemes (which is interreducible to language equivalence for deterministic CPDA). The presentation of the proof we give is novel and can be thought as a simplification of the original proof in [32]. First, it introduces an alternative definition of schemes called labelled recursion schemes by means of labelled transition

1300

Arnaud Carayol and Olivier Serre

systems. In these labelled transition systems, the domain is composed of the ground terms built using the non-terminal of the scheme; the relations come from the rewriting rules of the schemes and are labelled by terminals. Second, it presents a transformation from a recursion scheme to a CPDA, which only uses basic automata techniques, and does not appeal to objects from game semantics such as traversals. Nevertheless, it is important to stress that, even if concepts like traversals are no longer present in our proof, the key ideas come from [32] and the CPDA one derives from a scheme is the same as the one defined in [32]. The article is organised as follows. § 2 introduces the main concepts (schemes and CPDA) together with examples. Then in § 3 we give a transformation from CPDA to schemes and in § 4 we provide the converse transformation. Finally, § 5 is devoted to the notion of safety.

2. Preliminaries 2.1. Trees and terms. Let A be a finite alphabet. We let A denote the set of finite words over A, and we refer to a subset of A as a language over A. A tree t with directions in A (or simply a tree if A is clear from the context) is a non-empty prefixclosed subset of A . Elements of t are called nodes and " is called the root of t . For a node u 2 t , the subtree of t rooted at u, denoted tu , is the tree ¹v 2 A j u  v 2 tº. We let Trees1 .A/ denote the set of trees with directions in A. A ranked alphabet A is an alphabet together with an arity function, %W A ! N. The terms built over a ranked alphabet A are those trees with directions ´ [ ¹f1 ; : : : ; f%.f / º if %.f / > 0, def E E E AD f; where f D ¹f º if %.f / D 0. f 2A E to be a term, we require, for all nodes u, that the set For a tree t 2 Trees1 .A/ E Au D ¹d 2 A j ud 2 tº is empty if and only if u ends with some f 2 A (hence %.f / D 0) and if Au is non-empty, then it is equal to some fE for some f 2 A. We let Terms.A/ denote the set of terms over A. For c 2 A of arity 0, we let c denote the term ¹"; cº. For f S 2 A of arity n > 0 and for terms t1 ; : : : ; tn , we let f .t1 ; : : : ; tn / denote the term ¹"º [ i 2Œ1;n ¹fi º  ti . These notions are illustrated in Figure 1.

2.2. Labelled transition systems. A rooted labelled transition system is an edgelabelled directed graph with a distinguished vertex, called the root. Formally, a rooted a labelled transition system L (Lts for short) is a tuple h D; r; †; . !/a2† i, where D is a finite or countable set called the domain, r 2 D is a distinguished element called the a root, † is a finite set of labels, and for all a 2 †, !  D  D is a binary relation on D .

35. Higher-order recursion schemes and their automata models f1

f

f2 c

c

f1

1301

f

f2 c

c

Figure 1. Two representations of the infinite term f2 ¹f1 c; f1 ; "º D f .c; f .c; f .   /// over the ranked alphabet ¹f; cº, assuming that %.f / D 2 and %.c/ D 0 a

For any a 2 † and any pair .s; t/ 2 D 2 we write s ! t to indicate that a .s; t/ 2 !, and we refer to it as an a-transition with source s and target t . For a w w word w D a1 : : : an 2 † , we define a binary relation ! on D by letting s ! t w (meaning that .s; t/ 2 !) if there exists a sequence s0 ; : : : ; sn of elements in D such ai that s0 D s , sn D t , and for all i 2 Œ1; n, si 1 ! si . These definitions are extended to L

w

languages over † by taking, for all L  † , the relation ! to be the union of all ! for w 2 L. When considering Lts associated with computational models, it is usual to allow silent (or internal) transitions. The symbol for silent transitions is usually " but here, to avoid confusion with the empty word, we will use  instead. Following [60], p. 31, we forbid a vertex to be the source of both a silent transition and a non-silent transition. a Formally, an Lts with silent transitions is an Lts h D; r; †; . !/a2† i whose set of labels contains a distinguished symbol, denoted  2 † and such that for all s 2 D , if s is the source of a -transition, then s is not the source of any a-transition with a ¤ . We let † denote the set † n ¹º of non-silent transition labels. For all words w D w

def

Lw

a1 : : : an 2 † , we let H) denote the relation !, where Lw D  a1  : : :  an 

is the set of words over † obtained by inserting arbitrarily many occurrences of  in w . An Lts (with silent transitions) is said to be deterministic if for all s; t1 and t2 in D a a and all a in †, if s ! t1 and s ! t2 , then t1 D t2 . Caveat 2.1. From now on, we always assume that the Lts we consider are deterministic.

We associate a tree with every Lts with silent transitions L, denoted Tree.L/, with directions in † , reflecting the possible behaviours of L starting from the root. For this we let def

w

Tree.L/ D ¹w 2 † j 9s 2 D; r H) sº:

Arnaud Carayol and Olivier Serre

1302

As L is deterministic, Tree.L/ is obtained by unfolding the underlying graph of L from its root and contracting all -transitions. Figure 2 presents an Lts with silent transitions together with its associated tree Tree.L/. As illustrated in Figure 2, the tree Tree.L/ does not reflect the diverging behaviours of L (i.e., the ability to perform an infinite sequence of silent transitions). For instance in the Lts of Figure 2, the vertex s diverges, whereas the vertex t does not. A more informative tree can be defined in which diverging behaviours are indicated by a ?child for some fresh symbol ?. This tree, denoted Tree? .L/, is defined by letting wn

def

Tree? .L/ D Tree.L/ [ ¹w? 2 † ? j 8n > 0; r H) sn for some sn º:

a a s

r

b

b

c

a

b

?

c

t u

a

c

b

a

b

?

Figure 2. An Lts L with silent transitions of root r (on the left), the tree Tree.L/ (in the centre) and the tree Tree? .L/ (on the right)

2.3. Higher-order recursion schemes. Recursion schemes are grammars for simply typed terms, and they are often used to generate a possibly infinite term. Hence before introducing recursion schemes, we start with some necessary definitions about simply typed terms. Also note that recursion schemes are not traditionally associated with an Lts. Hence we start with the standard definition of recursion schemes as generators for infinite terms, and then we provide an alternative definition based on Lts. 2.3.1. Simply typed terms. Types are generated by the grammar  WWD o j  !  . Every type  6D o can be uniquely written as 1 ! .2 !    .n ! o/    / where n > 0 and 1 ; : : : ; n are types. The number n is the arity of the type and is denoted by %./. To simplify the notation, we adopt the convention that the arrow is associative to the right and we write 1 !    ! n ! o (or .1 ; : : : ; n ; o/ to save space). Intuitively, the base type o corresponds to base elements (such as int in ML). An arrow type 1 ! 2 corresponds to a function taking an argument of type 1 and returning an element of type 2 . Even if there are no specific types for functions taking more than one argument, those functions are represented in their curried form. Indeed, a function taking two arguments of type o and returning a value of type o, in its curried

35. Higher-order recursion schemes and their automata models

1303

form, has the type o ! o ! o D o ! .o ! o/; intuitively, the function only takes its first argument and returns a function expecting the second argument and returning the desired result. The order measures the nesting of a type. Formally one defines ord.o/ D 0 and ord.1 ! 2 / D max.ord.1 / C 1; ord.2 //. Alternatively for a type  D .1 ; : : : ; n ; o/ of arity n > 0, the order of  is the maximum of the orders of the arguments plus one, i.e., ord./ D 1 C max¹ord.i / j 1 6 i 6 nº. Example 2.1. The type o ! .o ! .o ! o// has order 1 while ..o ! o/ ! o/ ! o has order 3.

Let X be a set of typed symbols. For every symbol f 2 X , and every type  , we write f W  to mean that f has type  . The set of applicative terms 1 of type  generated from X , denoted Terms .X /, is defined by induction over the following rules. If f W  is an element of X then f 2 Terms .X /; if s 2 Terms1 !2 .X / and t 2 Terms1 .X / then the applicative term obtained by applying t to s , denoted st , belongs to Terms2 .X /. For every applicative term t , and every type  , we write tW  to mean that t is an applicative term of type  . By convention, the application is considered to be left-associative, and thus we write t1 t2 t3 instead of .t1 t2 /t3 . Example 2.2. Assuming that f and g are two function symbols of respective types .o ! o/ ! o ! o and o ! o and c is a constant symbol of type o, we have gcW o;

f gW o ! o;

f gc D .f g/cW o;

f .f g/cW o:

The set Subs.t/ of subterms of t is inductively defined by Subs.f / D ¹f º for f 2 X and Subs.t1 t2 / D Subs.t1 / [ Subs.t2 / [ ¹t1 t2 º. The subterms of the term f .f g/cW o in Example 2.2 are f .f g/c; f ; f g; f .f g/; c and g. A less permissive notion is that of argument subterms of t , denoted ASubs.t/, which only keep those subterms that appear as an argument. The set ASubs.t/ is inductively defined by letting ASubs.t1 t2 / D ASubs.t1 / [ ASubs.t2 / [ ¹t2 ºS and ASubs.f / D ¿ for f 2 X . In particular if n t D F t1 : : : tn , ASubs.t/ D i D1 .ASubs.ti / [ ¹ti º/. The argument subterms of f .f g/cW o are f g; c and g . In particular, for all terms t , one has jASubs.t/j < jtj.

Fact 1. Any applicative term t over X can be uniquely written as F t1 : : : tn where F is a symbol in X of arity %.F / > n and ti are applicative terms for all i 2 Œ1; n. Moreover if F has type .1 ; : : : ; %.F / ; 0/ 2 X , then for all i 2 Œ1; n, ti has type i and tW .nC1 ; : : : ; %.F / ; 0/. Remark 2.2. In the following, we will simply write “term” instead of “applicative term” and let Terms.X / denote the set of applicative terms of ground type over X . It should be clear from the context if we are referring to applicative terms over a typed alphabet or terms over a ranked alphabet. Of course, a ranked alphabet A can be seen as a typed alphabet by assigning the type o  !o ! „ !   ƒ‚ …o %.f /

1 Which should not be confused with terms over a ranked alphabet (cf. Remark 2.2).

1304

Arnaud Carayol and Olivier Serre

to every symbol f of A. In particular, every symbol in A has order 0 or 1. The finite terms over A (seen as a ranked alphabet) are in bijection with the applicative ground terms over A (seen as a typed alphabet). 2.3.2. Recursion schemes. For each type  , we assume an infinite set V of variables of type  , such that V1 and V2 are disjoint whenever 1 6D 2 , and we write V for the union of those sets V as  ranges over types. We use letters x; y; '; ; ; ; : : : to range over variables. A (deterministic) recursion scheme is a 5-tuple S D h A; N; R; Z; ? i where  A is a ranked alphabet of terminals and ? is a distinguished terminal symbol of arity 0 (and hence of ground type) that does not appear in any production rule,  N is a finite set of typed non-terminals; we use upper-case letters F; G; H; : : : to range over non-terminals,  Z 2 N is a distinguished initial symbol of type o which does not appear in any right-hand side of a production rule,  R is a finite set of production rules, one for each non-terminal F W .1 ; : : : ; n ; o/, of the form F x1 : : : xn ! e where the xi are distinct variables with xi W i for i 2 Œ1; n and e is a ground term in Terms..An¹?º/[.N n¹Zº/[¹ x1 ; : : : ; xn º/. Note that the expressions on both sides of the arrow are terms of ground type. The order of a recursion scheme is defined to be the highest order of (the types of) its non-terminals. 2.3.3. Rewriting system associated with a recursion scheme. A recursion scheme S induces a rewriting relation, denoted !S , over Terms.A [ N /. Informally, !S replaces any ground subterm F t1 : : : t%.F / starting with a non-terminal F by the righthand side of the production rule F x1 : : : xn ! e in which the occurrences of the “formal parameter” xi are replaced by the actual parameter ti for i 2 Œ1; %.F /. The term M Œt=x obtained by replacing a variable xW  by a term tW  over A [ N in a term M over A [ N [ V is defined 2 by induction on M by taking .t1 t2 /Œt=x D t1 Œt=xt2 Œt=x; 'Œt=x D '

for ' 2 A [ N [ V if ' ¤ x;

xŒt=x D t:

The rewriting system !S is defined by induction using the following rules:    Substitution: F t1 : : : tn !S e xt11 ; : : : ; xtnn where F x1 : : : xn ! e is a production rule of S;  Context: if t !S t 0 then .st/ !S .st 0 / and .ts/ !S .t 0 s/. 2 Note that t does not contain any variables and hence we do not need to worry about capture of variables.

35. Higher-order recursion schemes and their automata models

1305

Example 2.3. Consider S, the order-2 recursion scheme with the set of non-terminals ¹ZW o; H W .o; o/; F W ..o; o; o/; o/º, variables ¹zW o; 'W .o; o; o/º, terminals A D ¹f; aº of arity 2 and 0 respectively, and the following rewrite rules: Z ! f .H a/.F f /; H z ! H .H z/; F ' ! 'a.F '/:

Figure 3 depicts the first rewriting steps of !S , starting from the initial symbol Z . f

a

f

H a

f

H

f

a

f

a

F

a

F

f

f f f f

H Z

H

F

a

f

a

H

F

a

f

f

f

H

F

H

F

H

f

H

f

a

H a

Figure 3

As illustrated above, the relation !S is confluent, i.e., for all ground terms t , t1 , and t2 , if t !S t1 and t !S t2 (here !S denotes the transitive closure of !S ), then there exists t 0 such that t1 !S t 0 and t2 !S t 0 . The proof of this statement is similar to proof of the confluence of the lambda-calculus [6]. 2.3.4. Value tree of a recursion scheme. Informally the value tree of (or the tree generated by) a recursion scheme S, denoted ŒŒ S , is a (possibly infinite) term, constructed from the terminals in A, that is obtained as the “limit” of the set of all terms that can obtained by iterative rewriting from the initial symbol Z .

Arnaud Carayol and Olivier Serre

1306

The terminal symbol ?W o is used to formally restrict terms over A [ N to their terminal symbols. We define a map ./? W Terms.A [ N / ! Terms.A/ that takes an applicative term and replaces each non-terminal, together with its arguments, by ?W o. We define ./? inductively as follows, where a ranges over A-symbols, and F over non-terminals in N : a? D a;

F ? D ?; ´ ? .st/? D .s ? t ? /

if s ? D ?; otherwise.

Clearly if t 2 Terms.A [ N / is of ground type then t ? 2 Terms.A/ is of ground type as well. Terms built over A can be partially ordered by the approximation ordering 4 defined for all terms t and t 0 over A by t 4 t 0 if t \ .AE n ¹?º/  t 0 . In other terms, t 0 is obtained from t by substituting some occurrences of ? by arbitrary terms over A. The set of terms over A together with 4 form a directed complete partial order, meaning that any directed 3 subset D of Terms.A/ admits a supremum, denoted sup D . Clearly if s !S t then s ? 4 t ? . The confluence of the relation !S implies that the set ¹ t ? j Z !S t º is directed. Hence the value tree of (or the tree generated by) S can be defined as its supremum, ŒŒ S  D sup¹ t ? j Z !S t º:

We write RecTreen A for the class of value trees ŒŒ S , where S ranges over order-n recursion schemes. Example 2.4. The value tree of the recursion scheme S of Example 2.3 is as in Figure 4.

f ? a

f

? ,

f D sup

f f

a

?

f

, ?

, ...

?

f a

a

f

?

a

Figure 4 3 A set D is directed if D is not empty and for all x; y 2 D , there exists z 2 D such that x 4 z and y 4 z)

35. Higher-order recursion schemes and their automata models

1307

Remark 2.3. The relation !S is unrestricted, in the sense that any ground subterm starting with a non-terminal can be rewritten. A more constrained rewriting policy referred to as outermost-innermost (OI) only allows rewriting a ground non-terminal subterm if it is not below any non-terminal symbols (i.e., it is outermost) [22]. The corresponding rewriting relation is denoted !S;OI . Note that using !S;OI instead of !S does not change the value tree of the scheme, i.e., sup¹ t ? j Z !S t º D sup¹ t ? j Z !S;OI t º. Another rewriting policy referred to as innermost-outermost (IO) only allows rewriting a ground non-terminal subterm if this subterm does not contain a ground nonterminal as subterm (i.e., it is innermost) [22]. The corresponding rewriting relation is denoted !S;IO. Note that using !S;IO instead of !S may change the value tree of the scheme. Indeed, consider as an example the recursion scheme S0 obtained from the scheme S in Example 2.3 by replacing its first production rule by the following two rules: Z ! K.H a/.F f /;

Kxy ! f xy:

Hence, we just added an intermediate non-terminal K , and one easily checks that ŒŒ S  D ŒŒ S0 . As the non-terminal H is not productive, following the IO policy, the second production rule will never be used, and therefore sup¹ t ? j Z !S0 ;IO t º D ?. 2.3.5. Labelled recursion schemes. A labelled recursion scheme is a recursion scheme without terminal symbols but whose productions are labelled by a finite alphabet. This slight variation in the definition allows us to associate a Lts with every labelled recursion scheme. A deterministic labelled recursion scheme is a 5-tuple S D h †; N; R; Z; ? i where  † is a finite set of labels and ? is a distinguished symbol in †,  N is a finite set of typed non-terminals; we use upper-case letters F; G; H; : : : to range over non-terminals,  ZW o 2 N is a distinguished initial symbol which does not appear in any righthand side,  R is a finite set of production rules of the form a

F x1 : : : xn ! e

where a 2 † n ¹?º, F W .1 ; : : : ; n ; o/ 2 N , the xi are distinct variables, each xi is of type i , and e is a ground term over .N n ¹Zº/ [ ¹ x1 ; : : : ; xn º. In addition, we require that there is at most one production rule starting with a given non-terminal and labelled by a given symbol.

Arnaud Carayol and Olivier Serre

1308

The Lts associated with S has the set of ground terms over N as domain, the initial a symbol Z as root, and, for all a 2 †, the relation ! is defined by   a t F t1 : : : t%.F / ! e xt11 ; : : : ; x%.F / %.F / a

if F x1 : : : xn ! e is a production rule. The tree generated by a labelled recursion scheme S, denoted Tree? .S/, is the tree Tree? of its associated Lts. To use labelled recursion schemes to generate terms over a ranked alphabet A, it is enough to enforce that for every non-terminal F 2 N :  either there is a unique production starting with F which is labelled by ,  or there is a unique production starting with F which is labelled by some symbol c of arity 0 and whose right-hand side starts with a non-terminal that comes with no production rule in the scheme,  or there exists a symbol f 2 A with %.f / > 0 such that the set of labels of production rules starting with F is exactly fE. fN

F f2

fN Z

H

F

aN

fN

fN

H f1

Hz F'

fN

aN

a

X

f2

H H aN

Z

f1

F

aN

fN .H a/ N .F fN/ H .H z/ ' aN .F '/

aN aN

fN x y fN x y

a f1 f2

X x y

Figure 5. A labelled recursion scheme generating the same term as the scheme of Example 2.3

Recursion schemes and labelled recursion schemes are equi-expressive for generating terms. Theorem 2.4. The recursion schemes and the labelled recursion schemes generate the same terms. Moreover the translations are linear and preserves order and arity. Proof. Let S D h A; N; R; Z; ? i be a recursion scheme. We define a labelled recursion E N 0 ; R0 ; Z; ? i generating the term ŒŒ S . For each terminal symbol scheme S0 D h A;

35. Higher-order recursion schemes and their automata models

1309

f 2 A, we introduce a non-terminal symbol, denoted fNW o !o! … o: „ !  ƒ‚ %.f /

The set of non-terminal symbols of S0 is N [ ¹fN j f 2 Aº [ ¹X º, where X is assumed to be a fresh non-terminal. With a term t over A [ N , we associate the term tN over N 0 obtained by replacing every occurrence of a terminal symbol f by its nonterminal counterpart fN. The production rules of S0 are as follows: 

¹F x1 : : : xn ! eN j F x1 : : : xn ! e 2 Rº

fi [ ¹fNx1 : : : x%.f / ! xi j f 2 A with %.f / > 0 and i 2 Œ1; %.f /º c

[ ¹cN ! X j c 2 A with %.c/ D 0º: E N; R; Z; ? i be a labelled Conversely, let A be ranked alphabet and let S D h A; recursion scheme respecting the syntactic restrictions mentioned above. We define a recursion scheme S0 D h A; N; R0 ; Z; ? i generating the same term as S. The set of production rules of S0 are defined as follows: 

 if F x1 : : : xn ! e belongs to R (in this case it is the only rule starting with F ) then F x1 : : : xn ! e belongs to R0 ; c  if, for some c of arity 0, F x1 : : : xn ! e belongs to R (in this case it is the only rule starting with F and e starts with a non-terminal that has no rule in R) then F x1 : : : xn ! c belongs to R0 ; fi

 if, for some f 2 A of arity %.f / > 0, F x1 : : : xn ! ei belongs to R for all 1 6 i 6 %.f /, then F x1 : : : xn ! f e1 : : : e%.f / belongs to R0 .

2.3.6. Examples of trees defined by labelled recursion schemes. In this section, we provide some examples of trees defined by labelled recursion schemes. Given a language L over †, we let Pref .L/ denote the tree in Trees1 .†/ containing all prefixes of words in L. The tree Pref .¹an bn j n > 0º/. Let us start with the tree T0 corresponding to the deterministic context-free language Pref .¹an b n j n > 0º/. As is the case for all prefix-closed deterministic context-free languages (see [16] and [17] or Theorem 4.8 at order 1), T0 is generated by an order-1 scheme S0 . a

Z ! H X; b

Bx ! x;

a

H x ! H .Bx/; b

H x ! x;

with Z; X W o and H; BW o ! o. The tree generated by S0 is given in Figure 6.

Arnaud Carayol and Olivier Serre

1310 Z

a

a

HX

a

H .B X/

b



b

b

X

a

H .B .B X//

B .B X/

BX b

b

X

BX b X

Figure 6

The tree Pref .¹an bn c n j n > 0/. Using order-2 schemes, it is possible to go beyond deterministic context-free languages and define, for instance the tree T1 D Pref .¹an b n c n j n > 0º/. Consider the order-2 scheme S1 given by a

Z ! F I .KC I /; b

Bx ! x; c

! F .KB'/.KC /;

F'

!

b

.'X /;



C x ! x;

K' x ! '. .x//;



with

a

F'

I x ! x:

 Z; X W o,  B; C; I W o ! o,  F W ..o ! o/; .o ! o/; o/, and  KW ..o ! o/; .o ! o/; o; o/. Intuitively, the non-terminal K plays the role of the composition of functions of 

type o ! o (i.e., for any terms F1 ; F2 W o ! o and tW o, KF1 F2 t ! F1 .F2 t/). For any term GW o ! o, we define G n for all n > 0 by taking G 0 D I and G nC1 D KGG n . For bn

any ground term t , G n t behaves as G.: : : .G .I t// : : : / and, in particular B n X H) X . „ ƒ‚ … n

For all n > 0, we have

an

Z ! F Bn n

1

b

C n ! C n .B n

1

bn

1cn

X / HHHH) X:

The tree Pref .¹an cb2 j n > 0º/. Following the same ideas as for S1 , the tree n

Texp D Pref .¹an cb 2 j n > 0º/:

is define by the order-2 scheme Sexp given below: 

Z ! F B; b

Bx ! x;

a

F ' ! F .D'/; c

F ' ! 'X;



D'x ! '.'x/;

35. Higher-order recursion schemes and their automata models

1311

with Z; X W o;

BW o ! o;

DW .o ! o; o; o/;

F W .o ! o; o/:

If we let D B denote the term of type o ! o defined by n

D0 B D B

and D nC1 B D D.D n B/

for n > 0, we have an

Z H) F D n B:

n

As, intuitively, D doubles its argument, D n B behaves like B 2 for n > 0. In particular, n D n BX reduces by b 2 to X . For all n > 0, an

b2

c

n

Z H) F D n B ! D n BX H) X:

The trees corresponding to the tower of exponentials of height k. At order k C 1 > 1, we can define the tree Texpk D Pref .¹an cb expk .n/ j n > 0º/ where we let exp0 .n/ D n and expkC1 .n/ D 2expk .n/ for k > 0. We illustrate the idea by giving an order-3 scheme 2n generating Texp2 D Pref .¹an cb 2 j n > 0º/, 

Z ! F D1 ;

a

! F .D2 /;

F

b

c

Bx ! x;

F ' ! 'BX



D2 'x ! . . '//x; 

D1 x !

. x/;

with Z; X W o;

BW o ! o;

D1 W .o ! o; o; o/;

F W ..o ! o; o; o/; o/;

D2 W ..o ! o; o; o/; o ! o; o; o/:

If we let D2n D1 denote the term of type .o ! o; o; o/ defined by D20 D1 D D1

and D2nC1 D1 D D2 D2n D1

for n > 0,

then an

Z H) F D2n D1 : n

As D2 intuitively double its argument with each application, D2n D1 behaves as D12 2n

n

and hence D12 B behaves as B 2 . For all n > 0, an

c

b2

2n

Z H) F D2n D1 ! D2n D1 BX H) X:

Arnaud Carayol and Olivier Serre

1312

The tree of the Urzyczyn language. All schemes presented in this section satisfy a syntactic restriction, called the safety condition, that will be discussed in the last section of this chapter. Paweł Urzyczyn conjectured that (a slight variation) of the tree described below, though generated by a order-2 scheme, could not be generated by any order-2 scheme satisfying the safety condition. This conjecture was proved by Paweł Parys in [49]. The tree TU has directions in ¹.; /; ?º. A word over ¹.; /º is well bracketed if it has as many opening brackets as closing brackets and if, for every prefix, the number of opening brackets is greater than the number of closing brackets. The language U is defined as the set of words of the form w?n where w is a prefix of a well-bracketed word and n is equal to jwj juj C 1, where u is the longest suffix of w that is well bracketed. In other words, n equals 1 if w is well bracketed, and otherwise it is equal to the index of the last unmatched opening bracket plus one. For instance, the words ./...// ? ? ? ? and ./././? belong to U . The tree TU is simply Pref .U /. The following scheme SU generates TU : 

Z ! G.H X /; .

Gz ! F Gz.H z/; ?

Gz ! X; ?

H u !;

.

F 'xy ! F .F 'x/y.Hy/; /

F 'xy ! '.H y/; ?

F 'xy ! x;

with Z; X W o, G; H W o ! o and F W .o ! o; o; o/. To better explain the inner workings of this scheme, let us introduce some syntactic sugar. With every integer, we associate a ground term by letting 0 D X and, for all n > 0, n C 1 D H n. With every sequence Œn1    n`  of integers, we associate a term of type o ! o by letting Œ  D G and Œn1    n` n`C1  D F Œn1    n` n`C1 . Finally we write .Œn1    n` ; n/ to denote the ground term Œn1    n` n. The scheme can be revisited as follows: 

Z ! .Œ ; 1/;

?

.

.Œ ; n C 1/ ! 0;

?

.Œn1    n` ; n/ ! n` ;

?

n C 1 ! n;

.Œn1    n` ; n/ ! .Œn1    n` n; n C 1/; /

.Œn1    n` ; n/ ! .Œn1    n`

Let w D w0    wjwj

1

1 ; n

C 1/:

be a prefix of a well-bracketed word. We have w

Z H) .Œn1    n` ; jwj C 1/;

where Œn1    n`  is the sequence (in increasing order) of those indices of unmatched ?

?n`

opening brackets in w . In turn, .Œn1    n` ; jwj/ ! n` ! 0. Hence, as expected, the number of ? symbols is equal to 1 if w is well bracketed (i.e., ` D 0), and otherwise it is equal to the index of the last unmatched opening bracket plus one.

35. Higher-order recursion schemes and their automata models

1313

2.4. Higher-order pushdown automata 2.4.1. Higher-order stack and their operations. Higher-order pushdown automata were introduced by Maslov [45] as a generalisation of pushdown automata. First, recall that a (order-1) pushdown automaton is a machine with a finite control together with an auxiliary storage given by a (order-1) stack whose symbols are taken from a finite alphabet. A higher-order pushdown automaton is defined in a similar way, except that it uses a higher-order stack as auxiliary storage. Intuitively, an order-n stack is a stack whose base symbols are order-.n 1/ stacks, with the convention that order-1 stacks are just stacks in the classical sense. Fix a finite stack alphabet € and a distinguished bottom-of-stack symbol ? 62 € . An order-1 stack is a sequence ?; a1 ; : : : ; a` 2 ?€  which is denoted [?a1 : : : a` ]1 . An order-k stack (or a k -stack), for k > 1, is a non-empty sequence s1 ; : : : ; s` of order.k 1/ stacks which is written [s1 : : : s` ]k . For convenience, we may sometimes see an element a 2 € as an order-0 stack, denoted [a]0 . We let Stacksk denote the set of S all order-k stacks and Stacks D k>1 Stacksk the set of all higher-order stacks. The height of the stack s denoted jsj is simply the length of the sequence. We denote by ord.s/ the order of the stack s . A substack of an order-1 stack [?a1 : : : ah ]1 is a stack of the form [?a1 : : : ah0 ]1 for some 0 6 h0 6 h. A substack of an order-k stack [s1    sh ]k , for k > 1, is either a stack of the form [s1    sh0 ]k with 0 1: topi .s/ returns the top .i 1/-stack of s , and popi .s/ returns s with its top .i 1/stack removed. Formally, for an order-n stack [s1 : : : s`C1 ]n with ` > 0 ´ s if i D n; topi .[s1 : : : s`C1 ]n / D `C1 topi .s`C1 / if i < n;

1314

Arnaud Carayol and Olivier Serre

´ [s1 : : : s` ]n popi .[s1 : : : s`C1 ]n / D [s1 : : : s` popi .s`C1 /]n

if i D n and ` > 1; if i < n:

By abuse of notation, we let topord.s/C1 .s/ D s . Note that popi .s/ is defined if and only if the height of topi C1 .s/ is strictly greater than 1. For example, pop2 .[[?ab]1 ]2 / is undefined. We introduce the operations pushi with i > 2 that duplicates the top .i 1/-stack of a given stack. More precisely, for an order-n stack s and for 2 6 i 6 n, we let pushi .s/ D s CCtopi .s/. The last operation, pusha1 pushes the symbol a 2 € on top of the top 1-stack. More precisely, for an order-n stack s and for a symbol a 2 € , we let pusha1 .s/ D s CC[a]0 . Example 2.6. Let s be the order-3 stack of Example 2.5. Then we have

top3 .s/ D [[?baa]1 [?bc]1 [?bab]1 ]2 ; pop3 .s/ D [[[?baac]1 [?bb]1 [?bcc]1 [?cba]1 ]2 ]3 :

Note that pop3 .pop3 .s// is undefined. We also have that

push2 .pop3 .s// D [[[?baac]1 [?bb]1 [?bcc]1 [?cba]1 [?cba]1 ]2 ]3 ; pushc1 .pop3 .s// D [[[?baac]1 [?bb]1 [?bcc]1 [?cbac]1 ]2 ]3 : 2.4.2. Stacks with links and their operations. We define a richer structure of higherorder stacks where we allow links. Intuitively, a stack with links is a higher-order stack in which any symbol may have a link that points to an internal stack below it. This link may be used later to collapse part of the stack. Order-n stacks with links are order-n stacks with a richer stack alphabet. Indeed, each symbol in the stack can be either an element a 2 € (i.e., not being the source of a link) or an element .a; `; h/ 2 €  ¹2; : : : ; nº  N (i.e., being the source of an `-link pointing to the h-th .` 1/-stack inside the topmost `-stack). Formally, order-n stacks with links over the alphabet € are defined as order-n stacks 4 over alphabet € [ €  ¹2; : : : ; nº  N.

Example 2.7. The stack s equals to

[[[?baac]1 [?bb]1 [?bc.c; 2; 2/]1 ]2 [[?baa]1 [?bc]1 [?b.a; 2; 1/.b; 3; 1/]1 ]2 ]3

is an order-3 stack with links. To improve readability when displaying n-stacks in examples, we shall explicitly draw the links rather than using stacks symbols in €  ¹2; : : : ; nº  N. For instance, we shall rather represent s as follows:

[[[?baac]1 [?bb]1 [?bcc]1 ]2 [[?baa]1 [?bc]1 [?bab]1 ]2 ]3 4 Note that we therefore slightly generalise our previous definition, as we implicitly use an infinite stack alphabet, but this does not introduce any technical change in the definition.

35. Higher-order recursion schemes and their automata models

1315

In addition to the previous operations popi , pushi and pusha1 , we introduce two extra operations: one to create links, and the other to collapse the stack by following a link. Link creation is made when pushing a new stack symbol, and the target of an `-link is always the .` 1/-stack below the topmost one. Formally, we define pusha;` 1 .s/ D .a;`;h/ push1 where we let h D jtop` .s/j 1 and require that h > 1. The collapse operation is defined only when the topmost symbol is the source of an `-link, and results in truncating the topmost ` stack to only keep the component below the target of the link. Formally, if top1 .s/ D .a; `; h/ and s D s 0 CCŒt1 : : : tk ` with k > h we let collapse.s/ D s 0 CCŒt1 : : : th ` . For any n, we let Opn .€/ denote the set of all operations over order-n stacks with links. Example 2.8. Take the 3-stack s D [[[?a]1 ]2 [[?]1 [?a]1 ]2 ]3 . We have

pushb;2 1 .s/ D [[[?a]1 ]2 [[?]1 [?ab]1 ]2 ]3 ;

collapse.pushb;2 1 .s// D [[[?a]1 ]2 [[?]1 ]2 ]3 ;

pushc;3 .pushb;2 .s// D [[[?a]1 ]2 [[?]1 [?abc]1 ]2 ]3 : „ 1 ƒ‚ 1 … 

Then push2 ./ and push3 ./ are respectively

[[[?a]1 ]2 [[?]1 [?abc]1 [?abc]1 ]2 ]3

and [[[?a]1 ]2 [[?]1 [?abc]1 ]2 [[?]1 [?abc]1 ]2 ]3 :

We have

collapse.push2 .// D collapse.push3 .// D collapse./ D [[[?a]1 ]2 ]3 :

2.4.3. Higher-order pushdown automata and collapsible automata. An order-n (deterministic) collapsible pushdown automaton (n-CPDA) is a 5-tuple A D h †; €; Q; ı; q0 i where † is an input alphabet containing a distinguished symbol denoted , the set € is a stack alphabet, Q is a finite set of control states, q0 2 Q is the initial state, and ıW Q  €  † ! Q  Opn .€/ is a (partial) transition function such that, for all q 2 Q and 2 € , if ı.q; ; / is defined then for all a ¤ , the value ı.q; ; a/ is undefined, i.e., if some -transition can be taken, then no other transition is possible. We require ı to respect the convention that ? cannot be pushed onto or popped from the stack. In the special case where ı.q; ; / is undefined for all q 2 Q and 2 € , we refer to A as a -free n-CPDA. In the special case where collapse … ı.q; ; a/ for all q 2 Q, 2 € and a 2 †, A is called a higher-order pushdown automaton.

1316

Arnaud Carayol and Olivier Serre

Let A D h †; €; Q; ı; q0 i be an n-CPDA. A configuration of an n-CPDA is a pair of the form .q; s/ where q 2 Q and s is an n-stack with link over € ; we let Config.A/ denote the set of configurations of A and we call .q0 ; ŒŒ: : : Œ?1 : : : n 1 n / the initial configuration. It is then natural to associate with A a deterministic Lts a denoted LA D h D; r; †; . !/a2† i and defined as follows. We let D be the set of all configurations of A and r be the initial one. Then, for all a 2 † and all a .q; s/; .q 0 ; s 0 / 2 D we have .q; s/ ! .q 0 ; s 0 / if and only if ı.q; top1 .s/; a/ D .q 0 ; op/ and s 0 D op.s/. The tree generated by an n-CPDA A, denoted Tree? .A/, is the tree Tree? .LA / of its Lts.

3. From CPDA to recursion schemes In this section, we argue that, for any CPDA A, one can construct a labelled recursion scheme (of the same order) that generates the same tree. For this, we first introduce a representation of stacks and configurations of A by applicative terms. Then we define a labelled recursion scheme S and finally we show that the Lts associated with S is the same as the one associated with A, which shows that S and A define the same tree. For the rest of this section we fix an order-n CPDA A D h †; €; Q; ı; q1 i and we let the state-set of A be Q D ¹q1 ; : : : ; qm º where m > 1. In order to treat in a uniform way those stack symbols that come with a link and those that do not, we will attach fake links, which we refer to as 1-links (recall that so far all links were `-links with ` > 1) to those symbols that have no link; moreover collapse.s/ will be undefined for any stack s such that top1 .s/ has a 1-link. In the following, we therefore write pusha;1 1 instead of pusha1 . 3.1. Term representation of stacks and configurations. We start by defining some useful types. First we identify the base type o with a new type denoted n. Inductively, for each 0 6 k < n we define a type k D .k C 1/m ! .k C 1/

where, for types A and B , we write Am ! B as a shorthand for A  ! A „ ! ƒ‚ … ! B.

In particular, for every 0 6 k 6 n,

k D .k C 1/m ! .k C 2/m !    ! nm ! n:

m

We also introduce, for every 1 6 k 6 n a non-terminal Voidk of type k. Assume s is an order-n stack and p is a control state of A. In the sequel, we will define, for every 0 6 k 6 n, a term Œjsjpk W k that represents the behaviour of the topk stack in s . To understand why Œjsjpk is of type k one can view an order-k stack as acting on order-.k C 1/ stacks: for every order-.k C 1/ stack we can build a new order.k C1/ stack by pushing an order-k stack on top of it. This behaviour corresponds to the type .k C 1/ ! .k C 1/. However, for technical reasons, when dealing with control

35. Higher-order recursion schemes and their automata models

1317

states and configurations, we need to work with m copies of each stack (one per control state). Hence we view a k -stack as mapping m copies of an order-.k C 1/ stack to a single order-.k C 1/ stack. This explains why k is defined to be .k C 1/m ! .k C 1/. For every stack symbol a, every 1 6 ` 6 n and every state p 2 Q, we introduce a non-terminal Fpa;` W `m ! 1m !    ! nm ! n For every 0 6 k 6 n, every state p and every order-n stack s whose top1 symbol is some a with an `-link, we define (inductively) the following term of order k D .k C 1/m !    ! nm ! n: Œjsjpk D Fpa;` Œjcollapse.s/j`q1 :::qm

Œjpop1 .s/j1q1 :::qm Œjpop2 .s/j2q1 :::qm : : : Œjpopk .s/jkq1 :::qm

where  Œjtjhq1 :::qm is a shorthand for (the sequence), Œjtjqh1 Œjtjqh2 : : : Œjtjqhm , q  Œjpopi .s/ji j D Voidi for all j 2 Œ1; m if popi .s/ is undefined, q  Œjcollapse.s/j1j D Void1 for all j 2 Œ1; m; note that it corresponds to the case where top1 .s/ has a 1-link (i.e., a fake link); hence collapse.s/ is undefined.

Note that the previous definition is well founded, as every stack in the definition of Œjsjpk has fewer symbols than s . Intuitively, Œjsjpk represents the top k -stack of the configuration .p; s/, i.e., top.kC1/ .s/. Example 3.1. Consider the following order-2 stack s D [[?a]1 [?b]1 [?bc]1 ]2

and assume (for simplicity) that we have a unique control state p . Then one has p

where

Œjsj2 D Fpc;1 Void1 .Fpb;2 .Fp?;1 Void1 Void1 //.Fpb;2 .Fp?;1 Void1 Void1 ///;  D Œj[[?a]1 ]2 jp2 D Fpa;1 Void1 .Fp?;1 Void1 Void1 /Void2 :

Let s and t be two order-n stacks with links and let k > 1. We shall say that s and t are topk -identical if and only if the following holds:  s and t are top1 -identical if and only if s and t have the same top1 symbol with an `-link (for some `) and (if defined) collapse.s/ and collapse.t/ are top`C1 -identical;  and for k > 1, s and t are topk -identical if and only if for all j > 0, popjk 1 .s/ is defined if and only if popjk 1 .t/ is defined, and when defined, popjk 1 .s/ and popjk 1 .t/ are top.k 1/ -identical.

Note that the previous definition is well founded, as it always refer to stacks with fewer symbols than s or t .

Arnaud Carayol and Olivier Serre

1318

Lemma 3.1. Let s and t be order-n stacks with links, and let k > 0. If s and t are top.kC1/ -identical then Œjsjpk D Œjtjpk for every state p .

Proof. The proof is by induction on the maximal size (i.e., the number of stack symbols) of s and t , and once the maximal size is fixed we reason by induction on k . The base case of s and t containing only the bottom-of-stack symbol is trivial. Hence assume that the property is established for any pair of stacks with less than N symbols for some N > 0, and consider two stacks s and t whose maximal size is N C1. Assume that s and t are top.kC1/ -identical for some k > 0. If s and t are top1 -identical, then, by definition, we have that top1 .s/ D .a; `; k/ and top1 .t/ D .a; `; k 0 / for some a 2 € , 1 6 e 6 n and k; k 0 2 N, and that (when defined) collapse.s/ and collapse.t/ are top`C1 -identical. As collapse.s/ and collapse.t/ are both of size 6 N , we have, by induction hypothesis, that Œjcollapse.s/j`q1 :::qm D Œjcollapse.t/j`q1 :::qm . Thus it immediately follows that Œjsjp0 D Œjtjp0 . We now consider some k > 0 and assume that the property is established for any h 6 k . We consider the case .k C 1/ and thus assume that s and t are top.kC2/ -identical: in particular pop.h 1/ .s/ and pop.h 1/ .t/ are also toph -identical for any h 6 .k C 1/, and by induction hypothesis, we have, for any h 6 k and any state q , that Œjsjqh D Œjtjqh . By definition, we also have that top1 .s/ D .a; `; k/ and top1 .t/ D .a; `; k 0 / for some a 2 € , 1 6 e 6 n and k; k 0 2 N, and that (when defined) collapse.s/ and collapse.t/ are top.`C1/ -identical. As collapse.s/ and collapse.t/ are both of size 6 N , we have, by induction hypothesis, that Œjcollapse.s/j`q1 :::qm D Œjcollapse.t/j`q1 :::qm . We let js D jt be the maximal j such that popj.kC1/ .s/ (equiv. popj.kC1/ .t/) is defined. By definition and

q1 :::qm ; Œjsjp.kC1/ D Fpa;` Œjcollapse.s/j`q1 :::qm Œjpop1 .s/j1q1 :::qm : : : Œjpop.kC1/ .s/j.kC1/ q1 :::qm : Œjtjp.kC1/ D Fpa;` Œjcollapse.t/j`q1 :::qm Œjpop1 .t/j1q1 :::qm : : : Œjpop.kC1/ .t/j.kC1/

Now if js D 0, we have

q1 :::qm Œjpop.kC1/ .s/j.kC1/ D Œjpop.kC1/ .t/j1q1 :::qm D Void.kC1/ : : : Void.kC1/ ;

and thus Œjsjp.kC1/ D Œjtjp.kC1/ . If js > 0, we note that jpop.kC1/ .s/ D jpop.kC1/ .t / D js 1 and pop.kC1/ .s/ and recall that pop.kC1/ .t/ are top.kC2/ -identical. Thus, by induction on js , we have Œjpop.kC1/ .s/jq.kC1/ D Œjpop.kC1/ .t/jq.kC1/ for any state q , and we conclude that Œjsjp.kC1/ D Œjtjp.kC1/ . 3.2. The labelled recursion scheme associated with A. We let S D h †; N; R; Z; ? i where N D ¹Fpa;` j p 2 Q; a 2 €; and 1 6 ` 6 nº [ ¹Voidk j 0 6 k 6 nº:

The set of productions R contains the production 

Z ! Œj[ : : : [?]1 : : : ]n jqn1 S

35. Higher-order recursion schemes and their automata models

1319

and the production a

x 1 : : : ‰n ! „q;op Fpa;` ˆ‰ S

if ı.p; a; a/ D .q; op/ and the term „q;op is equal to     

0

0

0

0

x 1 i‰2 : : : ‰n if op D pusha ;` for `0 > 1, Fqa ;` ‰`0 hF?a;` ˆ‰ 1 0 a0 ;1 a;` x i‰ : : : ‰ if op D push Fqa ;1 Voidm hF ˆ‰ 1 2 n ? 1 1 , x 1 : : : ‰.k 1/ hF?a;` ˆ‰ x 1 : : : ‰k i‰.kC1/ : : : ‰n if op D pushk , Fqa;` ˆ‰ ‰k;q ‰k 1 : : : ‰n if op D popk , ˆq ‰` 1 : : : ‰n if op D collapse and ` > 1,

x 1 : : : ‰k i as a shorthand for the sequence where hF?a;` ˆ‰

x 1 : : : ‰k Fqa;` ˆ‰ x 1 : : : ‰k : : : Fqa;` ˆ‰ x 1 : : : ‰k Fqa;` ˆ‰ m 1 2

and Voidm Void1 : : : Void1 . 1 is a shorthand for „ ƒ‚ … m

3.3. Correctness of the representation. The following proposition relates the Lts defined by A with the one defined by S. Proposition 3.2. Let .p; s/ be a configuration of A and let a 2 †. Then a

a

A

S

.p; s/ ! .q; t/ () Œjsjpn ! Œjtjqn :

Proof. Let a be the top symbol in s and let 0 6 ` 6 n be such that a has an .` C 1/ link. By definition, the head non-terminal symbol of Œjsjpn is Fpa;` . a

Remark that ı.p; a; a/ is defined, i.e., there exists some .q; t/ with .p; s/ ! .q; t/, A

a

if and only if there is some term  such that Œjsjpn !  . Hence it suffices to show, when q

S

ı.p; a; a/ D .q; op/ is defined, that  D Œjop.s/jn , and for this we do a case analysis. First, we let Œjsjpn D Fpa;` C q1 : : : C qm T1q1 : : : T1qm : : : Tnq1 : : : Tnqm

where C qi D Œjcollapse.s/jq` i W ` and Tkqi D Œjpopk .s/jqki W k for every 1 6 i 6 m and every 1 6 k 6 n. Then we distinguish the five possible cases for op . 0

0

 Assume that op D pusha1 ;` , with `0 > 1. Then, by definition 0

0

0

0

0

0

Œjpusha1 ;` .s/jqn D Fqa ;` Œjcollapse.pusha1 ;` .s//j`q01 :::qm 0

0

0

0

Œjpop1 .pusha1 ;` .s//j1q1 :::qm : : : Œjpopn .pusha1 ;` .s//jnq1 :::qm : 0

0

For every j > 1, one has popj .pusha1 ;` .s// D popj .s/, and therefore 0

0

Œjpopj .pusha1 ;` .s//jjq1 :::qm D Tjq1 : : : Tjqm :

Arnaud Carayol and Olivier Serre

1320

0

0

One has collapse.pusha1 ;` .s// D pop`0 .s/, and therefore 0

0

Œjcollapse.pusha1 ;` .s//j`q01 :::qm D T`q0 1 : : : T`q0m : 0

0

Finally, we have that pop1 .pusha1 ;` .s// D s , and therefore 0

0

q

q

q

C q1 : : : C qm T1 1 : : : T1 m : Œjpop1 .pusha1 ;` .s//j1i D Fqa;` i

Hence, it follows that 0

0

0

0

Œjpusha1 ;` .s/jqn D Fqa ;` T`q01 : : : T`q0m

C q1 : : : C qm T1q1 : : : T1qm : : : Fqa;` 1 Fqa;` C q1 : : : C qm T1q1 : : : T1qm m T2q1 : : : T2qm : : : Tnq1 : : : Tnqm :

On the other hand, it follows syntactically from the definition of S that the a right hand side of the previous expression is the term  such that Œjsjpn !  . S

0

 Assume that op D pusha1 ;1 . Then, by definition 0

0

Œjpusha1 ;1 .s/jqn D Fqa ;1 Void1 : : : Void1 0

Œjpop1 .pusha1 ;1 .s//j1q1 :::qm : : : 0

Œjpopn .pusha1 ;1 .s//jnq1 :::qm : 0

For every j > 1, one has popj .pusha1 ;1 .s// D popj .s/, and therefore 0

0

q :::qm

Œjpopj .pusha1 ;` .s//jj 1

q

q

D Tj 1 : : : Tj m :

0

Finally, we have that pop1 .pusha1 ;1 .s// D s , and therefore 0

q

C q1 : : : C qm T1q1 : : : T1qm : Œjpop1 .pusha1 ;1 .s//j1i D Fqa;` i

Hence, it follows that 0

0

0

0

Œjpusha1 ;` .s/jqn D Fqa ;` Void1 : : : Void1

C q1 : : : C qm T1q1 : : : T1qm : : : Fqa;` 1 Fqa;` C q1 : : : C qm T1q1 : : : T1qm m T2q1 : : : T2qm : : : Tnq1 : : : Tnqm :

On the other hand, it follows syntactically from the definition of S that the a right hand side of the previous expression is the term  such that Œjsjpn !  .

 Assume that op D pushk . Then, by definition,

Œjpushk .s/jqn D Fqa;` Œjcollapse.pushk .s//j`q1 :::qm

S

Œjpop1 .pushk .s//j1q1 :::qm : : : Œjpopn .pushk .s//jnq1 :::qm :

35. Higher-order recursion schemes and their automata models

1321

Note that we used the fact that the top1 element in pushk .s/ is a a and has an .` C 1/-link. Now, note that for every j > k , one has popj .pushk .s// D popj .s/, and therefore q :::qm

Œjpopj .pushk .s//jj 1

q

q

D Tj 1 : : : Tj m :

Also, popk .pushk .s// D s , and therefore, for every 1 6 i 6 m, q

q

q

q

q

C q1 : : : C qm T1 1 : : : T1 m : : : Tk 1 : : : Tk m : Œjpopk .pushk .s//jki D Fqa;` i

Now, for j < k , popj .pushk .s// and popj .s/ are topj C1 -identical and, thanks to Lemma 3.1, we have that Œjpopj .pushk .s//jjqi D Œjsjjqi D Tjqi . If ` D 1, then both collapse.s/ and collapse.pushk .s// are undefined, q hence we have Œjcollapse.pushk .s//j` i D Void1 D C qi . If 1 < ` 6 k , then collapse.pushk .s// and s are top`C1 -identical and, thanks to Lemma 3.1, Œjcollapse.pushk .s//jq` i D Œjcollapse.s/jq` i D C qi . If ` > k , then collapse.s/ D collapse.pushk .s// hence q

Œjcollapse.pushk .s//j` i D C qi :

Therefore, it follows that q1 Œjpushk .s/jqn D Fqa;` C q1 : : : C qm T1q1 : : : T1qm : : : T.k q

q

1/

qm : : : T.k 1/

q

q

C q1 : : : C qm T1 1 : : : T1 m : : : Tk 1 : : : Tk m : : : Fqa;` 1 q

q

q

q

Fqa;` C q1 : : : C qm T1 1 : : : T1 m : : : Tk 1 : : : Tk m m q

q

m 1 T.kC1/ : : : T.kC1/ : : : Tnq1 : : : Tnqm :

On the other hand, it follows syntactically from the definition of S that the a right hand side of the previous expression is the term  such that Œjsjpn !  .

 Assume that op D popk . Then, by definition 0

S

0

Œjpopk .s/jqn D Fqa ;` Œjcollapse.popk .s//j`q1 :::qm

Œjpop1 .popk .s//j1q1 :::qm : : : Œjpopn .popk .s//jnq1 :::qm

where the top1 element in popk .s/ is a a0 and has an .`0 C1/-link. Equivalently, Œjpopk .s/jqn D Œjpopk .s/jqk

q1 :::qm Œjpop.kC1/ .popk .s//j.kC1/ : : : Œjpopn .popk .s//jnq1 :::qm :

For every j > k , one has popj .popk .s// D popj .s/, and therefore, we have Œjpopj .popk .s//jjq1 :::qm D Tjq1 : : : Tjqm . Moreover, by definition we have that Tkq D Œjpopk .s/jqk . Hence, it follows that q

q

q

m 1 Œjpopk .s/jqn D Tk TkC1 : : : TkC1 : : : Tnq1 : : : Tnqm :

1322

Arnaud Carayol and Olivier Serre

On the other hand, it follows syntactically from the definition of S that the a right hand side of the previous expression is the term  such that Œjsjpn !  . S

 Assume that op D collapse. Then, by definition 0

0

q :::qm

Œjcollapse.s/jqn D Fqa ;` Œjcollapse.collapse.s//j` 1 q :::qm

Œjpop1 .collapse.s//j11

:::

Œjpopn .collapse.s//jnq1 :::qm ;

where the top1 element in collapse.s/ is a0 and has an .`0 C 1/-link. Equivalently, Œjcollapse.s/jqn D Œjcollapse.s/jq`

q1 :::qm Œjpop.`C1/ .collapse.s//j.kC1/ :::

Œjpopn .collapse.s//jnq1 :::qm :

For every j > e , one has popj .collapse.s// D popj .s/, and therefore we have Œjpopj .collapse.s//jjq1 :::qm D Tjq1 : : : Tjqm . Moreover, by definition we have that C q D Œjcollapse.s/jq` . On the other hand, it follows syntactically from the definition of S that the a right-hand side of the previous expression is the term  such that Œjsjpn !  . S

Corollary 3.3. The Lts defined by A is isomorphic to the one defined by S. In particular, A and S generate the same tree. Proof. Immediate from Proposition 3.2.

4. From recursion schemes to collapsible pushdown automata In this section, we construct, for any labelled recursion scheme S, a collapsible pushdown automaton A of the same order defining the same tree as S – i.e., Tree? .S/ D Tree? .A/. Recall that a silent production rule is a production rule labelled by . To simplify the presentation we assume that S does not contain any such production rule. If S were to contain silent transitions, we would treat the symbol  as any other symbol 5 in †. For the rest of this section, we fix a labelled recursion scheme h †; N; R; Z; ? i of order n > 1 without silent transitions. 5 Formally, one labels all silent production rules of S by a fresh symbol e to obtain a labelled scheme S0 without silent transitions. The construction presented in this section produces an automaton A0 such that Tree? .S0 / D Tree? .A0 /. The automaton A obtained by replacing all e -labelled rules of A by  is such that Tree? .S/ D Tree? .A/.

35. Higher-order recursion schemes and their automata models

1323

The automaton A has a distinguished state, denoted q? , and we associate a ground term over N denoted by ŒŒ s  with a configuration of the form .q? ; s/. Other configurations correspond to internal steps of the simulation and are only the source of silent transitions. To show that the two Lts define the same trees, we will establish that, for any reachable configuration of the form .q? ; s/ and for any a 2 †, the following holds: a

a

 if .q? ; s/ ! .q? ; s 0 / then ŒŒ s  ! ŒŒ s 0 ; A

S

a

a

S

A

 if ŒŒ s  ! t then .q? ; s/ ! .q? ; s 0 / and ŒŒ s 0  D t .

Hence, the main ingredient of the construction is the partial mapping ŒŒ   associating a ground term over N with an order-n stack. The main difficulty is to guarantee that any rewriting rule of S applicable to the encoded term ŒŒ s  can be simulated by applying a sequence of stack operations to s . In § 4.1, we present the mapping ŒŒ   together with its basic properties; in § 4.2, we give the definition of A and prove the desired properties. To simplify the presentation, we assume, without loss of generality, that all productions starting with a non-terminal A have the same left-hand side (i.e., they use the same variables in the same order) and that two productions starting with different nonterminals do not share any variables. Hence a variable x 2 V appears in a unique left-hand side Ax1 : : : ; x%.A/ and we denote by rk.x/ the index of x in the sequence x1 : : : x%.A/ (i.e., x D xrk.x/ ). Example 4.1. Throughout the whole section, we will illustrate definitions and constructions using the order-2 scheme SU generating the tree TU presented at the end of § 2.3.6 as a running example. We recall its definition below: 

Z ! G.H X /; .

Gz ! F Gz.H z/; 

Gz ! X; ?

H u ! u;

.

F 'xy ! F .F 'x/y.Hy/; /

F 'xy ! '.H y/; ?

F 'xy ! x;

with Z; X W o, G; H W o ! o, and F W .o ! o; o; o/. We have rk.'/ D rk.z/ D rk.u/ D 1, rk.x/ D 2 and rk.y/ D 3.

4.1. Stacks representing terms. The stack alphabet € consists of the initial symbol and of the right-hand sides of the production rules in R and their argument subterms (cf. § 2.3.1), i.e., [ def ¹eº [ ASubs.e/: € D ¹Zº [ a

F x1 :::x%.x/ !e

Example 4.2. For the scheme SU , one gets the following stack alphabet: € D ¹Z; G.H X /; H X ; X ; F .F 'x/y.Hy/; F 'x; Hy; F Gz.H z/; G; H z; '.Hy/º [ ¹x; y; z; u; 'º:

Arnaud Carayol and Olivier Serre

1324

Notation 4.1. For ' 2 V [ N , a ' -stack designates a stack whose top symbol starts with ' . By extension, a stack s is said to be an N -stack (resp., a V -stack) if it is a ' -stack for some ' 2 N (resp., ' 2 V ). In order to represent a term in Terms.N /, a stack over € must be well formed, i.e., it must satisfy syntactic conditions given in the following definition.

Definition 4.2 (well-formed stack). A non-empty stack of order-n over € is well formed if every non-empty substack r of s satisfies the following two conditions:  if top1 .r/ is not equal to Z then pop1 .r/ is an A-stack for some A 2 N and top1 .r/ belongs to an A-production rule,  if top1 .r/ is of type  of order k > 0 then top 1 .r/ is the source of an .n k C1/link and collapse.r/ is a ' -stack for some variable ' 2 V of type  . We let WStacks denote the set of all well-formed stacks. Example 4.3. For the scheme SU , the order-2 stacks in Figure 7 are well formed. ' .H y/

' .H y/

F .F ' x/ y .H y/

F .F ' x/ y .H y/

F 'x

F G z .H z/

F G z .H z/

F G z .H z/

G .H X/

G .H X/

G .H X/

Z

Z

Z

s1

s2 y ' .H y/

F .F ' x/ y .H y/

F .F ' x/ y .H y/

F 'x

F G z .H z/

F G z .H z/

G .H X/

G .H X/

Z

Z s3

Figure 7

Notation 4.3. We write s WW t for s 2 WStacks and t 2 € to mean that if t belongs to the r.h.s. of a production starting with A 2 N then s is an A-stack. In particular, if s 2 WStacks then pop1 .s/ WW top1 .s/. We let CStacks denote the set of such s WW t , and define the size of an element s WW t as the pair .jsj; jtj/, where jsj denotes the number of stack symbols in s and jtj the length of the term t . When comparing sizes, we use the standard lexicographic (total) order over N  N.

35. Higher-order recursion schemes and their automata models

1325

In Definition 4.5, we will associate a ground term over N with any well-formed stack s that we refer to as the value of s . To define this value, we first associate, with any element s WW t in CStacks, a value denoted ŒŒ s WW t . This value is a term over N of the same type as t . Intuitively, it is obtained by replacing the variables appearing in the term t by values encoded in the stack s , and one should therefore understand ŒŒ s WW t  as the value of the term t in the context (or environment) of s . For all ' 2 V [ N , all k 2 Œ1; %.'/ and all ' -stack s 2 WStacks, we define an element of CStacks, denoted Argk .s/, representing the k -th argument of the term represented by s . More precisely if the top symbol of s is 't1 : : : t` , we take ´ Argk .s/ D pop1 .s/ WW tk if k 6 `, Argk .s/ D Argk ` .collapse.s// otherwise. Definition 4.4. For all s WW t 2 CStacks, we take 8 ˆ `).  If rk.x/ 6 ` then the term associated with x in s is equal to the term associated with trk.x/ in pop1 .s/, i.e., ŒŒ s WW x  D ŒŒ pop1 .s/ WW trk.x/ .  If rk.x/ > ` then the term ŒŒ s WW x  is obtained by following the link attached to top1 .s/. Recall that, as s is a well-formed stack and top1 .s/ is not of ground type (as ` < %.A/), there exists a link attached to top1 .s/. Moreover, collapse.s/, the stack obtained by following the link, has a top-symbol of the 0 for some ' 2 V and m > 0. Intuitively, ti0 corresponds to the form 't10 : : : tm .` C i /-th argument of A. If rk.x/ belongs to Œ` C 1; ` C m, then the term 0 . If rk.x/ is ŒŒ s WW x  is defined to be the term ŒŒ pop1 .collapse.s// WW trk .x/ ` greater than ` C m then the link attached to the top symbol of collapse.s/ is followed and the process is reiterated. As the size of the stack strictly decreases at each step, this process terminates.

Now, if s is a well-formed ' -stack, its value is obtained by applying the value of all its %.'/ arguments to the value of ' in the context of pop1 .s/. This leads to the following formal definition.

Arnaud Carayol and Olivier Serre

1326

Definition 4.5. The term associated with a well-formed ' -stack s 2 WStacks with ' 2 N [ V is def

ŒŒ s  D ŒŒ pop1 .s/ WW ' ŒŒ Arg1 .s/  : : : ŒŒ Arg%.'/ .s/ :

Fact 2. Let s be a well-formed ' -stack. If top1 .s/W o then ŒŒ s  D ŒŒ pop1 .s/ WW top1 .s/ :

If top1 .s/W 1 !    ! ` ! o then

ŒŒ s  D ŒŒ pop1 .s/ WW top 1 .s/ ŒŒ Arg1 .collapse.s//  : : : ŒŒ Arg` .collapse.s// :

Proof. The first case (i.e., top 1 .s/W o) is immediate. Assume that top1 .s/ is equal to 't1 : : : tn with ' 2 N [ V of type 1 !    ! %.'/ ! o and ti 2 € of type i , for all i 2 Œ1; n. Note that ` D %.'/ n. We have def

ŒŒ s  D ŒŒ pop1 .s/ WW ' ŒŒ Arg1 .s/  : : : ŒŒ Argn .s/ ŒŒ ArgnC1 .s/  : : : ŒŒ Arg%.'/ .s/  „ ƒ‚ … ŒŒ pop1 .s/WW't1 :::tn 

D ŒŒ pop1 .s/ WW top1 .s/ ŒŒ Arg 1 .collapse.s//  : : : ŒŒ Arg` .collapse.s// :

Example 4.4. Let us consider the well-formed stacks s1 , s2 , and s3 presented in Example 4.3. In the representation in Figure 8, the association between variables and their “values” are made explicit by the red arrows. The following lemma states the basic properties of the encoding ŒŒ   and Argk .  /.

Lemma 4.1. We have the following properties: 1. for all ' -stacks s 2 WStacks with ' 2 V [ N of type 1 !    ! %.'/ ! o and for all k 2 Œ1; %.'/, the stack Argk .s/ is equal to some r WW t 2 CStacks with t of type k ; 2. for all s WW t 2 CStacks with tW  2 € , the term ŒŒ s WW t  belongs to Terms .N /; 3. for all s 2 WStacks, the term ŒŒ s  belongs to Terms.N /.

Proof. We start proving the first point and then use it to obtain the second one. Combining them, we finally prove the last point. 1. We proceed by induction on the size of s 2 WStacks. The base case considers the stack Œ: : : Œ?Z1 : : : n . As %.Z/ D 0, there is nothing to prove. Fix some stack s and assume that the property holds for all stacks smaller than s 2 WStacks. Let 't1 : : : t` W  be the top symbol of s with ' 2 N [ V and ti 2 € for all i 2 Œ1; `. If ' is of type 1 !    ! %.'/ ! o then for all i 2 Œ1; `, ti is of type i and  is the type `C1 !    ! %.'/ ! o. def

If k 6 `, then Argk .s/ D pop1 .s/ WW tk and there is nothing to prove. If def

%.'/ > k > `, then Argk .s/ D Argk ` .collapse.s//. To conclude the result by induction, the only thing we have to prove is that Argk ` .collapse.s// is well defined. As ord./ > 0, we have by definition of WStacks that collapse.s/ is well defined and that its top symbol starts with a symbol of type  . As jcollapse.s/j < jsj and as %. / D %.'/ ` > k ` > 1, we have by the

35. Higher-order recursion schemes and their automata models ' .H y/ F .F ' x/ y .H y/

' .H y/ F .F ' x/ y .H y/

F 'x

F G z .H z/

F G z .H z/

F G z .H z/

G .H X/

G .H X/

G .H X/

Z

Z

Z

s1

1327

s2

y F .F ' x/ y .H y/

' .H y/ F .F ' x/ y .H y/

F 'x

F G z .H z/

F G z .H z/

G .H X /

G .H X/

Z

Z s3

Figure 8

induction hypothesis that Argk ` .collapse.s// is well defined and is equal to some r WW t 2 CStacks with t 2 € of type k `C` D k . 2. We proceed by induction on the size of s WW t . The base case deals with the def stack Œ : : : Œ ? 1 : : : n WW Z . As ŒŒ Œ n WW Z  D Z , the property holds. Assume that the property holds for all elements of CStacks smaller than some s WW t 2 CStacks with tW  . Let us show that ŒŒ s WW t  is of type  . The case where t 2 N is trivial. The one where t D t1 t2 is immediate by induction, as both ŒŒ s WW t2  and ŒŒ s WW t1  have a size smaller than ŒŒ s WW t . The last case is when t is a variable x 2 V . Assume that the variable x appears in an A-production for some AW  D 1 !    ! %.A/ ! o in N . In particular, the def

variable x is of type rk.x/ . We have ŒŒ s WW x  D ŒŒ Argrk.x/ .s/ . By definition of CStacks, s is an A-stack and using point .1/, Argrk.x/ .s/ is equal to r WW t 0 with r 2 S tacks and t 0 W rk.x/ 2 € . Thus ŒŒ s WW x  D ŒŒ r WW t 0  for some r smaller than s , and using the induction hypothesis, one concludes that ŒŒ s WW x  is a term in Termsrk.x/ .N /. 3. Let s 2 WStacks whose top symbol starts with 'W  D 1 !    ! %.'/ ! o. Clearly pop1 .s/ WW ' belongs to CStacks and by point .2/, ŒŒ pop1 .s/ WW '  is of type  . Points .1/ and .2/ imply that, ŒŒ Argk .s/  is of type k , for all k 2 Œ1; %.'/. Hence, from Definition 4.5 it directly follows that ŒŒ s  is of type o.

1328

Arnaud Carayol and Olivier Serre

We conclude with two fundamental properties of Argk ./ that will allow us to simulate the rewriting of the scheme using sta ck operations and finite memory. The first property is that the arguments represented by a well-formed stack are not modified when performing a pushk operation. More precisely, for all ' -stacks s 2 WStacks with ' 2 N [ V , we have ŒŒ Arg` .pushk .s//  D ŒŒ Arg` .s/  for all ` 2 Œ1; %.'/ and all k 2 Œ2; m. This follows (by letting r D topk .s/) from the following slightly more general result. Lemma 4.2. Let k 2 Œ2; m and let s D s 0 CCtopk .s/ 2 WStacks. For all non-empty ' -stacks r v topk .s/, we have ŒŒ Arg` .s 0 CCr/  D ŒŒ Arg` .s CCr/  for all ` 2 Œ1; %.'/. Proof. We show, by induction on the size of r , that s CCr and s 0 CCr are well formed and ŒŒ Arg` .s 0 CCr/  D ŒŒ Arg` .s CCr/  for all ` 2 Œ1; %.'/, where ' 2 N [ V denotes the head symbol of top1 .r/. The base case (which considers Œ: : : Œ?Z1 : : : k ) is immediate. Assume that the property holds for all substacks of topk .s/ smaller than some ' -stack r v topk .s/. We will show that it holds for r . The key observation is that top2 .s CCr/ D top2 .s 0 CCr/ and either

collapse.s CCr/ D collapse.s CCr/ if the link attached to topmost symbol of r is order greater than k , or

collapse.s CCr/ D s CCcollapse.r/and collapse.s 0 CCr/ D s 0 CCcollapse.r/ otherwise. As s 0 CCr is a substack of s (which is well formed), s 0 CCr is well formed as well. To prove that s CCr is well formed, we need to show that every non-empty substack of s CCr satisfies the two properties expressed in Definition 4.2. The case of a proper substack immediately follows the induction hypothesis. We can deduce that s CCr satisfies these two properties from the above observations. Indeed the first property only depends on the top most order-1 stack (and top2 .s CCr/ D top2 .s 0 CCr/) and the second property follows from the fact that top1 .s CCr/ D top1 .s 0 CCr/ and top1 .collapse.s CCr// D top1 .collapse.s 0 CCr//. Assume that the top symbol of r is equal to 't1 : : : tn . Let ` 2 Œ1; %.'/ and let us show that ŒŒ Arg` .s CCr/  D ŒŒ Arg` .s 0 CCr/ . If ` 6 n, then ŒŒ Arg` .s CCr/  D ŒŒ s CCpop1 .r/ WW t`  D ŒŒ s 0 CCpop1 .r/ WW t` . By the induction hypothesis, we have that ŒŒ s CCr 0 WW t  D ŒŒ s 0 CCr 0 WW t  for any proper substack r 0 of r , and in particular for r 0 D pop1 .r/. If ` > n then ŒŒ Arg` .s CCr/  is equal to both ŒŒ Arg` n .collapse.s CCr//  and ŒŒ Arg` n .collapse.s CCr// . From the above observation, we either have that the stack collapse.s CCr/ is equal to collapse.s 0 CCr/ and the equality trivially holds, or we have collapse.s CCr/ D s CCcollapse.r/ and collapse.s 0 CCr/ D s 0 CCcollapse.r/ in which case the equality follows by the induction hypothesis as j collapse.r/ j < j r j.

35. Higher-order recursion schemes and their automata models

1329

The next property will later been use to prove that any rewriting step can be simulated by a finite number of transitions in the automaton. Lemma 4.3. Let s be a ' -stack in WStacks for some 'W 1 !    ! %.'/ ! o in V [N and let ` 2 Œ1; %.'/ with ` of order k > 0. If Arg` .s/ is equal to r WW t 2 CStacks with t starting with 2 N [ V then

popn

kC1 .s/

D popn

kC1 .r/

and j topn

kC1 .s/ j

> j topn

kC1 .r/ j:

Proof. We proceed by induction of the size of s . The base case, which considers the stack Œ : : : Œ ?Z 1 : : : n , is immediate as %.Z/ D 0. Assume that the property holds for all stacks in WStacks smaller than some stack s 2 WStacks. Let 't1    tm be the top symbol of s with 'W 1 !    ! %.'/ ! o in V [ N and m 2 Œ0; %.'/. Let ` 2 Œ1; %.'/ and let k be the order of ` . Assume that Arg` .s/ D r WW t . If ` 6 m, then Arg` .s/ D pop1 s WW t` . In particular, r is equal to pop1 .s/ and the property holds because popn kC1 .r/ D popn kC1 .pop1 .s// D popn kC1 .s/ as n k C 1 > 2 (indeed k < n by definition of n). If ` > m, Arg` .s/ D Arg` m .collapse.s//. By the induction hypothesis, we have

popn

kC1 .collapse.s//

D popn

kC1 .r/:

To conclude the result, it is enough to show that popn kC1 .collapse.s// D popn kC1 .s/. Let k 0 be the order of top 1 .s/. As top1 .s/ D 't1 : : : tm is of type mC1 !    ! %.'/ ! o, we have k 0 > k . By definition of well-formed stacks, the order of the link attached to top symbol is equal to n k 0 C 1. In particular, popn kC1 .collapse.s// D popn kC1 .s/. 4.2. Simulating the Lts of S on stacks. As an intermediate step, we define an Lts M over well-formed stacks and we prove that it generates the same tree as S (i.e., Tree? .M/ D Tree? .S/). From M, a CPDA generating Tree? .M/ is then easily defined. a

We let M D h WStacks; Œ: : : Œ?Z : : : n ; †; . !/a2† i and define the transitions as

follows: 8 a ˆ s ! pusht1 .s/ ˆ ˆ M ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ <  s ! pusht1 .r/ M ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ  ˆ kC1 ˆ s ! pusht;n .r/ ˆ 1 ˆ : M

M

if s is an A-stack with A 2 N a and Ax1 : : : x%.A/ ! t 2 R, if s is a ' -stack with 'W o 2 V and Argrk.'/ .pop1 .s// D r WW t , if s is a ' -stack with 'W  2 V of order k > 0 and Argrk.'/ .pop1 .pushn kC1 .s/// D r WW t .

Example 4.5. In Figure 9, we illustrate the definition of M on the scheme SU .



?

Z

G.H X/

Z

G.H X/

Z

Z

F Gz.H z/

HX

F Gz.H z/

'.H y/ F .F 'x/y.Hy/

'.H y/

F .F 'x/y.Hy/

?

Z

Z

/

x



Z

'.H y/



Z

Z

G.H X/

F Gz.H z/

'.H y/



Z

X

Z

G.H X/

F Gz.H z/

z Z

G.H X/

Z

Z

G.H X/

F Gz.H z/

F Gz.H z/ G.H X/

F 'x

'.H y/ F .F 'x/y.Hy/

F .F 'x/y.Hy/

G.H X/

F Gz.H z/

F .F 'x/y.Hy/

G.H X/

'.H y/ F .F 'x/y.Hy/

F Gz.H z/

Figure 9

Z

HX

x

G.H X/

G.H X/

G.H X/ Z

F Gz.H z/

F Gz.H z/

F Gz.H z/ 

'.H y/

Z

G.H X/

F .F 'x/y.Hy/

.

x

Z

G.H X/

F 'x

.

'.H y/

Z

G.H X/

F Gz.H z/

F .F 'x/y.Hy/



F Gz.H z/

F .F 'x/y.Hy/

1330 Arnaud Carayol and Olivier Serre

35. Higher-order recursion schemes and their automata models

1331

The first line of the definition of ! corresponds to the case of an N -stack. To M

a

simulate the application of a production rule Ax1 : : : xn ! e on the term encoded by an A-stack s , we simply push the right-hand side e of the production on top of s . The correctness of this rule directly follows from the definition of ŒŒ   (cf. Lemma 4.4 below). Doing so, a term starting with a variable may be pushed on top of the stack, e.g., /

when applying the production rule F 'xy ! '.H y/. Indeed, we need to retrieve the value of the head variable in order to simulate the next transition of S: the second and third lines of the definition are normalisation rules that aim at replacing the variable at the head of the top of the stack (in Example 4.5 ' ) by its definition (hence not changing the value of the associated term). By iterative application, we eventually end up with an N -stack encoding the same term and we can apply again the first rule. The following lemma states the soundness of the first line of the definition of !. M

Lemma 4.4. Let s be an N -stack in WStacks and a 2 †. 8 a a < 9t 2 Terms.N /; ŒŒ s  ! t H) 9s 0 2 WStacks; s ! s 0 and ŒŒ s 0  D t; a

: 9s 0 2 WStacks; s ! s 0 M

a

M

H) ŒŒ s  ! ŒŒ s 0 :

Proof. Let s 2 WStacks be an A-stack for some A 2 N and let a 2 †. By definition of ŒŒ s , ŒŒ s  is equal to AŒŒ Arg1 .s/  : : : ŒŒ Arg%.A/ .s/ . a

a

Assume that ŒŒ s  ! t for some t 2 Terms.N /. By definition of !, there exists a a production Ax1 : : : x%.A/ ! t 0 in R such that t is equal to t 0 Œx1 =ŒŒ Arg1 .s/ ; : : : ; x%.A/ =ŒŒ Arg%.A/ .s/ :

a

By definition of !, we have s 0

M

a

0

! pusht1 .s/ hence we only need to note that the

M

term ŒŒ pusht1 .s/  is equal to t 0 Œx1 =ŒŒ Arg1 .s/ ; : : : ; x%.A/ =ŒŒ Arg%.A/ .s/ . Indeed, as t 0 0 is of ground type, ŒŒ pusht1 .s/  is equal to ŒŒ s WW t 0  which is by definition equal to t 0 Œx1 =ŒŒ Arg1 .s/ ; : : : ; x%.A/ =ŒŒ Arg%.A/ .s/ . Now, assume that s

a

a

! s 0 for some s 0 2 WStacks. By definition of !, there

M

a

0

M

exists a production Ax1 : : : x%.A/ ! t 0 2 R such that s 0 D pusht1 .s/. As s is an A-stack, we have ŒŒ s  D AŒŒ Arg1 .s/  : : : ŒŒ Arg%.A/ .s/ . Furthermore ŒŒ s 0  is equal to a a t 0 Œx1 =Arg1 .s/; : : : ; x%.A/ =Arg%.A/ .s/. Hence by definition of !, ŒŒ s  ! ŒŒ s 0 . The next lemma states the soundness of the second and third lines of the definition of M. It also permits concluding that there are no infinite paths labelled by  in M. Lemma 4.5. We have the following properties: 1. let s 2 WStacks be a ' -stack with ' 2 V and s 0 2 WStacks be a -stack  with 2 V [ N . If s ! s 0 then ord.'/ 6 ord. / and ŒŒ s  D ŒŒ s 0  with M j topn ord.'/C1 .s/ j > j topn ord.'/C1 .s 0 / j;

Arnaud Carayol and Olivier Serre

1332

2. for all stack s 2 WStacks there exists a unique N -stack s 0 2 WStacks such 

that s ! s 0 . M

Proof. 1. Let ' be a variable in V and let s be a ' -stack in WStacks.We distinguish two cases depending on the order of the ' . Assume that ' is of ground type and that Argrk.'/ .pop1 .s// is some r WW t 2 CStacks.  We have by definition of M that s ! s 0 D pusht1 .r/. To show that ŒŒ s  is equal M

to ŒŒ s 0 , we simply unfold the definitions. def

def

ŒŒ s  D ŒŒ pop1 .s/ WW '  D ŒŒ Argrk.'/ .pop1 .s//  Def. 4.5

def

D

D ŒŒ r WW t 

def

ŒŒ pusht1 .r/  D ŒŒ s 0 :

Assume that s 0 D pusht1 .r/ is a -stack for some 2 N [ V . We have ord. / > ord.'/ D 0. As j Argk .pop1 .s// j 6 j s j 2, we have that j topnC1 .s/ D s j > j topnC1 .s 0 / D s 0 j. Assume that ' is of type  D 1 !    ! %.'/ ! o of order k > 0. Assume that Argrk.'/ .pop1 .pushn kC1 .s/// is equal to r WW t 2 CStacks. First recall that, from kC1 .r/. Lemma 4.1, we have that tW  . We have by definition that s ! s 0 D pusht;n 1

Let us show that ŒŒ s  D ŒŒ s 0 . Using Fact 2, we have that

M

ŒŒ s 0  D ŒŒ pop1 .s 0 / WW top1 .s 0 /  ŒŒ Arg1 .collapse.s 0 //     ŒŒ Arg%.'/ .collapse.s 0 //  ƒ‚ …„ ƒ‚ … „ „ ƒ‚ … DŒŒ pop1 .s/WW'  .1/ DŒŒ Arg1 .s/  .2/ DŒŒ Arg%.'/ .s/  .2/ D ŒŒ pop1 .s/ WW ' ŒŒ Arg1 .s/  : : : ŒŒ Arg%.'/ .s/  D ŒŒ s :

The equalities denoted .1/ and .2/ are proven below: def

ŒŒ pop1 .s 0 / WW top1 .s 0 /  D

Lemma 4.2

D

ŒŒ r WW t  D ŒŒ Argrk.'/ .pop1 .pushn

kC1 .s/// 

(1)

ŒŒ Argrk.'/ .pop1 .s//  D ŒŒ pop1 .s/ WW ' 

and for all i 2 Œ1; %.'/, ŒŒ Argi .collapse.s 0 //  D

D

ŒŒ Argi .collapse.pusht;n 1 ŒŒ Argi .popn

Lemma 4.3

D

D

ŒŒ Argi .popn

kC1

.r/// 

kC1 .r//  kC1 .pop1 .pushn kC1 .s//// 

(2)

ŒŒ Argi .s/ :

As both ' and t have type  , and as t is of the form t1 : : : t` for some ` > 0, it directly follows that ord.'/ 6 ord. /. The inequality j topn ord.'/C1 .s/ j > j topn ord.'/C1 .s 0 / j follows from Lemma 4.3.

35. Higher-order recursion schemes and their automata models

1333

2. Assume, to get a contradiction, that there exists an infinite sequence .si /i >0 of 

stacks in WStacks such that for all i > 0, si ! si C1 . For all i > 0, we let ti denote M

the top symbol of si and 'i the head symbol of ti . According to (1), the order of the 'i increases and hence is ultimately constant. Let j and k be such that, for all i > j , ord.'i / is equal to k . Using (1), the size of the topn kC1 .si / is strictly decreasing starting from j , which provides the contradiction. From Lemmas 4.4 and 4.5, M and S generate the same trees. Proposition 4.6. Tree? .S/ D Tree? .M/: Proof. By definition of M, only well-formed N -stacks can be the source of non-silent a transitions. Let s be a well-formed N -stack. If ŒŒ s  ! t for some a 2 †, then the S

a

a

N -stack s 0 such that s ! s 0 is such that ŒŒ s 0  D t . Conversely if s ! s 0 for some M

0

a

M

0

N -stack s , then ŒŒ s  ! ŒŒ s . S

From M we now define an n-CPDA A D h †; €; Q; ı; q0 i generating the same tree as M. The set of states Q is equal to ¹q0 ; q1 ; : : : ; q%.S/ ; q ; qV º where %.S/ denotes the maximal arity appearing in S. Intuitively, the initial state q0 is only used to go from .q0 ; Œ   Œ?1    n / to .q ; Œ   Œ?Z1    n /; the state q is used to mark N -stacks; for k 2 Œ1; %.S/, the state qk is used to the compute Argk .   /. The state qV is used to signal stacks that appear in the derivation of system M that are V -stacks. The transitions are given below.  ı.q0 ; ?; / D .q ; pushZ 1 /.

a

 If t starts with F 2 N and F x1 : : : x%.F / ! e 2 R: – ı.q ; t; a/ D .q ; pushe1 / if e starts with a symbol in N , – ı.q ; t; a/ D .qa ; pushe1 / if e starts with a variable.  If t is a term of the form 't1 : : : t` for some ' 2 V : – ı.qV ; t; / D .qrk.'/ ; pop1 / if ' is an order-0 variable, – ı.qV ; t; / D .qrk.'/ ; pushn kC1 I pop1 / if ' is a variable of order k > 0.  If t is a term of the form 't1 : : : t` for some ' 2 V [ N : – ı.qk ; t; / D .qrk.tk / ; pop1 I pusht1k / if k 6 ` and tk W o, – ı.qk ; t; / D .qrk.tk / ; pop1 I pusht1k ;n hC1 / if k 6 ` and tk has order h > 0, – ı.qk ; t; / D .qk ` ; collapse/ if k > `.

where, for all t 2 € , qrk.t / designates the state qrk.x/ if t starts with a variable x and q otherwise, and op1 I op2 means applying op1 followed by op2 . An equivalent CPDA using only one operation per transition may be obtained by adding intermediary states. Remark 4.7. The previously given CPDA uses several operations per transition. An equivalent CPDA using only one operation per transition may be obtained by adding intermediary states.

Arnaud Carayol and Olivier Serre

1334

Theorem 4.8. For every labelled recursion scheme S of order-n, there is an n-CPDA A that generates the same tree. Moreover, the number of states in A is linear in the maximal arity appearing in S, and its alphabet is of size linear in the one of S. Proof (sktech). Let s be a well-formed stack. We denote by hhsii the configuration of A defined by hhsii D .q ; s/ if s is an N -stack and hhsii D .qrk.x/ ; s/ if s is a V -stack whose topmost symbol starts with a variable x . a

a

A



M 

M

A

Clearly for any well-formed N -stack s , s ! s 0 if and only if hhsii ! hhs 0 ii. For any V -stack s , if s

! s 0 , then hhsii

! hhs 0 ii as intuitively ! combines A



the definition of both ! and Argk .  /. Conversely, for all V -stacks, if s ! s 0 and M





A

A

M

hhsii ! hhs2 ii then hhs2 ii ! hhs 0 ii.

5. Safe higher-order recursion schemes In this last section, we consider a syntactic subfamily of recursion schemes called safe recursion schemes. The safety constraint was introduced in [36], but was already implicit in the work of Damm [22] (also see [24], p. 44, for a detailed presentation). This restriction constrains the way variables are used to form argument subterms of the rules’ right-hand sides. Definition 5.1 ([36]). A recursion scheme is safe if none of its right-hand sides contains an argument-subterm of order k containing a variable of order strictly less than k . Other than the scheme SU generating the tree of the Urzyczyn language, all examples we gave are safe schemes. The scheme SU is not safe, as the production .

F 'x ! F .F 'x/y.Hy/

contains in its right-hand side the argument subterm F 'xW o ! o of order-1, which contains the variable xW o of order-0. Urzyczyn conjectured that (a slight variation) of the tree TU generated by SU , though generated by a order-2 scheme, could not be generated by any safe scheme. This conjecture was recently proved by Parys in [49] and [50]. Remark 5.1. In [36], the notion of safety is only defined for homogeneous schemes. A type is said to be homogeneous if it is either ground or equal to 1 !    ! n ! o where the i ’s are homogeneous and ord.1 / >    > ord.n /. By extension, a scheme is homogeneous if all its non-terminal symbols have homogeneous types. For instance, .o ! o/ ! o ! o is an homogeneous type whereas o ! .o ! o/ ! o is not. In Proposition 5.5, we will see that dropping the homogeneity constraint in the definition of safety does not change the family of generated trees.

35. Higher-order recursion schemes and their automata models

1335

5.1. Safety and the Translation from Schemes to CPDA. In [36] and [37], the motivation for considering the safety constraint was that safe schemes can be translated into a subfamily of the collapsible automata, namely higher-order pushdown automata. Recall that an order-k pushdown automaton is an order-k CPDA that does not use the collapse operation (hence, links are useless). Theorem 5.2 below shows that the translation of recursion schemes into collapsible automata presented in § 4, when applied to a safe scheme, yields an automaton in which links are not really needed. Obviously the automaton performs the collapse operations but whenever it is applied to an order-k link, its target is the .k 1/-stack below the top .k 1/-stack. Hence any collapse operation can safely be replaced by a popk operation. This notion is captured by the notion of link-free CPDA. Definition 5.2. A CPDA is link-free if for every configuration .p; s/ reachable from the initial configuration and for every transition ı.p; top1 .s/; a/ D .q; collapse/, we have collapse.s/ D pop` .s/, where ` is the order of the link attached to top1 .s/.

Theorem 5.2. The translation of § 4 applied to a safe recursion scheme yields a linkfree collapsible automaton. We get the following corollary extending a previous result from [36], by dropping the homogeneity assumption. Corollary 5.3. Order-k safe schemes and order-k pushdown automata generate the same trees. 5.2. Damm’s view of safety. The safety constraint may seem unnatural and purely ad hoc. Inspired by the constraint of derived types of Damm, we introduce a more natural constraint, Damm safety, which leads to the same family of trees [22]. Damm safety syntactically restricts the use of partial application: in any argument subterm of a right-hand side, if one argument of some order-k is provided, then all arguments of order-k must also be provided. For instance if f W o ! o, cW o and 'W .o ! o/ ! .o ! o/ ! o ! o ! o, the terms ' , 'ff and 'ff cc can appear as argument subterms in a Damm-safe scheme, but 'f and 'ff c are forbidden. Definition 5.3 ([22]). A recursion scheme is Damm safe if it is homogeneous and all argument-subterms appearing in a right hand-side are of the form 't1 : : : tk with 'W 1 !    ! n ! o and either k 2 ¹0; nº or ord.k / > ord.kC1 /. Remark 5.4. The second constraint in the definition of Damm safety can be reformulated as follows: all argument subterms of an argument subterm of order-k appearing in a right-hand side have at least order-k .

Using Remark 5.4, it is easy to see that Damm-safety implies the safety constraint. However, the safety constraint, even when restricted to homogeneous schemes, is less restrictive than Damm safety. Consider, for instance, a variable xW o and non-terminals GW o ! o ! o and C W o. Then Gx cannot appear as an argument-subterm in a safe scheme, but GC can. As GC does not satisfy the Damm-safety constraint, safety is syntactically more permissive than Damm-safety. However unsurprisingly, any safe

1336

Arnaud Carayol and Olivier Serre

scheme can be transformed into an equivalent Damm-safe scheme of the same order. The transformation consists of converting the safe scheme into a higher-order pushdown automaton (Corollary 5.3) and then converting this automaton back to a scheme using the translation of [36]. In fact, this translation of higher-order pushdown automata into safe schemes produces Damm-safe schemes. Proposition 5.5. Damm-safe schemes are safe and for every safe scheme, there exists a Damm-safe scheme of the same order generating the same tree. This proposition in particular shows that any safe scheme can be transformed into an equivalent homogeneous one. Broadbent, using the translation from schemes into CPDA, showed that any scheme (possibly unsafe) can be converted into an equivalent one that is homogeneous [7]. Recently, Parys gave a new proof of this result by directly manipulating the scheme; he also provided another construction that preserves safety [51].

References [1] K. Aehlig, A finite semantics of simply-typed lambda terms for infinite runs of automata. In Computer science logic (Z. Ésik, ed.). Proceedings of the 20 th International Workshop (CSL 2006), the 15th Annual Conference of the EACSL, held in Szeged, September 25–29, 2006. Lecture Notes in Computer Science, 4207. Springer, Berlin, 2006, 104–118. MR 2334418 Zbl 1225.68107 q.v. 1298 [2] K. Aehlig, J. de Miranda, and L. Ong, Safety is not a restriction at level 2 for string languages. In Foundations of software science and computation structures (V. Sassone, ed.). Proceedings of the 8th international conference, FOSSACS 2005, held as part of the joint European conferences on theory and practice of software, ETAPS 2005, Edinburgh, UK, April 4–8, 2005. Lecture Notes in Computer Science 3441, Springer, Berlin, 2005, 490–501. Zbl 1119.68102 q.v. 1297, 1298 [3] A. Arnold and D. Niwiński, Rudiments of -calculus. Studies in Logic and the Foundations of Mathematics, 146. North-Holland Publishing Co., Amsterdam, 2001. MR 1854973 Zbl 0968.03002 q.v. 1298 [4] T. Ball and S. K. Rajamani, The SLAM project: Debugging system software via static analysis. ACM SIGPLAN Notices 37 (2002), no. 1, 1–3. Proceedings of the 29 th ACM Symposium on Principles of Programming Language. q.v. 1298 [5] V. Bárány, E. Grädel, and S. Rubin, Automata-based presentations of infinite structures. In Finite and algorithmic model theory (J. Esparza, C. Michaux, and C. Steinhorn, eds.) London Mathematical Society Lecture Note Series, 379. Cambridge University Press, Cambridge, 2011, 1–76. MR 2856983 Zbl 1246.03056 q.v. 1297 [6] H. P. Barendregt, The lambda calculus. Its syntax and semantics. Revised edition. Studies in Logic and the Foundations of Mathematics, 103. North-Holland Publishing Co., Amsterdam, 1984. MR 0774952 Zbl 0551.03007 q.v. 1305 [7] C. Broadbent, On collapsible pushdown automata, their graphs and the power of links. Ph.D. thesis. Oxford University, Oxford, 2011. q.v. 1336

35. Higher-order recursion schemes and their automata models

1337

[8] C. Broadbent, A. Carayol, C.-H. L. Ong, and O. Serre, Recursion schemes and logical reflection. In 25 th Annual IEEE Symposium on Logic in Computer Science LICS 2010. Proceedings of the International Symposium held in Edinburgh, July 11–14, 2010. IEEE Computer Society, Los Alamitos, CA, 2010, 120–129. MR 2953901 IEEEXplore 5570928 q.v. 1298 [9] C. Broadbent, A. Carayol, M. Hague, and O. Serre, C-SHORe: a collapsible approach to higher-order verification. In Proceedings of the 18 th ACM SIGPLAN international conference on functional programming (G. Morrisett and T. Uustalu, eds.) ICFP ’13. Held in Boston, MA, USA, September 25–27, 2013. Association for Computing Machinery (ACM), New York, 2013, 13–24. Zbl 1323.68364 q.v. 1299 [10] C. Broadbent and N. Kobayashi, Saturation-based model checking of higher-order recursion schemes. In Computer science logic 2013 (S. Ronchi Della Rocca, ed.). Papers from the 22nd Annual Conference of the EACSL (CSL ’13) held in Torino, September 2–5, 2013. LIPIcs. Leibniz International Proceedings in Informatics, 23. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2013, 129–148. MR 3111737 Zbl 1356.68141 q.v. 1299 [11] T. Cachat, Higher order pushdown automata, the Caucal hierarchy of graphs and parity games. In Automata, languages and programming (J. C. M. Baeten, J. K. Lenstra, J. Parrow, and G. J. Woeginger, eds.). Proceedings of the 30 th International Colloquium (ICALP 2003) held at the Technische Universiteit Eindhoven, Eindhoven, June 30–July 4, 2003. Lecture Notes in Computer Science, 2719. Springer, Berlin, 2003, 556–569. MR 2080728 Zbl 1039.68063 q.v. 1298 [12] A. Carayol, A. Meyer, M. Hague, C.-H. L. Ong, and O. Serre, Winning regions of higherorder pushdown games. In 2008 23 rd Annual IEEE Symposium on Logic in Computer Science. Held in Pittsburgh, PA, June 24–27, 2008. IEEE Computer Society, Los Alamitos, CA, 2008, 19–204. IEEEXplore 4557911 q.v. 1298 [13] A. Carayol and O. Serre, Collapsible pushdown automata and labeled recursion schemes equivalence, safety and effective selection. In Proceedings of the 2012 27 th Annual ACM/IEEE Symposium on Logic in Computer Science. Held at the University of Dubrovnik, Dubrovnik, June 25–28, 2012. IEEE Computer Society, Los Alamitos, CA, 2012, 165–174. MR 3050437 Zbl 1360.68543 IEEEXplore 6280435 q.v. 1298 [14] A. Carayol and S. Wöhrle, The Caucal hierarchy of infinite graphs in terms of logic and higher-order pushdown automata. In FST TCS 2003: Foundations of software technology and theoretical computer science (P. K. Pandya and J. Radhakrishnan, eds.). Proceedings of the 23rd Conference held in Mumbai, December 15–17, 2003. Lecture Notes in Computer Science, 2914. Springer, Berlin, 2003, 112–123. MR 2093642 Zbl 1205.03022 q.v. 1297 [15] D. Caucal, On infinite terms having a decidable monadic theory. In Mathematical foundations of computer science 2002 (K. Diks and W. Rytter, eds.). Papers from the 27 th International Symposium (MFCS 2002) held in Warsaw, August 26–30, 2002. Lecture Notes in Computer Science, 2420. Springer, Berlin, 2002, 165–176. MR 2064455 Zbl 1014.68077 q.v. 1297, 1298 [16] B. Courcelle, A representation of trees by languages I. Theoret. Comput. Sci. 6 (1978), no. 3, 255–279. MR 0495225 Zbl 0377.68040 q.v. 1296, 1309 [17] B. Courcelle, A representation of trees by languages II. Theoret. Comput. Sci. 7 (1978), no. 1, 25–55. MR 0495226 Zbl 0428.68088 q.v. 1296, 1309 [18] B. Courcelle, The monadic second-order logic of graphs IX: machines and their behaviours. Theoret. Comput. Sci. 151 (1995), no. 1, 125–162. Topology and completion in semantics (Chartres, 1993). MR 1362151 Zbl 0872.03026 q.v. 1297

1338

Arnaud Carayol and Olivier Serre

[19] B. Courcelle and M. Nivat, The algebraic semantics of recursive program schemes. In Mathematical foundations of computer science, 1978 (J. Winkowski, ed.) Proceediings of the 7 th Symposium, Zakopane, 1978. Lecture Notes in Computer Science, 64. Springer, Berlin, 1978, 16–30. MR 0519827 Zbl 0384.68016 q.v. 1296 [20] W. Damm, Higher type program schemes and their tree languages. In Theoretical computer science (H. Tzschach, H. Waldschmidt, and H. K. Walter, eds.). 3rd GI (Gesellschaft für Informatik) Conference. Fachtagung Theoretische Informatik held in Darmstadt, March 28–30, 1977. Lecture Notes in Computer Science, 48. Springer, Berlin, 1977, 51–72. MR 0521202 Zbl 0358.68009 q.v. 1296 [21] W. Damm, Languages defined by higher type program schemes. In Automata, languages and programming (A. Salomaa and M. Steinby, eds.). Fourth Colloquium, held at the University of Turku, Turku, July 18–22, 1977. Lecture Notes in Computer Science, 52. Springer, Berlin, 1977, 164–179. MR 0483620 Zbl 0356.68078 q.v. 1296 [22] W. Damm, The IO- and OI-hierarchies. Theoret. Comput. Sci. 20 (1982), no. 2, 95–207. MR 0666544 Zbl 0478.68012 q.v. 1296, 1307, 1334, 1335 [23] W. Damm and A. Goerdt, An automata-theoretical characterization of the OI-hierarchy. Inform. and Control 71 (1986), no. 1–2, 1–32. MR 0864744 Zbl 0628.68061 q.v. 1296 [24] J. de Miranda, Structures generated by higher-order grammars and the safety constraint. Ph.D. thesis. University of Oxford, Oxford, 2006. q.v. 1295, 1334 [25] J. Engelfriet, Iterated pushdown automata and complexity classes. In Proceedings of the 15 th Annual ACM Symposium on Theory of Computing (D. S. Johnson, R. Fagin, M. L. Fredman, D. Harel, R. M. Karp, N. A. Lynch, C. H. Papadimitriou, R. L. Rivest, W. L. Ruzzo, and J. I. Seiferas, eds.) STOC 1983. Held in Boston, MA, April 25–27, 1983. Association for Computing Machinery, New York, 1983, 365–373. q.v. 1296 [26] J. Engelfriet, Iterated stack automata and complexity classes. Inform. and Comput. 95 (1991), no. 1, 21–75. MR 1133778 Zbl 0758.68029 q.v. 1296 [27] J. Engelfriet and E. M. Schmidt, IO and OI. I. J. Comput. System Sci. 15 (1977), no. 3, 328–353. MR 0502290 Zbl 0366.68053 q.v. 1296 [28] J. Engelfriet and E. M. Schmidt, IO and OI. II. J. Comput. System Sci. 16 (1978), no. 1, 67–99. MR 0502291 Zbl 0371.68020 q.v. 1296 [29] J. Flum, E. Grädel, and T. Wilke (eds.), Logic and automata. History and perspectives. Texts in Logic and Games, 2. Amsterdam University Press, Amsterdam, 2008. MR 2549260 Zbl 1198.03006 q.v. 1298 [30] S. Garland and D. Luckham, Program schemes, recursion schemes and formal languages. J. Comput. System Sci. 7 (1973), 119–160. MR 0315930 Zbl 0277.68010 q.v. 1295 [31] E. Grädel, W. Thomas, and T. Wilke (eds.), Automata, logics, and infinite games. A guide to current research. Lecture Notes in Computer Science, 2500. Springer, Berlin, 2002. MR 2070731 Zbl 1011.00037 q.v. 1298 [32] M. Hague, A. S. Murawski, C.-H. L. Ong, and O. Serre, Collapsible pushdown automata and recursion schemes. In 2008 23 rd Annual IEEE Symposium on Logic in Computer Science. Held in Pittsburgh, PA, 2008. IEEE Computer Society, Los Alamitos, CA, June 24–27, 2008, 452–461. IEEEXplore 4557934 q.v. 1297, 1299, 1300 [33] J. M. E. Hyland and C.-H. L. Ong, On Full Abstraction for PCF: I. Models, observables and the full abstraction problem. II. Dialogue games and innocent strategies. III. A fully abstract and universal game model. Inform. and Comput. 163 (2000), no. 2, 285–408. MR 1808886 Zbl 1006.68027 q.v. 1298

35. Higher-order recursion schemes and their automata models

1339

[34] K. Inaba and S. Maneth, The complexity of tree transducer output languages. In FST & TCS 2008: IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science. (R. Hariharan, M. Mukund, and V. Vinay, eds.) LIPIcs. Leibniz International Proceedings in Informatics, 2. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2008, 244–255. MR 2874087 Zbl 1248.68205 q.v. 1296 [35] K. Indermark, Schemes with recursion on higher types. In Mathematical Foundations of Computer Science (A. W. Mazurkiewicz, ed.). Proceedings of the 5th Symposium, 1976. Lecture Notes in Computer Science, 45. Springer, Berlin, 1976. 352–358. Zbl 0337.68015 q.v. 1296 [36] T. Knapik, D. Niwiński, and P. Urzyczyn, Deciding monadic theories of hyperalgebraic trees. In Typed lambda calculi and applications. (S. Abramsky, ed.) Proceedings of the 5th International Conference (TLCA 2001) held in Kraków, May 2–5, 2001. Samson Abramsky. Lecture Notes in Computer Science, 2044. Springer, Berlin, 2001, 253–267. MR 1890277 Zbl 0981.03012 q.v. 1297, 1334, 1335, 1336 [37] T. Knapik, D. Niwiński, and P. Urzyczyn, Higher-order pushdown trees are easy. In Foundations of software science and computation structures (M. Nielsen and U. Engberg, eds.). Proceedings of the 5th International Conference (FOSSACS 2002) held as part of the Joint European Conference on Theory and Practice of Software (ETAPS 2002) in Grenoble, April 10–12, 2002. Lecture Notes in Computer Science, 2303. Springer, Berlin, 2002, 205–222. q.v. 1297, 1298, 1335 [38] T. Knapik, D. Niwiński, P. Urzyczyn, and I. Walukiewicz, Unsafe grammars and panic automata. In Automata, languages and programming (L. Caires, G. F. Italiano, L. Monteiro, C. Palamidessi, and M. Yung, eds.) Proceedings of the 32 nd International Colloquium (ICALP 2005) held in Lisbon, July 11–15, 2005. Lecture Notes in Computer Science, 3580. Springer, Berlin, 2005, 1450–1461. MR 2184732 Zbl 1081.68054 q.v. 1297, 1298 [39] N. Kobayashi, Model-checking higher-order functions. In PPDP ’09: Proceedings of the 11th ACM SIGPLAN conference on Principles and practice of declarative programming (A. Porto and F. J. López-Fraguas, eds.). Association for Computing Machinery, New York, 2009, 25–36. q.v. 1299 [40] N. Kobayashi, Types and higher-order recursion schemes for verification of higher-order programs. In Proceedings of the 36 th annual ACM SIGPLAN-SIGACT symposium on principles of programming languages (Z. Shao and B. C. Pierce, eds.) POPL ’09, Savannah, GA, January 18–24, 2009. Association for Computing Machinery, New York, 2009, 416–428. Zbl 1315.68099 q.v. 1298 [41] N. Kobayashi, A practical linear time algorithm for trivial automata model checking of higher-order recursion schemes. In Foundations of software science and computational structures (M. Hofmann, ed.). Proceedings of the 14th International Conference (FOSSACS 2011) held as part of the Joint European Conferences on Theory and Practice of Software (ETAPS 2011) in Saarbrücken, March 26–April 3, 2011. Lecture Notes in Computer Science, 6604. Springer, Berlin, 2011, 260–274. MR 2813615 Zbl 1326.68187 q.v. 1299 [42] N. Kobayashi, Model checking higher-order programs. J. ACM 60 (2013), no. 3, Art. 20, 62 pp. MR 3078707 Zbl 1281.68157 q.v. 1299 [43] N. Kobayashi and C.-H. L. Ong, A type system equivalent to the modal mu-calculus model checking of higher-order recursion schemes. In 24 th Annual IEEE Symposium on Logic in Computer Science. Proceedings of the symposium (LICS 2009) held at UCLA, Los Angeles, CA, August 11–14, 2009. IEEE Computer Society, Los Alamitos, CA, 2009, 179–188. MR 2932382 IEEEXplore 5230581 q.v. 1298

1340

Arnaud Carayol and Olivier Serre

[44] A. N. Maslov, Иерархия индексных языков произвольного уровня. Dokl. Akad. Nauk SSSR 217 (1974), 1013–1016. English translation, The hierarchy of indexed languages of an arbitrary level. Soviet Math. Dokl. 15 (1974), 1170–1174. MR 0366113 Zbl 0316.68042 q.v. 1296 [45] A. N. Maslov, Многоуровневые магазинные автоматы. Probl. Peredachi Inf. 12 (1976), no. 1, 55–62. English translation, Multilevel stack automata. Problems Inform. Transmission 12 (1976), no. 1, 38–42. q.v. 1296, 1313 [46] R. P. Neatherway, S. J. Ramsay, and C.-H. L. Ong, A traversal-based algorithm for higherorder model checking. ACM SIGPLAN Notices 47 (2012), no. 9, 353–364. Proceedings of the 17 th ACM SIGPLAN International Conference on Functional Programming. Zbl 1291.68264 q.v. 1299 [47] M. Nivat, Langages algébriques sur le magma libre et sémantique des schémas de programme. In Automata, languages and programming (M. Nivat, ed.). Proceedings of a symposium organized by Institut de Recherche d’Informatique et d’Automatique, July 3–7, 1972. North-Holland Publishing Co., Amsterdam and London, and American Elsevier Publishing Co., New York, 1973, 293–308. MR 0383813 Zbl 0279.68010 q.v. 1295 [48] C.-H. L. Ong, On model-checking trees generated by higher-order recursion schemes. 21 st Annual IEEE Symposium on Logic in Computer Science (LICS ’06). Held in Seattle, WA, August 12–15, 2006. IEEE Computer Society, Los Alamitos, CA, 2006, 81–90. IEEEXplore 1691219 q.v. 1298 [49] P. Parys, Collapse operation increases expressive power of deterministic higher order pushdown automata. In 28 th International Symposium on Theoretical Aspects of Computer Science (T. Schwentick and C. Dürr, eds.). Proceedings of the symposium (STACS ’11) held in Dortmund, March 10–12, 2011. LIPIcs. Leibniz International Proceedings in Informatics, 9. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2011, 603–614. MR 2853463 Zbl 1253.68206 q.v. 1312, 1334 [50] P. Parys, On the significance of the collapse operation. In Proceedings of the 2012 27 th Annual ACM/IEEE Symposium on Logic in Computer Science. Held at the University of Dubrovnik, Dubrovnik, June 25–28, 2012. Held in Nairobi, August 25–28, 2020. IEEE Computer Society, Los Alamitos, CA, 2012, 521–530. MR 3050473 Zbl 1360.68569 IEEEXplore 9219832 q.v. 1334 [51] P. Parys, Homogeneity without loss of generality. In 3 rd International Conference on Formal Structures for Computation and Deduction (H. Kirchner, ed.). FSCD 2018, July 9–12, 2018, Oxford, United Kingdom. Leibniz International Proceedings in Informatics, 108. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2018, art. no. 27, 15 pp. MR 3829384 q.v. 1336 [52] P. Parys, Recursion schemes and the WMSOCU logic. In 35 th Symposium on Theoretical Aspects of Computer Science (R. Niedermeier and B. Vallée, eds.). STACS 2018, February 28–March 3, 2018, Caen, France. Selected papers. LIPIcs. Leibniz International Proceedings in Informatics, 96. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2018, art. no. 53, 16 pp. MR 3779334 Zbl 07228444 q.v. 1298 [53] M. O. Rabin, Decidability of second-order theories and automata on infinite trees. Trans. Amer. Math. Soc. 141 (1969), 1–35. MR 0246760 Zbl 0221.02031 q.v. 1297 [54] S. J. Ramsay, R. P. Neatherway, and C. L. Ong, A type-directed abstraction refinement approach to higher-order model checking. In POPL ’14: Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (S. Jagannathan

35. Higher-order recursion schemes and their automata models

[55]

[56]

[57] [58]

[59] [60] [61]

[62] [63] [64]

1341

and P. Sewell, eds.) Association for Computing Machinery, New York, 2009, 61–72. q.v. 1299 S. Salvati and I. Walukiewicz, Krivine machines and higher-order schemes. In Automata, languages and programming (L. Aceto, M. Henzinger, and J. Sgall, eds.). Part II. Proceedings of the 38th International Colloquium (ICALP 2011) held in Zürich, July 4–8, 2011. Lecture Notes in Computer Science, 6756. Springer, Berlin, 2011, 162–173. MR 2852423 Zbl 1333.68111 q.v. 1298 S. Salvati and I. Walukiewicz, Recursive schemes, Krivine machines, and collapsible pushdown automata (A. Finkel, J. Leroux, and I. Potapov, eds.). Proceedings of the 6 th International Workshop (RP 2012) held at the University of Bordeaux, Bordeaux, September 17–19, 2012. Lecture Notes in Computer Science, 7550. Springer, Berlin, 2012, 6–20. MR 3040103 Zbl 1310.68138 q.v. 1298 S. Schwoon, Model-checking pushdown systems. Ph.D. thesis. Technische Universität München, Munich, 2002. q.v. 1298, 1299 G. Sénizergues, The equivalence problem for deterministic pushdown automata is decidable. In Automata, languages and programming (P. Degano, R. Gorrieri, and A. MarchettiSpaccamela, eds.). Proceedings of the 24th International Colloquium (ICALP ’97) held in Bologna, July 7–11, 1997. Lecture Notes in Computer Science, 1256. Springer, Berlin, 1997, 671–681. MR 1616225 Zbl 1401.68168 q.v. 1296 G. Sénizergues, L.A/ D L.B/? a simplified decidability proof. Theoret. Comput. Sci. 281 (2002), no. 1–2, 555–608. Selected papers in honour of Maurice Nivat. MR 1909588 Zbl 1050.68096 q.v. 1296 C. Stirling, Decidability of bisimulation equivalence for pushdown processes. Technical Report EDI-INF-RR-0005. School of Informatics. University of Edinburgh, Edinburgh, 2000. q.v. 1296, 1301 C. Stirling, Schema revisited. In Computer science logic (P. Clote and H. Schwichtenberg, eds.). Proceedings of the 14th International Workshop (CSL 2000) held at the Annual Conference of the European Association for Computer Science Logic (EACSL) in Fischbachau, August 21–26, 2000. Lecture Notes in Computer Science, 1862. Springer, Berlin, 2000, 126–138. MR 1859439 Zbl 0973.68531 q.v. 1296 C. Stirling, Decidability of DPDA equivalence. Theoret. Comput. Sci. 255 (2001), no. 1–2, 1–31. MR 1819064 Zbl 0974.68056 q.v. 1296 W. Thomas, Languages, automata, and logic. In Handbook of formal languages (G. Rozenberg and A. Salomaa, eds.). Vol. 3. Beyond words. Springer, Berlin, 1997, 389–455. MR 1470024 q.v. 1298 I. Walukiewicz, Pushdown processes: games and model-checking. Inform. and Comput. 164 (2001), no. 2, 234–263. MR 1816150 Zbl 1003.68072 q.v. 1298

Chapter 36

Analysis of probabilistic processes and automata theory Kousha Etessami

Contents 1. 2. 3. 4. 5.

Introduction . . . . . . . . . . . . . Definitions and Background . . . . . Analysis of finite-state Markov chains Analysis of finite-state MDPs . . . . Adding recursion to MCs and MDPs

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

1343 1347 1362 1368 1372

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1379

1. Introduction Markov chains are a fundamental mathematical model for systems that evolve randomly over time. They thus play a central role in stochastic modeling in many fields. In settings where in addition to stochastic behaviour we also allow control (or non-determinism), so that the system state evolves partly randomly and partly based on decisions by a controller, the resulting model is called a Markov decision process (MDP). MDPs give rise to a variety of stochastic dynamic optimisation problems, depending on what objective the controller wishes to optimise. Historically, automata theory developed entirely separately from the theory of stochastic processes and stochastic optimal control, with each developed by a separate mathematical community having distinct motives. It turns out, however, that there are fruitful connections between these fields. In particular, a number of classic infinitestate automata-theoretic models, such as one-counter automata, context-free grammars, and pushdown automata, are in fact closely related to corresponding classic and wellstudied countably infinite-state stochastic processes. Roughly speaking, such automatatheoretic models share the same (or, a closely related) underlying state transition system with corresponding classic stochastic processes. Upon reflection, it should not be entirely surprising that this is the case. After all, Markov chains are nothing other than probabilistic state transition systems. In order for a class of infinite-state Markov chains to be considered important, it should not only model interesting real-world phenomena, but it should also hopefully be “analyzable” in some sense. Better yet, its analyses should have reasonable computational complexity. But these same criteria also apply to infinite-state automata-theoretic models: their relevance is at least partly dictated by whether we have efficient algorithms for analyzing them.

1344

Kousha Etessami

Clearly, we can not devise effective algorithms for analyzing arbitrary finitelypresented countably infinite-state transition systems. For example, Turing machines are clearly finitely presented, but we can not decide whether a Turing machine halts, i.e., whether we can reach the halting configuration from the start configuration. Furthermore, if we consider probabilistic Turing machines (PTMs), we easily see that there can not exist any algorithm that computes any non-trivial approximation of the probability that a given probabilistic Turing machine halts. Researchers working on automata theory and on stochastic processes have, over time, arrived at related classes of “analyzable” infinite-state transition systems, and they have built automata-theoretic structure, or stochastic structure, upon them to suit their own purposes. Let us mention a couple of examples. Consider the derivation graph of a context-free grammar (CFG), in which states consist of sequences of terminals and nonterminals and with a simultaneous derivation law defining transitions between states, so all non-terminals in a sequence are expanded at once according to rules associated with those nonterminals. The state transition systems obtained this way are intimately related to the underlying state transition systems of multi-type branching processes (BP), a classic stochastic process ([33]). Basically, the transition system for the BP corresponding to a CFG is the quotient of the CFG’s transition system under the equivalence that equates any two sequences of terminals and nonterminals that contain the same number of occurrences of each nonterminal symbol in them (see [28] for a detailed explanation). Likewise, one-counter automata share essentially the same state transition system with quasi-birth-death processes (QBDs) (see [21] for the details). QBDs are a class of stochastic processes heavily studied in queuing theory, where the counter can basically be used to keep track of the number of jobs in the queue. A generalisation of QBDs, referred to as tree-like QBDs in the queuing theory literature, turns out to share its state transition graphs with pushdown automata (again, see [21] for the precise correspondence). The aforementioned stochastic models (in discrete-time) can all be formulated as subcases, in precise ways, of a model obtained by adding a natural recursion feature to finite-state Markov chains, called recursive Markov chains (RMCs) [28]. RMCs are also essentially equivalent to probabilistic pushdown systems [17] (see [28] for the precise sense of this equivalence). RMCs and RMDPs constitute natural abstract models of the control flow of probabilistic procedural programs with recursion. Of course, being analyzable as automata does not automatically imply that the corresponding class of probabilistic transition systems or MDPs is also analyzable, nor the other way around. For some classes of transition systems, effective/efficient “analyzability” does coincide in the two settings, whereas for others it does not. We shall see examples of both. This chapter surveys some basic algorithmic results for the analysis of Markov chains (MCs) and Markov decision processes (MDPs), in both finite-state settings, as well as in finitely presented countably-infinite state settings. We will consider a few different analyses, focusing on computation of hitting (reachability) probabilities and on model checking. But we will also discuss important reward-based analyses.

36. Analysis of probabilistic processes and automata theory

1345

We will also emphasise computational complexity considerations for the relevant problems. Finally, we shall very briefly mention the extension from MDPs to stochastic games and give some references to the relevant literature. Algorithmic analyses of MCs and MDPs, including transient analyses, steady state analyses, optimal reward analyses, and model checking, play an important role in many application areas. A sampling of the many application areas where stochastic modeling and analysis play a role includes: queueing theory, computational biology, natural language processing, verification, economics, finance, and operations research in general. Automata-theoretic models and methods come into play for analysis of stochastic systems in several ways. To begin with, we can view a Markov chain as a probabilistic state transition system (or probabilistic automaton). For model checking of MCs (and, respectively, MDPs), one is interested in determining the (optimal) probability with which a random walk on the MC (respectively, on the MDP using a chosen strategy for the controller) satisfies a given temporal property. The temporal property may be specified, for example, as a linear temporal logic (LTL) formula, or as an ! -automaton. In the latter case the connection to automata theory is very direct: the properties are given by automata, or formalisms closely related to automata, so automata-theoretic methods are largely unavoidable. Even for classic analyses of MCs and MDPs, as already indicated, there are deeper connections between the transition graphs of models studied originally in automata theory, such as context-free grammars, one-counter automata, and pushdown automata, and classic stochastic models that have been studied extensively in the stochastic processes literature over many decades, such as (multi-type) branching processes and (quasi-)birth-death processes. Recently, these connections have been exploited to develop efficient algorithms for analyzing such stochastic models, and to obtain results about the computational complexity of such analyses. We will survey some of this work. The literature on analysis of such Markov chains and MDPs is large and growing, even when restricted to aspects involving automata-theoretic connections. Thus, in this brief survey I can only hope to cover a very limited selection of the many models and algorithms. We will restrict our attention entirely to finite or countable-state discrete-time Markov chains (MCs) and MDPs. After providing some basic background, in § 2.1 we will define formally a number of important analysis problems for MCs and MDPs, and discuss carefully the different computation and decision problems that they give rise to, and we give some examples of analyses on finite-state MCs and MDPs in § 2.2, to help build the intuition of the reader. We then proceed in subsequent sections to discuss algorithms for and complexity of these analyses, beginning in § 3, then proceeding in § 4 to finite-state MDPs. We then define recursive Markov chains and recursive MDPs in § 5. As already discussed, these recursive models subsume a number of stochastic models and MDPs which have tight automata-theoretic connections. We then briefly discuss algorithms and complexity of analyzing RMCs and RMDPs, and provide pointers to the by now large relevant literature.

1346

Kousha Etessami

One of the themes that will emerge in this survey is that for key analyses of both finite-state MCs and MDPs, as well as for analysing classes of infinite-state recursive MCs and MDPs, a basic ingredient in their algorithms will be to find a solution to a corresponding system of equations. In the case of MDPs, these equations correspond to the appropriate Bellman optimality equations for the classes of MDPs involved. In particular, in several settings we will need to find the least fixed point (least nonnegative) solution to a monotone system of equations. As the models become richer, these systems of equations become richer and more involved, e.g., going from linear to non-linear and requiring richer sets of algebraic operators (e.g., going from operators ¹Cº to ¹C; maxº, or to ¹C; º, and then to ¹C; ; maxº, etc.). The computational complexity of finding solutions to such systems of equations, which turn out to be very intriguing problems with interesting connections to several areas of research, are thus intimately connected to the computational complexity of basic analysis problems for such stochastic models. Finally, although we do not have room to discuss it in this chapter, let us briefly mention that one can also study the complexity of analysis problems for the extension of MDPs to stochastic games. In particular, in a two-player zero-sum stochastic game, there is not just one controller, but also an adversary, whose objective is the opposite of that of the controller. In turn-based stochastic games, also referred to as simple stochastic games (SSGs), and first studied by Condon [10], the two players control different states. Condon [10] showed that deciding whether the value is > 12 for a given SSG with the objectives of maximising and minimising the probability of hitting a target state for the two adversarial players is in NP\co-NP, and it is a major open problem whether this problem can be decided in P-time. (The problem is well known to be at least as hard as solving parity games and mean payoff games; see e.g. [45].) Although we shall not have room to discuss it this survey, we note that, again, key computational questions for stochastic games boil down to finding a solution for certain equation systems, and again, these equations become richer as the class of stochastic game models becomes richer, for example, going from ¹C; max; minº to ¹C; ; max; minº and to ¹C; Valº, and to ¹C; ; Valº, where Val is the value operator Val.M / that gives the minimax value of a 2-player zero-sum matrix game with matrix M . Note that Val clearly generalises both max and min. Equations over ¹C; Valº were already used by Shapley [39] to characterise the value of his original 2-player zero-sum stochastic games (which, in the parlance used in this paper, constituted stochastic games with a discounted total payoff objective). Shapley’s discounted equations for these defined a contraction mapping whose Banach fixed point gives the value of the stochastic game starting at each state. In other settings, e.g., in (concurrent) stochastic reachability (hitting) games, the equations define a monotone mapping whose Tarski least fixed point defines the value vector (note that Val). These games further generalise to infinite-state recursive settings and require monotone equations over ¹C; ; Valº for their value [25]. The reader interested in learning more about the stochastic game extensions of some of the models we discuss in this chapter can consult [30], [27], [22], and [18]. Warning. This chapter is certainly not a comprehensive survey of algorithms for analysis and verification of Markov chains and MDPs and their connections to automata

36. Analysis of probabilistic processes and automata theory

1347

theory. These are vast and rapidly growing subjects, with a huge existing theoretical and practical literature. No comprehensive survey is feasible now, and it is not our intention to attempt one. This chapter only highlights a few basic topics, based largely on the author’s own research interests, focusing on some connections between probabilistic processes and automata theory, and on recent research on algorithms for analyzing infinite-state recursive probabilistic systems. We do not mention many important related subjects. For example, we do not discuss existing software tools for analysis and model checking of probabilistic systems. There are many; see, e.g., [34]. Also, some software already exists for analysing recursive probabilistic systems; see, e.g., [43]. We also do not mention verification of probabilistic models against branching-time temporal logics like PCTL (see, e.g., Chapter 10 of [2] for one treatment of this in a textbook). We also do not discuss probabilistic (bi)simulation and related topics (again, see Chapter 10 of [2] for a brief treatment of this). There are many other topics related to both algorithms for analysis of probabilistic processes and to automata theory that we shall not mention at all.

2. Definitions and Background Although we will endeavour to provide most of the formal definitions needed for our purposes, our subject is vast and we will need to assume some familiarity with basic notions and facts from probability theory, the theory of Markov chains, and the theory of Markov decision processes. For background on these topics the reader is referred, for example, to the following excellent textbooks [9], [36], and [38]. Recall that a  -algebra over a set  is a set F  2 of subsets of , such that  2 F, and such that F is closed under countable union and under complementation with respect to . Recall that a probability space .; F; P/ consists of a set of outcomes,  (i.e., the sample space), a  -algebra F  2 of events over , and a probability measure PW F ! Œ0; 1. For a real-valued random variable (r.v.), X W  ! R, over a probability space .; F; P/, the expected value of X , when it exists, is de: R : x D noted E.X / D  Xd P. Note that, when E.X / is defined, E.X / 2 R Œ 1; C1. x and their We will sometimes need to consider extended-real-valued r.v.’s X W  ! R expectation. The theory for these r.v.’s is readily available (see, e.g., [9]), and consists of natural extensions to the definitions for real-valued r.v.’s and their expectation. A probability distribution over a finite or countably-infinite set, U , is a function P F W U ! Œ0; 1 such that u2U F .u/ D 1. The support of the distribution F is the set support.F / WD ¹u 2 U j F .u/ > 0º.

Markov chains. We view a (denumerable, discrete-time, time homogeneous) Markov chain (MC) as being given by a pair M D .S; P / consisting of a countable (or finite) set of states PS and a probabilistic transition function P W S  S ! Œ0; 1 such that, for all s 2 S , s 0 2S P .s; s 0 / D 1. P is also referred to as the transition probability matrix of M, and for s; s 0 2 S we often use the notation Ps;s 0 as an alternative to P .s; s 0 /. When jS j D n is finite, we will indeed find it convenient to view P as an .n  n/ matrix, and we will often find it convenient to view the countable (or finite) state set S

Kousha Etessami

1348

as consisting of (an initial segment of) the positive integers NC D ¹1; 2; : : :º. P is thus, by definition, a stochastic matrix, meaning it is non-negative and all its rows sum to 1. We use   S  S to denote the underlying transition relation of the Markov chain M, defined by  D ¹.s; s 0 / j P .s; s 0 / > 0º. The state set S together with  defines the underlying directed graph G D .S; / of the Markov chain M. For every s 2 S , define successors.s/ D ¹s 0 j .s; s 0 / 2 º. Clearly, for all s 2 S , successors.s/ ¤ ;, so all states have at least one successor in . We use the notation s ! s 0 as an alternative  to .s; s 0 / 2 , and we use s Ý s 0 to denote that .s; s 0 / is in the transitive closure  of , i.e., that there is a (possibly empty) directed path in G from s to s 0 . We C

k

use s Ý s 0 (respectively, s Ý s 0 ) to denote there is a directed path of positive length (respectively, of length k ) from s to s 0 . The Markov chain is called irreducible if, for all  states s; s 0 2 S , s Ý s 0 holds. In other words, irreducibility means the graph G has one strongly-connected component (SCC). Recall that an SCC is a maximal subset C  S  such that for all s; s 0 2 C , s Ý s 0 . The structure of the strongly-connected components of G plays an important role in the analysis of finite-state Markov chains M. Particularly important are bottom strongly-connected components (BSCCs). A BSCC C  S of G  is an SCC such that for all s 2 C there is no state s 0 62 C such that s Ý s 0 . For s 2 S , 0 we use Ps to denote the function Ps W S ! Œ0; 1 defined by Ps .s / WD P .s; s 0 / for all s 0 2 S . Note that, for all s 2 S , Ps defines a probability distribution on S . A Markov chain M D .S; P /, together with an initial probability distribution on states, IW S ! Œ0; 1, defines a probability space .; F; PI / where the sample space  D S ! consists of the set of infinite trajectories, or sample paths, or runs 1 of M. A trajectory  D 0 1    2  D S ! is simply an infinite word (! -word) over the alphabet S . For a finite string w 2 S  , let CM .w/ WD wS !   denote the set of trajectories that have the string w as an initial prefix. The (Borel)  -algebra F  2 of measurable events associated with trajectories of the MC M is the (unique)  -algebra generated by (i.e., the smallest  -algebra containing) all basic open sets or basic cylinders, given by ¹CM .w/ j w 2 S  º. The probability measure PI W F ! Œ0; 1, which is parametrised by the initial distribution I, is uniquely determined by specifying, as follows, the probabilities of all basic cylinders CM .w/. Firstly, for the empty string w D  , we have CM ./ D S ! D , so of course we define PI .CM .// WD 1. For any non-empty string w D w0 w1    wk 2 S C , where wi 2 S , i D 0; : : : ; k , k > 0, Q we define PI .CM .w// WD I.w0 /  kiD1 P .wi 1 ; wi /. This definition extends uniquely to all events in the  -algebra F. When the initial distribution I assigns probability 1 to a single state s , we will sometimes use Ps instead of PI to denote the associated probability measure. A more common formulation of Markov chains, encountered in the probability theory literature, is the following: a Markov chain M, together with initial distribution I, defines a discrete-time stochastic process .Xi W i 2 N/ consisting of a sequence of random variables Xi W  ! S over the probability space .; F; PI /, where each Xi 1 In the probability theory literature the word run is not often used to refer to sample paths. We use it here to highlight the close correspondence with the notion of runs in automata theory.

36. Analysis of probabilistic processes and automata theory

1349

maps a trajectory  D 0 1 2    2 S ! D  to the i -th state along that trajectory, i.e., Xi ./ WD i . Clearly, according to these definitions, P.X0 D s/ D I.s/, for all s 2 S , and furthermore, .Xi /i 2N satisfied the Markov property, i.e., for any finite sequence of states s0 ; s1 ; : : : ; sk ; skC1 , where k > 0, P.XkC1 D skC1 j X0 D s0 ; : : : ; Xk D sk /

D P.XkC1 D skC1 j Xk D sk / D P .sk ; skC1 /:

Clearly, these properties also uniquely characterise the Markov chain M (and initial distribution I), so they can alternatively be taken as the definition of the Markov chain. Let us observe here that, for any finite-state MC M with any initial distribution I with probability 1, a trajectory of M will eventually enter some bottom strongly connected component (BSCC) C  S of G and will forever thereafter stay in C . In other words, if the BSCCs of the underlying graph G of a finite-state MC M are given  W by C1 ; C2 ; : : : ; Ck , then PI jkD1 9t > 0W 8t 0 > tW X t 0 2 Cj D 1. We will sometimes wish to consider a labelled Markov chain M D .S; P; l/ where lW S ! † is a mapping that assigns to each state s 2 S a symbol l.s/ 2 † from some alphabet †. The labels on distinct states need not be distinct. Sometimes, we may wish to associate rewards ( payoffs) to states, in which case the labelling function lW S ! † assigns numerical values to states. For example, we may have † D Z. We associate with every trajectory  D 0 1 2    2 S ! of M, an ! -word l./ 2 †! over the : alphabet †, defined by l./ D l.0 /l.1 /l.2 /    . x over the probability space .; F; PI / of traFor a random variable Y W  ! R : Rjectories generated by a Markov chain M with initial distribution I, we use EI .Y / D  Yd PI , to denote the expected value of Y , assuming it exists, parametrised by initial distribution I. If I assigns probability 1 to a state s , then we typically write Es .Y / instead of EI .Y /. Example 2.1. A simple example of a labelled finite-state Markov chain M1 D .S; P; l/ with 6 states S D ¹s1 ; : : : ; s6 º is depicted in Figure 1. This 6-state MC has the transition probability matrix P D .Pi;j /i;j 2¹1;:::;6º : 0

0

B 2 B 5 B 0 P D B B 0 B @ 0 0

1 3

0 0 0 0 0

1 2 1 5 1 2 1 2

0 0

0 0

0 0

1 2 1 2

1 6 2 5

0 0 0 1

0 0 0 0 1 0

1

C C C C: C C A

Each state s has a label l.s/ 2 † D ¹a; b; cº, and these are depicted in red in Figure 1. So, for example, l.s1 / D a.

Kousha Etessami

1350 1 2

1 2

1 2

1

s4 a

s3 c 1 2

s1 a

s5

s6 c

b 1

1 2

1 5

1 6 1 3

2 5 s2 b

2 5

Figure 1. A simple 6-state labelled Markov chain, M1

Let us consider hitting probabilities, in this MC. It is clear that in the MC M1 , regardless of what state a trajectory starts in, with probability 1 the trajectory will eventually hit (reach) one of the two states s3 or s5 , and will thereafter infinitely-often   return to that state. Consider the hitting (or reachability) probabilities, qi;j , where qi;j is defined as the probability of eventually hitting vertex sj starting at vertex si , with  i; j D 1; : : : ; 6. What, for example, is the probability q1;3 for M1 in Figure 1? This 17 hitting probability happens to be 26 . How can we compute it? We will come back to this question in § 3. For finite-state MCs, such probabilities can be computed easily by solving corresponding systems of linear equations. For this example, the probabilities   .q1;3 ; q2;3 / constitute the unique solution vector to the linear system of equations in two variables .x1 ; x2 / given by x1 D 31 x2 C 21 ; x2 D 52  x1 C 51 . Hitting probabilities form a basic ingredient for many other kinds of analyses of MCs, including model checking. Markov decision processes. A (finite-state or countable-state) Markov decision process (MDP) is a tuple D D .S; .S0 ; S1 /; ; P /, where S is a (finite or countable) set of states; .S0 ; S1 / is a partition of S into random states, S0 , and controlled states S1 , i.e., S D S0 [ S1 and S0 \ S1 D ;;   S  S is a transition relation; and finally P W S0  S ! Œ0; 1 is a probabilistic transition function out of random states. For every s 2 S , define successors.s/ D ¹s 0 j .s; s 0 / 2 º. We assume that, for all states s 2 S , successors.s/ ¤ ;, so all states have at least one successor in . For each s 2 S0 , we again use Ps to denote the function Ps W S ! Œ0; 1 defined by letting Ps .s 0 / WD P .s; s 0 /. We assume that for each s 2 S0 , Ps defines a probability distribution (i.e., P furthermore 0 s 0 2S Ps .s / D 1), and that support.Ps / D successors.s/. In other words, the transitions that are assigned positive probability are precisely transitions to those states

36. Analysis of probabilistic processes and automata theory

1351

that are immediate successors of s according to the transition relation , and these probabilities must of course sum to 1. We will be focusing on either finite-state MDPs, or countably infinite-state MDPs that are finitely presented. Every specific family of MCs and MDPs that we consider is finitely-branching, meaning that for all s 2 S , the set successors.s/ is finite. Indeed, all families of MDPs that we consider are boundedly-branching, meaning there is an integer k > 1 (depending on the MDP) such that for all s 2 S , jsuccessors.s/j 6 k . An MDP represents a partially controlled stochastic process. The controller (also known as player) exerts its control by choosing a strategy (also known as policy, also known as scheduler). A strategy ( policy) is a function  that, to each string ws 2 S  S1 ending in a controlled state s 2 S1 , assigns a probability distribution on the neighbors of s , .ws/W successors.s/ ! Œ0; 1. We say that a strategy  is memoryless if .ws/ depends only on the last vertex s . In this case we can denote the strategy by a function which assigns to every state s 2 S0 a probability distribution .s/W successors.s/ ! Œ0; 1. We say that a strategy  is deterministic if, for every ws 2 S  S1 , these is some s 0 2 S such that .ws/.s 0 / D 1, in other words, .ws/ assigns probability 1 to some neighbor of s . When  is deterministic, we write .ws/ D s 0 instead of .ws/.s 0 / D 1. Likewise, for a memoryless deterministic strategy  , we write .s/ D s 0 instead of .ws/.s 0 / D 1. Strategies that are not necessarily memoryless (respectively, deterministic) are called history-dependent (respectively, randomised). Given an MDP D D .S; ; .S0 ; S1 /; P /, fixing a strategy  for the controller determines a unique Markov chain D./ D .S C ; P  / for which the set of states is S C (i.e., the non-empty strings over S ), and where, for all w; w 0 2 S  and s; s 0 2 S , 8 0 0 0 ˆ

0W X t D sj / defining the supremum probability (over all strategies) of eventually hitting  vertex sj starting at vertex si , i; j D 1; : : : ; 6. What is qmax ;5;3 for the MDP M2 in 3  Figure 2? The maximum hitting probability happens to be qmax ;5;3 D 4 and the memoryless strategy that always chooses to move from state s5 to state s1 , and from state s6 to state s2 , achieves this optimal probability. Indeed, for finite-state MDPs, there is always a memoryless optimal strategy for maximising (or minimising) the probability of eventually hitting given target states. How can we compute such probabilities? We will come back to this question in § 4. For general finite-state MDPs these maximum (minimum) probabilities can be computed by solving corresponding systems of max(min)-linear Bellman equations. Such equations can be solved in polynomial time, 2 Of course we can also simply view the labels l.s/ as assigning to each state s a pair .ys ; zs / consisting of a label from ys 2 † D ¹a; b; cº and a payoff zs 2 Z.

36. Analysis of probabilistic processes and automata theory 1

1

s3

s4

c; 7

b; 8

1 2

1353

1 12

a; 4

2 5

1 5

1 12

s6



1 6

s1

 s2

a; 6

b; 6



1 6

1 5 s5

1 5



c; 9

Figure 2. A 6-state labelled MDP, M2

using linear programming. Optimal hitting probabilities again form a basic ingredient for many other kinds of analyses of MDPs, including model checking. Quick review of Büchi automata, !-regular languages, and linear temporal logic. In order to discuss model checking problems for MCs and MDPs, we now review basic facts about and fix notations for ! -automata and linear temporal logic, which are topics covered in more detail in Chapter 6 of this Handbook. Two standard formalisms for specifying languages of ! -words are Büchi automata and linear temporal logic. A Büchi automaton (BA) is given by a tuple B D .Q; †; q0 ; ı; F /, where Q is a finite set of states, † is a finite alphabet, q0 2 Q is an initial state, ı  Q  †  Q is a transition relation, and F  Q is a set of accepting states. We can assume without loss of generality (if necessary, by adding an extra dummy state) that the transition relation ı is total in the sense that for every state q 2 Q and every letter a 2 † of the alphabet there is some state q 0 2 Q such that there is a transition .q; a; q 0 / 2 ı . The Büchi automaton is called deterministic if for every state q and every a 2 Q0 there exists at most one state q 0 such that .q; a; q 0 / 2 ı . Otherwise, it is nondeterministic. A run of B is a sequence  D q0 v0 q1 v1 q2    of alternating states qi 2 S and letters vi 2 †, i > 0, such that for all i > 0 .qi ; vi ; qi C1 / 2 ı . The ! -word associated with run  is L./ D v0 v1 v2    2 †! . The run  is accepting if for infinitely many i , qi 2 F . We define the ! -regular language associated with B by L.B/ D ¹L./ j  is an accepting run of Bº. Note that L.B/  †! .

Kousha Etessami

1354

It is well known that any ! -regular language can be described as the language of ! -words associated with a (nondeterministic) Büchi automaton. Indeed, we can take this as the definition ! -regular languages. However, unlike the fact that deterministic finite automata (DFAs) suffice to capture all regular languages over finite strings, deterministic BAs do not suffice for expressing all ! -regular languages. For example, the language of ! -words over the alphabet ¹a; bº that contains only a finite number of b ’s can not be described by any deterministic BA. To capture all ! -regular languages using deterministic automata, we need more sophisticated acceptance conditions, like Müller, Rabin, Streett, or Parity acceptance conditions (see Chapter 6). In particular, the standard subset construction, which when applied to any (nondeterministic) finite automaton yields a deterministic finite automaton that accepts the same language of finite strings, does not work for ! -automata: it may yield an ! automaton that accepts a strictly larger language of ! -words. Remarkably however, it turns out that in a certain sense the standard subset construction does work for the purpose of model checking of ! -regular properties of labelled Markov chains and Markov decision processes. This is one of several key insights first revealed in the tour-de-force papers by Courcoubetis and Yannakakis [11], [12], [13], and [14]. These papers also established the best complexity bounds available (and best possible, subject to complexity-theoretic assumptions), for model checking finite-state Markov chains and MDPs. We will highlight some of these results. Another major insight in the papers by Courcoubetis and Yannakakis relates to model checking linear temporal logic properties of MCs. Recall that linear temporal logic (LTL) [37] formulas are built from a finite set Prop D ¹P1 ; : : : ; Pk º of propositions, using the usual Boolean connectives, :; _, and ^, the unary temporal connective next (denoted ) and the binary temporal connective until (U); thus, if ; are LTL formulas, then  and  U are also LTL formulas, as are both : and  _ , as well as  ^ . This constitutes an inductive definition of temporal formulas. Note that other useful temporal connectives can be defined using U. The formula True U means “eventually holds” and is abbreviated Þ . The formula :.Þ: / means “always holds” and is abbreviated  . An LTL formula specifies a language of ! -words over the alphabet † D 2Prop , as follows. If w D w0 ; w1 ; w2 ; : : : 2 †! is an ! -word, and ' is an LTL formula, then first we define satisfaction of the formula by w at position i , where i > 0, denoted .w; i / ˆ ' . We define this inductively on the structure of the formula ' as follows:      

.w; i / ˆ p for p 2 Prop if and only if p 2 wi ; .w; i / ˆ : if and only if not .w; i / ˆ  ; .w; i / ˆ  _ if and only if .w; i / ˆ  or w; i ˆ ; .w; i / ˆ  ^ if and only if .w; i / ˆ  and w; i ˆ ; .w; i / ˆ  if and only if .w; .i C 1// ˆ  ; .w; i / ˆ  U if and only if 9j > i W ..w; j / ˆ and 8k.i 6 k < j /W .w; k/ ˆ /.

36. Analysis of probabilistic processes and automata theory

1355

The ! -language specified by an LTL formula ' , is L.'/ WD ¹w 2 †! j .w; 0/ ˆ 'º. The language specified by every LTL formula is ! -regular, and in fact any LTL formula can be converted to an equivalent (albeit, exponentially bigger) nondeterministic Büchi automaton that accepts the same language, see, e.g., [42] and Chapter 6. 2.1. Some important analysis problems for MCs and MDPs. We now formally define a variety of important algorithmic analyses that one might wish to perform on MCs and MDPs. Given an MDP D, and initial distribution I, and a strategy  , let Xi denote the random variable that assigns to a trajectory  of the Markov chain D./, the state Xi ./ D si 2 S of D that is visited by the play at time i (in other words, Xi ./ D si if i D wsi for some w 2 S  and si 2 S ). The controller’s goal is to optimise the (expected) value of some random variable, or the probability of some event, both of which could be a function of the entire random trajectory. There are a wide variety of objectives that have been studied in the MDP literature. We now list some important analyses that have been considered. Note that all of the analyses listed below are also applicable to purely stochastic Markov chains, because MCs are just special cases of MDPs, where there are no controlled nodes. In other words, in an MC the controller has only one (vacuous) strategy, which is to do nothing. I. MP: m ea n payo f f.3 The labelling function l is a payoff function lW S ! Q which associates to every state s a (rational valued) payoff 4 l.s/ 2 Q. The goal of the controller is to maximise 5 the expected mean payoff of the play  D s0 s1 s2 s3    ; Pn 1  l.Xi /  : EI lim inf i D0 n!1 n Note that in the case of irreducible finite-state MCs, mean payoff analysis subsumes, as a very special case, computation of the invariant (stationary) distribution of the MC. Recall, the invariant distribution for an irreducible MC, M D .S; P /, with S D ¹1; : : : ; nº, is the unique probability distribution  on P states, given by a non-negative row vector  D .1 ; : : : ; n / with i i D 1, such that P D . When the finite-state MC is ergodic (irreducible and aperiodic), the invariant distribution  is the steady-state distribution, giving the long-run probability of being in any particular state, regardless of the initial distribution. Consider a state j 2 S , and consider the following labelling of the states of M with payoffs: let l.j / WD 1, and for all other states j 0 2 S n ¹j º, Pn 1 l.X /  let l.j 0 / WD 0. Then j D Ej limn!1 i D0n i .

3 This objective is also known as the limiting-average payoff objective in the MDP literature. 4 We restrict to rational payoffs in Q, rather than payoffs in R, for computational reasons. We wish to analyze the complexity of algorithms also in terms of the encoding size of the input coefficients. 5 Note that maximising expected mean payoff (or discounted payoff) when payoffs can be both positive and negative rational values is computationally equivalent to minimising expected mean payoff, because minimising the mean payoff amounts to maximising the mean payoff when all payoffs labelling states are negated.

Kousha Etessami

1356

II. DTP: d i s co u n t ed t o ta l payo f f. Given a payoff function lW S ! Q labelling the states and given a rational discount factor 0 < ˇ < 1, the goal is to maximise the expected discounted total payoff n   X ˇ i l.Xi / : EI lim n!1

i D0

The limit in the expression exists under mild conditions on the MDP (e.g., it suffices if the payoffs labelling states are bounded in absolute value). Discounted payoff objectives play an important role, e.g., in economics and finance, where the discount factor ˇ can often be viewed as being given by the rate of inflation, i.e., the rate at which the present value of money depreciates over time. III. NTP: no n -n egat i v e t o ta l payo f f. There is no discount, the states are labelled by non-negative payoffs lW S ! Q>0 . The goal is to either maximise or minimise the expected total reward, which may in general be C1: n   X l.Xi / : (1) EI lim n!1

i D0

Sometimes the structure of the MCs or MDPs implies that this expectation is finite. Analyzing expected non-negative total reward includes, as a special case, analysis of the expected hitting time of a set of target states. Consider an MDP M D .S; .S0 ; S1 /; P; / with a set F  S . Turn all target states in F into absorbing random states, meaning re-define the random states as S00 WD S0 [ F , and the controlled states as S10 WD S1 n F , and let Ps;s WD 1 for s 2 F . Define the payoff labels at states as follows: for s 2 F , let l.s/ WD 0; for s 2 S n F , let l.s/ WD 1. Let HF denote the random variable (family) defining the hitting time ofPthe target set  F . Then, clearly, for every strategy  , EI .HF / D EI limn!1 niD0 l.Xi / . IV. HP: h i t t i ng p ro ba b i li t y o f d es i r ed ( o r u n d es i r ed ) ta rg et stat es . Given a set of target states F  S , the goal is to maximise (or minimise) the probability of eventually hitting a state s 2 F . In other words, we wish to choose a strategy  to maximise, or minimise: PI .9i > 0W Xi 2 F /. Let us denote the supremum and infimum of these probabilities by   qmax ;I;F  sup PI .9i W Xi 2 F / 

and   qmin ;I;F  inf PI .9i W Xi 2 F /: 

It need not in general be the case that there exists any optimal strategy     such that qmax ;I;F D PI .9i W Xi 2 F /; likewise, for infinitely-branching   MDPs, there need not exist any strategy   such that qmin ;I;F D PI .9i W Xi 2F /.

36. Analysis of probabilistic processes and automata theory

1357

Indeed, one can easily construct examples of infinite-state MDPs where no optimal strategy exists for maximising/minimising the probability of hitting a set of target states.6 In such cases, there only exist  -optimal strategies, for every  > 0. The objective of optimising hitting probability can also be easily reformulated as a special case of NTP, i.e., of optimising expected total non-negative reward, as follows. Remove all out-going transitions from states in F and replace them with a single transition from each state in F to a new state s  . Let .s/ D 1 for all s 2 F and let .s/ D 0 otherwise. Then the goal of maximising/minimising the probability of eventually hitting the target states F is equivalent to the goal of maximising/minimising the undiscounted expected total non-negative payoff, when the payoffs labelling the states are given by . However, the ability to label non-absorbing states with reward 0 is crucial for this. In fact, in some MDP settings, analysing expected total reward when all non-absorbing states are labelled by strictly positive rewards is substantially easier than analyzing hitting probability (see, in particular, [22] for an example). V. MoCh: m o d el c h ec k i ng o f ! - r eg u la r o r LT L p ro p ert i es . Given a labelled MDP D D .S; ; P; l/ and initial distribution I, where lW S ! †, and given an ! -regular language L over the alphabet †, specified by giving a Büchi automaton B or LTL formula ' , so that L D L.B/ or L D L.'/, the goal of the controller is to choose a strategy  so as to maximise (or minimise) the probability that the trajectory  of D./ generates an ! -word l./ 2 L. In other words, we can associate with the ! -regular language L the corresponding event (family) EL ./ D ¹ 2 ./ j l./ 2 Lº in the probability space generated of trajectories of the MC D./ generated by the MDP D and the strategy  . It can be checked that, regardless of what the strategy  is used, for any ! -regular language L the property EL ./ does indeed constitute an event in the  -algebra F./. (This was noted already, e.g., in [41].) When it is clear from the context, we overload notation and use L to refer to the event family EL ./ parametrised by the strategy  . The goal of model checking 7 for MDPs is thus to maximise (or minimise) the probability PI .L/. For analyses like HP and MoCh, which involve computing the (optimal) probability of some event, the associated computational problems can be further subdivided and classified as either qualitative or quantitative analyses, as we now discuss. Sometimes we may not need to know the (optimal) probability of the event in question and we may instead just be satisfied to know whether or not the event holds almost surely, i.e., with (maximum) probability 1 or, equivalently, whether the complement 6 In the case of minimisation, such examples require infinitely-branching infinite-state MDPs, but for maximisation simple finitely-presented boundedly-branching infinite-state MDPs suffice to show that no optimal strategy for hitting the target states exists. 7 In the context of MDPs, as phrased here, this is an optimisation problem, and not a decision problem, so the word “model checking” is a bit of a misnomer. But we will adhere to this terminology.

1358

Kousha Etessami

event has (infimum) probability 0. These constitute what are generally referred to as qualitative analyses, whereas quantitative analyses involve computing the (optimal) probability of the event in question. However, particularly for MDPs, there are subtle distinctions between different forms of qualitative analysis, and also between different forms of quantitative analysis. In some settings these distinctions can make a big difference in terms of the computational complexity of the problems involved. So we now examine these distinctions more carefully. 1. Q ua li tat i v e a na lys i s o f M C s a n d M D P s . Given an (MC or) MDP D and an initial distribution I for an event E (again, strictly speaking, a family of event E./ in the respective probability spaces of trajectories of the MCs D./ parametrised by the strategy  ), and for a set ‰ of strategies constraining the strategies that the controller may use (e.g., ‰ may simply be all strategies, or only memoryless ones, or deterministic ones, etc.), consider the following decision problem: decide whether 9 2 ‰W PI .E/ D 1:

(2)

This decision problem is referred to as the qualitative almost-sure decision problem for the event E (and with respect to the strategy constraint ‰ ). This x D 0, problem is of course equivalent to asking whether 9 2 ‰W PI .E/ x where E D  n E denotes the complement event. (Again, strictly speaking, x E./ D ./ n E./ is a family of events parametrised by  .) If such a strategy  exists, then we may also want to compute (some representation of) such a strategy, in which case this is no longer just a decision problem. A closely related, but in general not equivalent, problem is decide whether

sup PI .E/ D 1:

(3)

2‰

This is referred to as the qualitative limit-sure 8 decision problem for the event E . Although the almost-sure and limit-sure decision problems are related, and although they are obviously equivalent if the model is simply a Markov chain, these problems are certainly not equivalent for all MDPs, because as already discussed in relation to HP, in general there need not exist any optimal : strategy  that achieves probability 1 for the event HitF D .9i W Xi 2 F /, and yet there may exist a sequence of strategies 1 ; 2 ; 3 ; : : :, which achieve  probabilities arbitrarily close to 1. For example, we could have PI i .HitF / D i 1 1=2 . In such a case, the limit-sure condition (3) holds while the almost-sure condition (2) does not. 8 The term limit-sure was first used in [15], where they considered the distinct almost-sure and limitsure decision problems in the context of concurrent (stochastic) reachability games. As we shall see, the distinction between almost-sure and limit-sure qualitative analyses is relevant in various other contexts, including for important classes of finitely-presented infinite-state MDPs.

36. Analysis of probabilistic processes and automata theory

1359

We also in general need to consider, as distinct qualitative problems for MDPs, the following duals of the above problems, which are not in general equivalent, namely, decide whether 8 2 ‰W PI .E/ D 1: This is of course the complement of deciding whether 9 2 ‰W PI .E/ < 1, which is equivalent to x > 0: decide whether 9 2 ‰W PI .E/

(4)

Note however that in this dual setting there is no distinction between the almost-sure and the limit-sure cases. The above problems are also equivalent to x deciding whether inf 2‰ PI .E/0 . We refer to problem (4) as the qualitative witness-positivity 9 decision problem for (the family of) events Ex . Let us also mention some “qualitative” problems that can be associated with objectives such as NTP, where the objective is optimise the expected total non-negative payoff. It is possible, for example, that 9 2 ‰W EI



lim

n!1

holds true, or else that

sup EI 2‰



lim

n!1

n X i D0

n X i D0

 l.Xi / D C1

 l.Xi / D C1:

Again, the latter may hold true while the former does not, because there may be no optimal strategy. These problems are clearly analogous to the almost-sure and limit-sure qualitative decision problems for the probability of an event E . We will call them the qualitative witness-infinity problem and the qualitative limit-infinity problem for the expectation of the associated random variable (family) Y . In many settings, such “qualitative” problems are not relevant because the random variable Y is guaranteed to have bounded expectation. For example, this holds for finite-state MDPs with MP and DTP, namely mean payoff and discounted total payoff objectives. 2. Q ua n t i tat i v e a na lys i s o f M C s a n d M D P s . Quantitative analysis problems can be considered for I–V on on our list, and not just for those relating to the (optimal) probability of an event. In general, for quantitative analysis we want to compute the optimal (supremum or infimum) expected value of some random variable family Y or the optimal probability of some event family E . However, it may not always be possible to compute the quantity in question exactly. This may be because of the computational complexity doing so. It may also be because of a more basic reason: in a variety of stochastic models we can consider, the optimal (supremum or infimum) value over all  2 ‰ may be irrational, even when all of the finite data describing the Markov chain or 9 Or witness-less-than-one, where appropriate.

1360

Kousha Etessami

MDP consists of rational values. In such cases, we can still consider approximating the optimal value within some desired error bound, or deciding whether the optimal value is at least a given rational value r 2 Q. Again, there are some subtle distinctions, so let us formulate these problems more precisely. a. Q ua n t i tat i v e d ec i s i o n p ro b lem s . Given an MDP D, an initial distribution I, and some event (family) E , and given a rational value r 2 Q, decide whether 9 2 ‰W PI .E/ > r; (5) Or, if the objective is to optimise the expected value of a r.v. Y , we may want to decide whether or not 9 2 ‰W EI .Y / > r . Of course, if such a strategy  exists, we may also wish to compute (some representation of) such a strategy. A different decision problem is decide whether 9 2 ‰W PI .E/ 6 r;

(6)

Analogously, decide whether 9 2 6 r. Note that (5) is concerned with the goal of maximising the probability of the event E (or expectation of the r.v. Y ): does there exist a strategy that obtains a value of at least r ? Whereas, (6) is concerned with the goal of minimising the probability of E (or expectation of r.v. Y ): does there exist a strategy that obtains an value of at most r ? Sometimes, the above decision problems are too hard computationally, whereas the corresponding approximation problems are not as hard. b. Q ua n t i tat i v e  - a p p rox i m at i o n p ro b lem s . We are given an MC or MDP, D, and initial distribution I, some event (family) E whose probability we are interested in, or a random variable (family) Y whose expectation we are interested in. Let ‰W EI .Y /

v  D sup PI .E/ 2‰

or

v  D inf EI .Y /; 2‰

in the respective cases. We are also given a rational positive error threshold  > 0. We wish to Compute an  -approximate value 10 v 2 Q such that jv 

vj < :

(7)

We may then also wish to compute (a representation of) an  -optimal 0 0 strategy: a strategy  0 such that jv  PI .E/j <  or jv  EI .Y /j <  , respectively. 2.2. More examples of analyses for finite-state MCs and MDPs. We now reconsider the example MC and MDP given in Figures 1 and 2, and consider other analyses for these. Example 2.3. Let us consider again the labelled 6-state finite-state Markov chain M1 D .S; P; l/ depicted in Figure 1, and let us consider some other analyses for that MC. 10 Such an  -approximation may be impossible with v 2 Q, e.g., because v  D sup 2‰ E I .Y / D 1.

36. Analysis of probabilistic processes and automata theory

1361

MoCh. Consider the following model checking problem. The LTL formula  Þ b , expresses the property that the symbol b occurs infinitely often in the ! -word. What is the probability Ps1 .L. Þ b//? It is not difficult to see, by inspection of M1 , that Ps1 .L. Þ b// is precisely equal to the probability of eventually hitting state s5 starting  . Furthermore, since we know that in state s1 . In other words, Ps1 .L. Þ b// D q1;5 starting from state s1 , with probability 1 we will eventually hit either state s3 or s5 , i.e.,    that q1;3 C q1;5 D 1, and since we have already noted that q1;3 D 17 , we can conclude 26 9 that Ps1 .L. Þ b// D 26 . Note that in this case model checking was boiled down to computing hitting probabilities. The general algorithms for model checking Markov chains against ! regular properties are much more involved, but as we shall see they also ultimately reduce the problem to computing hitting probabilities on certain associated Markov chains. MP. Now let us use hitting probabilities to do mean payoff analysis on the MC M1 . In

particular, suppose that the labels on states are associated with payoffs, as follows: Let

a WD 4;

b WD

3;

c WD 7:

Pn 1  l.Xi /  vi D Esi lim inf i D0 n!1 n denote the expected mean payoff when starting in state si . In the MC M1 , what is v1 ? Let G1 denote the underlying graph of M1 . The two BSCCs of G1 are C1 D ¹s3 ; s4 º and C2 D ¹s5 ; s6 º. Clearly, starting in state s1 of M1 , with probability 1 we will eventually hit one of these two BSCCs and stay in that BSCC forever thereafter. We already 17  , and that we will hit know that we will eventually hit C1 with probability q1;3 D 26 9  C2 with probability q1;5 D 26 . Note that the MC defined by restricting M1 to the nodes of BSCC C1 is ergodic, and that its unique steady-state distribution is clearly 1 1 of C2 is 2 ; 2 . Likewise, although the MC defined by restricting M1 to the nodes  1 1 not ergodic, it is irreducible, and its unique invariant distribution is 2 ; 2 . In other words, in the case of both BSCCs C1 and C2 , once we enter such a BSCC, in the long run we spend 12 the time in each of the two states of that BSCC. Thus v1 , the longrun mean payoff starting in s1 , can be calculated  state  via the following expression: 9 17  12  7 C 12  4 C 26  12  3 C 12  7 D 217 v1 D 26 52 .

Example 2.4. Now let us reconsider the 6-state labelled MDP M2 D .S; .S0 ; S1 /; ; P; l/, with states S D ¹s1 ; : : : ; s6 º, depicted in Figure 2.

MoCh. Consider, in particular, the following model checking problem. What is the supremum probability sup Ps5 .L. Þ b//? It is not difficult to see, by inspection of M2 , that regardless what strategy  is used, Ps5 .L. Þ b// is precisely equal to the probability Ps5 .9i W Xi D s4 / of eventually hitting state s4 starting at state s5 . It can furthermore be seen that the probability of hitting state s4 is maximised by the simple memoryless strategy   that always moves to state s2 whenever in state s5 and always moves to state s4 whenever in state s6 .

1362

Kousha Etessami

And, furthermore the (maximum) probability that this strategy achieves of eventually  hitting state s4 is 13 . In other words, sup Ps1 .M1 ˆ Þ b/ D P .M1 ˆ Þ b/ D 22 13  qmax ;5;4 D 22 . This example is too simple in at least one sense: the maximum probability in this case is attained by a deterministic memoryless strategy, but in general for obtaining the maximum probability of an LTL or ! -regular property on a finite-state MDPs it need not suffice to use a deterministic memoryless strategy (in particular, memory may be required). MP. Finally, let us consider the mean payoff objective on the MDP M2 in Figure 2,

where the aim is to maximise the expected limiting (lim inf of the) average payoff per step, where the one-step reward at state s is given by the function r.s/. In other Pn 1 r.X /  words, the aim is to maximise EI lim inf n!1 i D0n i . Note that in the MDP M2 regardless of what strategy is employed by the controller, with probability 1 the trajectory will eventually enter one of the two states s3 or s4 , and stay there forever thereafter. Once it is in one of these two states, the (expected) limiting average payoff thereafter is simply the payoff at that state, which is r.s3 / D 7 for state s3 and and r.s4 / D 8 for state s4 . Thus, since r.s4 / > r.s3 /, in order to maximise the expected mean payoff starting at any other state, we simply need to maximise the probability of eventually hitting state s4 . We already know from our previous calculations that, 13 , and starting at state s5 , the maximum probability of eventually hitting state s4 is 22 this is achieved by the deterministic memoryless strategy that always moves from state s5 to state s2 , and from state s6 to state s4 . Thus the maximum expected mean  13 payoff is 22 8 C 1 13  7 D 167 22 22 , and this is achieved by the same deterministic memoryless strategy. For finite-state MDPs, it is always the case that there exists an optimal deterministic memoryless strategy for maximising the expected limiting average payoff (see, e.g., Theorem 9.1.8 in [38]), and one can compute the optimal limiting average payoff, and an optimal memoryless strategy, in polynomial time using linear programming (see, e.g., Chapters 8 and 9 of [38]).

3. Analysis of finite-state Markov chains In this section we review some algorithms for analyzing finite-state MCs, and discuss their complexity. Let us already summarise the known facts: for all of the analyses I–V listed in § 2.1, all qualitative and quantitative decision and computation problems are solvable in strongly 11 polynomial time, as a function of the encoding size of the given 11 Recall that a problem whose input instances are represented by a vector of rational values is said to be solvable in strongly polynomial time if the problem can be solved by an algorithm that both: (i) runs in polynomial time, as a function of the dimension n of the input vector, in the unit-cost (arbitrary precision) arithmetic RAM model of computation, where standard arithmetic operations ¹C; º on, and comparisons of, arbitrary rational numbers require unit-cost, and (ii) runs in polynomial space as a function of the encoding size of the input vector, where the rational coordinates are encoded as usual, with numerator and denominator given in binary.

36. Analysis of probabilistic processes and automata theory

1363

finite-state MC M. For qualitative analyses, the algorithms only involve graph-theoretic analysis of the underlying transition graph G of the MC, M. For quantitative analyses, the algorithms additionally involve solving corresponding systems of linear equations. For model checking (MoCh) the complexity is polynomial in the encoding size of M but exponential in the encoding size of the ! -regular language L and remarkably this is so whether L is specified by a non-deterministic Büchi automaton (BA) B, or as an LTL formula ' (as shown by Courcoubetis and Yannakakis in [11] and [13]). This is despite the fact that worst-case exponential blow-up is unavoidable when translating LTL formulas to BAs. We shall only discuss III–V in more detail. We will also observe that some key facts used for analyzing finite-state MCs hold more generally, for all denumerable MCs. Suppose we are “given” a MC M D .S; P /, where for now we allow the set S to be countably infinite. Later, for computational purposes, we will assume S is finite. For convenience, we equate S with (an initial segment of) the positive natural numbers NC D ¹1; 2; : : :º. We let n  jS j. Thus, if n 2 NC , then S D ¹1; : : : ; nº, and otherwise if n D 1 (i.e., if n D ! ), then S D NC . HP. Suppose we are “given” a subset F  S of target states, and suppose we wish to compute the probabilities, qi , of eventually hitting a target state in F starting from : initial state i 2 S . In other words, qi D Pi .9t > 0W X t 2 F /. We first observe that hitting probabilities for a denumerable MC can be “computed” by “solving” the following linear system of equations (albeit, with infinitely many equations, if there are infinitely many states). There is one variable, xi , and one equation, for every state i 2 S : 8 1 for all i 2 F , ˆ 0 ! Rn>0 defines a monotone mapping from non-negative vectors to non-negative vectors. That is, for all x > y > 0, we have R.x/ > R.y/ > 0. It is easy to see that the hitting probabilities q  D .qi W i 2 S / must be a solution of x D R.x/ if i 2 F , then clearly qi D 1, and if i 2 S n F , then clearly P . Indeed,   qi D j 2S Pi;j qj , because starting at i 62 F , in order to eventually hit F , we first P have to take one step and thereafter eventually hit F , and j 2S Pi;j qj captures the probability of eventually hitting F after one step, starting at i .

1364

Kousha Etessami

Unfortunately, in general the equations x D R.x/ can have multiple solutions, for trivial reasons. To see this, consider the trivial 2-state Markov chain with states S D ¹1; 2º, with transition probabilities defined by P1;1 D P2;2 D 1, and Pi;j D 0 for i ¤ j and where the target state is F D ¹1º. The equations x D R.x/ are thus given by (x1 D 1; x2 D x2 ). Obviously, any pair .1; r/ for r 2 R is a solution. It turns out the hitting probabilities q  D .qi W i 2 S / are always the least nonnegative solution of x D R.x/, which is the least fixed point (LFP) of the monotone operator RW Rn>0 ! Rn>0 . Let us state this more precisely. For a vector y 2 Rn and k > 1, let R0 .y/ D y ; for k > 1, let RkC1 .y/ D R.Rk .y//. For any k > 0, let qik denote the probability of hitting target set F starting in initial state i , in at : most k time steps. In other words, qik D Pi .9t.0 6 t 6 k/W X t 2 F /. Note that limk!1 qik "D qi , meaning qik converges monotonically from below to q  , as k ! C1. Let q k D .qik W i 2 S / denote the corresponding vector. We shall use 0, or just 0, to denote an all-zero vector of the appropriate dimensions, when this is clear from the context. The following key Proposition 3.1, is well known and easy to prove: part (1) can be proved by induction on k , and the rest follows. (We will later learn that variants of Proposition 3.1 hold in much more general settings, when the symbols in the proposition are interpreted differently.) Proposition 3.1. 1. For all k > 0, q k D RkC1 .0/, and thus RkC1 .0/ 6 q  and limk!1 Rk .0/ "D q  . 2. q  D R.q  / and if q 0 2 Rn>0 and q 0 D R.q 0 /, then q  6 q 0 . In other words, q  is the least fixed point (LFP) of R.x/.

Now suppose that M D .S; P / is a finite-state Markov chain, so n  jS j < 1, and that we are given the transition probability matrix P explicitly. How can we use Proposition 3.1 to compute the hitting probabilities q  ? We have to compute the least non-negative solution to the linear system of equations x D R.x/. One (not very efficient) way to do this in polynomial time is to formulate this as a linear programming problem. Namely, the vector q  is the unique optimal solution to the following LP: X minimise: xi I (9) i 2S subject to: R.x/ 6 x; x > 0: Note that the inequality R.x/ 6 x stands for a system of inequalities Ri .x/ 6 xi , i 2 S , and likewise x > 0 stands for xi > 0, i 2 S . Although this already shows we can compute q  in P-time, we can do much better. Namely, let us denote by G D .S; / the underlying directed graph of the MC M. Note that qi D 0 if and only if there is no path in G from i to any state j 2 F . We can thus easily compute the set SZero D ¹i j qi D 0º in P-time by a simple depth-first search in G . We can then remove the equations corresponding to variables xi , i 2 SZero , from the system of equations x D R.x/, and replace occurrences of variables xi 2 SZero by 0 on the right hand side of any other equations xj D Rj .x/ where they occur.

36. Analysis of probabilistic processes and automata theory

1365

For convenience in what is to follow, we also remove the variables xi for i 2 F , and their equations xi D 1, and replace the occurrences of variables xi 2 F by 1 on the RHS of any other equations xj D Rj .x/ where they occur. y x/ This gives us a new system of linear equations xO D R. O in fewer variables. It turns out that this new system has a unique solution, corresponding to the remaining (positive) coordinates of q  and, furthermore, if the equation is written in matrix notation as xO D Py xO C b , then the matrix .I Py / is guaranteed to be invertible and the (positive) coordinates of q  that were not eliminated are given by the solution .I Py / 1 b . Thus, we can compute q  in (strongly) polynomial time by first doing some simple graphtheoretic analysis on G , and then solving a linear system of equations. We note that it follows from basic facts in matrix theory that .I Py / 1 D P1 k y this to put a probabilistic interpretation on the calculation kD0 P . We can use P1 yk yk q  D .I Py / 1 b D kD0 P b . Note that Pi;j D Pi .Xk D j / is the probability y derived from M, which excludes all states in SZero [ F , that, in a Markov chain M and replaces them with dead-end absorbing states, starting in state i , at time k the trajectory is in state j . Thus, for k > 0, .Py k b/i is the probability of entering a state in F for the first time at time k C 1. It is thus clear, by a probabilistic argument, that P yk q D 1 Py / 1 b . kD0 P b D .I A more basic method for computing q  numerically is already immediately suggested by Proposition 3.1, and it “works” even for infinite-state MCs. Namely, we can simply iteratively compute a sequence of vectors y k D Rk .0/, k D 0; 1; : : :, letting y 0 WD 0, and y kC1 WD R.y k /. By Proposition 3.1, the sequence y k D Rk .0/ converges monotonically to q  . This well-known method is called value iteration. Of course, one issue is that we do not a priori know how many iterations of value iteration are required as a function of the input matrix P in order to converge to within a desired error bound of the vector q  . It turns out that in the worst case there are bad examples for finite-state MCs, where convergence of value iteration can be very slow. For example, consider the MC M D .S; P /, where S D ¹1; : : : ; nº, and where the target set is F D ¹nº, and where for all i 2 ¹1; : : : ; n 1º, Pi;1 D 21 , Pi;i C1 D 21 , and Pn;n D 1, and all other transition probabilities are of course0. Note that qi D 1 for all i 2 S . Now by Proposition 3.1, 2n  qik D Rik .0/. However, it can be seen that, for all k 6 2n , q1k 6 1 1 21n 6 1 1e , where e D 2:71828 : : : is the base of the natural log. Thus, we need at least k > 2n value iterations before jR1k .0/ q1 j 6 31 6 1e . However, value iteration works reasonably well on many instances of MCs, and optimised variants of it are widely used in practice (also for MDPs).

NTP: Let us now consider non-negative total payoff analysis of MCs which, as already noted, generalises hitting probability analysis. We shall now reuse symbols q  and R.x/, with a different interpretation, for reasons that will become clear shortly. Suppose we have a non-negative payoff-labelled MC, M D .S; P; l/, with n  jS j states (possibly infinite), and with lW S ! N. We wish to compute  P qj D Ej limk!1 kiD0 l.Xi / . We can again write a linear system of equations for

1366

Kousha Etessami

this, with one equation per variable xi , over variables x D .xi j i 2 S /, as follows: P xi D l.i / C j 2S Pi;j  xj ; for all i 2 S : (10)

We can again denote this system of linear equations, in vector notation, as x D R.x/. Since l.i / > 0, the operator R.x/ is again monotone, and it turns out that again the vector q  of expected total payoffs is the least non-negative solution of x D R.x/, except with the difference that we now must also allow for the possibility that some coordinates of q  are C1. Formally, we can work over the ordered semi-ring x >0 D R>0 [ ¹C1º, where by definition C1  0 D 0, C1 C r D C1, and C1 > r , R  Pk y >0 . Let q k D Ej for all r 2 R t D0 l.X t / . Then, j

Proposition 3.2. The statement of Proposition 3.1 holds true, verbatim, for the above re-interpretations of x D R.x/, q  , and q k .

Thus the expectation vector q  is the least fixed point of the monotone operator x x n . Thus, by Proposition 3.2, the value iteration y k WD Rk .0/ converges RW Rn>0 ! R >0 monotonically to the expected total payoff vector q  . However, since some coordinates of q  may now be C1, the value iterates y k may never actually get “close enough” to q  . We can nevertheless again compute expectations q  in strongly polynomial time for finite-state MCs, including determining those coordinates that are C1, using a variant of what was described earlier for computing hitting probabilities. First, consider the underlying graph G D .S; / of M. For any bottom-SCC C  S of G , if there is some j 2 C such that l.j / > 0, then clearly vj0 D C1 for all j 0 2 C and for all j 0 2 S  such that j 0 Ý j . Indeed, this describes all states such that vj 0 D C1, because with probability 1 the trajectory will eventually hit some BSCC, and thereafter stay in that BSCC forever. We can thus use depth-first search to decompose G into its DAG of SCCs and find and remove from the equations x D R.x/ any variable xi such that qi D C1. Likewise, by simple reachability analysis on G we can find and remove all variables xi such that qi D 0, by just noting that qi D 0 if and only if there is no state  j 2 S such that both l.j / > 0 and i Ý j . After we remove, as indicated, both C1 and 0 variables from the equations, we are left either with an empty list of equations or a system of linear equations on the remaining variables whose unique solution is a positive real-valued vector that yields the remaining coordinates of the vector q  . We can thus compute these remaining coordinates (in strongly polynomial time) by solving the remaining linear equations. MoCh. Suppose we are given a labelled finite-state MC M D .S; P; l/ an initial distribution I, and a Büchi automaton B D .Q; †; q0 ; ı; F /. Suppose we wish to compute the probability p  D PI .L.B//. We now describe an algorithm for computing p  , due

to Courcoubetis and Yannakakis [13], which runs in time polynomial in the encoding size jMj of M, and exponential in the encoding size jBj of B. We can assume, without loss of generality that † D S , i.e., the alphabet of B is the set of states of M. We can do so because we can always update the transition relation ı of B, refining it so that if .q; a; q 0 / was in ı , and for some s 2 S we have l.s/ D a,

36. Analysis of probabilistic processes and automata theory

1367

then we put .q; s; q 0 / in the new transition relation. It is clear that the probability that M generates a trajectory accepted by the new BA is the same as the probability that M generates a trajectory labelled by an ! -word in L.B/. So from now on, we assume † D S. We first perform a naive subset construction on the BA B to obtain a deterministic BA. Recall however that the subset construction does not, in general, preserve the ! -regular language of a BA, and that in fact ! -regular languages accepted by some nondeterministic BAs are not accepted by any deterministic BA. Nevertheless, it was shown by [11] and [13] that the subset construction “works” in a suitable way for probabilistic model checking. Let B0 D .2Q ; †; ¹q0 º; ı 0 ; F 0 / be the deterministic BA obtained by performing the usual subset construction on B. The states of B0 are 2Q , the alphabet is † D S , the start state is ¹q0 º, and ı 0  2Q  †  2Q is a deterministic transition relation defined by ı 0 WD ¹.T; a; T 0 / j T 0 D ¹q 0 2 Q j 9q 2 T W .q; a; q 0 / 2 ıº. Finally, we let F 0 D ¹T  Q j T \ F ¤ ;º. Next, we define the product MC, M ˝ B0 D .S  2Q ; Pz / obtained from the MC M and the deterministic Büchi automaton B0 . The states of M ˝ B0 are pairs .s; T /, where s 2 S and T 2 2Q , The transition probability function Pz is defined as follows: ´ P .s; s 0 / if .T; s 0 ; T 0 / 2 ı 0 , 0 0 Pz..s; T /; .s ; T // D 0 otherwise. Note that M ˝ B0 is indeed an MC, whose trajectories are a refinement of the trajectories of M. In particular, projecting a trajectory  2 .S  2Q /! on to its left coordinates yields a trajectory of M. Let GM˝B0 denote the underlying directed graph of the MC M ˝ B0 . Finally, for a pair .s; T / 2 S  2Q , which defines a state of M ˝ B0 , and thus also a node of GM˝B0 , let GM˝B0 ..s; T // denote the directed subgraph of GM˝B0 induced by the set of nodes consisting of all of the nodes .s 0 ; T 0 / 2 S  2Q of  GM˝B0 that are reachable from .s; T /, i.e., such that .s; T / Ý .s 0 ; T 0 /. The following important definitions are key to the algorithm. A pair .s; q/ 2 S  Q is called special 12 if q 2 F and some bottom-SCC C of GM˝B0 ..s; ¹qº// contains a node .s; T / 2 C with q 2 T . For a bottom-SCC C  S  2Q of GM˝cB 0 (and thus also of M ˝ B 0 ), we shall call C accepting if there exists some .s; T / 2 C such that there exists q 2 T \ F such that .s; q/ is a special pair. The following theorem from [11] and [13] reduces the MoCh problem for finite-state MCs to HP problems on (larger) finite-state MCs. Theorem 3.3 ([11] and [13]). Given a labelled finite-state MC M D .S; P; l/ with initial state s 2 S and given a non-deterministic BA B with initial state q0 , the probability Ps .L.B// is equal to the probability that in the MC M ˝ B 0 starting from initial state .s; ¹q0 º/ the trajectory eventually reaches an accepting bottom-SCC of M ˝ B 0. 12 In [13] “recurrent” was used, but “recurrent” has other meanings; so we use “special” instead, as in [29].

1368

Kousha Etessami

In order to compute Ps .L.B//, we first need to do a graph-theoretic analysis on the directed graphs GM˝B 0 and also an analysis of various subgraphs GM˝B 0 ..s 0 ; ¹qº//, for s 0 2 S and q 2 F , so as to compute special pairs .s 0 ; q/ 2 S Q, and use that to compute all accepting bottom-SCCs of GM˝B 0 . We can then consider all nodes in such accepting bottom-SCCs as target nodes and compute the probability of hitting a target node starting from the initial state .s; ¹q0 º/ of the MC M ˝ B 0 , which yields the probability Ps .L.B// that we are after. To compute the hitting probabilities we of course use the methods already described for solving HP. Note that this algorithm does not involve full-fledged determinisation of Büchi automata (such as Safra’s construction) which involves a 2jBj log jBj blow-up in size and requires more sophisticated acceptance conditions such as Rabin or Müller conditions. Overall, this algorithm runs in strongly polynomial time as a function of jMj (assuming B is fixed) and exponential time as a function of jBj, when B is nondeterministic (and polynomial in jBj when B is deterministic). It was furthermore shown in [11] and [13] that, given an MC M and a nondeterministic BA B as input, the qualitative problem of deciding whether Ps .L.B// D 1 is in PSPACE, and it was already shown in [41] that the problem is PSPACE-hard, so the qualitative problem is PSPACEcomplete. Courcoubetis and Yannakakis in [11] and [13] also considered model checking of finite-state MCs with respect to properties specified by LTL formulas and, remarkably, they showed that both the quantitative problem and the qualitative problem for LTL model checking of MCs has the same complexity as that of model checking an ! -regular property given by a nondeterministic BA. This was surprising, because it is well known that in general translating an LTL formula to a BA requires worst-case exponential blow-up. Their algorithm involves iterative constructions of larger and larger finitestate MCs, starting from M, built up via a structural induction on the subformulas of the LTL formula. The transition probabilities of the new MCs in the iterative construction are obtained by computing certain hitting probabilities on the old MCs. See [13] for details.

4. Analysis of finite-state MDPs We now review some algorithms for analyzing finite-state MDPs and discuss their complexity. Many analogies with the algorithms for finite-state MCs will soon become clear. In fact, we have deliberately stated some equations and facts for finite-state MCs in a general enough form so as to be able to reuse them here (and also later, for recursive MCs and 1-recursive MDPs). Let us already summarise the known facts: again, for I–V listed in § 2.1, all qualitative and quantitative decision and computation problems are solvable in polynomial time as a function of the encoding size of the given MDP (but the known P-time algorithms for all of them require solving linear programming problems, and thus none of

36. Analysis of probabilistic processes and automata theory

1369

them are currently known to be solvable in strongly polynomial time 13 ). For qualitative analyses, the algorithms only involve and-or game graph analysis on the underlying transition graph G of the MDP D, which can be done in Ptime. For quantitative analyses, the algorithms additionally involve solving corresponding max/min-linear Bellman optimality equations, which can be solved in P-time using linear programming. For model checking (MoCh) the complexity is polynomial in the encoding size of D, but again exponential in the encoding size jBj if the ! -regular property L D L.B/ is given by a nondeterministic Büchi automaton B. However for finite-state MDPs, unlike finite-state MCs, if L D L.'/ is given by an LTL formula ' , then the complexity is double-exponential as a function of the encoding size of ' . These complexity bounds can not be improved, because the problems are EXPTIME-hard and 2EXPTIME-hard, respectively. These results on model checking finite-state MDPs were established by Courcoubetis and Yannakakis in [11], [13], [12], and [14]. I.MP and II.DTP are standard for finite-state MDPs and algorithms for them can be found in any textbook on MDPs. See, e.g., [38] for a thorough treatment. Let us mention that for I–IV on finite-state MDPs it is well known that there always exist deterministic memoryless optimal strategies (see [38]). For model checking (V.MoCh), memoryless strategies do not suffice in general for optimising the probability of an ! -regular property, but bounded-memory strategies do suffice ([14]). We shall only discuss analyses HP and MoCh further. Suppose we are “given” a MDP D D .S; .S0 ; S1 /; ; P /, where for now we allow the set S to be countably infinite. Again, for convenience, we equate S with (an initial segment of) the positive natural numbers NC D ¹1; 2; : : :º and let n  jS j. We will furthermore assume that every state i 2 S1 is boundedly branching, meaning there is some k 2 N (depending on the MDP), such that, for every i 2 S1 , jsuccessors.i /j 6 k . This allows us to use max and min operators in the Bellman optimality equations, whereas we would otherwise require sup and inf . HP. Suppose we are “given” a subset F  S of target states and suppose we wish   to compute the supremum probabilities qmax ;i or the infimum probabilities qmin;i of eventually hitting a target state in F starting from initial state i 2 S . In other words, : :     qmax ;i D sup Pi .9t > 0W X t 2 F / and qmin;i D inf  Pi .9t > 0W X t 2 F /. Maximum (minimum, respectively) hitting probabilities for a denumerable MDP can be “computed” by “solving” the following max-(min-)linear system of equations called their Bellman optimality equations. There is one variable xi and one equation for every state i 2 S . Let opt D max or min, according to whether we are maximising or minimising hitting probability. The equations are given by 8 ˆ 1 for all i 2 F ,

0 ! Rn>0 again defines a monotone map from non-negative vectors  to non-negative vectors. Let q  D .qopt ;i W i 2 S /, where opt D max or D min, k respectively. For any k > 0, let qopt;i denote the optimal probability of hitting target k set F starting in initial state i , in at most k time steps. Let q k D .qopt ;i W i 2 S / denote the corresponding vector of optimal probabilities. The following is again easy to prove by induction on k . Proposition 4.1. The statement of Proposition 3.1 holds true, verbatim, for the above re-interpretations of x D R.x/, q  , and q k .

Thus the optimal hitting probabilities q  are the LFP of x D R.x/. Now suppose that D D .S; .S0 ; S1 /; ; P / is a finite-state MDP. How can we use Proposition 4.1 to compute the optimal hitting probabilities q  ? We have to compute the least nonnegative solution to the linear system of equations x D R.x/. One way to do this in polynomial time for maximising MDPs is to formulate this as a linear programming  problem. Namely, the vector qmax is the unique optimal solution to the LP given in (9), with this new interpretation of R.x/. However, to express the constraints R.x/ 6 x as an LP, and recalling that, for i 2 S1 , Ri .x/  maxj 2successors.i / xj , we need to rewrite the corresponding constraints Ri .x/ 6 xi as a system of linear inequality constraints .xj 6 xi j j 2 successors.i //. With this modification (9) again defines an LP and  the vector qmax is the unique optimal solution to this LP.  For minimising MDPs, computing qmin can also be reduced to linear programming, but this case involves some more preprocessing. In order to express the problem as an LP one first needs to do a little graph-theoretic analysis. Specifically, we first need to  identify and remove all states i such that qmin ;i D 0. We can do this by a simple and-or game graph analysis on the underlying graph G of the MDP. Once this is done, it turns  out that on the remaining MDP one can solve for qP min as the unique optimal solution of a different LP, namely the LP given by maximise: i xi ; subject to: R.x/ > x , x > 0, where in this case when we have Ri .x/ D minj 2successors.i / xj , we have to rewrite the constraint Ri .x/ > xi , as a system of constraints .xj > xi j j 2 successors.i //. A more basic method for computing q  is again already immediately suggested by Proposition 3.1: value iteration. By Proposition 3.1, the sequence y k D Rk .0/ converges monotonically to q  . As we already saw, even for finite-state MCs, value iteration can be slow to converge in the worst case, but it is widely used in practice, also for MDPs.

36. Analysis of probabilistic processes and automata theory

1371

Another standard method for solving HP for maximising MDPs, as well as for solving many other classes of MDPs, is called policy iteration or strategy improvement. It involved initially fixing an arbitrary (memoryless) strategy for the controller, evaluating q  on the resulting MC, and then updating the strategy (at every state) by choosing a neighbor whose value is strictly greater than that of the currently chosen neighbor chosen by the previous fixed strategy, if such a strictly greater neighbor exists. See, e.g., [38] for much more on policy iteration for MDPs. It is worth mentioning that answering the qualitative questions of whether   qmax ;i D 0; 1, or whether qmin;i D 0; 1, requires only (game) graph theoretic analyses that do not depend on the actual probabilities of transitions in the given MDP, and so do not require solving LPs. Thus, these qualitative questions for HP can be answered in strongly polynomial time (see, e.g., [13] and [14]). MoCh. Given a labelled finite-state MDP D D .S; .S0 ; S1 /; ; P; l/, an initial state s0 2 S , and a Büchi automaton B D .Q; †; q0 ; ı; F /, we wish to compute the optimum (w.l.o.g., maximum) probability p  D sup Ps0 .L.B//. Qualitative decision problems associated with this were studied in [13] and [41], and quantitative decision problems where studied in [14]. We briefly mention the main results of [14].

As in the case of MoCh for MCs, we can assume, w.l.o.g., that † D S , and we let B0 D .2Q ; †; ¹q0 º; ı 0 ; F 0 / be the deterministic BA obtained by performing the usual (naive) subset construction on B. Next, as for MCs, we define the product MDP z Pz /. Note that there is a one-to-one D ˝ B0 D .S  2Q ; .S0  2Q ; S1  2Q /; ; correspondence between strategies  on D and strategies  on D ˝ B0 (because B0 is deterministic). Using more involved analysis than for the case of MCs, employing the notion of controllably recurrent pairs .s; q/ 2 S Q (which we will not define here) that roughly correspond to the special pairs in the case of MCs, [14] showed how one can compute a set of target states Z  S 2Q of D˝B0 such that in order to optimise the probability PI .L.B// in D it suffices for the strategy  to first optimise the probability of hitting a target set Z in D ˝ B0 and once a target state in z 2 Z is hit; the strategy  should then switch to a different strategy z that thereafter assures that with probability 1 the infinite trajectory is accepted by B (which is made possible by definition of the target states Z ). In this way, the problem MoCh is reduced to (much larger) instances of the problem HP, which as we saw can be solved using linear programming. Let us note however that, whereas for HP we always have memoryless deterministic (positional) optimal strategies, the optimal strategies obtained this way for MoCh by [14] are not positional, and in fact it is easy to see that optimal positional strategies for MoCh need not exist. The complexity of [14]’s algorithm for computing p  D max Ps0 .L.B// is polynomial in jDj and exponential in the size jBj for a nondeterministic Büchi automaton B. It was previously shown in [13] that even the qualitative decision problem of determining whether p  D 1 is EXPTIME-complete, and thus we can not improve substantially on this complexity upper bound. If the ! -regular property is specified as an LTL formula instead, then it was shown in [13] that the resulting qualitative problem of determining whether p  D 1 is already 2EXPTIME-complete.

1372

Kousha Etessami

5. Adding recursion to MCs and MDPs As mentioned in the introduction, a number of important classes of countably infinitestate MCs and MDPs that are closely related to automata-theoretic models are subsumed, in precise senses, by adding a natural recursion feature to MCs and MDPs. in a manner similar to allowing potentially recursive subroutine calls in procedural programs. The resulting formal models, called recursive Markov chains (RMCs) and recursive Markov decision processes (RMDPs) were defined and studied in [28] and [29] and, respectively, in [30] and [27]. RMCs and RMDPs provide natural abstract models for probabilistic procedural programs with recursion (and this indeed partly motivated their study). RMCs (and RMDPs), and various of their subclasses, capture probabilistic and controlled extensions of classic infinite-state automata theoretic models, including pushdown automata, context-free grammars, and one-counter automata. Indeed, RMCs and RMDPs can equivalently be viewed as probabilistic and MDP extensions of pushdown automata. We refer the reader to [28] and [30] for detailed formal definitions and results about RMCs and RMDPs, respectively. A (not-necessarily finitely-presented) recursive Markov chain (RMC) is a tuple A D .A1 ; : : : ; Ak /, where each component Ai D .Ni ; Bi ; Yi ; Eni ; Exi ; ıi /

consists of the following data.  A (countable, or finite) set Ni of nodes.  A subset of entry nodes Eni  Ni , and a subset of exit nodes Exi  Ni .  A (countable, or finite) set Bi of boxes, and a mapping Yi W Bi 7! ¹1; : : : ; kº that assigns to every box (the index of) one of the components A1 ; : : : ; Ak . To each box b 2 Bi we associate a set of call ports Callb D ¹.b; e n/ j e n 2 EnYi .b/ º, corresponding to the entries of the corresponding component, and a set of return ports Returnb D ¹.b; ex/ j ex 2 ExYi .b/ º, corresponding to the exits of the corresponding component.  A probabilistic transition relation ıi , where transitions are of the form .u; pu;v ; v/

such that 1. the source u is either a non-exit node u 2 Ni n Exi , or a return port u D .b; ex/ of a box b 2 Bi ; 2. the destination v is either a non-entry node v 2 Ni n Eni or a call port u D .b; e n/ of a box b 2 Bi ; 3. pu;v 2 R>0 is the transition probability from P u to v ; 4. consistency of probabilities: for each u, ¹v0 j.u;pu;v0 ;v0 /2ıi º pu;v0 D 1; unless u is a call port or exit P node, neither of which have outgoing transitions, in which case by default v0 pu;v0 D 0. When we want to ensure that an RMC is finitely-presented for computational purposes, we assume that all the sets involved (like nodes Ni and boxes Bi ) are finite, we assume that the transition probabilities pu;v are rational numbers, given as the ratio

36. Analysis of probabilistic processes and automata theory

1373

of two integers, and we measure their size by the number of bits in the numerator and denominator. The size jAj of a given finitely-presented RMC A is the number of bits needed to specify it (including the encoding size of the transition probabilities). As in the case of MCs and MDPs, some general theorems used for analysis of RMCs hold true even when sets defining them like nodes Ni and boxes Bi are (countably) infinite. We will use the term vertex of Ai to refer collectively to its set of nodes, call ports, and return ports, and we denote this set by Qi . Thus, the transition relation ıi is a set of probability-weighted directed edges on the set Qi of vertices of Ai . We will use all the notations without a subscript to refer to the union over all the components of the S S RMC A. Thus, N D kiD1 Ni denotes the set of all the nodes of A; Q D kiD1 Qi S S the set of all vertices; B D kiD1 Bi the set of all the boxes; Y D kiD1 Yi the map S Y W B 7! ¹1; : : : ; kº of all boxes to components; and ı D i ıi the set of all transitions of A. Example 5.1. Figure 3 depicts a example RMC (taken from [29]). This RMC has two components A1 and A2 , each with one entry and two exits (in general different components may have different numbers of entries and exits). Component A2 has two boxes: b10 which maps to A1 , and b20 which maps to A2 . A1

A2 1 2

b1 W A2

1

1 2

ex1

1 1 4 4

en 1 2 1 2

1 3

z

1

1 3

1 w

3 4

4 5

en0 ex2

u

b10 W A1

b20 W A2

1 3 4

1 3

ex10

1 5

1 4 1 4

1 2

v

ex20

1

Figure 3. A sample recursive Markov chain (taken from [29])

An RMC A defines a global denumerable Markov chain MA D .V; PA / as follows. The global states V  B   Q are pairs of the form hˇ; ui, where ˇ 2 B  is a (possibly empty) sequence of boxes and u 2 Q is a vertex of A, denoting the call stack. More precisely, the states V  B   Q and transition probabilities PA of MA are defined inductively as follows: 1. h; ui 2 V for u 2 Q ( denotes the empty string); 2. if hˇ; ui 2 V and .u; pu;v ; v/ 2 ı , then hˇ; vi 2 V and PA .hˇ; ui; hˇ; vi/ D pu;v I

1374

Kousha Etessami

3. if hˇ; .b; e n/i 2 V , where .b; e n/ 2 Callb , then hˇb; e ni 2 V and PA .hˇ; .b; e n/i; hˇb; e ni/ D 1I

4. if hˇb; exi 2 V , where .b; ex/ 2 Returnb , then hˇ; .b; ex/i 2 V and PA .hˇb; exi; hˇ; .b; ex/i/ D 1:

(1) corresponds to the possible initial states, (2) corresponds to a transition within a component, (3) corresponds to a recursive call when a new component is entered via a box, (4) corresponds to the end of a recursive call when the process exits a component and control returns to the calling component. Some states of MA are terminating, having no outgoing transitions. These are precisely the states h; exi, where ex is an exit. If we want to view MA as a proper Markov chain, then we can consider terminating states of MA to be absorbing states, with a self-loop transition to themselves having probability 1. Unrestricted RMCs are essentially equivalent, in a precise sense, to probabilistic pushdown automata (pPDA) (see [28] for the precise equivalence). An RMC where every component has at most one exit is called a 1-exit RMC, or just 1-RMC. 1-RMCs correspond, in a precise sense, to the stochastic process generated by left-most derivations of a stochastic context-free grammar (SCFG), and they also intimately related to multi-type branching processes (see [28] for details of these relationships). An RMC where there is only one box in the entire RMC is call a 1-box RMC. As shown in [23], these correspond to probabilistic 1-counter automata, and to (discrete-time) quasi-birth death processes. Termination probability analysis (VI.TP). We now define a key analysis for RMCs, namely computation of termination probabilities, which plays a central role in many other analyses of RMCs. For an RMC, A D .A1 ; : : : ; Ak /, given a vertex u 2 Qi and  an exit ex 2 Exi , both in the same component Ai , let q.u;ex/ denote the probability of eventually reaching the state h; exi, starting at the state h; ui. Computation of  termination probabilities q.u;ex/ plays an important role for many other analyses of RMCs, including for MoCh, in a way analogous to the role that HP plays for analysing (finite-state) MCs.  Considering the termination probabilities q.u;ex/ as unknowns, we can set up a  system of non-linear polynomial equations, such that the probabilities q.u;ex/ are the least fixed point (LFP) solution of this system. Use a variable x.u;ex/ for each unknown  probability q.u;ex/ . We will often find it convenient to index the variables x.u;ex/ according to a fixed order, so we can refer to them also as x1 ; : : : ; xn , with each x.u;ex/ identified with xj for some j . Of course, if Ni or Bi are infinite for some component Ai , then we have an infinite vector x D .x1    xj    / of variables, rather than an n-vector x D .xj j j 2 ¹1; : : : ; nº/, for some n < 1. Given RMC A D .A1 ; : : : ; Ak /, we define a system of polynomial equations x D R.x/ over the variables x.u;ex/ , where u 2 Qi and ex 2 Exi , for 1 6 i 6 k . The system contains one equation x.u;ex/ D R.u;ex/ .x/, for each variable x.u;ex/ , where R.u;ex/ .x/

36. Analysis of probabilistic processes and automata theory

1375

is a multivariate polynomial with positive rational coefficients. x D R.x/ is defined as follows. There are several based on the “type” of vertex u. Let Œk D ¹1; : : : ; kº, 8 1 if u D ex 2 Exi for i 2 Œk, ˆ ˆ ˆ ˆ ˆ ˆ 0 if u; ex 2 Exi and u ¤ ex ˆ ˆ ˆ ˆ for i 2 Œk, ˆ ˆ ˆ < X (12) x.u;ex/ D pu;v  x.v;ex/ if u 2 Ni n ¹exº or u D .b; ex 0 / ˆ ˆ ˆ ˆ for b 2 Bi , i 2 Œk, ¹vj.u;pu;v ;v/2ıº ˆ ˆ ˆ ˆ ˆ X ˆ ˆ ˆ x.e n;ex 0 /  x..b;ex 0 /;ex/ if u D .b; e n/ for b 2 Bi , i 2 Œk. ˆ : ex 0 2E xY .b/

Given a (finitely-presented) RMC A, we can obviously construct the system x D R.x/ in polynomial time. R.x/ has size O.jAj 2 /, where  denotes the maximum number of exits of any component. Let q  2 Rn denote the n-vector of probabilities  q.u;ex/ , using the same indexing as used for x . The map R W Rn>0 7! Rn>0 is clearly monotone on Rn>0 , and furthermore the following analog of Proposition 3.1 holds.

Theorem 5.1 (see [28]14 ). The termination probability vector q  for an RMC is the least fixed point of x D R.x/. Thus, q  D R.q  / and, for all q 0 2 Rn>0 , if q 0 D R.q 0 /, then q  6 q 0 . Furthermore, Rk .0/ 6 RkC1 .0/ 6 q  for all k > 0, and q  D limk!1 Rk .0/.

For (finitely-presented) RMCs the termination probabilities q  are in general irrational, so we can’t compute them “exactly.” However, using decision procedures for the existential theory of reals, we can decide, e.g., whether qj > r , for any given rational value r , in PSPACE (see [28]). It was shown in [28] that for general RMCs any nontrivial approximation of the probabilities q  is at least as hard as long standing open problems in the complexity of numerical computation, namely the square-root sum problem and a harder arithmetic circuit decision problem known as PosSLP (see [1]), both of which are not even known to be decidable in NP nor in the polynomial time hierarchy. In [28], a decomposed multivariate Newton’s method is studied and shown to converge monotonically to the LFP q  of x D R.x/ for an arbitrary RMC starting from 0; more generally this holds for any monotone polynomial system of equations (MPS) x D R.x/. The convergence behaviour of Newton’s method for MPSs was subsequently studied further in [16], yielding some important insights. Firstly, [16] gave examples of (not strongly connected) 1-exit RMCs, on whose system of equations x D R.x/ Newton’s method would require an exponential number of iterations as a function of the encoding size of the 1-RMC (and of x D R.x/) to converge to within 14 In [28] this theorem is only claimed for finitely-presented RMCs, where the sets of nodes and boxes are finite, but exactly the same proofs establish the result when the sets of nodes and boxes can be countably infinite.

1376

Kousha Etessami

even 1 bit of precision of the LFP vector q  starting from 0; on the other hand in certain strongly-connected cases of RMCs [16] gave exponential upper bounds on the number of iterations required to obtain a desired approximation to q  as a function of the encoding size of x D R.x/ for RMCs. For arbitrary MPSs, [16] gave no upper bounds on the number of Newton’s iterations required as a function of the encoding size of the input MPS. Recently, in [40] an exponential worst-case upper bound was established for Newton’s method as a function of the encoding size of the MPS for computing its LFP to desired precision. The bound in [40] is essentially optimal in several important parameters of the problem. In the case of 1-exit RMCs, the corresponding equation system x D R.x/ is a probabilistic polynomial system of equations (PPS). These consist of equations of the form xi D Ri .x/, where Ri .x/ is a probabilistic polynomial, meaning a multivariate polynomial in the variables x whose monomial coefficients and constant term are all non-negative and their sum is (at most) 1. A recent result in [19] shows that Newton’s method, combined with P-time methods from [28] for qualitative analysis of termination for 1-exit RMCs, can be used to obtain a P-time algorithm for PPSs and 1-exit RMCs (in the standard Turing model of computation) for approximating q  to within arbitrary desired precision 2 j , for j given in unary. This result also has important consequences for multi-type branching processes (BPs) and stochastic context-free grammars (SCFGs). For instance, it yields the first P-time algorithm for computing extinction probabilities of BPs and for computing the probability of generating a given string for arbitrary SCFGs (see [19]). See also the recent paper [20], where it is has been further shown that for a very broad class of SCFGs, excluding only some degenerate “deeply critical” SCFGs, Newton’s method yields a P-time algorithm for computing within desired precision the probability that the SCFG generates a string in a regular language given by a DFA. In particular, [20] shows that this runs in P-time for any SCFG whose parameters are estimated using the standard EM (“inside-outside”) method. In the case of 1-box RMCs, which are essentially equivalent to discrete-time quasibirth-death processes (QBDs) and to probabilistic one-counter automata, it was shown in [21] that decomposed Newton’s method requires only polynomially many iterations, as a function of the encoding size of x D R.x/, and of j , to compute q  to within additive error 2 j . The vector q  corresponds to the so called G matrix of a QBD, which is a key to many other analyses of QBDs (see, e.g., [35] and [3]), and this thus yields the first P-time algorithm, in the unit-cost arithmetic RAM model of computation, for computing the G matrix of an arbitrary QBD. More recently, in [40], it was shown that with suitable rounding of Newton’s method the G matrix can be computed in P-time in the standard Turing model of computation. Model checking (MoCh). Model checking of RMCs was studied in [29], where it was shown how to use TP analysis toward both qualitative and quantitative model checking of RMCs. The algorithms are involved: in brief, given a labelled RMC A and a ! -regular property, say by a Büchi automaton B, it is possible to use termination

36. Analysis of probabilistic processes and automata theory

1377

probabilities q  to first define a finite-state MC M0 , called the conditioned summary chain, of the “product” of the RMC and the naive determinisation B0 of B, and then to boil down the probability of L.B/ in the original RMC to the probability of hitting a subset T of states in M0 , where T can be computed using suitable modifications to the notion of special pairs, used earlier for solving MoCh by [13] for finite-state MCs. Furthermore, a different algorithm can be used for properties specified by LTL formulas. For the resulting complexity bounds for the various cases of qualitative and quantitative analysis, see [29], whose results also yield the best available complexity bounds (improving by more than one exponential the prior bounds) for model checking ! -regular and LTL properties of probabilistic pushdown systems, a problem which was first studied in [17]. For model checking 1-box RMCs (equivalently, probabilistic one-counter automata (pOCAs)), a recent paper [8] shows how to use the polynomial time algorithm obtained in [23] and [40] for computing (to within any desired precision) the termination probabilities q  for 1-box RMCs and pOCAs, in order to obtain an algorithm for computing (to within desired precision) the probability of an ! -regular property for pOCAs which, for a fixed ! -regular property, also runs in polynomial time (see also [40]). Recursive Markov decision processes (RMDPs). It is not difficult to generalise the definition of RMCs to define RMDPs by allowing some nodes of the RMC to be controlled. RMDPs were first studied in [24] and [30], where it was shown that, unfortunately, even very basic computational problems, such as computing any nontrivial approximation of the optimal (supremum or infimum) termination probabilities of finitely presented RMDPs is not computable. Furthermore, [24] and [30] showed that even qualitative model checking (MoCh) analyses are undecidable already for 1-exit RMDPs. Fortunately it was also shown in [24] and [30] that for 1-exit RMDPs (1-RMDPs), which correspond also to controlled versions of BPs and SCFGs, it is possible to set up a monotone max/min probabilistic polynomial system of equations (max/minPPS) x D R.x/ whose LFP q  corresponds precisely to the vector of optimal termination probabilities. A maxPPS (respectively, minPPS) x D R.x/ consists of equations xi D Ri .x/, where each Ri .x/ has the form max¹Q1 .x/; : : : ; Qki .x/º, where each Qj .x/ is a probabilistic polynomial in the variables x . It was furthermore shown in [24] and [30] that the controller always has optimal deterministic stackless and memoryless optimal strategies for optimising termination probability in 1-RMDP. Already for 2-exit RMDPs, it is not even the case that there necessarily exists any optimal strategy for maximising the probability of termination (see [24]). It was subsequently shown in [25] and [30] that qualitative optimal termination problems for 1-RMDPs can be decided in P-time using a spectral optimisation method that requires use of linear programming. The algorithms from [25] for deciding whether optimal termination probability for 1-RMDPs is exactly 1 were later used in [7] in order to show that there is a P-time algorithm for detecting whether there exists a strategy which achieves optimal termination probability 1 of reaching a given vertex of a 1-exit RMC in any calling context (any call stack). However, there need not exist any optimal strategy

1378

Kousha Etessami

for reaching a vertex in any calling context, even when the supremum probability of doing so is 1, and even the decidability of determining whether the supremum probability is 1 for this problem remains open. Finally, in a recent advance it was shown in [18] that for 1-RMDPs the vector q  of optimal termination probabilities can be approximated in P-time to within arbitrary desired precision, by using a generalisation of Newton’s method applied to the corresponding max/minPPS equations x D R.x/, which converges monotonically to their LFP. The generalised Newton method requires solving an LP in each iteration (in both the maximising and minimising cases, which are different). For 1-box RMDPs corresponding to controlled QBDs and to MDP extensions of probabilistic one-counter automata we do not have an associated equation system x D R.x/ which captures their termination probabilities. Nevertheless, it was shown in [6] and [4] that for both maximising and minimising the termination probability in 1-box RMDPs the qualitative problem of deciding whether the optimal probability is 1 for termination, i.e. for hitting counter value 0 in any state, can be decided in P-time using, among other things, linear programming. Subsequently, it was shown in [5] that for a 1-box RMDP one can approximate the optimal probability of termination in any state in exponential time. Optimal strategies need not exist for maximising termination probability in 1-box RMDPs (see [5]). It remains open whether this exponential time upper bound can be improved. Deciding whether the (optimal) termination probability is, say, > 21 , is already square-root-sum-hard, even for 1-box RMCs (see [23]). Apparently harder selective termination problems for 1-box RMDPs were also studied in [6], such as whether there is some strategy with which we hit counter value 0 in a desired control state with probability 1. It was shown in [6] that this problem is already PSPACE-hard, and that this particular qualitative selective termination problem is decidable. However, the decidability of limit-sure (and quantitative) “selective” termination for 1-box RMDPs remains open. Recursive stochastic games. although we have not discussed stochastic games (see, e.g., [39], [10], and [31]), we mention that a number of results, in particular about 1-RMDPs, extend naturally to two-player zero-sum 1-exit recursive simple stochastic games (1-RSSGs) (see [24] and [30]) and to 1-exit recursive concurrent stochastic games (1-RCSGs) (see [26] and [27]). In particular, corresponding to 1-RSSGs with the objective (and counter-objective) of maximising (and minimising) termination probability, there are monotone min-max-polynomial equations x D R.x/ whose LFP yields the vector of termination values starting at each vertex (see [24] and [30]). Corresponding to 1-RCSGs as shown in [27], there are monotone minimax-polynomial equations, where the value operator Val.M / for a 1-shot 2-player zero-sum matrix game M is used in the equations, the LFP of which yields the value vector of the 1-RCSG. It was shown in [25] and [30] that deciding whether the value of a 1-RSSG termination game is exactly 1 is in NP\co-NP, and that this problem is already at least as hard as Condon’s quantitative decision problem for finite-state SSGs [10], whereas for finite-state SSGs the qualitative decision problem of deciding whether the value is 1 is known to be in

36. Analysis of probabilistic processes and automata theory

1379

P-time. For 1-RCSG termination games it was shown in [27] that quantitative decision and approximation problems for the game value are solvable in PSPACE using the associated monotone system of equations x D R.x/ and it was shown that even the qualitative problem deciding whether the game value is 1 is at least as hard as the square-root sum problem, which as discussed already is not even known to be in NP. The complexity of analyzing 1-box RSSGs (equivalently, one-counter SSGs) was studied in [6], [4], and [5], where some upper and lower bounds were established, but the precise complexity of a number of analysis problems for one-counter SSGs (and one-counter MDPs) remains open.

References [1] E. Allender, P. Bürgisser, J. Kjeldgaard-Pedersen, and P. B. Miltersen, On the complexity of numerical analysis. SIAM J. Comput. 38 (2008/09), no. 5, 1987–2006. MR 2476283 Zbl 1191.68329 q.v. 1375 [2] C. Baier and J.-P. Katoen, Principles of model checking. With a foreword by K. G. Larsen. MIT Press, Cambridge, MA, 2008. MR 2493187 Zbl 1179.68076 q.v. 1347 [3] D. Bini, G. Latouche, and B. Meini, Numerical methods for structured Markov chains. Numerical Mathematics and Scientific Computation. Oxford Science Publications. Oxford University Press, New York, 2005. MR 2132031 Zbl 1076.60002 q.v. 1376 [4] T. Brázdil, V. Brožek, and K. Etessami, One-counter stochastic games. In 30 th International Conference on Foundations of Software Technology and Theoretical Computer Science (K. Lodaya and M. Mahajan, eds.) Papers from the conference (FST & TCS 2010) held in Chennai, December 15–18, 2010. LIPIcs. Leibniz International Proceedings in Informatics, 8. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2010, 108–119. MR 2853829 Zbl 1245.68099 q.v. 1378, 1379 [5] T. Brázdil, V. Brožek, K. Etessami, and A. Kučera, Approximating the termination value of one-counter MDPs and stochastic games. Inform. and Comput. 222 (2013), 121–138. MR 3000966 Zbl 1267.68160 q.v. 1378, 1379 [6] T. Brázdil, V. Brožek, K. Etessami, T. Kučera, and D. Wojtczak, One-counter Markov decision processes. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms (M. Charikar, ed.). Proceedings of SODA 2010 held in Austin, TX, January 17–19, 2010, 863–874. MR 2809711 Zbl 1288.90119 q.v. 1378, 1379 [7] T. Brázdil, V. Brožek, V. Forejt, and A. Kučera, Reachability in recursive Markov decision processes. Inform. and Comput. 206 (2008), no. 5, 520–537. MR 2412365 Zbl 1145.91011 q.v. 1377 [8] T. Brázdil, S. Kiefer, and A. Kučera, Efficient analysis of probabilistic programs with an unbounded counter. In Computer aided verification (G. Gopalakrishnan and S. Qadeer, eds.). Lecture Notes in Computer Science, 6806. Springer, Berlin., 2011, 208–224. MR 2870753 q.v. 1377 [9] K. L. Chung, A course in probability theory. Third edition. Academic Press, San Diego, CA, 2001. MR 1796326 Zbl 0980.60001 q.v. 1347 [10] A. Condon, The complexity of stochastic games. Inform. and Comput. 96 (1992), no. 2, 203–224. MR 1147987 Zbl 0756.90103 q.v. 1346, 1378

1380

Kousha Etessami

[11] C. Courcoubetis and M. Yannakakis, Verifying temporal properties of finite-state probabilistic programs. In Proceedings of the 29 th Annual Symposium on Foundations of Computer Science Held in White Plains, N.Y., October 24–26, 2018. IEEE Computer Society, Los Alamitos, CA, 2018, 338–345. IEEEXplore 21950 q.v. 1354, 1363, 1367, 1368, 1369 [12] C. Courcoubetis and M. Yannakakis, Markov decision processes and regular events (extended abstract). In Automata, languages and programming (M. Paterson, ed.). Proceedings of the 17th International colloquium. Warwick University, Coventry, July 16–20, 1990. Lecture Notes in Computer Science. 443. Springer, Berlin, 336–349. Zbl 0765.68152 q.v. 1354, 1369 [13] C. Courcoubetis and M. Yannakakis, The complexity of probabilistic verification. J. Assoc. Comput. Mach. 42 (1995), no. 4, 857–907. MR 1411788 Zbl 0885.68109 q.v. 1354, 1363, 1366, 1367, 1368, 1369, 1371, 1377 [14] C. Courcoubetis and M. Yannakakis, Markov decision processes and regular events. IEEE Trans. Automat. Control 43 (1998), no. 10, 1399–1418. MR 1646707 Zbl 0954.90061 IEEEXplore 720497 q.v. 1354, 1369, 1371 [15] L. de Alfaro, T. A. Henzinger, and O. Kupferman, Concurrent reachability games. Theoret. Comput. Sci. 386 (2007), no. 3, 188–217. Journal version of a FOCS ’98 paper. MR 2359533 Zbl 1154.91306 q.v. 1358 [16] J. Esparza, S. Kiefer, and M. Luttenberger, Computing the least fixed point of positive polynomial systems. SIAM J. Comput. 39 (2010), no. 6, 2282–2335. MR 2607902 Zbl 1213.65076 q.v. 1375, 1376 [17] J. Esparza, A. Kučera, and R. Mayr, Model checking probabilistic pushdown automata. Log. Methods Comput. Sci. 2004, Special issue: Conference “Logic in Computer Science 2004”, 1:2, 31 pp. MR 2357547 Zbl 1126.68053 q.v. 1344, 1377 [18] K. Etessami, A. Stewart, and M. Yannakakis, Polynomial time algorithms for branching Markov decision processes and probabilistic min(max) polynomial equations. In Automata, languages, and programming (A. Czumaj, K. Mehlhorn, A. M. Pitts, and R. Wattenhofer, eds.). Part I. Proceedings of the 39 th International Colloquium (ICALP 2012) held at the University of Warwick, Coventry, July 9–13, 2012. Lecture Notes in Computer Science, 7391. Springer, Berlin, 2012, 314–326. MR 2995318 Zbl 1272.68458 q.v. 1346, 1378 [19] K. Etessami, A. Stewart, and M. Yannakakis, Polynomial time algorithms for multi-type branching processes and stochastic context-free grammars. In STOC ’12 – Proceedings of the 2012 ACM Symposium on Theory of Computing. Held in New York, May 19–22, 2012. Association for Computing Machinery, New York, 2012, 579–588. MR 2961532 Zbl 1286.68188 q.v. 1376 [20] K. Etessami, A. Stewart, and M. Yannakakis, Stochastic context-free grammars, regular languages, and Newton’s method. In Automata, languages, and programming (F. V. Fomin, R. Freivalds, M. Z. Kwiatkowska, and D. Peleg, eds.). Part II. Lecture Notes in Computer Science, 7966. Springer, Berlin, 2013, 199–211. MR 3109147 Zbl 1334.68111 q.v. 1376 [21] K. Etessami, D. Wojtczak, and M. Yannakakis, Quasi-birth-death processes, tree-like QBDs, probabilistic 1-counter automata, and pushdown systems. In Fifth International Conference on Quantitative Evaluation of Systems. Held in St. Malo, France, September 14–17. IEEE Computer Society, Los Alamitos, CA, 2008, 243–253. IEEEXplore 4634979 q.v. 1344, 1376 [22] K. Etessami, D. Wojtczak, and M. Yannakakis, Recursive stochastic games with positive rewards. In Automata, languages and programming (L. Aceto, I. Damgård, L. A. Goldberg,

36. Analysis of probabilistic processes and automata theory

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31] [32]

[33]

[34]

1381

M. M. Halldórsson, A. Ingólfsdóttir, and I. Walukiewicz, eds.). Part I. Proceedings of the 35 th International Colloquium (ICALP 2008) held in Reykjavik, July 7–11, 2008. Lecture Notes in Computer Science, 5125. Springer, Berlin, 2008, 711–723. MR 2500313 Zbl 1153.91328 q.v. 1346, 1357 K. Etessami, D. Wojtczak, and M. Yannakakis, Quasi-birth-death processes, tree-like QBDs, probabilistic 1-counter automata, and pushdown systems. Perform. Eval. 67 (2010), no. 9, 837–857. q.v. 1374, 1377, 1378 K. Etessami and M. Yannakakis, Recursive Markov decision processes and recursive stochastic games. In Automata, languages and programming (L. Caires, G. F. Italiano, L. Monteiro, C. Palamidessi, and M. Yung, eds.) Proceedings of the 32nd International Colloquium (ICALP 2005) held in Lisbon, July 11–15, 2005. Lecture Notes in Computer Science, 3580. Springer, Berlin, 2005, 891–903. MR 2184687 Zbl 1085.68089 q.v. 1377, 1378 K. Etessami and M. Yannakakis, Efficient qualitative analysis of classes of recursive Markov decision processes and simple stochastic games. In STACS 2006. (B. Durand and W. Thomas, eds.). Proceedings of the 23rd Annual Symposium on Theoretical Aspects of Computer Science held in Marseille, February 23–25, 2006. Lecture Notes in Computer Science, 3884. Springer, Berlin, 2006, 634–645. MR 2249403 Zbl 1136.90499 q.v. 1346, 1377, 1378 K. Etessami and M. Yannakakis, Recursive concurrent stochastic games. In Automata, languages and programming. (M. Bugliesi, B. Preneel, V. Sassone, and I. Wegener, eds.). Part II. Lecture Notes in Computer Science, 4052. Springer, Berlin, 2006, 324–335. MR 2307246 Zbl 1133.91317 q.v. 1378, 1381 K. Etessami and M. Yannakakis, Recursive concurrent stochastic games. Log. Methods Comput. Sci. 4 (2008), no. 4, 4:7, 21 pp. (Journal version of [26]). MR 2456846 Zbl 1161.68619 q.v. 1346, 1372, 1378, 1379 K. Etessami and M. Yannakakis, Recursive Markov chains, stochastic grammars, and monotone systems of nonlinear equations. J. ACM 56 (2009), no. 1, Art. 1, 66 pp. MR 2541335 Zbl 1325.68091 q.v. 1344, 1372, 1374, 1375, 1376 K. Etessami and M. Yannakakis, Model checking of recursive probabilistic systems. ACM Trans. Comput. Log. 13 (2012), no. 2, Art. 12, 40 pp. MR 2915655 Zbl 1351.68159 q.v. 1367, 1372, 1373, 1376, 1377 K. Etessami and M. Yannakakis, Recursive Markov decision processes and recursive stochastic games. J. ACM 62 (2015), no. 2, Art. 11, 69 pp. MR 3346150 Zbl 1333.91005 q.v. 1346, 1372, 1377, 1378 J. Filar and K. Vrieze, Competitive Markov decision processes. Springer, New York, 1997. MR 1418636 Zbl 0934.91002 q.v. 1378 T. D. Hansen, P. B. Miltersen, and U. Zwick, Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor. J. ACM 60 (2013), no. 1, Art. 1, 16 pp. (Journal version of a ITCS ’11 paper). MR 3033218 q.v. 1369 T. E. Harris, The theory of branching processes. Die Grundlehren der Mathematischen Wissenschaften, 119. Springer, Berlin, and Prentice-Hall, Englewood Cliffs, N.J., 1963. MR 0163361 Zbl 0117.13002 q.v. 1344 M. Kwiatkowska, G. Norman, and D. Parker, Prism 4.0: Verification of probabilistic real-time systems. In Computer aided verification (G. Gopalakrishnan and S. Qadeer, eds.). Proceedings of the 23rd international conference, CAV 2011, Snowbird, UT,

1382

[35]

[36] [37] [38]

[39] [40]

[41]

[42]

[43]

[44] [45]

Kousha Etessami July 14–20, 2011. Lecture Notes in Computer Science, 6806. Springer, Berlin, 2011, 585–591. https://www.prismmodelchecker.org MR 2870782 q.v. 1347 G. Latouche and V. Ramaswami, Introduction to matrix analytic methods in stochastic modeling. ASA-SIAM Series on Statistics and Applied Probability. Society for Industrial and Applied Mathematics, Philadelphia, P.A., American Statistical Association, Alexandria, VA, 1999. MR 1674122 Zbl 0922.60001 q.v. 1376 J. R. Norris, Markov chains. Cambridge Series in Statistical and Probabilistic Mathematics, 2. Cambridge University Press, Cambridge, 1997. MR 1600720 Zbl 0873.60043 q.v. 1347 A. Pnueli, The temporal logic of programs. 18 th Annual Symposium on Foundations of Computer Science. Held in Providence, R.I., October 31–November 2, 1977. IEEE Computer Society, Long Beach, CA, 1977. MR 0502161 IEEEXplore 4567924 q.v. 1354 M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. A Wiley-Interscience Publication. John Wiley & Sons, New York, 1994. MR 1270015 Zbl 0829.90134 q.v. 1347, 1362, 1369, 1371 L. Shapley, Stochastic games. Proc. Nat. Acad. Sci. U.S.A. 39 (1953), 1095–1100. MR 0061807 Zbl 0051.35805 q.v. 1346, 1378 A. Stewart, K. Etessami, and M. Yannakakis, Upper bounds for Newton’s method on monotone polynomial systems, and P-time model checking of probabilistic one-counter automata. In Computer aided verification (N. Sharygina and H. Veith, eds.). Proceedings of the 25th International Conference (CAV 2013) held in Saint Petersburg, July 13–19, 2013. Lecture Notes in Computer Science, 8044. Springer, Berlin, 2013, 495–510. MR 3115524 q.v. 1376, 1377 M. Y. Vardi, Automatic verification of probabilistic concurrent finite-state programs. In 26 th Annual Symposium on Foundations of Computer Science (sfcs 1985). Held in Portland, OR, October 21-23, 1985. IEEE Computer Society, Los Alamitos, CA, 1985, 327–338. IEEEXplore 4568158 q.v. 1357, 1368, 1371 M. Y. Vardi and P. Wolper, An automata-theoretic approach to automatic program verification. In Proceedings, Symposium on Logic in Computer Science. Held in Cambridge, Massachusetts, USA, June 16–18, 1986. IEEE Computer Society, Los Alamitos, CA, 1986, 322–331. q.v. 1355 D. Wojtczak and K. Etessami, Premo: an analyzer for probabilistic recursive models. In Proceedings of the 13 th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (O. Grumberg and M. Huth, eds.). Lecture Notes in Computer Science, 4424. Springer, Berlin, 2007, 66–71. q.v. 1347 Y. Ye, A new complexity result on solving the Markov decision problem. Math. Oper. Res. 30 (2005), no. 3, 733–749. MR 2161207 Zbl 1082.90132 q.v. 1369 U. Zwick and M. Paterson, The complexity of mean payoff games on graphs. Theoret. Comput. Sci. 158 (1996), no. 1–2, 343–359. MR 1388974 Zbl 0871.68138 q.v. 1346

Chapter 37

Natural language parsing Mark-Jan Nederhof and Giorgio Satta

Contents 1. 2. 3. 4. 5. 6. 7.

Introduction to natural language parsing Preliminaries . . . . . . . . . . . . . Tabulation . . . . . . . . . . . . . . . LR recognition . . . . . . . . . . . . Earley’s algorithm . . . . . . . . . . . Cocke–Younger–Kasami algorithm . . Bibliographic notes . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

1383 1387 1390 1400 1404 1407 1408

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1409

1. Introduction to natural language parsing Natural language processing, or NLP for short, is a wide research area within the field of artificial intelligence. The goal of NLP is the development of computer applications involving human language, either in the form of written text or speech, with the aim of improving communication between humans and machines, as well as supporting communication between humans. Among the most attractive applications that are nowadays targeted by NLP we can list simple, goal-directed conversations over the phone, search for query-relevant web pages, translation and summarisation of written text and speech, and extraction of basic knowledge from documents describing predefined scenarios. Unfortunately, the best language-aware technologies available today can only deal with each of the above tasks on restricted domains, one at a time, and are still error prone. Nevertheless, with the growing flood of text-encoded information in our society, language technologies are needed today more than ever to extract information from documents, in order to keep ourselves informed and organised. Motivated by such an impelling need, NLP is nowadays a rapidly growing field. It is developing tools whose degree of accuracy is constantly improving, and it is attracting the attention of many researchers as well as commercial companies. On a technical perspective, NLP systems are based on mathematical and statistical models of natural language. Each model represents knowledge about some specific aspects of natural language. Simple examples are the phonetic model, the morphological dictionary, the syntactic component, and so on.

Mark-Jan Nederhof and Giorgio Satta

1384

S NP

VP

Noun

Verb

John

eats

NP NP

PP

Noun

Prep

NP

strawberries

with

Noun chocolate

Figure 1. One possible parse tree for the English sentence John eats strawberries with chocolate

State-of-the-art models can be abstractly viewed as parametric systems, where each parameter is responsible of some elementary decision that must be taken by the system in the process of assigning some linguistically meaningful structure to the input. More precisely, each parameter is associated with some score, and appropriate functions of these scores are used to drive the search in the space of all potential solutions. Specialised algorithms are then developed for training these models from large amounts of linguistic examples. Besides real world applications, it turns out that the above models and the associated processing algorithms are also of independent interest in the fields of computer science and cognitive science. In most NLP systems the central model is the so-called parsing component. The term parsing here refers to the process of automatically analyzing a given sentence, viewed as a sequence of words, in order to determine its possible underlying syntactic structures. Syntactic structures for natural language sentences are most conveniently represented by means of node-labelled, ordered trees called parse trees. Parse trees reveal how sentences can be analyzed as being composed of units of words, called phrases or constituents, and how these units are hierarchically organised. As a simple example, consider the English sentence John eats strawberries with chocolate. One possible parse tree for this sentence is depicted in Figure 1. This structure conveys the information that the noun phrase (NP) John is the subject of some eating action expressed by means of a verb phrase (VP). The internal structure of the verbal phrase also reveals that the eating action involves as a direct object the noun phrase strawberries, and that this noun phrase is modified by the prepositional phrase (PP) with chocolate, indicating the specific way strawberries are accompanied in John’s meal. The main importance of the parsing process lies therefore in the

37. Natural language parsing

1385

grammatical information that syntactic structures convey to modules that implement semantic, pragmatic and discourse processing, which are crucial in most applications. For this reason parsing is considered a central part in typical natural language processing systems, and the accuracy of the parse trees can have much impact on the success of an application as a whole. The models that are most often used for the parsing component are based on formal language theory. Since the early work by the linguist Noam Chomsky, it was recognised that regular languages do not have enough generative power to model the syntax of natural language. One of the most commonly used models is therefore the class of context-free languages. Although some natural language constructions have been shown to fall beyond the generative power of this class, context-free languages are adequate for capturing the vast majority of natural language constructions that are observed in real world applications. For reasons that will be apparent in later sections, when dealing with parsing algorithms the most convenient way of representing contextfree languages is through the class of pushdown automata, or PDAs for short. As is well-known, the generative power of these devices is the same as that of the class of context-free grammars. One of the main problems in adopting PDAs in the syntactic component is due to the fact that the set of parse trees that these devices may assign to a given input sentence is typically very large. This is because models of context-free power often fail to capture subtle properties of the structure, meaning, and use of natural language, and consequently allow many parse trees that humans would not find plausible. Consider again our sample sentence John eats strawberries with chocolate. While it should be clear to the reader that, in the context of this sentence, the prepositional phrase with chocolate indicates the specific way strawberries are accompanied in John’s meal, one alternative syntactic analysis treats with chocolate as an instrumental means by which the action of eating is carried out by John. The resulting parse tree is depicted in Figure 2. Here we see that the prepositional phrase is a modifier of the verb phrase eats strawberries, as opposed to a modifier of the noun phrase strawberries as in the analysis in Figure 1. Although this latter reading does not make much sense given our knowledge of the world, the corresponding syntactic analysis must nevertheless be elicited by the parsing model, since prepositional phrases can generally be used to modify verb phrases. This can be seen if we contrast the above sentence with the similar sentence John eats strawberries with a fork, where we have just changed the lexical content of the prepositional phrase. As already mentioned, in real world applications we have to deal with sentences that are massively ambiguous under PDA models. In contrast, many NLP systems require the availability of a small set of preferred parse trees, ideally only one, from among the full set of parse trees. Narrowing down a large set of candidate parse trees is called syntactic disambiguation. This is perhaps the most important computational problem that one deals with in natural language parsing. The most common approach to syntactic disambiguation consists in the adoption of lexicalised and weighted PDAs, as described in what follows. A lexicalised PDA is

Mark-Jan Nederhof and Giorgio Satta

1386

S NP

VP

Noun John

VP

PP

Verb

NP

Prep

NP

eats

Noun

with

Noun

strawberries

chocolate

Figure 2. An alternative (unlikely) parse tree for the English sentence John eats strawberries with chocolate

designed in such a way that its internal stack symbols (see the next section for a precise definition) record information about lexical elements and/or word senses. In this way, the meanings of each individual word or word class can be appropriately treated by the automaton when discriminating among different syntactic configurations. Furthermore, the transitions in a lexicalised PDA are augmented with some kind of numeric value, called a weight. Weights are empirically estimated from real data, in order to capture semantic and pragmatic factors that are related to specific lexical choices. During parsing, these weights are then combined to provide scores for each complete parse tree. This results in the ranking of the syntactic analyses of each sentence, and the preferred parse tree can be selected as the one with the optimal score, which might be the minimum or the maximum, depending on the nature of the scores themselves. In the development of the next sections we will not further delve into the precise way in which lexical information can be embedded into a PDA, nor consider the specific techniques for weight estimation from annotated corpora, since the focus of this chapter is on parsing algorithms rather than on language modeling and learning. Nevertheless, the reader should always keep in mind that the PDA models used in the next sections assign scores to the produced parse trees in a way that reflects the preferred use of language in given domains. A second aspect of the parsing process that is of major importance when dealing with natural language is related to the representation of the search space, and the specific order in which we visit this space, which is reflected by the specific order in which individual parse trees are constructed. This is commonly referred to as the problem of the choice of the parsing strategy. In this chapter we introduce several families of weighed PDAs, implementing the most commonly used parsing strategies that can be found in the literature, such as bottom-up and top-down parsing. We also discuss more specialised strategies such as LR parsing.

37. Natural language parsing

1387

In natural language parsing, all of the PDAs of interest are nondeterministic. In order to be able to parse strings, we then need to introduce specialised dynamic programming algorithms that simulate all possible computations of a PDA on a given input string, in a time-efficient manner. We also deal with the problem of representing in a space-efficient manner the parse forest of all parse trees that the model assigns to an input, and show how to extract the parse trees of interest from such a representation. Several dynamic programming algorithms for parsing based on the equivalent formalism of context-free grammars are found in the literature. They are also called tabular parsing methods. Examples of tabular parsing methods are the Cocke–Younger– Kasami algorithm and Earley’s algorithm. What is often overlooked is that these algorithms can be straightforwardly derived from parsing techniques expressed by means of PDAs, as we show in this chapter. Therefore, the approach we follow here is also attractive from a foundational perspective, since PDAs are simpler devices than the tabular methods that can be derived from them. This approach allows one to become acquainted with simple, non-tabular forms of context-free parsing before moving on to tabulation, which also simplifies the relevant formal proofs.

2. Preliminaries In this section we discuss grammars and automata with weights that are elements from a semiring. 2.1. Semirings. A semiring is an algebraic structure .W; ˚; ˝; 0; 1/, consisting of a non-empty set W of elements, which are referred to in the following sections as weights, with two binary operations ˚ and ˝, called addition and multiplication, and two elements 0; 1 2 W, such that:

 .W; ˚; 0/ is a commutative monoid with neutral element 0, that is, for all x; y; z 2 W: – .x ˚ y/ ˚ z D x ˚ .y ˚ z/, – x ˚ y D y ˚ x, – x ˚ 0 D 0 ˚ x D x;  .W; ˝; 1/ is a monoid with neutral element 1, that is, for all x; y; z 2 W: – .x ˝ y/ ˝ z D x ˝ .y ˝ z/, – x ˝ 1 D 1 ˝ x D x;  ˝ distributes over ˚ from either side, that is, for all x; y; z 2 W: – .x ˚ y/ ˝ z D .x ˝ z/ ˚ .y ˝ z/, – x ˝ .y ˝ z/ D .x ˝ y/ ˚ .x ˝ z/;  0 annihilates W with respect to ˝, that is, for all x 2 W: – x ˝ 0 D 0 ˝ x D 0.

A semiring is called commutative if .W; ˝; 1/ is commutative, that is, for all x; y 2 W:  x ˝ y D y ˝ x.

Mark-Jan Nederhof and Giorgio Satta

1388

A semiring is called idempotent if for all x 2 W:  x ˚ x D x.

For an idempotent semiring we define the relation 6W by x 6W y if and only if x ˚ y D y , for all x; y 2 W. This relation is a partial order over W. An example of a commutative semiring is .R>0 ; C; ; 0; 1/, consisting of the set of non-negative real numbers and arithmetic addition and multiplication. An example of a commutative and idempotent semiring is the Boolean semiring .¹false; trueº; _; ^; false; true/, where clearly false 6W true. Another commutative and idempotent semiring is .R>0 [ ¹1º; min; C, 1; 0/, where x min y stands for the minimum of x and y . The relation 6W here is the familiar > relation on (non-negative) real numbers.   For each alphabet †, there is an idempotent semiring .2† ; [; ; ;; ¹"º/, where 2† is the powerset of the set of strings over †, and  is defined by V  W D ¹vw j v 2 V; w 2 W º. The relation 6W here amounts to the subset relation. 2.2. Grammars. As usual, a context-free grammar (CFG) is defined to be a 4-tuple G D .†; N , S; R/, where † and N are two finite disjoint sets of terminals and nonterminals, respectively, S 2 N is the start symbol, and R is a finite set of rules, each of the form A ! ˛ , where A 2 N and ˛ 2 .† [ N / . A weighted CFG (WCFG) over a semiring .W; ˚; ˝; 0; 1/ is a 5-tuple G D .†; N; S; R; / where .†; N; S; R/ is a CFG and  is a mapping from R to W. Let 1 ; : : : ; m be the sequence of rules applied in a left-most derivation of a string w . The weight of that derivation is defined by .1 / ˝    ˝ .m /. The weight of a string w is the addition (by ˚) of the weights of all the left-most derivations of w , and is 0 if w cannot be derived. A number of semirings are of special importance for WCFGs. By restricting .R>0 ; C; ; 0; 1/ to numbers between 0 and 1 we obtain probabilistic CFGs (PCFGs), also called stochastic CFGs. The weights of rules can be interpreted as probabilities if the grammar is proper, that is, if for each nonterminal A, the weights of all rules with A as left-hand side sum to one. Under the condition of consistency, not discussed further here, the weights of all strings sum to one, which means that the PCFG defines a probability distribution over strings as well as over derivations. Recognition is obtained with the Boolean semiring, if  maps each rule to true. The weight of a string will then be true if and only if there is at least one derivation.  A WCFG with alphabet †1 over semiring .2†2 ; [; ; ;; ¹"º/ can be used to describe an algebraic transduction between a source language over †1 and a target language over †2 , assuming each rule is mapped to a singleton set. 2.3. Automata. In contrast to standard definitions in automata theory, our pushdown automata do not possess states. This is firstly because each transition explicitly names the symbol on top of the stack before and the one that results after. Therefore, information that could be kept in a separate state can here be encoded in the top-most stack symbol. Secondly, absence of states simplifies the tabular simulation of nondeterministic pushdown automata, as we will show later. For the same reason, our automata can

37. Natural language parsing

1389

push at most one symbol at a time, but can pop an arbitrary number of symbols at a time. The bottom-most stack symbol cannot be popped or replaced by another symbol. Hence, a pushdown automaton (PDA) M is a 5-tuple .†; €; s0 ; sf ; /, where † is a finite set of input symbols, € is a finite set of stack symbols, s0 2 € is the initial stack symbol, sf 2 € is the final stack symbol, and  is the set of transitionsEach transition x has the form s 7 ! sr , where s 2 € ,  2 €  , r 2 € n ¹s0 º and x 2 † [ ¹"º. In our notation, stacks grow from left to right, i.e., the top-most stack symbol is found at the right end. Assume the input string is w D a1    an , where ai 2 † (1 6 i 6 n). A configuration is a pair consisting of a stack, which is a string over € , and an input position i , which is a number such that 0 6 i 6 n. The initial configuration is .s0 ; 0/ and the final configuration is .s0 sf ; jwj/. A step of automaton M relative to w is defined by the relation `M;w . Two kinds of steps are allowed, depending on whether an input symbol is read: "

 . s; i / `M;w . sr; i / if there is a transition s 7 ! sr , and a  . s; i / `M;w . sr; i C 1/ if there is a transition s 7 ! sr and a D ai C1 .

The reflexive transitive closure of `M;w is denoted as `M;w . A computation is a sequence of configurations resulting from steps relative to a fixed string w . We say a computation accepts that string w if it starts with .s0 ; 0/ and ends with .s0 sf ; jwj/. The language recognised by M is the set of all strings w that can be accepted, or in other words, the set of strings w such that .s0 ; 0/ `M;w .s0 sf ; jwj/. The class of languages that PDAs recognise equals the class of context-free languages, as will be shown in § 6. In the computational analysis of PDAs, an important quantity is the length M of x the longest string s in transitions s 7 ! sr . This is a bound on the number of stack symbols that needs to be considered before each step of the automaton. We assume below that a PDA is always reduced, which means that each stack symbol is used in some computation that accepts a string. If a PDA is not reduced, one may make it reduced by removing the stack symbols and the associated transitions that cannot be used in computations accepting strings. There are effective algorithms for reducing PDAs, which are similar to well-known algorithms for reducing context-free grammars. We call a PDA deterministic if for each input string and each configuration for that input string, at most one transition is applicable; furthermore, no transitions are applicable when the top-most stack symbol is sf . It is clearly decidable whether a PDA is deterministic, as one only needs to consider a finite number of stacks, with length no more than M , and at most one input symbol a ahead of the current input position. A weighted pushdown automaton (WPDA) M over a semiring .W; ˚; ˝, 0; 1/ is a 6-tuple .†; €; s0 ; sf ; ; /, where .†; €; s0 ; sf ; / is a PDA, and  is a mapping from  to W. Let 1 ; : : : ; m be the sequence of transitions applied in a computation accepting a string w . The weight of that computation is defined by .1 / ˝    ˝ .m /. The weight of a string w is the addition (by ˚) of the weights of all computations that accept w .

1390

Mark-Jan Nederhof and Giorgio Satta

The equivalence of CFGs and PDAs carries over to WCFGs and WPDAs. More precisely, for a given semiring, a mapping from strings to weights described by a WCFG can also be described by a WPDA, and vice versa. This will become clear when we discuss top-down recognition, in § 5. As in the case of WCFGs, a number of semirings are of special importance to WPDAs. With a semiring .R>0 ; C; ; 0; 1/ restricted to numbers between 0 and 1 we can obtain probabilistic PDAs (PPDAs). We say a stack is derivable if .s0 ; 0/ `M;w x . ; i /, for some w and i (0 6 i 6 jwj). We say a transition s 7 ! sr is applicable for stack if s is a suffix of . A PPDA is called proper if for each derivable stack the weights of applicable transitions sum to one. Properness is decidable because the set of derivable stacks can be effectively computed, as we will see in § 3.1.

3. Tabulation In this section we investigate efficient simulation of nondeterministic pushdown automata. We start with unweighted PDAs, then extend the basic algorithm to compute weights, and consider finite automata in place of linear input. 3.1. Tabulation of PDAs. If a PDA is deterministic, then there is at most one computation that accepts a given string w . This computation then has length linear in n D jwj. If, however, the PDA is nondeterministic, then there may be exponentially many computations accepting a single string. In special cases, this number may even be infinite. In order to keep recognition tractable, we therefore need algorithms that can determine the existence of an accepting computation without considering each computation individually. The central observation is that if we have two configurations . 1 s; i / and . 2 s; i /, with 1 ¤ 2 , then the same sequences of steps may apply in both cases, at least until the stack is popped to expose the differences between 1 and 2 . In other words, computations that increase the length of the stack starting from top-of-stack element s are independent of the stack elements below s , and can be seen as isolated subcomputations. In the below algorithm, we focus on subcomputations that establish that .s; i / `M;w .sr; j /. This will be represented by a pair of pairs of the form ..s; i /; .r; j //, which also implies that . s; i / `M;w . sr; j / for any , in the light of the above observation. Concretely, we simulate a PDA M on input w by the construction of a directed graph PM;w D .VM;w ; EM;w /, with a set VM;w of vertices that is €  ¹0; : : : ; jwjº, and a set EM;w of directed edges defined as the smallest set such that: "

1. if there is a transition ss1    sm 7 ! sr and if there are edges: ..s; i0 /; .s1 ; i1 //; ..s1 ; i1 /; .s2 ; i2 //; :::; ..sm 1 ; im 1 /; .sm ; im // 2 EM;w ;

then also ..s; i0 /; .r; im // 2 EM;w ; and

37. Natural language parsing

1391

a

2. if there is a transition ss1    sm 7 ! sr and if there are edges: ..s; i0 /; .s1 ; i1 //; ..s1 ; i1 /; .s2 ; i2 //; :::; ..sm 1 ; im 1 /; .sm ; im // 2 EM;w ;

with aim C1 D a, then also ..s; i0 /; .r; im C 1// 2 EM;w .

The paths in the graph starting with vertex .s0 ; 0/ tell us precisely which stacks can occur at which input positions: Lemma 3.1. a. If there is a sequence of edges of the form ..s0 ; 0/; .s1 ; i1 //, . . . , ..sm 1 ; im 1 /; .sm ; im // 2 EM;w , then .s0 ; 0/ `M;w .s0 s1    sm ; im /.

b. If .s0 ; 0/ `M;w .s0 s1    sm ; i /, for some s1 ; : : : ; sm 2 € and input position i then there is at least one sequence of edges of the form ..s0 ; 0/; .s1 ; i1 //, . . . , ..sm 1 ; im 1 /; .sm ; im // 2 EM;w , with im D i .

Proof. A proof of (a) can be based on clauses (1) and (2) of the definition of EM;w . By these clauses, one can construct a sequence of sets ; D E0  E1      EjEM;w j D EM;w , each having one more element than the previous set, and this element is derived by either clause from elements in the previous set. One can prove that the invariant as stated in part (a) of the lemma is preserved from one set to the next. Part (b) can be proven by induction on the lengths of computations. The problem of recognition can be solved by looking at existence of one particular edge in the graph: Corollary 3.2. A string w is accepted by M if and only if ..s0 ; 0/; .sf ; jwj// 2 EM;w . Example 3.1. Consider the PDA M with a single state s D s0 D sf and the following transitions: a

s ! ss; " sss ! ss:

Assume the input string is w D aaaa. This string can be accepted by M by means of five different computations. One possible accepting computation consists of the following steps: .s; 0/ `M;w `M;w `M;w `M;w `M;w `M;w `M;w

.ss; 1/; .sss; 2/; .ss; 2/; .sss; 3/; .ssss; 4/; .sss; 4/; .ss; 4/:

Mark-Jan Nederhof and Giorgio Satta

1392

The seven steps above correspond to the construction of the following edges of graph PM;w by the PDA tabulation algorithm, in the given order: ..s; 0/; .s; 1//; ..s; 1/; .s; 2//; ..s; 0/; .s; 2//; ..s; 2/; .s; 3//; ..s; 3/; .s; 4//; ..s; 2/; .s; 4//; ..s; 0/; .s; 4//:

The portion of the graph PM;w associated with the above computation is depicted in Figure 3. The complete graph PM;w constructed by the PDA tabulation algorithm running on M and w is depicted in Figure 4. An alternative way to characterise the construction of EM;w is in terms of a deduction system, as illustrated in Figure 5, which covers transitions reading an input symbol x D a as well as those reading the empty string x D ".

.s; 0/

.s; 1/

.s; 2/

.s; 3/

.s; 4/

Figure 3. The portion of graph PM;w representing an individual computation on string w D aaaa by the PDA M in Example 3.1

.s; 0/

.s; 1/

.s; 2/

.s; 3/

.s; 4/

Figure 4. Graph PM;w constructed by the tabulation algorithm for PDA M in Example 3.1 and string w D aaaa. The graph compactly represents all five computations.

A deduction system consists of a number of inference rules, each having zero or more antecedents and one consequent. The antecedents are written above a horizontal line and the consequent is written below it. The antecedents represent items that

37. Natural language parsing ..s; i0 /; .s1 ; i1 // ..s1 ; i1 /; .s2 ; i2 // :: : ..sm

1 ; im 1 /; .sm ; im //

..s; i0 /; .r; i//

´

1393

x

ss1    sm 7 ! sr 2 ;

.x D " ^ i D im / _ .x D aim C1 ^ i D im C 1/

Figure 5. Construction of the graph in terms of a deduction system, which assumes a fixed PDA with a set  of transitions and a fixed input string a1    an . In this special case, the system has a single inference rule.

have already been derived. In the case at hand, these items are existing edges in the graph. A consequent represents a new item that may be derived from the items in the antecedent. The side conditions, written to the right of the horizontal line, need to be fulfilled for the inference rule to be applicable. By an instantiation of an inference rule we mean a consistent substitution of variables by elements from a presumed set of structures. In the case at hand, these structures are a fixed automaton and a fixed input string, which act as parameters to the deduction system. A deduction system has a natural interpretation as a dynamic programming algorithm. An appropriate kind of indexing needs to be set up in order to detect when there are new instantiations of inference rules that can be applied. With proper care, the time complexity can be kept linear in the number of possible instantiations of inference rules. In the case at hand, this is O.jj  jwjM /, where  is the set of transitions of PDA M and w is the input string. As defined in § 2, M is the length of the longest string s in x transitions s 7 ! sr . A consequence of the above analysis is that the time complexity of this type of PDA recognition is polynomial in the length of the input string, but exponential in the size of the automaton, as the degree of the polynomial is determined by M . However, by a transformation of the PDA one may bring down M to 3, and thereby reduce the time complexity to cubic. This “binarising” transformation proceeds by breaking down long transitions into shorter transitions, each of which pops at most two symbols a from the stack. For example, a transition ss1 s2 s3 7 ! sr would be replaced by two " a transitions s1 s2 s3 7 ! s1 .s2 ; s3 / and ss1 .s2 ; s3 / 7 ! sr , where .s2 ; s3 / stands for a new stack symbol. The sketched transformation can be extended to weighted PDAs, in such a way that weights of computations (and thereby weights of strings) are preserved. Lemma 3.1 and Corollary 3.2 pertain to paths in the graph starting with .s0 ; 0/. However, there may be paths elsewhere in the graph that do not correspond to any computation of the automaton. The reason lies in transitions with left-hand sides of length one, which push one symbol on the stack. The inference rule from Figure 5 in that case simplifies to

Mark-Jan Nederhof and Giorgio Satta

1394

´ x s 7 ! sr 2 ; ..s; i0 /; .r; i // .x D " ^ i D i0 / _ .x D ai0 C1 ^ i D i0 C 1/:

Note that the rule is applicable even if s can never occur on top of the stack at input position i0 , in any computation starting from the initial configuration .s0 ; 0/. However, application of the inference rule is useless in this case, as it can play no role in the acceptance of the input. The running costs of the algorithm could therefore be reduced if we block application of transitions in such cases. This can be done at the expense of a small amount of overhead involved in recording which vertices in the graph correspond to derivable configurations. A practical implementation would in fact not add a vertex until existence of a relevant configuration is established. The refinement that realises these ideas is presented as deduction system in Figure 6. It manipulates a new kind of item .r; j /, which can be derived from an item of the form ..s; i /; .r; j //. The item .s0 ; 0/ is an axiom, that is, it is derived by an inference rule without antecedents. There are now two inference rules manipulating transitions. One is as before, but only deals with transitions whose left-hand sides have length two or greater. The other inference rule deals with transitions whose left-hand sides have length one. Its application is conditional on the existence of an item that shows the relevant stack symbol can occur at the relevant input position. ..s; i/; .r; j // .r; j /

.s0 ; 0/ ..s; i0 /; .s1 ; i1 // ..s1 ; i1 /; .s2 ; i2 // :: : ..sm

1 ; im 1 /; .sm ; im //

..s; i0 /; .r; i// .s; i0 / ..s; i0 /; .r; i//

´

´

x

ss1    sm 7 ! sr 2  ^ m > 1;

.x D " ^ i D im / _ .x D aim C1 ^ i D im C 1/

x

s 7 ! sr 2 ;

.x D " ^ i D i0 / _ .x D ai0 C1 ^ i D i0 C 1/

Figure 6. Refined construction of the graph, which only allows application x of a transition s 7 ! sr that pushes r on top of s at input position i0 if s can be on top of the stack at that input position

Lemma 3.1 and Corollary 3.2 remain valid, and in addition we have: Lemma 3.3. Assume the set EM;w of edges is derived by the refined tabulation algorithm in Figure 6. Then, a. if there is a sequence of edges of the form ..s1 ; i1 /; .s2 ; i2 //, . . . , ..sm 1 ; im 1 /; .sm ; im // 2 EM;w , then .s0 ; 0/ `M;w . s1 ; i1 /, for some 2 €  , and   .s1 ; i1 / `M;w .s1    sm ; im /, and thereby .s0 ; 0/ `M;w . s1 ; i1 / `M;w . s1    sm ; im /I

37. Natural language parsing

1395

b. if .s0 ; 0/ `M;w . s1 ; i / and .s1 ; i / `M;w .s1    sm ; j / for some s1 ; : : : ; sm 2 € and input positions i and j , then there is at least one sequence of edges of the form ..s1 ; i1 /; .s2 ; i2 //, . . . , ..sm 1 ; im 1 /; .sm ; im // 2 EM;w , with i1 D i and im D j . A PDA is said to have the valid prefix property if it does not scan an input string beyond its first error. More precisely, if wv is not accepted by a PDA M, for fixed w and any v , then there is no stack and no input position i > jwj such that .s0 ; 0/ `M;w v . ; i / for any v . The refined tabulation algorithm in Figure 6 preserves the valid prefix property in the sense that the largest i in any ..s; j /; .r; i // that is added to EM;w v equals the largest i such that .s0 ; 0/ `M;w v . ; i /. 3.2. Tabulation of WPDAs. An instantiation of the inference rule from Figure 5 can be seen as a context-free rule, with the consequent as left-hand side, and the antecedents as members in the right-hand side. The collection of all such rules, with start nonterminal ..s0 ; 0/; .sf ; jwj//, forms a context-free grammar that is a compact representation of all possible computations for given PDA M and input string w . Each derivation in the grammar corresponds to one computation of the automaton and vice versa. If M is a weighted PDA, then we can extend this view by adding rules that incorporate the weights of the transitions. More precisely, given a WPDA M D .†; €; s0 ; sf ; ; / over a semiring .W; ˚; ˝; 0; 1/, an input string w with length jwj D n, the WCFG GM;w is defined as .;; NM;w ; SM;w ; RM;w ; M;w /, where NM;w D .€  ¹0; : : : ; nº/  .€  ¹0; : : : ; nº/ [ ¹A j  2 º;

where A are new symbols, SM;w D ..s0 ; 0/; .sf ; n//, and RM;w is defined as follows. x

 For each transition  D .ss1    sm 7 ! sr/ in  and each sequence of input positions i0 , . . . , im such that 0 6 i0 6    6 im 6 n and i such that .x D " ^ i D im / _ .x D aim C1 ^ i D im C 1/, there is a rule ..s; i0 /; .r; i // ! ..s; i0 /; .s1 ; i1 // ..s1 ; i1 /; .s2 ; i2 // :: : ..sm A

1 ; im 1 /; .sm ; im //

in RM;w with weight 1.  For each transition  there is a rule A ! " in RM;w with weight ./.

The motivation for the additional rules A ! " is to ensure that the weights are multiplied in the same order in which they are multiplied in the WPDA. If the semiring is commutative, however, then the additional rules A ! " are not needed, and the above definition of RM;w can be simplified to the following.

1396

Mark-Jan Nederhof and Giorgio Satta x

 For each transition  D .ss1    sm 7 ! sr/ in  and each sequence of input positions i0 , . . . , im such that 0 6 i0 6    6 im 6 n and i such that .x D " ^ i D im / _ .x D aim C1 ^ i D im C 1/, there is a rule ..s; i0 /; .r; i // ! ..s; i0 /; .s1 ; i1 // ..s1 ; i1 /; .s2 ; i2 // :: : ..sm 1 ; im 1 /; .sm ; im //

in RM;w with weight ./. As GM;w has no terminals, the language it generates is ¹"º if w is accepted by M and ; otherwise. The importance of GM;w comes from the following observation: Lemma 3.4. Given WPDA M, input string w and GM;w as above, the weight of w according to M equals the weight of " according to GM;w .

Proof. A straightforward proof starts by defining a bijection from computations for M and w to derivations for GM;w . The next step is to show that the weights on both sides of the bijection are equal. The weight of w according to M is the addition of the weights of the computations for M and w , and the weight of " according to GM;w is the addition of the weights of the derivations of ", by which the statement in the lemma follows. Because the parsing problem of w is already solved with the creation of GM;w , the computation of the weight of w can be solved by adding the weights of all derivations. In general, given a WCFG .†; N; S; R; /; the addition of the weights of all derivations is expressed by the value P .S /. We will refer to P as the partition function. We can extend it to other nonterminals in the grammar, and define P .a/ D 1 for all a 2 †, and for all A 2 N : M O P .A/ D ./  P .Xi /: (1) D.A !X1 Xm / 16i 6m

In words, the value of P for nonterminal A is obtained by adding the values for each rule with left-hand side A. The value for a rule is obtained by multiplying the weight of the rule with the values of P applied to the members in the right-hand side (or with 1 if the right-hand side is empty). If a WCFG is recursive, then the above amounts to a system of non-linear equations. Whether such a system can be solved to give P .S / in closed form depends on the semiring. In the case of the Boolean semiring, the solution can be trivially computed in polynomial time. In the case of .R>0 ; C; ; 0; 1/, the solution can, in general, only be approximated, for example by Newton’s method.  A special case is a WPDA M D .†1 ; €; s0 ; sf ; ; / over a semiring .2†2 ; [; ;  ;; ¹"º/, such that each transition is mapped to a singleton set in 2†2 . The result of parsing an input string w 2 †1 is a WCFG GM;w D .;; NM;w ; SM;w ; RM;w ; M;w / over the same semiring, and each rule in RM;w of the form  D .A ! "/ is mapped to a singleton set M;w ./ D ./ D ¹vº, for some v 2 †2 , and all other rules in RM;w are mapped to 1 D ¹"º.

37. Natural language parsing

1397

0 We can now construct an unweighted CFG G0M;w D .†2 ; NM;w ; SM;w ; RM;w /. 0 For each rule from RM;w of the form  D .A ! "/, with ./ D ¹vº, we let RM;w 0 contain A ! v . The other rules from RM;w are copied unchanged to RM;w , ignoring their weights, which are all 1 D ¹"º. The language generated by the unweighted CFG G0M;w is exactly the value of P .SM;w / in the WCFG GM;w .

Example 3.2. Consider the PPDA M defined by augmenting the transitions of the PDA from Example 3.1 with probabilities from the already mentioned commutative semiring .Œ0; 1; C; ; 0; 1/: a s ! ss; 0:25; " sss ! ss; 0:75: Assume the input string w D aaaa. Since our semiring is commutative, we can parse w under M using the simplified version of the algorithm in this subsection. This results in the PCFG GM;w with start symbol ..s; 0/; .s; 1// and composed by the following rules and associated probabilities: ..s; 0/; .s; 4// ! ..s; 0/; .s; 1// ..s; 1/; .s; 4//; ..s; 0/; .s; 4// ! ..s; 0/; .s; 2// ..s; 2/; .s; 4//; ..s; 0/; .s; 4// ! ..s; 0/; .s; 3// ..s; 3/; .s; 4//;

0:25; 0:25; 0:25;

..s; 0/; .s; 3// ! ..s; 0/; .s; 1// ..s; 1/; .s; 3//; ..s; 0/; .s; 3// ! ..s; 0/; .s; 2// ..s; 2/; .s; 3//;

0:25; 0:25;

..s; 1/; .s; 4// ! ..s; 1/; .s; 2// ..s; 2/; .s; 4//; ..s; 1/; .s; 4// ! ..s; 1/; .s; 3// ..s; 3/; .s; 4//;

0:25; 0:25;

..s; 0/; .s; 2// ! ..s; 0/; .s; 1// ..s; 1/; .s; 2//; ..s; 1/; .s; 3// ! ..s; 1/; .s; 2// ..s; 2/; .s; 3//; ..s; 2/; .s; 4// ! ..s; 2/; .s; 3// ..s; 3/; .s; 4//;

0:25; 0:25; 0:25;

..s; 0/; .s; 1// ! "; 0:75; ..s; 1/; .s; 2// ! "; 0:75; ..s; 2/; .s; 3// ! "; 0:75; ..s; 3/; .s; 4// ! "; 0:75: There is a bijection between the set of trees generated by the grammar GM;w and the set of computations of M on w . Such a correspondence preserves the weights. As an example, the computation of M on w consisting of the steps .s; 0/ `M;w `M;w `M;w `M;w `M;w `M;w `M;w

.ss; 1/; .sss; 2/; .ss; 2/; .sss; 3/; .ssss; 4/; .sss; 4/; .ss; 4/

is represented by the tree depicted in Figure 7. Both have overall weights .:25/3  .:75/4  0:0049438.

Mark-Jan Nederhof and Giorgio Satta

1398

..s; 0/; .s; 4//

..s; 2/; .s; 4//

..s; 0/; .s; 2//

..s; 0/; .s; 1//

..s; 1/; .s; 2//

..s; 2/; .s; 3//

..s; 3/; .s; 4//

"

"

"

"

Figure 7. One of the trees generated by grammar GM;w , representing an individual computation of M on w

Note that the PCFG GM;w is not proper. In the simple case at hand, GM;w is also non-recursive, and therefore it generates only a finite set of trees. Thus, the value of the partition function P ...s; 0/; .s; 4/// can be computed exactly. This value is the probability of aaaa under the source PPDA M. 3.3. Knuth’s optimal-derivation algorithm. The shortest-path algorithm of Dijkstra finds the shortest path in a graph between two vertices, without the need for considering all paths. A generalisation by Knuth finds an optimal derivation in a weighted grammar without the need for considering all derivations. We will briefly discuss Knuth’s algorithm, specialised to applying the partition function P to the start symbol S of a WCFG. We assume a WCFG over an idempotent semiring .W; ˚; ˝; 0; 1/, such that 6W is a total order. This means that ˚ can be seen as a “maximum” function on its arguments. We further assume that ˝ is a superior function on its arguments, which means that it is monotone non-decreasing in each argument; that is, x1 6W x2 implies x1 ˝ y 6W x2 ˝ y and y ˝ x1 6W y ˝ x2 for all x1 ; x2 ; y 2 W, and that x ˝ y 6W x and x ˝ y 6W y for all x; y 2 W. An example of an idempotent semiring in which ˝ is a superior function is .R>0 [ ¹1º; min; C; 1; 0/. As pointed out in § 2, here 6W is the > relation on real numbers. Therefore, ˚ is a maximum function in terms of 6W , while being the minimum function on real numbers. Suppose that we have a probabilistic CFG over the semiring .R>0 ; C; ; 0; 1/, where each rule is mapped by the function  to a probability, that is, a real number in the range Œ0; 1. The same CFG with the same weight function  can also be interpreted in terms of an idempotent semiring .Œ0; 1; max; ; 0; 1/, where C has been replaced by max. It is clear that  is a superior function. With this new semiring, P .S /, the partition function applied to the start symbol, becomes the probability of the most probable derivation, rather than the sum of the probabilities of all derivations.

37. Natural language parsing

1399

Algorithm 1 presents Knuth’s algorithm. In each iteration, the value of P .A/ is determined for a nonterminal A. The set E contains all grammar symbols X for which P .X / has already been determined; this is initially the set of terminals †, as we set P .a/ D 1 for each a 2 †. The set F contains the nonterminals not yet in E that are candidates to be added next. Each nonterminal A in F is such that a derivation from A exists consisting of a rule A ! X1    Xm , and derivations from X1 ; : : : ; Xm matching the values of P .X1 /; : : : ; P .Xm / found earlier. The nonterminal A for which such a derivation has the maximum weight is then added to E. Algorithm 1 Knuth’s generalisation of Dijkstra’s algorithm ED† repeat F D ¹A j A … E ^ 9A ! X1    Xm ŒX1 ; : : : ; Xm 2 Eº if F D ; return 0 for all A 2 M F do ./ ˝ P .X1 / ˝    ˝ P .Xm / T .A/ D D.A!X1 Xm /W X1 ;:::;Xm 2E

choose A 2 F such that T .A/ D P .A/ D T .A/ E D E [ ¹Aº until S 2 E return P .S/

M

T .B/

B2F

A suitable implementation of Knuth’s algorithm considers each symbol in each grammar rule only once, which amounts to O.jGj/ steps; here jGj denotes the size of the grammar measured in the number of symbol occurrences in rules. Added to this L is the cost of selecting A 2 F such that T .A/ D B2F T .B/ in each iteration. By maintaining a priority queue for F across iterations, the time costs are the number of times elements are added to, or removed from, the priority queue, which is bounded by jN j, times the cost of adding and removing one element, which is bounded by log.jN j/ for a priority queue containing up to jN j elements. The total running time is therefore O.jGj C jN j log.jN j//, assuming the addition and multiplication operations of the semiring can be performed in unit time. 3.4. Extensions. The above algorithms were defined for a WPDA M and an input string w . Acceptance was defined in terms of configurations . ; i /, where is a stack and i is an input position. Similarly, the graph PM;w was defined to have vertices of the form .s; i /, where s is a stack symbol and i is an input position. The algorithms can, however, be easily generalised by replacing the linear structure w by a finite automaton A. A configuration is then a pair . ; q/, where q is a state in the finite automaton. Instead of a graph PM;w we may construct a graph PM;A with vertices of the form .s; q/. With Figure 5 as starting point, the edges are derived as in

Mark-Jan Nederhof and Giorgio Satta

1400

Figure 8. (One may alternatively take the refined algorithm from Figure 6 as a starting point if the valid prefix property is to be preserved.) ..s; q0 /; .s1 ; q1 // ..s1 ; q1 /; .s2 ; q2 // :: : ..sm

1 ; qm 1 /; .sm ; qm //

..s; q0 /; .r; q//

8 < ss    s 7 x! sr 2  ; m 1 M : .x D " ^ q D q / _ .x D a ^ q m

a

m

7 ! q 2 A /

Figure 8. Construction of edges in the graph PM;A , assuming a PDA M with a set M of transitions and a finite automaton A with a set A of transitions. In order to keep the discussion simple, we assume there are no epsilon transitions in A.

The recognition problem now amounts to determining whether the languages recognised by M and A have a non-empty intersection. This can be solved much as before, by testing whether PM;A contains an edge of the form ..s0 ; q0 /; .sf ; qf //, where q0 and qf are an initial and a final state of A, respectively. The time complexity of recognition is now O.jM j.jQA jM CjQA jM 1 jA j//, where M is the set of transitions of M, and QA is the set of states of A, and A is its set of transitions. The expression reflects that we can distinguish two cases in Figure 8, one where x D " and no transitions in A are used, and another case where a choice of a transition in A determines the state qm , leaving m more states to be chosen, where m 6 M 1. Before constructing PM;A , we may binarise M in order to reduce the value of M to 3. Much as in the previous sections, we can construct a WCFG GM;A . A minor complication is that we need an extra start symbol S , and additional rules of the form S ! ..s0 ; q0 /; .sf ; qf //, where qf is one of the final states; these rules are all given weight 1. The derivations of GM;A represent computations of M that accept strings that are also accepted by A. If the finite automaton is deterministic (or non-ambiguous in general), then each accepting computation of M is represented by at most one derivation of GM;A . This implies that P .S / equals the addition of the weights of all computations of M that accept strings that are also accepted by A. Further generalisations are possible, for example by letting A be a weighted finite automaton, or letting M be a more general type of automaton capable of handling languages beyond the class of context-free languages. These extensions are beyond the scope of this chapter, however.

4. LR recognition Before the tabulation algorithms from the previous section can be applied, we need to obtain a (W)PDA. In the context of natural language processing, the obvious starting point would be a linguistically motivated grammar, generating the kinds of phrase

37. Natural language parsing

1401

structures that were discussed in the introduction to this chapter. By parsing strategy we mean a general procedure to map a (W)CFG generating a language L to a (W)PDA recognising the same language; in the case of weighted grammars and weighted automata, we also require that the weight of each string is preserved. Among the parsing strategies that we discuss in this chapter, LR parsing is the only one that provides PDAs of a type on which we can readily apply the tabulation algorithms from § 3. For the other strategies, some modifications will be in order, as we will see in following sections. LR parsing has traditionally been used for a class of unambiguous grammars that allow construction of deterministic PDAs. However, linguistically motivated grammars tend to be ambiguous, which means that the resulting automata will be nondeterministic. Let G D .†; N; S; R/ be a CFG. For technical reasons, we assume that there is only one rule in R with left-hand side S , which is denoted S !  , and we assume that S does not appear in the right-hand side of any rule. This is without loss of generality, as a grammar for which this does not hold can be transformed to the required form by introducing a new symbol S Ž , adding the rule S Ž ! S , and setting S Ž to be the new start symbol. The set IG of dotted rules, also called LR.0/ items, is defined as follows: IG D ¹ŒA ! ˛  ˇ j A ! ˛ˇ 2 P º:

(2)

Intuitively, a dotted rule is obtained by inserting a bullet dividing the right-hand side into a prefix and a suffix. The prefix consists of symbols that have already been matched to input that the automaton has read so far. The function closureG is a mapping from 2IG to 2IG . For any s  IG , closureG .s/ is defined to be the smallest set satisfying the following conditions:  s  closureG .s/; and  ŒB !  ˇ 2 closureG .s/ whenever ŒA ! ˛  B  2 closureG .s/ and B ! ˇ 2 R.

We also define function gotoG from 2IG  .† [ N / to 2IG , such that

gotoG .s; X / D ¹ŒA ! ˛X   j ŒA ! ˛  X  2 closureG .s/º:

(3)

Intuitively, the goto function updates the position of the bullets in right-hand sides as the automaton reads input from left to right. The symbol X can be a nonterminal, which must then derive a sequence of input symbols, ending in the current input position. The function is extended in a natural way to allow a string of grammar symbols as second argument, by setting gotoG .s; "/ D s and gotoG .s; X˛/ D gotoG .gotoG .s; X /; ˛/. The set of LR.0/ states, denoted €G , is defined to be the smallest subset of 2IG that satisfies the following conditions:  ¹ŒS !  º 2 €G ; and  s 2 €G whenever gotoG .s 0 ; X / D s ¤ ; for some s 0 2 €G and X 2 † [ N .

Mark-Jan Nederhof and Giorgio Satta

1402

Note that if ŒA ! ˛   2 s , with s 2 €G , then either s D ¹ŒS !  º or ˛ ¤ ". The dotted rules in elements of €G are called kernel items. The dotted rules in closureG .s/ns , for s 2 €G , are called closure items. The set of kernel items is disjoint from the set of closure items. The function redexG from N to €G is defined as follows. For A 2 N , the string s0 s1 s2    sm (m > 0) of elements from €G belongs to redexG .A/ if and only if the following conditions are both satisfied:  ŒA ! ˛  2 closureG .sm / for some ˛ D X1 X2    Xm ;  gotoG .sk 1 ; Xk / D sk for 1 6 k 6 m.

Note that the above conditions imply that ŒA !  X1 X2    Xm  2 closureG .s0 / and ŒA ! X1    Xk  XkC1    Xm  2 sk for 0 < k 6 m. Furthermore, the string X1 X2    Xm is unique for each redex s0 s1 s2    sm , and will be denoted rhs.s0 s1 s2    sm /. Let G D .N; †; R; S / be a CFG. The LR.0/ automaton associated with G is the PDA MG D .IG ; †; ıG ; s0 ; sf /, where s0 D ¹ŒS !  º, sf D gotoG .s0 ; / and ıG includes all and only the following transitions: a

1. s 7 ! s s 0 whenever s 2 IG ^ s 0 D gotoG .s; a/ ¤ ;; and "

2. s 7 ! s s 0 whenever s 2 redexG .A/ ^ s 0 D gotoG .s; A/ ¤ ;.

The transitions of the first form are called shift actions and those of the second form are " called reduce actions. Each reduce action s 7 ! s s 0 with A as above is said to have A ! rhs.s/ as underlying rule. We say a grammar G is LR(0) if MG is a deterministic PDA. A language is said to be deterministic if there is a LR(0) grammar generating that language. The class of LR(k ) grammars, where k > 1 denotes the number of symbols of lookahead, is outside the scope of this chapter. A computation of a PDA implementing the LR parsing strategy corresponds to a right-most derivation of the grammar, but in reverse order. More precisely, if we construct a list of rules underlying reduce actions in an accepting computation, then the mirror image of this list equals the list of rules used in a right-most derivation. For this reason, it is in general not possible to extend the LR parsing strategy to preserve weights in a semiring, unless that semiring is commutative. For commutative semirings, the weight function M of the WPDA is obtained from the weight function G of the a original WCFG as follows. We set M ./ D 1 for each shift action  D .s 7 ! s s 0 /. " For each reduce action  D s 7 ! s s 0 , we copy the weight from the underlying grammar rule  D .A ! rhs.s// by M ./ D G ./. Example 4.1. Consider the CFG G with the following three rules: S ! A; A ! A C A; A ! a:

37. Natural language parsing

1403

The set €G of LR(0) states includes s0 D ¹ŒS !  Aº. By applying the function closureG to s0 , we obtain ¹ŒS !  A; ŒA !  A C A; ŒA !  aº. The function gotoG produces a non-empty set for only two grammar symbols as second argument:

gotoG .s0 ; S / D ;; gotoG .s0 ; A/ D ¹ŒS ! A ; ŒA ! A  CAº; gotoG .s0 ; C/ D ;; gotoG .s0 ; a/ D ¹ŒA ! a º:

The resulting two non-empty sets give us two more states in €G , on which we can again apply the closureG and gotoG functions. By repeating this exhaustively, the entire set €G is obtained, which consists of the states: s0 D ¹ŒS s1 D ¹ŒS s2 D ¹ŒA s3 D ¹ŒA s4 D ¹ŒA

!  Aº; ! A ; ŒA ! A  CAº; ! a º; ! AC  Aº; ! A C A ; ŒA ! A  CAº:

Figure 9 presents the function gotoG as a directed graph with the set of vertices being €G . s1 A

A

C

s0

s3 a

s2

s4

a

+

Figure 9. Function gotoG from Example 4.1 represented as a directed graph over the set of vertices €G

We can now define MG , the PDA that implements the LR(0) parsing strategy for G. The initial stack symbol is s0 and the final stack symbol is s1 . The transitions are a

s0 ! s0 s2 ; C

s1 ! s1 s3 ; a s3 ! s3 s2 ; C

s4 ! s4 s3 ;

"

s0 s2 ! s0 s1 ; "

s3 s2 ! s3 s4 ; " s0 s1 s3 s4 ! s0 s1 ; "

s3 s4 s3 s4 ! s3 s4 :

Note that MG is nondeterministic. For instance, with stack contents s0 s1 s3 s4 and next symbol C in the input, the automaton can apply a shift action C

s4 ! s4 s3

1404

Mark-Jan Nederhof and Giorgio Satta

or a reduce action "

s0 s1 s3 s4 ! s0 s1 :

Assume that the input string is w D a C a C a. Because of the shift-reduce conflict mentioned above, w can be accepted by MG by means of two different computations: .s0 ; 0/ `MG ;w `MG ;w `MG ;w `MG ;w `MG ;w `MG ;w `MG ;w `MG ;w `MG ;w `MG ;w

.s0 s2 ; 1/ .s0 s1 ; 1/ .s0 s1 s3 ; 2/ .s0 s1 s3 s2 ; 3/ .s0 s1 s3 s4 ; 3/ .s0 s1 s3 s4 s3 ; 4/ .s0 s1 s3 s4 s3 s2 ; 5/ .s0 s1 s3 s4 s3 s4 ; 5/ .s0 s1 s3 s4 ; 5/ .s0 s1 ; 5/

.s0 ; 0/ `MG ;w `MG ;w `MG ;w `MG ;w `MG ;w `MG ;w `MG ;w `MG ;w `MG ;w `MG ;w

.s0 s2 ; 1/ .s0 s1 ; 1/ .s0 s1 s3 ; 2/ .s0 s1 s3 s2 ; 3/ .s0 s1 s3 s4 ; 3/ .s0 s1 ; 3/ .s0 s1 s3 ; 4/ .s0 s1 s3 s2 ; 5/ .s0 s1 s3 s4 ; 5/ .s0 s1 ; 5/:

By applying the refined tabulation algorithm (Figure 6) on MG and w , we obtain the graph PMG ;w , which is given in Figure 10. The result of applying a tabulation algorithm from § 3 on a PDA implementing the LR strategy is known as GLR parsing (for generalised LR).

5. Earley’s algorithm For a somewhat simpler kind of pushdown automaton, we turn to top-down parsing. In our formulation, the stack symbols are dotted rules, rather than sets of dotted rules as in the case of LR parsing. Instead of the closure function we have a predictor transition, which produces only one new dotted rule from an existing one. Because stack symbols are single dotted rules, we do not need the goto function. When one more symbol in the right-hand side of a rule has been recognised, we do not push a new stack symbol as in the case of LR parsing, but replace the current dotted rule by another dotted rule that results from shifting the position of the bullet one place to the right. As a consequence, there is no more need for reduce actions that pop several symbols at once, and instead

37. Natural language parsing

1405

.s3 ; 4/ .s0 ; 0/

.s1 ; 1/

.s3 ; 2/

.s2 ; 1/

.s1 ; 3/

.s1 ; 5/

.s2 ; 3/

.s2 ; 5/

.s4 ; 3/

.s4 ; 5/

Figure 10. Graph PMG ;w constructed by the refined tabulation algorithm for string w D aCaCa and PDA MG from Example 4.1

there is a completer transition, which shortens the length of the stack by only one symbol. The transitions are formalised below, in terms of dotted rules, obtained from the set R of rules of a fixed CFG, plus an additional stack symbol ?, which we need for technical reasons. The initial stack symbol is ? and the final stack symbol is ŒS !  , where S !  is the (unique) rule with the start symbol S in the left-hand side. There are four types of transitions: "

initialiser: ? 7 ! ? ŒS !  ; "

predictor: ŒA ! ˛  B  7 ! ŒA ! ˛  B  ŒB !  ˇ, where .A ! ˛B /; .B ! ˇ/ 2 R; a

scanner: ŒA ! ˛  a  7 ! ŒA ! ˛a  , where .A ! ˛a / 2 R; "

completer: ŒA ! ˛  B  ŒB ! ˇ  7 ! ŒA ! ˛B  , where .A ! ˛B /; .B ! ˇ/ 2 R.

The scanner and completer transitions do not conform, however, to the definition of PDAs in § 2. One solution is to interpret a scanner transition as an abbreviation for a set of transitions of the form s ŒA ! ˛  a  `a s ŒA ! ˛a   for all choices of s , and similarly, to interpret a completer transition as an abbreviation for a set of transitions of the form s ŒA ! ˛  B  ŒB ! ˇ  `" s ŒA ! ˛B   for all choices of s . It is not difficult to see that the only relevant choices for s are dotted rules of the form ŒD ! ı1  Aı2 , or ? if A ! ˛B is S !  .

Mark-Jan Nederhof and Giorgio Satta

1406

With the resulting transitions we can apply the algorithms from § 3. In particular, if we specialise the refined deduction system from Figure 6 to the types of transitions derived above, we obtain the system in Figure 11. a.

.?; 0/

b.

..s; i /; .r; j // .r; j /

c.

.?; i0 / ..?; i0 /; .ŒS

d.

.ŒA ! ˛ ..ŒA ! ˛

e.

..s; i /; .ŒA ! ˛ ..s; i /; .ŒA ! ˛a

f.

..s; i /; .ŒA ! ˛ ..ŒA ! ˛ ..s; i /; .ŒA ! ˛B

0 //

®

C 1//

®

.B ! ˇ/ 2 R

a D aj C1



Figure 11. First attempt to define a deduction system for top-down recognition

This deduction system allows a number of simplifications. First, the only purpose of inference rule (a) is to provide the antecedent in inference rule (c), and the value of i0 therein is always 0. Therefore these two inference rules can be combined into the axiom . ..?; 0/; .ŒS !  ; 0// Similarly, the only purpose of inference rule (b) is to provide items of the form .ŒA ! ˛  B ; i / to inference rule (d). Both rules can be combined into ® ..s; j /; .ŒA ! ˛  B ; i // .B ! ˇ/ 2 R: ..ŒA ! ˛  B ; i /; .ŒB !  ˇ; i // The most important simplification follows from the observation that in an item ..s; j /.r; i //, the symbol s is never needed. In particular, in inference rule (f), the second antecedent ..ŒA ! ˛  B ; j /; .ŒB ! ˇ ; k// implies the existence of a subderivation from symbols in ˇ covering the input string from position j to position k . If there is another item ..s 0 ; i 0 /; .ŒA0 ! ˛ 0  B 0 ; j //, then at some point the item ..ŒA0 ! ˛ 0  B 0 ; j /; .ŒB ! ˇ ; k// will also be derived. By omitting s from ..s; j /.r; i // and writing the remaining three components as a triple, we obtain the deduction system in Figure 12, which is better known as Earley’s algorithm.

37. Natural language parsing Initializer.

Predictor.

1407

.0; ŒS .j; ŒA ! ˛ .i; ŒB

Scanner.

.i; ŒA ! ˛ .i; ŒA ! ˛a

Completer.

.i; ŒA ! ˛ .j; ŒB ! .i; ŒA ! ˛B

®

.B ! ˇ/ 2 R

C 1/

®

a D aj C1

Figure 12. Earley’s algorithm

The construction above of the PDA that implements the top-down parsing strategy can be straightforwardly extended to weighted automata, assuming a WCFG as input to the construction. The weight of a predictor transition is the weight of the grammar rule that is predicted. The weight of the initialiser transition is the weight of S !  . The weights of all other transitions are 1. Note that the semiring need not be commutative, in contrast to the LR parsing strategy.

6. Cocke–Younger–Kasami algorithm The next parsing strategy we consider is (pure) bottom-up parsing. The most straightforward formulation is available if we assume the CFG is in Chomsky normal form, which means that each context-free rule is either of the form A ! a, where a 2 †, or of the form A ! BC , where B; C 2 N . The stack symbols are nonterminals, plus the extra symbol ?, which we need for technical reasons. The initial stack symbol is ?, and the final stack symbol is the start symbol S of the grammar. There are two types of transitions: a

 s 7 ! sA, where .A ! a/ 2 R, for any stack symbol s ; "  sBC 7 ! sA, where .A ! BC / 2 R, for any stack symbol s .

Note that if we omitted the symbol s in the above, then the transitions would not conform to the definition of PDAs in § 2. If we specialise the deduction system from Figure 5 to the types of transitions above, we obtain the system in Figure 13. Much as in the previous section, the symbol s in items ..s; j /; .r; i // is never needed. In particular, in inference rule (b), the second antecedent ..B; j /; .C; k// implies the existence of a subderivation from C covering the input string from position j to position k . If there is another item ..s 0 ; i 0 /; .B 0 ; j //, then at some point the item ..B 0 ; j /; .C; k// will also be derived.

Mark-Jan Nederhof and Giorgio Satta

1408 a.

..s; i /; .A; i C 1//

®

.A ! ai C1 / 2 R

..s; i /; .B; j // ..B; j /; .C; k// ® .A ! BC / 2 R ..s; i /; .A; k//

b.

Figure 13. First attempt to define a deduction system for bottom-up recognition

By omitting s from ..s; j /; .r; i // and writing the remaining three components as a triple, we obtain the deduction system in Figure 14. The items are commonly derived in such a way that each .i; A; j / is obtained before any .i 0 ; A0 ; j 0 / with j < j 0 . Further, each .i; A; j / is obtained before any .i 0 ; A0 ; j / with i 0 < i . Typical implementations use three nested loops for inference rule (b), with the outer loop for increasing k , a loop therein for decreasing i , and a loop therein for increasing j . This is better known as the Cocke–Younger–Kasami algorithm. As in the case of LR parsing, the extension to weighted grammars and weighted automata requires a commutative semiring. This is because the order in which PDA transitions are applied in a computation differs from the order in which grammar rules are applied in a corresponding left-most derivation. With a commutative semiring, each of the two types of transitions adopts the weight of the underlying grammar rule. a.

b.

.i; A; i C 1/

®

.A ! ai C1 / 2 R

.i; B; j / .j; C; k/ ® .A ! BC / 2 R .i; A; k/

Figure 14. Cocke–Younger–Kasami recognition

7. Bibliographic notes Application of semirings to parsing has its origins in [3], [47], [31], [19], [28], [27], and [14]. There is an intimate connection with formal power series [24]. General books on the use of probabilities in natural language processing are [9], [32], and [8]. A good overall textbook on NLP is [20]. Formal properties of context-free grammars are discussed, amongst others, in [18] and [45]. Probabilistic CFGs that are not proper can be made proper without changing the weights they assign to strings. This issue is addressed in [48], [1], [36], and [37]. Much of our treatment of probabilistic parsing can be extended to more powerful grammatical formalisms, see [39] and [41]. Application of semirings with a focus on finite-state transducers is discussed in [34].

37. Natural language parsing

1409

Reduction of PDAs is very similar to reduction of CFGs. The latter can be carried out in linear time in the size of a CFG, as shown in [45]. This linear-time complexity result carries over to reduction of PDAs. Probabilistic PDAs were defined, for example, in [40]. Extending parsing strategies to be probabilistic was investigated in [38], where it was shown that top-down parsing and LR parsing are incomparable in their power to describe probability distributions on strings. Also see [1] for observations on the power of bottom-up parsing. We have shown how tabular parsing algorithms can be derived from parsing strategies, via nondeterministic PDAs. What is described in this chapter is based on ideas from [2], [25], and [7]. An alternative approach to obtain tabular parsing algorithms is memoisation of functions [29]. Deduction systems for the description of tabular parsing algorithms have been used in [43]; also see [44]. An early paper that addresses abstract specifications of parsing algorithms is [11]. For an analysis of the time complexity of deduction systems, see [33]. A compact representation of all computations of a PDA on a string input is related to a compact representation of all derivations of a CFG. The idea of such parse forests goes back to [6], which in fact showed the existence of the representation for input that is a finite automaton instead of a string. This generalisation was already briefly mentioned in § 3.4. For further discussion of structures related to parse forests, see [10], [42], [49], and [30]. Similar ideas for richer formalisms than CFGs are due to [51], [26], [5], and [52]. Solving a system of non-linear equations derived from a weighted CFG is addressed in [13]. Knuth’s algorithm was introduced in [23]. Also see [21] and [35]. Further reading on LR parsing can be found in [22] and [46]. The number of LR(0) states grows exponentially with the size of the grammar, as shown in Proposition 6.46 of [46]. The term “GLR parsing” is due to [49] and [50]. Earley’s algorithm and variants of it were introduced in [12], [4], [15], and [16]. Further treatment of bottom-up parsing can be found in [45]. For the Cocke– Younger–Kasami algorithm, also see [53] and [4]. The Chomsky normal form is discussed in [17], which points out that known transformations may increase the size of the grammar quadratically.

References [1] S. Abney, D. McAllester, and F. Pereira, Relating probabilistic grammars and automata. In Proceedings of the 37 th Annual Meeting of the Association for Computational Linguistics. Held in College Park, Maryland, June 20–26, 1999. Association for Computational Linguistics, 1998, 542–549. q.v. 1408, 1409 [2] A. Aho, J. Hopcroft, and J. Ullman, Time and tape complexity of pushdown automaton languages. Information and Control 13 (1968), 186–206. Zbl 0257.68065 q.v. 1409

1410

Mark-Jan Nederhof and Giorgio Satta

[3] A. Aho and T. Peterson, A minimum distance error-correcting parser for context-free languages. SIAM J. Comput. 1 (1972), no. 4, 305–312. MR 0315937 Zbl 0241.68038 q.v. 1408 [4] A. Aho and J. Ullman, Parsing. The theory of parsing, translation, and compiling. Vol. I: Parsing. Prentice-Hall Series in Automatic Computation. Prentice-Hall, Englewood Cliffs, N.J., 1972. MR 0408321 q.v. 1409 [5] M. A. Alonso Pardo, M.-J. Nederhof, and E. Villemonte de la Clergerie, Tabulation of automata for tree-adjoining languages. Grammars 3 (2000), no. 2–3, 89–110. Sixth Meeting on Mathematics of Language (MOL6, Orlando, FL, 1999). MR 1811020 q.v. 1409 [6] Y. Bar-Hillel, M. Perles, and E. Shamir, On formal properties of simple phrase structure grammars. In Language and information (Y. Bar-Hillel, ed.). Selected essays on their theory and application. Chapter 9. Addison-Wesley, Reading, Massachusetts, 1964, 116–150. q.v. 1409 [7] S. Billot and B. Lang, The structure of shared forests in ambiguous parsing. In ACL ’89: Proceedings of the 27 th annual meeting on Association for Computational Linguistics. Vancouver, June 1989. Association for Computational Linguistics, 1989, 143–151. q.v. 1409 [8] R. Bod, J. Hay, and S. Jannedy (eds.), Probabilistic linguistics. Papers from the Symposium “Probability Theory in Linguistics” held in Washington, DC, January 2001. A Bradford Book. MIT Press, Cambridge, MA, 2003. MR 1994896 Zbl 1059.68142 q.v. 1408 [9] E. Charniak, Statistical language learning. MIT Press, Cambridge, MA, 1993. q.v. 1408 [10] J. Cocke and J. Schwartz, Programming languages and their compilers – preliminary notes. Second revised version. Courant Institute of Mathematical Sciences, New York University, April 1970, 184–206. q.v. 1409 [11] S. Cook, Path systems and language recognition. In STOC ’70: Proceedings of the second annual ACM symposium on Theory of computing (P. C. Fischer, R. Fabian, J. D. Ullman, and R. M. Karp, editors.) Association for Computing Machinery, New York, 1970, 70–72. q.v. 1409 [12] J. Earley, An efficient context-free parsing algorithm. Comm. ACM 13 (1970), no. 2, 94–102. Zbl 0185.43401 q.v. 1409 [13] K. Etessami and M. Yannakakis, Recursive Markov chains, stochastic grammars, and monotone systems of nonlinear equations. In STACS 2005 (V. Diekert and B. Durand, eds.). Proceedings of the 22nd Annual Symposium on Theoretical Aspects of Computer Science held in Stuttgart, February 24–26, 2005. Lecture Notes in Computer Science, 3404. Springer, Berlin, 2005, 340–352. MR 2151630 Zbl 1118.68497 q.v. 1409 [14] J. Goodman, Semiring parsing. Comput. Linguist. 25 (1999), no. 4, 573–605. MR 1746110 q.v. 1408 [15] S. Graham and M. Harrison, Parsing of general context free languages. Adv. Comput. 14 (1976), 77–185. q.v. 1409 [16] S. Graham, M. Harrison, and W. Ruzzo, An improved context-free recognizer. ACM Trans. Prog. Lang. Syst. 2 (1980), 415–462. Zbl 0461.68084 q.v. 1409 [17] M. Harrison, Introduction to formal language theory. Addison-Wesley Publishing Co., Reading, MA, 1978. MR 0526397 Zbl 0411.68058 q.v. 1409 [18] J. Hopcroft and J. Ullman, Introduction to automata theory, languages, and computation. Addison-Wesley Series in Computer Science. Addison-Wesley Publishing Co., Reading, MA, 1979. MR 0645539 Zbl 0426.68001 q.v. 1408

37. Natural language parsing

1411

[19] F. Jelinek, J. Lafferty, and R. Mercer, Basic methods of probabilistic context free grammars. In Speech recognition and understanding (P. Laface and R. De Mori, eds.). Recent advances, trends and applications. Proceedings of the NATO Advanced Study Institute, held in Cetraro, Italy, July 1-13, 1990. NATO ASI Series. Series F. Computer and Systems Sciences, 75. Springer, Berlin, 1992, 345–360. q.v. 1408 [20] D. Jurafsky and J. Martin, Speech and language processing. Second ediction. Pearson, London, 2008. q.v. 1408 [21] D. Klein and C. Manning, Parsing and hypergraphs. In Proceedings of the Seventh International Workshop on Parsing Technologies. Held inn Beijing, October 2001. Tsinghua University Press, 123–134. q.v. 1409 [22] D. E. Knuth, On the translation of languages from left to right. Information and Control 8 (1965), 607–639. MR 0189942 Zbl 0231.68027 q.v. 1409 [23] D. E. Knuth, A generalization of Dijkstra’s algorithm. Information Processing Lett. 6 (1977), no. 1, 1–5. MR 0455525 Zbl 0363.68056 q.v. 1409 [24] W. Kuich, Semirings and formal power series: their relevance to formal languages and automata. In Handbook of formal languages (G. Rozenberg and A. Salomaa, eds.). Vol. 1. Word, language, grammar. Springer, Berlin, 1997, 609–677. MR 1470001 q.v. 1408 [25] B. Lang, Deterministic techniques for efficient non-deterministic parsers. In Automata, languages and programming (J. Loeckx, ed.). 2nd Colloquium, University of Saarbrücken, July 29–August 2, 1974. Springer, Berlin, 1974. 255–269. MR 0428824 Zbl 0299.68020 q.v. 1409 [26] B. Lang, Recognition can be harder than parsing. Comput. Intell. 10 (1994), no. 4, 486–494. q.v. 1409 [27] A. Lavie, An integrated heuristic scheme for partial parse evaluation. In 32 nd Annual Meeting of the ACL, Held in Las Cruces, New Mexico, June 1994. Association for Computational Linguistics, 1994, 316–318. q.v. 1408 [28] A. Lavie and M. Tomita, GLR – an efficient noise-skipping parsing algorithm for contextfree grammars. In Recent advances in parsing technology (H. Bunt and M. Tomita, eds.). Chapter 10. Text, Speech and Language Technology, 1. Kluwer Academic Publishers, Dordrecht, 1996, 183–200. q.v. 1408 [29] R. Leermakers, The functional treatment of parsing. The Kluwer International Series in Engineering and Computer Science. 242. Kluwer Academic Publishers, Dordrecht, 1993. Zbl 0785.68056 q.v. 1409 [30] H. Leiss, On Kilbury’s modification of Earley’s algorithm. ACM Trans. Prog. Lang. Syst. 12 (1990), 610–640. q.v. 1409 [31] G. Lyon, Syntax-directed least-errors analysis for context-free languages: a practical approach. Comm. ACM 17 (1974), no 1, 3–14. q.v. 1408 [32] C. Manning and H. Schütze, Foundations of statistical natural language processing. MIT Press, Cambridge, MA, 1999. MR 1722790 Zbl 0951.68158 q.v. 1408 [33] D. McAllester, On the complexity analysis of static analyses. J. ACM 49 (2002), no. 4, 512–537. MR 2146459 Zbl 1326.68102 q.v. 1409 [34] M. Mohri, Statistical natural language processing. In Applied combinatorics on words (M. Lothaire, ed.). Chapter 4. A collective work by J. Berstel, D. Perrin, M. Crochemore, E. Laporte, M. Mohri, N. Pisanti, M.-F. Sagot, G. Reinert, S. Schbath, M. Waterman, Ph. Jacquet, W. Szpankowski, D. Poulalhon, G. Schaeffer, R. Kolpakov, G. Koucherov,

1412

[35] [36] [37]

[38] [39]

[40] [41] [42] [43] [44] [45] [46] [47]

[48] [49] [50]

Mark-Jan Nederhof and Giorgio Satta J.-P. Allouche, and V. Berthé (eds.). With a preface by J. Berstel and D. Perrin. Encyclopedia of Mathematics and its Applications, 105. Cambridge University Press, Cambridge, 2005, 210–240. q.v. 1408 M.-J. Nederhof, Weighted deductive parsing and Knuth’s algorithm. Comput. Linguist. 29 (2003), no. 1, 135–143. MR 2112909 Zbl 1234.68426 q.v. 1409 M.-J. Nederhof and G. Satta, Probabilistic parsing as intersection. In Proceedings of the Eighth International Conference on Parsing Technologies. LORIA, INRIA, Nancy, 2003. Association for Computational Linguistics, 2003, 137–148. q.v. 1408 M.-J. Nederhof and G. Satta, Estimation of consistent probabilistic context-free grammars. In Proceedings of the Human Language Technology Conference of the NAACL, Main Conference. New York, June 2006. Association for Computational Linguistics, 2006, 343–350. q.v. 1408 M.-J. Nederhof and G. Satta, Probabilistic parsing strategies. J. ACM 53 (2006), no. 3, 406–436. MR 2238951 Zbl 1326.68171 q.v. 1409 P. Resnik, Probabilistic tree-adjoining grammar as a framework for statistical natural language processing. In COLING ’92: Proceedings of the 14th conference on Computational linguistics. Volume 2. Nantes, August 1992. Association for Computational Linguistics, 1992, 418–424. q.v. 1408 E. Santos, Probabilistic grammars and automata. Information and Control 21 (1972), 27–47. MR 0323477 Zbl 0243.94053 q.v. 1409 Y. Schabes, Stochastic lexicalized tree-adjoining grammars. In Proc. Fifteenth International Conference on Computational Linguistics, volume 2, Nantes, August. 1992. Association for Computational Linguistics, 1992, 426–432. q.v. 1408 B. Sheil, Observations on context-free parsing. In Statistical Methods in Linguistics. Skriptor, Stockholm, 1976, 71–109. q.v. 1409 S. Shieber, Y. Schabes, and F. Pereira, Principles and implementation of deductive parsing. J. Logic Programming, 24:3–36, 1995. q.v. 1409 K. Sikkel, Parsing schemata. Texts in Theoretical Computer Science. Springer, Berlin, 1997. Zbl 0876.68071 q.v. 1409 S. Sippu and E. Soisalon-Soininen, Parsing theory. Vol. I. Languages and parsing. EATCS Monographs on Theoretical Computer Science, 15. Springer, Berlin, 1988. MR 0960693 Zbl 0651.68007 q.v. 1408, 1409 S. Sippu and E. Soisalon-Soininen, Parsing theory. Vol. II. LR.k/ and LL.k/ parsing. EATCS Monographs on Theoretical Computer Science, 20. Springer, Berlin, 1990. MR 1080358 Zbl 0703.68071 q.v. 1409 R. Teitelbaum, Context-free error analysis by evaluation of algebraic power series. In Proceedings of the 5 th Annual ACM Symposium on Theory of Computing (A. V. Aho, A. Borodin, R. L. Constable, R. W. Floyd, M. A. Harrison, R. M. Karp, and H. R. Strong, eds.) STOC 1973. Held in Austin, TX, 1973. Association for Computing Machinery, New York, 1973, 196–199. q.v. 1408 R. Thompson, Determination of probabilistic grammars for functionally specified probability-measure languages. IEEE Trans. Comput. C-23 (1974), no. 6, 603–614. MR 0426521 Zbl 0287.68050 IEEEXplore 1672594 q.v. 1408 M. Tomita, Efficient parsing for natural language. Kluwer Academic Publishers, Dordrecht, 1986. q.v. 1409 M. Tomita, An efficient augmented-context-free parsing algorithm. Comput. Linguist. 13 (1987), 31–46. q.v. 1409

37. Natural language parsing

1413

[51] K. Vijay-Shanker and D. Weir, The use of shared forests in tree adjoining grammar parsing. In Sixth Conference of the European Chapter of the Association for Computational Linguistics. Held in Utrecht, The Netherlands, 1993. Association for Computational Linguistics, 384–393. q.v. 1409 [52] E. Villemonte de la Clergerie and F. Barthélemy, Information flow in tabular interpretations for generalized push-down automata. Theoret. Comput. Sci. 199 (1998), no. 1–2, 167–198. MR 1622958 Zbl 0983.68103 q.v. 1409 [53] D. Younger, Recognition and parsing of context-free languages in time n3 . Information and Control 10 (1967), 189–208. Zbl 0149.24803 q.v. 1409

Chapter 38

Verification Javier Esparza, Orna Kupferman, and Moshe Y. Vardi

Contents 1. 2. 3. 4. 5.

Introduction . . . . . Linear-time logics . . Applications . . . . . Branching-time logics Applications . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

1415 1417 1422 1429 1439

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1450

1. Introduction This chapter describes the automata-theoretic approach to the satisfiability and modelchecking problems for temporal logics. In a nutshell, the approach reduces these problems to standard decision problems about automata, like nonemptiness, language containment, or membership (whether a given object is accepted by a given automaton). These problems are solved using results of automata theory, which leads to algorithms for satisfiability and model-checking. Temporal logics are modal logics for the description of the temporal ordering of events. They have become one of the most popular and flexible formalisms for specifying the behaviour of reactive systems, see [53] and [46]. In the early 1980s, algorithmic methods were proposed for automatically verifying temporal logic properties of finite-state programs, see [56], [44], [12], and [79]. (A state of a program is a complete description of its status, including the assignment of values to variables, the value of the program counter, and the like. Finite-state programs have finitely many possible states. Many hardware designs, synchronisation and communication protocols, abstract versions of device drivers and many other systems can be modeled as finitestate programs.) The behaviour of a finite-state program can be formalised as a finite propositional Kripke structure, and its desired behaviour as a formula of propositional temporal logic. In order to verify the correctness of the program, one checks that its associated Kripke structure is a model of (satisfies) the formula. In other words, the problem of verifying whether a given finite-state program behaves as desired is reduced to the model-checking problem for the temporal logic. Extensive introductions to model checking can be found in [13] and [1].

1416

Javier Esparza, Orna Kupferman, and Moshe Y. Vardi

Temporal logics can describe time as linear or branching. In linear-time logics, each moment in time has a unique possible future, while in branching-time logics, each moment in time may split into several possible futures. (For an extensive discussion of various temporal logics, see [20].) For both types, a close and fruitful connection with the theory of automata on infinite structures has been developed. The central idea is to associate with each temporal logic formula a finite automaton on infinite structures recognising the models of the formula. For linear temporal logic the structures are infinite words (see [65], [45], [68], and [81]), while for branching temporal logic the structures are infinite trees, see [25], [71], [19], [80], and [22]. Once this has been achieved, the satisfiability problem for a logic reduces to the nonemptiness problem for its corresponding class of automata. The model-checking problem reduces to the language-containment problem or the membership problem, depending on the logic. In the 1980s, and the first half of the 1990s, the literature produced direct translations from temporal logic formulas to nondeterministic automata (cf. [80], [81], and [29]). However, for branching time logics this translations did not lead to asymptotically optimal algorithms: in particular, algorithms for branching-time logics derived from these translations were exponential, while other approaches only required linear time. Work carried out since the second half of the 1990s has solved this problem by splitting the translation into two steps: a first translation of temporal formulas into alternating automata, followed by a translation of alternating into nondeterministic automata, see [76] and [42]. The existential and universal states of alternating automata match the disjunctive and conjunctive operators of the logic, which makes the first translation simple and succinct: the size of the alternating automaton is linear in the size of the formula, see [49], [42], [23], and [76]. The two steps also decouple the logical and the combinatorial part of the problem: the translations from formulas to automata handle the logical part, while the combinatorics are handled by automata constructions. The chapter is divided into two parts, corresponding to linear-time and branchingtime logics. In the first part we present a translation of the logic LTL [54] into alternating Büchi word automata. The second part contains translations of the logics CTL, CTL? , and the propositional -calculus into different classes of symmetric alternating tree automata: weak automata for CTL, hesitant automata for CTL? , and parity automata for the -calculus. Historical note. The connection between logic and automata goes back to work in the early 1960s (see [8], [18], and [75]) on monadic second-order logic and automata over finite words. This was extended in [9] to infinite words, in [16] and [73] to finite trees, and in [57] to infinite trees. As temporal logics can be expressed in first-order or monadic second-order logic (see [35] and [31]), the connection between monadic second-order logic and automata yields a connection between temporal logics and automata. Developing decision procedures that go via monadic second-order logic was a standard approach in the 1970s; see [27]. A direct translation to automata was proposed first in [70] in the context of propositional dynamic logic. A direct translation from temporal logic to automata was first given in [83] (also see [81] for linear time and [78] for branching time). The translation to alternating automata was first proposed in [49] and pursued further in [76], [77], and [42].

38. Verification

1417

2. Linear-time logics 2.1. Linear temporal logic. The logic LTL is a linear temporal logic [54]. Formulas of LTL are constructed from a set AP of atomic propositions using the usual Boolean operators and the temporal operators X (“next time”) and U (“until”). Formally, an LTL formula over AP is defined as follows:  true, false, or p , for p 2 AP;  : 1 , 1 ^ 2 , X 1 , or 1 U

2,

where

1

and

2

are LTL formulas.

The logic LTL is used for specifying properties of reactive systems. The systems are modeled by Kripke structures, and the semantics of LTL is defined with respect to infinite computations, modeled by infinite paths in Kripke structures. Formally, a Kripke structure is K D hAP; W; R; W0 ; `i, where AP is the set of atomic propositions, W is a set of states, R  W  W is a total transition relation (that is, for every state w 2 W there is at least one state w 0 2 W such that R.w; w 0 /), W0  W is a set of initial states, and `W W ! 2AP maps each state to the set of atomic propositions that hold in this state. A path of the Kripke structure K is an infinite sequence w0 ; w1 ; : : : of states such that w0 2 W0 and R.wi ; wi C1 / for all i > 0. A computation over AP is an infinite word over the alphabet 2AP , namely a sequence of truth assignments to the atomic propositions in AP. Every path w0 ; w1 ; : : : of K induces the computation `.w0 /; `.w1 /; : : : of K . Consider a computation  D 0 ; 1 ; 2 ; : : :, where for every j > 0, the set j  AP is the set of atomic propositions that hold in the j -th position of  . We denote the suffix j ; j C1 ; : : : of  by  j . We write  ˆ to denote that the computation  satisfies the LTL formula . The relation ˆ is inductively defined as follows:      

for all  , we have  ˆ true and  6ˆ false; for an atomic proposition p 2 AP, we have  ˆ p if and only if p 2 0 ;  ˆ : 1 if and only if  6ˆ 1 ;  ˆ 1 ^ 2 if and only if  ˆ 1 and  ˆ 2 ;  ˆ X 1 if and only if  1 ˆ 1 ;  ˆ 1 U 2 if and only if there exists k > 0 such that  k ˆ 2 and  i ˆ for all 0 6 i < k .

1

Each LTL formula over AP defines a language L  .2AP /! of the computations that satisfy , Formally, L D ¹ 2 .2AP /! j ˆ

º:

We use the following abbreviations in writing formulas:  _; !, and $, interpreted in the usual way;  1 R 2 D :..: 1 /U.: 2 //. That is, 1 R 2 is such that the operator R (“release”) dualizes the operator U ;  F D trueU (“eventually”, where “F ” stands for “future”);  G D :F : (“always”, where “G ” stands for “globally”). Equivalently, G D falseR .

1418

Javier Esparza, Orna Kupferman, and Moshe Y. Vardi

Example 2.1. We use LTL to formalise some desirable properties of a mutual exclusion algorithm for two processes, Process 0 and Process 1. Let AP contain the atomic propositions (with i 2 ¹0; 1º) csi (Process i is in its critical section) and tryi (Process i tries to enter its critical section).  The mutual exclusion property states that Process 0 and Process 1 are never simultaneously in their critical sections. We can express it using the LTL formula me D G..:cs0 / _ .:cs1 //:

Note we could have used several equivalent formulas, like G.cs0 ! :cs1 / or :F .cs0 ^ cs1 /. The corresponding language Lme contains all computations having no occurrences of letters (elements of 2AP ) containing both cs0 and cs1 . Thus, Lme D ¹ 2 .2AP /! j for all j > 0, we have cs0 … j or cs1 … j º:

 The finite waiting property for Process i states that if Process i tries to access its critical section, it eventually will. In LTL, we have i

fw

D G.tryi ! F csi /:

The corresponding language Lifw contains all computations in which every occurrence of a letter containing tryi is followed later by an occurrence of some letter containing csi . Lifw D ¹ 2 .2AP /! j for all j > 0, if tryi 2 j , then there is k > j such that csi 2 k /º:

 The access only after trying property for Process i states that Process i enters its critical section only after it has tried to enter it. In LTL, i

at

D ..:csi /U tryi / _ G:csi :

The corresponding language Liat contains all computations in which every occurrence of a letter containing csi is preceded by an occurrence of some letter containing tryi . Liat D ¹ 2 .2AP /! j for all j > 0, if csi 2 j , then there is k 6 j such that tryi 2 k º:

2.2. Alternating Büchi word automata. For a finite alphabet †, a word w D 0 1    is a (finite or infinite) sequence of letters from †. A property of a system with a set AP of atomic propositions can be viewed as a language over the alphabet 2AP . We have seen in § 2.1 that LTL can be used to formalise properties. Another way to define properties is to use automata. A nondeterministic finite automaton is a tuple A D h†; Q; Q0 ; ı; ˛i, where † is a finite nonempty alphabet, Q is a finite nonempty set of states, Q0  Q is a nonempty set of initial states, ıW Q  † ! 2Q is a transition function, and ˛ is an acceptance condition.

38. Verification

1419

Intuitively, when an automaton A runs on an input word over †, it starts in one of the initial states, and it proceeds along the word according to the transition function. Thus, ı.q; / is the set of states that A can enter when it is in state q and it reads the letter  . Note that the automaton may be nondeterministic, since it may have many initial states and the transition function may specify many possible transitions for each state and letter. The automaton A is deterministic if jQ0 j D 1 and jı.q; /j 6 1 for all states q 2 Q and symbols  2 †. Formally, a run r of A on a finite word w D 1    n 2 † is a sequence q0 ; q1 ; : : : ; qn of n C 1 states in Q such that q0 2 Q0 , and qi C1 2 ı.qi ; i C1 / for all 0 6 i < n. Note that a nondeterministic automaton can have many runs on a given input word; in contrast, a deterministic automaton can have at most one. If the input word is infinite, then a run of A on it is an infinite sequence of states. The acceptance condition ˛ determines which runs are accepting. For automata on finite words, ˛  Q and a run r is accepting if qn 2 ˛ . Otherwise, r is rejecting. For automata on infinite words, one can consider several acceptance conditions. In the Büchi acceptance condition, ˛  Q, and a run r is accepting if it visits some state in ˛ infinitely often. Formally, let inf .r/ D ¹qW qi D q for infinitely many i º. Then, r is accepting if and only if inf .r/ \ ˛ ¤ ;. A nondeterministic automaton A accepts a word w if there is an accepting run of A on w . A universal automaton has the same components as a nondeterministic one, but it accepts a word w if all its runs on w are accepting. We now turn to define alternating automata. We first need some notation. For a given set X , let BC .X / be the set of positive Boolean formulas over X (i.e., Boolean formulas built from elements in X using ^ and _), where we also allow the formulas true and false. For Y  X , we say that Y satisfies a formula  2 BC .X / if and only if the truth assignment that assigns true to the members of Y and assigns false to the members of X n Y satisfies  . For example, the sets ¹q1 ; q3 º and ¹q2 ; q3 º both satisfy the formula .q1 _ q2 / ^ q3 , while the set ¹q1 ; q2 º does not satisfy this formula. Consider an automaton A D h†; Q; Q0 ; ı; ˛i. We can represent ı using BC .Q/. For example, a transition ı.q; / D ¹q1 ; q2 ; q3 º of a nondeterministic automaton A can be written as ı.q; / D q1 _ q2 _ q3 . If A is universal, the transition can be written as ı.q; / D q1 ^ q2 ^ q3 . While transitions of nondeterministic and universal automata correspond to disjunctions and conjunctions, respectively, transitions of alternating automata can be arbitrary formulas in BC .Q/. We can have, for instance, a transition ı.q; / D .q1 ^ q2 / _ .q3 ^ q4 /, meaning that the automaton accepts a suffix w i of w from state q , if it accepts w i C1 from both q1 and q2 or from both q3 and q4 . Such a transition combines existential and universal choices. Formally, an alternating automaton on infinite words is a tuple A D h†; Q; qin ; ı; ˛i, where †; Q, and ˛ are as in nondeterministic automata, qin 2 Q is an initial state (we will later explain why it is technically easier to assume a single initial state), and ıW Q  † ! BC .Q/ is a transition function. In order to define runs of alternating automata, we first have to define trees and labelled trees. Given a set ‡ of directions, an ‡ -tree is a prefix-closed set T  ‡  . Thus, if x  c 2 T where x 2 ‡  and c 2 ‡ ,

1420

Javier Esparza, Orna Kupferman, and Moshe Y. Vardi

then x 2 T also. The elements of T are called nodes, and the empty word " is the root of T . For every x 2 T and c 2 ‡ , the node x  c is a successor of x . The number of successors of x is called the degree of x and is denoted by d.x/. A node is a leaf if it has no successors. We sometimes refer to the length jxj of x as its level in the tree. A path of an ‡ -tree T is a set   T such that " 2  and for every x 2  , either x is a leaf or there exists a unique c 2 ‡ such that x  c 2  . Given an alphabet †, a †-labelled ‡ -tree is a pair hT; V i where T is an ‡ -tree and V W T ! † maps each node of T to a letter of †. While a run of a nondeterministic automaton on an infinite word is an infinite sequence of states, a run of an alternating automaton is a Q-labelled N-tree. Formally, given an infinite word w D 0 1    , a run of A on w is a Q-labelled N-tree hTr ; ri such that the following two conditions hold:  " 2 Tr and r."/ D qin ;  if x 2 Tr , r.x/ D q , and ı.q; jxj / D  , then there is a (possibly empty) set S D ¹q1 ; : : : ; qk º such that S satisfies  and for all 1 6 c 6 k , we have x  c 2 Tr and r.x  c/ D qc .

For example, if ı.qin ; 0 / D .q1 _ q2 / ^ .q3 _ q4 /, then every run of A on w has a root labelled qin , a node in level 1 labelled q1 or q2 , and another node in level 1 labelled q3 or q4 . Note that if  D true, then x need not have children. This is the reason why Tr may have leaves. Also, since no set S satisfies  D false, no run ever takes a transition with  D false. A run hTr ; ri is accepting if and only if all its infinite paths, labelled by words in Q! , satisfy the acceptance condition. A word w is accepted if and only if there exists an accepting run on it. Note that while conjunctions in the transition function of A are reflected in branches of hTr ; ri, disjunctions are reflected in the fact we can have many runs on the same word. The language of A, denoted L.A/, is the set of infinite words that A accepts. We define the size jAj of an automaton A D h†; Q; ı; q0 ; ˛i as jQj C jıj C j˛j, where jQj and j˛j are the respective cardinalities of the sets Q and ˛ , and where jıj is the sum of the lengths of formulas that appear as ı.q; / for some q 2 Q and  2 †n . Example 2.2. We describe an alternating Büchi automaton An over the alphabet †n D ¹1; 2; : : : ; nº such that An accepts exactly all words containing the subword i 3 for all letters i 2 †n . Let An D h†n ; Qn ; qin ; ı; ;i be defined as follows.  Qn D ¹qin º[.†n ¹3; 2; 1º/. Thus, in addition to an initial state, the automaton An contains three states for each letter. Intuitively, the automaton is going to spawn into n different copies, with copy i waiting for the subword i 3 using the states hi; 3i; hi; 2i, and hi; 1i.  In its first transition, An spawn into n copies, taking the first letter V into account. Thus, for all i 2 †, we have ı.qin ; i / D hi; 2i ^ j ¤i hj; 3i.

38. Verification

1421

In addition, for all i 2 † and c 2 ¹3; 2; 1º, we have 8 ˆ Q0 for all the sets Q0 in i .¹Qi º[¹Q  If ' D A , we construct and dualise the HAA of E: .

1446

Javier Esparza, Orna Kupferman, and Moshe Y. Vardi

We prove the correctness of the construction by induction on the structure of ' . The proof is immediate for the case ' is of the form p , :p , '1 ^ '2 , '1 _ '2 , or A . We consider here the case where ' D E . If a tree hTK ; VK i satisfies ' , then there exists a path  in it such that  ˆ  . Thus, there exists an accepting run r of U on a word that agrees with  on the formulas in max.'/. It is easy to see that a run of A' that proceeds on  according to r can accept hTK ; VK i. Indeed, by the definition of A0' , the copies that proceed according to ı 0 satisfy the acceptance condition. In addition, by the adjustment of A0' to the alphabet 2AP and by the induction hypothesis, copies that take care of the maximal formulas can fulfill the acceptance condition. Now, if a run r of A' accepts a tree hTK ; VK i, then there must be a path  in this tree such that A' proceeds according to an accepting run of U on a word that agrees with  on the formulas in max.'/. Thus,  ˆ  and hTK ; VK i satisfies ' . We now consider the size of A . For every ' , we prove, by induction on the structure of ' , that the size of A' is exponential in j'j.

 Clearly, for ' D p or ' D :p for some p 2 AP, the size of A' is constant.  For ' D '1 ^ '2 or ' D '1 _ '2 , we have jA' j D O.jA'1 j C jA'2 j/. By the induction hypothesis, jA'1 j is exponential in j'1 j and jA'2 j is exponential in j'2 j. Thus, jA' j is surely exponential in j'j.  For ' D E , we know, by Theorem 3.2, that the number of states of the word automaton U is exponential in jj. Therefore, A0' is exponential in j'j. Also, j†0 j is exponential in j max.'/j and, by the induction hypothesis, for all 'i 2 max.'/, the size of A'i is exponential in j'i j. Therefore, A' is also exponential in j'j.  For ' D A , we know, by the above, that jAE : j is exponential in j'j. Since complementing an HAA does not change its size, the result for ' follows.

Finally, since each subformula of linear in j j.

induces exactly one set, the depth of A is

Example 5.4. Consider the CTL? formula D AGF .p _ AX p/. We describe the construction of A step by step. Since is of the form A , we need to construct and dualise the HAA of EF G..:p/ ^ EX :p/. We start with the HAAs A' and Az' for ' D .:p/ ^ EX :p .  A' D h¹¹pº; ;º; ¹q2; q3 º; ı; q2 ; h;; ;ii, with

state q q2 q3

ı.q; ¹pº/ false false

Q qe2 ; h;; ;ii, with  Az' D h¹¹pº; ;º; ¹qe2; qe3 º; ı;

state q qe2 qe3

Q ¹pº/ ı.q; true true

ı.q; ;/ .Þ; q3 / true

Q ;/ ı.q; .; qe3 / false

38. Verification

1447

Starting with a Büchi word automaton U for  D F G' , we construct A0E  . We have U D h¹¹'º; ;º; ¹q0; q1 º; ; q0 ; ¹q1 ºi, with  .q0 ; ¹'º/ D ¹q0 ; q1 º;  .q1 ; ¹'º/ D ¹q1 º;

 .q0 ; ;/ D ¹q0 º;  .q1 ; ;/ D ;.

Hence, A0E  D h¹¹'º; ;º; ¹q0; q1 º; ı 0 ; q0 ; h¹q1 º; ;ii, with state q q0 q1

ı 0 .q; ¹'º/ .Þ; q0 / _ .Þ; q1 // .Þ; q1 /

ı 0 .q; ;/ .Þ; q0 / false

We are now ready to compose the automata into an automaton AE  over the alphabet ¹¹pº; ;º. We set AE  D h¹¹pº; ;º; ¹q0; q1 ; q3 ; qe3 º; ı; q0 ; h¹q1 º; ;ii with ı defined as follows. (Note that we simplify the transitions, replacing true ^  or false _  by  , replacing true _  by true, and replacing false ^  by false.) state q q0 q1 q3 qe3

ı.q; ¹pº/ .Þ; q0 / false false true

ı.q; ;/ Œ..Þ; q0 / _ .Þ; q1 // ^ .Þ; q3 / _ Œ.Þ; q0 / ^ .; qe3 / .Þ; q1 / ^ .Þ; q3 / true false

Consider ı.q0 ; ;/. The first disjunct corresponds to the case where A0E  guesses that ' holds in the present. Then, AE  proceeds with ı 0 .q0 ; ¹'º/ D .Þ; q0 / _ .Þ; q1 / conjuncted with ı.q2 ; ;/ D .Þ; q3 /. The later guarantees that ' indeed holds in the present. The second disjunct corresponds to the case where ' does not hold in the present. Then, AE  proceeds with ı 0 .q0 ; ;/ D .Þ; q0 / conjuncted with ı.qe2 ; ;/ D .; qe3 /. We obtain A by dualising AE  . Hence, Q qe0 ; h;; ¹qe1 ºii; A D h¹¹pº; ;º; ¹qe0; qe1 ; qe3 ; q3 º; ı;

with state q qe0 qe1 qe3 q3

Q ¹pº/ ı.q; .; qe0 / true true false

Q ;/ ı.q; ..; qe0 / ^ .; qe1 // _ .; qe3 // ^ ..; qe0 / _ .Þ; q3 // .; qe1 / _ .; qe3 / false true

1448

Javier Esparza, Orna Kupferman, and Moshe Y. Vardi

Consider the state qe1 . A copy of A that visits qe1 keeps creating new copies of A , all visiting qe1 , unless it reaches a node that satisfies p or AX p . Since qe1 2 B , all the copies should eventually reach such a node. So, by sending a copy that visits the state qe1 to a node x , the HAA A guarantees that all the paths in the subtree with x as root eventually reach a node satisfying p _ AX p . Hence, in the state qe0 , unless A gets convinced that p _ AX p holds in the present, it sends copies that visit qe1 to all the successors. In addition, it always send copies visiting qe0 to all the successors.

5.2. Automata-based satisfiability and model-checking procedures for branching temporal logics. In § 3.2 we have seen how the satisfiability and model-checking problems for LTL can be reduced to the nonemptiness and the language-disjointness problems for nondeterministic Büchi word automata. In this section we reduce the satisfiability and model-checking problems for branching-time temporal logics to the nonemptiness and 1-letter nonemptiness problems for alternating tree automata. The type of the automaton depends on the logic considered. We start with the satisfiability problem. Theorem 5.7. The satisfiability problem is  EXPTIME-complete for CTL, alternation-free -calculus, and -calculus.  2EXPTIME-complete for CTL? .

Proof. By [25], all the logics considered in the theorem satisfy the linear sufficient branching degree property. More specifically, is is satisfiable, then there is a Kripke structure of branching degree j j C 1 that satisfies . The upper bounds then follow from the translations described in § 5.1 and the complexities of the nonemptiness problem specified in Theorem 4.1, applied to the corresponding symmetric alternating automata and branching degree. The lower bounds are proven in [26] and [61]. We continue with the model-checking problem. Given the alternating tree automaton A and a Kripke structure K , we define their product AK; to be a 1-letter alternating word automaton (see below). Thus, the product with K takes us from a tree automaton to a word automaton, and from an automaton over an alphabet 2AP to a 1-letter automaton. Obviously, the nonemptiness problem for tree automata cannot, in general, be reduced to the nonemptiness problem of word automata. Also, as discussed above, the nonemptiness problem for alternating word automata cannot, in general, be reduced to the 1-letter nonemptiness problem. It is taking the product with K that makes both reductions valid here. Since each state in AK; is associated with a state w of K , then each state has the exact information as to which subtree of hTK ; VK i it is responsible for (i.e., which subtree it would have run over if AK; had not been a 1-letter word automaton). The branching structure of hTK ; VK i and its 2AP -labelling are thus embodied in the states of AK; . In particular, it is guaranteed that all the copies of the product automaton that start in a certain state, say one associated with w , follow the same labelling: the one that corresponds to computations of K that start in w . Let A D h2AP ; Q ; ı ; q0 ; ˛ i be a symmetric alternating tree automaton for L. / and let K D hAP; W; R; w 0 ; `i be a Kripke structure. The product automaton

38. Verification

1449

of A and K is an alternating word automaton AK; D h¹aº; W  Q ; ı; hw 0 ; q0 i; ˛i where ı and ˛ are defined as follows.  Let q 2 Q , w 2 W , and ı .q; `.w// D  . Then ı.hw; qi; a/ D  0 , where  0 is obtained from  by replacing: – each atom ."; q 0 / in  by the atom hw; q 0 i, V – each atom .; q 0 / in  by the conjunction w 0 WR.w;w 0/ hw 0 ; q 0 i, and W – each atom .Þ; q 0 / in  by the disjunction w 0 WR.w;w 0/ hw 0 ; q 0 i.  The acceptance condition ˛ is defined according to the acceptance condition ˛ of A : each set F in ˛ is replaced by the set W  F . It is easy to see that AK; is of the same type as A . In particular, if A is a WAA (with a partition ¹Q1 ; Q2 ; : : : ; Qn º), then so is AK; (with a partition ¹W  Q1 ; W  Q2 ; : : : ; W  Qn º). Proposition 5.8 ([42]). Consider a Kripke structure K and a branching temporal logic . Then,  jAK; j D O.jKj  jA j/;  L.AK; / is nonempty if and only if K ˆ .

Proof. The claim about the size of AK; follows easily from the definition of AK; . Indeed, jW  Q j D jW j  jQ j, jıj D jW j  jı j, and j˛j D jW j  j˛ j. To prove the correctness of the reduction, we show that L.AK; / is nonempty if and only if A accepts hTK ; VK i. Since A accepts L. /, the latter holds if and only if K ˆ . Given an accepting run of A over hTK ; VK i, we construct an accepting run of AK; . Also, given an accepting run of AK; , we construct an accepting run of A over hTK ; VK i. First, assume that A accepts hTK ; VK i. Thus, there exists an accepting run hTr ; ri of A over hTK ; VK i. Recall that Tr is labelled with N  Q . A node y 2 Tr with r.y/ D .x; q/ corresponds to a copy of A that is in the state q and reads the tree obtained by unwinding K from VK .x/. Consider the tree hTr ; r 0 i where Tr is labelled with 0  W  Q and for every y 2 Tr with r.y/ D .x; q/, we have r 0 .y/ D .0jxj ; VK .x/; q/. We show that hTr ; r 0 i is an accepting run of AK; . In fact, since ˛ D W  ˛ , we only need to show that hTr ; r 0 i is a run of AK; ; acceptance follows from the fact that hTr ; ri is accepting. Intuitively, hTr ; r 0 i is a “legal” run, since the W -component in r 0 always agrees with VK . This agreement is the only additional requirement of ı with respect to ı . Consider a node y 2 Tr with r.y/ D .x; q/, VK .x/ D w . Let ı .q; w/ D  . Since hTr ; ri is a run of A , there exists a set ¹.c0 ; q0 /; .c1 ; q1 /; : : : ; .cn ; qn /º satisfying  , such that the successors of y in Tr are y  i , for 1 6 i 6 n, each labelled with .x  ci ; qi /. In hTr ; r 0 i, by its definition, r 0 .y/ D .0jxj ; w; q/ and the successors of y are y  i , each labelled with .0jxC1j ; wci ; qi /, where ¹w0 ; : : : ; wn º is the set of successors of w in K . Let ı.q; a/ D  0 . By the definition of ı , the set ¹.wc0 ; q0 /; .wc1 ; q1 /; : : : ; .wcn ; qn /º satisfies  0 . Thus, hTr ; r 0 i is a run of AK; . Now, assume that AK; accepts a! . Thus, there exists an accepting run hTr ; ri of AK; . Recall that Tr is labelled with 0 W Q . Consider the tree hTr ; r 0 i labelled

1450

Javier Esparza, Orna Kupferman, and Moshe Y. Vardi

with N Q , where r 0 ."/ D ."; q0 / and for every y  c 2 Tr with r 0 .y/ 2 ¹xº Q and r.y c/ D .0jxC1j ; w; q/, we have r 0 .y c/ D .x i; q/, where i is such that VK .x i / D w . As in the previous direction, it is easy to see that hTr ; r 0 i is an accepting run of A over hTK ; VK i. Proposition 5.8 can be viewed as an automata-theoretic generalisation of Theorem 4.1 in [24]. Theorems 5.1, 5.2, 5.4, and 5.6, together with Theorem 4.2, then imply the following. Theorem 5.9 ([42]). The model-checking problem for  CTL can be solved in linear time and in space O.m log2 .mn//, where m is the length of the formula and n is the size of the Kripke structure;  the alternation-free -calculus can be solved in linear time;  the -calculus is in NP \ co-NP;  CTL? can be solved in space O.m.m C log n/2 /, where m is the length of the formula and n is the size of the Kripke structure.

Now, let us define the program complexity [79] of model checking as the complexity of this problem in terms of the size of the input Kripke structure; i.e., assuming the formula fixed. Consider the translation of CTL? formulas to HAA. Fixing the formula, we get an HAA of a fixed depth, making their nonemptiness test in NLOGSPACE. On the other hand, the 1-letter nonemptiness problem for weak alternating word automata is PTIME-complete [42]. Thus, the restricted structure of HAAs is essential for a space-efficient nonemptiness test, inducing the following complexities for the program complexity of model checking. Theorem 5.10 ([42]). The program complexity of the model-checking problem is  NLOGSPACE-complete for CTL and CTL? ;  PTIME-complete for the alternation-free -calculus.

References [1] C. Baier and J.-P. Katoen, Principles of model checking. With a foreword by K. G. Larsen. MIT Press, Cambridge, MA, 2008. MR 2493187 Zbl 1179.68076 q.v. 1415 [2] C. Beeri, On the membership problem for functional and multivalued dependencies in relational databases. ACM Trans. on Database Syst. 5 (1980), 241–259. Zbl 0441.68118 q.v. 1439 [3] C. Beeri and P. Bernstein, Computational problems related to the design of normal form relational schemas. ACM Trans. on Database Syst. 4 (1979), 30–59. q.v. 1439 [4] G. Bhat and R. Cleaveland, Efficient local model-checking for fragments of the modal -calculus. In Tools and Algorithms for the Construction and Analysis of Systems (T. Margaria and B. Steffen, eds.). Proceedings of the Second International Workshop, TACAS ’96, held in Passau, March 27–29, 1996. Lecture Notes in Computer Science 1055. Springer, Berlin, 1996, 107–126. q.v. 1443

38. Verification

1451

[5] G. Bhat and R. Cleaveland, Efficient model checking via the equational -calculus. In 11 th Annual IEEE Symposium on Logic in Computer Science. Held in New Brunswick, NJ, July 27–30, 1996. IEEE Computer Society, Los Alamitos, CA, 1996, 304–312. MR 1461843 IEEEXplore 561358 q.v. 1443 [6] U. Boker and O. Kupferman, Co-ing Büchi made tight and useful. In 24 th Annual IEEE Symposium on Logic in Computer Science. Proceedings of the symposium (LICS 2009) held at UCLA, Los Angeles, CA, August 11–14, 2009. IEEE Computer Society, Los Alamitos, CA, 2009, 245–254. MR 2932389 Zbl 5230576 q.v. 1427 [7] U. Boker, O. Kupferman, and A. Rosenberg, Alternation removal in Büchi automata. In Automata, languages and programming (S. Abramsky, C. Gavoille, C. Kirchner, F. M. auf der Heide, and P. G. Spirakis, eds.). Part II. Proceedings of the 37 th International Colloquium (ICALP 2010) held in Bordeaux, July 6–10, 2010. Lecture Notes in Computer Science, 6199. Springer, Berlin, 2010, 76–87. MR 2734637 Zbl 1288.68148 q.v. 1422, 1424 [8] J. R. Büchi, Weak second-order arithmetic and finite automata. Z. Math. Logik Grundlagen Math. 6 (1960), 66–92. MR 0125010 Zbl 0103.24705 q.v. 1416 [9] J. R. Büchi, On a decision method in restricted second order arithmetic. In Logic, Methodology and Philosophy of Science (E. Nagel, P. Suppes, and A. Tarski, eds.). Proceedings of the 1960 International Congress. Stanford University Press, Stanford, CA, 1962, 1–11. MR 0183636 Zbl 0147.25103 q.v. 1416 [10] A. Chandra, D. Kozen, and L. J. Stockmeyer, Alternation. J. Assoc. Comput. Mach. 28 (1981), no. 1, 114–133. MR 0603186 Zbl 0473.68043 q.v. 1427 [11] Y. Choueka, Theories of automata on ! -tapes: A simplified approach. J. Comput. System Sci. 8 (1974), 117–141. MR 0342378 Zbl 0292.02033 q.v. 1425 [12] E. Clarke, E. A. Emerson, and A. P. Sistla, Automatic verification of finite-state concurrent systems using temporal logic specifications. ACM Trans. Prog. Lang. Syst. 8 (1986), no. 2, 244–263. Zbl 0591.68027 q.v. 1415 [13] E. Clarke, O. Grumberg, and D. Peled, Model checking. MIT Press, Cambridge, MA, 1999. q.v. 1415 [14] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to algorithms. Third edition. MIT Press, Cambridge, MA, 2009. MR 2572804 Zbl 1187.68679 q.v. 1424 [15] C. Courcoubetis and M. Yannakakis, The complexity of probabilistic verification. J. Assoc. Comput. Mach. 42 (1995), no. 4, 857–907. MR 1411788 Zbl 0885.68109 q.v. 1426 [16] J. Doner, Decidability of the weak second-order theory of two successors. Notices Amer. Math. Soc 12 (1965), 819, 35 pp. q.v. 1416 [17] W. Dowling and J. Gallier, Linear-time algorithms for testing the satisfiability of propositional Horn formulae. J. Logic Programming 1 (1984), no. 3, 267–284. MR 0770156 Zbl 0593.68062 q.v. 1439 [18] C. Elgot, Decision problems of finite-automata design and related arithmetics. Trans. Amer. Math. Soc. 98 (1961), 21–51. MR 0139530 Zbl 0111.01102 q.v. 1416 [19] E. A. Emerson, Automata, tableaux, and temporal logics. In Logics of programs (R. Parikh, ed.). Proceedings of the conference held at Brooklyn College, Brooklyn, N.Y., June 17–19, 1985. Lecture Notes in Computer Science, 193. Springer, Berlin, 1985, 79–87. MR 0808790 Zbl 0603.03005 q.v. 1416 [20] E. A. Emerson, Temporal and modal logic. In Handbook of theoretical computer science (J. van Leeuwen, ed.). Vol. B. Formal models and semantics. Chapter 16. Elsevier Science Publishers, Amsterdam, and MIT Press, Cambridge, MA, 1990, 997–1072. MR 1127201 Zbl 0900.03030 q.v. 1416

1452

Javier Esparza, Orna Kupferman, and Moshe Y. Vardi

[21] E. A. Emerson and J. Halpern. “Sometimes” and “not never” revisited: on branching versus linear time. J. Assoc. Comput. Mach. 33 (1986), no. 1, 151–178. MR 0820103 Zbl 0629.68020 q.v. 1429 [22] E. A. Emerson and C. S. Jutla, The complexity of tree automata and logics of programs (extended abstract). In 29 th Annual Symposium on Foundations of Computer Science. Held in White Plains, N.Y., October 24–26, 1988. IEEE Press, Los Alamitos, CA, 1988, 328–337. IEEEXplore 21949 q.v. 1416 [23] E. A. Emerson and C. S. Jutla, Tree automata, mu-calculus and determinacy. In Proceedings of the 32 nd Annual Symposium of Foundations of Computer Science. Held in San Juan, Puerto Rico, October 1–4, 1991. IEEE Press, Los Alamitos, CA, 368–377. IEEEXplore 185392 q.v. 1416, 1443 [24] E. A. Emerson, C. S. Jutla, and A. P. Sistla, On model-checking for fragments of -calculus. In Computer aided verification (C. Courcoubetis, ed.). Proceedings of the Fifth International Conference (CAV ’93) held in Elounda, June 28–July 1, 1993. Lecture Notes in Computer Science, 697. Springer, Berlin, 1993, 385–396. MR 1254452 Zbl 1390.68436 q.v. 1450 [25] E. A. Emerson and A. P. Sistla, Deciding branching time logic. In STOC ’84: Proceedings of the sixteenth annual ACM symposium on Theory of computing (R. A. DeMillo, ed.). Association for Computing Machinery, New York, 1984, 14–24. q.v. 1416, 1448 [26] M. Fischer and R. Ladner, Propositional dynamic logic of regular programs. J. Comput. System Sci. 18 (1979), no. 2, 194–211. MR 0532175 Zbl 0408.03014 q.v. 1437, 1448 [27] D. Gabbay, Applications of trees to intermediate logics. J. Symbolic Logic 37 (1972), 135–138. MR 0319709 Zbl 0243.02019 q.v. 1416 [28] P. Gastin and D. Oddoux, Fast LTL to Büchi automata translation. In Computer aided verification (G. Berry, H. Comon, and A. Finkel, eds.). Proceedings of the 13th International Conference (CAV 2001) held in Paris, July 18–22, 2001. Lecture Notes in Computer Science, 2102. Springer, Berlin, 2001, 53–65. MR 2048982 Zbl 0991.68044 q.v. 1424 [29] R. Gerth, D. Peled, M. Y. Vardi, and P. Wolper, Simple on-the-fly automatic verification of linear temporal logic. In Protocol specification, testing, and verification (P. Dembinski and M. Sredniawa, eds.). Proceedings of the Fifteenth IFIP WG6.1 International Symposium on Protocol Specification, Testing and Verification, Warsaw, Poland, June 1995. IFIP Advances in Information and Communication Technology, 38. Springer International Publishing, Cham, 1995, 3–18. q.v. 1416 [30] Y. Gurevich and L. Harrington, Trees, automata, and games. In Proceedings of the 14 th annual ACM symposium on theory of computing. (H. R. Lewis, B. B. Simons, W. A. Burkhard, and L. H. Landweber, eds.). Held in San Francisco, CA, May 5–7, 1982. Association for Computing Machinery, New York, 1982, 60–65. q.v. 1437 [31] T. Hafer and W. Thomas, Computation tree logic CTL? and path quantifiers in the monadic theory of the binary tree. In Automata, languages and programming (T. Ottmann, ed.). Proceedings of the fourteenth international colloquium held at the University of Karlsruhe, Karlsruhe, July 13–17, 1987. Lecture Notes in Computer Science, 267. Springer, Berlin, 1987, 269–279. MR 0912715 Zbl 0632.03027 q.v. 1416 [32] D. Janin and I. Walukiewicz, Automata for the modal -calculus and related results. In Mathematical foundations of computer science 1995 (J. Wiedermann and P. Hájek, eds.). Proceedings of the 20 th International Symposium (MFCS ’95) held in Prague, August 28–September 1, 1995. Lecture Notes in Computer Science, 969. Springer, Berlin, 1995, 552–562. MR 1467281 Zbl 1193.68163 q.v. 1433

38. Verification

1453

[33] N. Jones, Space-bounded reducibility among combinatorial problems. J. Comput. System Sci. 11 (1975), no. 1, 68–85. MR 0398165 Zbl 0317.02039 q.v. 1425 [34] D. Kähler and T. Wilke, Complementation, disambiguation, and determinization of Büchi automata unified. In Automata, languages and programming (L. Aceto, I. Damgård, L. A. Goldberg, M. M. Halldórsson, A. Ingólfsdóttir, and I. Walukiewicz, eds.). Part I. Proceedings of the 35 th International Colloquium (ICALP 2008) held in Reykjavik, July 7–11, 2008 Lecture Notes in Computer Science, 5125. Springer, Berlin, 2008, 724–735. MR 2500314 Zbl 1153.68032 q.v. 1426, 1427 [35] J. Kamp, Tense logic and the theory of order. Ph.D. thesis. University of California – Los Angeles, Los Angeles, 1968. q.v. 1416 [36] D. Kozen, Results on the propositional -calculus. Theoret. Comput. Sci. 27 (1983), no. 3, 333–354. MR 0731069 Zbl 0553.03007 q.v. 1431, 1432 [37] S. Krishnan, A. Puri, and R. Brayton, Deterministic ! -automata vis-a-vis deterministic Büchi automata. In Algorithms and computation (D. Du and X. Zhang, eds.). Proceedings of the Fifth International Symposium (ISAAC ’94) held in Beijing, August 25–27, 1994. Lecture Notes in Computer Science, 834. Springer, Berlin, 1994, 378–386. MR 1316437 Zbl 0953.68563 q.v. 1427 [38] O. Kupferman and R. Lampert, On the construction of fine automata for safety properties. In Automated technology for verification and analysis (S. Graf and W. Zhang, eds.). Proceedings of the 4 th international symposium, ATVA 2006, Beijing, China, October 23–26, 2006. Lecture Notes in Computer Science, 4218. Springer, Berlin, 110–124. Zbl 1161.68572 q.v. 1429 [39] O. Kupferman and A. Rosenberg, The blow-up in translating LTL to deterministic automata. In Model checking and artificial intelligence (R. van der Meyden and J. Smaus, eds.). 6 th international workshop, MoChArt 2010, Atlanta, GA, USA, July 11, 2010. Revised selected and invited papers. Lecture Notes in Computer Science, 6572. Lecture Notes in Artificial Intelligence. Springer, Berlin, 2010, 85–94. Zbl 1327.68161 q.v. 1427 [40] O. Kupferman and M. Y. Vardi, Model checking of safety properties. J. Form. Methods in Syst. Design 19 (2001), no. 3, 291–314. Zbl 0995.68061 q.v. 1428 [41] O. Kupferman and M. Y. Vardi, From linear time to branching time. ACM Trans. Comput. Log. 6 (2005), no. 2, 273–294. MR 2126057 Zbl 1367.68195 q.v. 1426, 1427 [42] O. Kupferman, M. Y. Vardi, and P. Wolper, An automata-theoretic approach to branchingtime model checking. J. ACM 47 (2000), no. 2, 312–360. MR 1769445 Zbl 1133.68376 q.v. 1416, 1437, 1439, 1440, 1442, 1443, 1444, 1449, 1450 [43] L. H. Landweber, Decision problems for ! -automata. Math. Systems Theory 3 (1969), 376–384. MR 0260595 Zbl 0182.02402 q.v. 1426 [44] O. Lichtenstein and A. Pnueli, Checking that finite state concurrent programs satisfy their linear specification. In POPL ’85: Proceedings of the 12 th ACM SIGACT-SIGPLAN symposium on Principles of programming languages (M. S. V. Deusen, Z. Galil, and B. K. Reid, eds.). Association for Computing Machinery, New York, 1985, 97–107. q.v. 1415, 1426 [45] O. Lichtenstein, A. Pnueli, and L. Zuck, The glory of the past. In Logics of programs (R. Parikh, ed.). Proceedings of the conference held at Brooklyn College, Brooklyn, N.Y., June 17–19, 1985. Lecture Notes in Computer Science, 193. Springer, Berlin, 1985, 196–218. MR 0808799 Zbl 0586.68028 q.v. 1416 [46] Z. Manna and A. Pnueli, The temporal logic of reactive and concurrent systems. Specification. Springer, New York, 1992. MR 1156076 Zbl 0753.68003 q.v. 1415

1454

Javier Esparza, Orna Kupferman, and Moshe Y. Vardi

[47] S. Miyano and T. Hayashi, Alternating finite automata on ! -words. Theoret. Comput. Sci. 32 (1984), no. 3, 321–330. MR 0761350 Zbl 0544.68042 q.v. 1421 [48] D. E. Muller, A. Saoudi, and P. E. Schupp, Alternating automata, the weak monadic theory of trees and its complexity. Theoret. Comput. Sci. 97 (1992), no. 2, 233–244. MR 1163817 Zbl 0776.03017 q.v. 1435, 1437, 1444 [49] D. E. Muller, A. Saoudi, and P. E. Schupp, Weak alternating automata give a simple explanation of why most temporal and dynamic logics are decidable in exponential time. In Third Annual Symposium on Logic in Computer Science. Held in Edinburgh, July 5–8, 1988. IEEE Computer Society, Los Alamitos, CA, 1988, 422–427. IEEEXplore 5139 q.v. 1416 [50] D. E. Muller and P. E. Schupp, Alternating automata on infinite trees. Theoret. Comput. Sci. 54 (1987), no. 2–3, 267–276. MR 0919595 Zbl 0636.68108 q.v. 1433, 1444 [51] D. E. Muller and P. E. Schupp, Simulating alternating tree automata by nondeterministic automata: New results and new proofs of theorems of Rabin, McNaughton and Safra. Theoret. Comput. Sci. 141 (1995), no. 1–2, 69–107. MR 1323149 Zbl 0873.68135 q.v. 1437 [52] N. Piterman, From nondeterministic Büchi and Streett automata to deterministic parity automata. Log. Methods Comput. Sci. 3 (2007), no. 3, 3:5, 21 pp. MR 2346962 Zbl 1125.68067 q.v. 1426, 1427 [53] A. Pnueli, The temporal logic of programs. In 18 th Annual Symposium on Foundations of Computer Science. SFCS 1977. Held in Providence, R.I., October 31–November 2, 1977. IEEE Press, Computer Society, Long Beach, CA, 1977, 46–57. MR 0502161 IEEEXplore 4567924 q.v. 1415 [54] A. Pnueli, The temporal semantics of concurrent programs. Theoret. Comput. Sci. 13 (1981), no. 1, 45–60. MR 0593863 Zbl 0441.68010 q.v. 1416, 1417 [55] A. Pnueli and R. Rosner, On the synthesis of a reactive module. In Proceedings of the 16 th ACM SIGPLAN-SIGACT symposium on Principles of programming languages. POPL ’89. Austin, TX, January 11–13, 1989. Association for Computing Machinery, New York, 1989, 179–190. q.v. 1426 [56] J. Queille and J. Sifakis, Specification and verification of concurrent systems in CESAR. In International symposium on programming (M. Dezani-Ciancaglini and U. Montanari, eds.) Proceedings of the symposium held in Turin, April 6–8, 1982. Lecture Notes in Computer Science, 137. Springer, Berlin, 1982, 337–351. MR 0807187 Zbl 0482.68028 q.v. 1415 [57] M. O. Rabin, Decidability of second order theories and automata on infinite trees. Trans. Amer. Math. Soc. 141 (1969), 1–35. MR 0246760 Zbl 0221.02031 q.v. 1416, 1426 [58] M. O. Rabin, Weakly definable relations and special automata. In Mathematical logic and foundations of set theory (Y. Bar-Hillel, ed.) Proceedings of an International Colloquium held under the auspices of the Israel Academy of Sciences and Humanities, Jerusalem, November 11–14, 1968. Studies in Logic and the Foundations of Mathematics North-Holland Publishing Co., Amsterdam and London, 1970, 1–23. MR 0277388 Zbl 0214.02208 q.v. 1444 [59] M. O. Rabin and D. Scott, Finite automata and their decision problems. IBM J. Res. Develop. 3 (1959), 114–125. MR 0103795 Zbl 0158.25404 q.v. 1422 [60] S. Safra, On the complexity of ! -automata. In 29 th Annual Symposium on Foundations of Computer Science. Held in White Plains, N.Y., October 24–26, 1988. IEEE Press, Los Alamitos, CA, 1988, 319–327. IEEEXplore 21948 q.v. 1426, 1427

38. Verification

1455

[61] S. Safra and M. Y. Vardi, On ! -automata and temporal logic. In STOC ’89: Proceedings of the twenty-first annual ACM symposium on Theory of computing (D. S. Johnson, ed.). Association for Computing Machinery, New York, 1989, 127–137. q.v. 1448 [62] S. Schewe, Büchi complementation made tight. In STACS 2009: 26 th International Symposium on Theoretical Aspects of Computer Science (S. Albers and J. Marion, eds.). LIPIcs. Leibniz International Proceedings in Informatics, 3. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2009, 661–672. MR 2870691 Zbl 1236.68176 q.v. 1427 [63] S. Schewe, Tighter bounds for the determinisation of Büchi automata. In Foundations of software science and computational structures (L. de Alfaro, ed.). Proceedings of the 12 th International Conference (FOSSACS 2009) held in York, March 22–29, 2009. Lecture Notes in Computer Science, 5504. Springer, Berlin, 2009, 167–181. MR 2545219 Zbl 1234.68239 q.v. 1426 [64] H. Seidl, Deciding equivalence of finite tree automata. SIAM J. Comput. 19 (1990), no. 3, 424–437. MR 1041537 Zbl 0699.68075 q.v. 1437 [65] A. P. Sistla, Theoretical issues in the design of distributed and concurrent systems. Ph.D. thesis. Harvard University, Cambridge, MA, 1983. q.v. 1416 [66] A. P. Sistla, Safety, liveness and fairness in temporal logic. Form. Asp. Comput. 6 (1994), 495–511. q.v. 1428 [67] A. P. Sistla and E. Clarke, The complexity of propositional linear temporal logic. J. Assoc. Comput. Mach. 32 (1985), no. 3, 733–749. MR 0796211 q.v. 1425, 1426 [68] A. P. Sistla, M. Y. Vardi, and P. Wolper, The complementation problem for Büchi automata with applications to temporal logic. Theoret. Comput. Sci. 49 (1987), no. 2–3, 217–237. MR 0909332 Zbl 0613.03015 q.v. 1416, 1425 [69] G. Slutzki, Alternating tree automata. Theoret. Comput. Sci. 41 (1985), no. 2–3, 305–318. MR 0847683 Zbl 0595.68050 q.v. 1433 [70] R. Streett, Propositional dynamic logic of looping and converse is elementarily decidable. Inform. and Control 54 (1982), no. 1–2, 121–141. MR 0713309 Zbl 0515.68062 q.v. 1416 [71] R. Streett and E. A. Emerson, The propositional -calculus is elementary. In Automata, languages and programming (J. Paredaens, ed.). Proceedings of the eleventh colloquium held in Antwerp, July 16–20, 1984. Lecture Notes in Computer Science, 172. Springer, Berlin, 1984, 465–472. MR 0784273 Zbl 0556.68005 q.v. 1416 [72] D. Tabakov and M. Y. Vardi, Monitoring temporal systemC properties. In Eighth ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE 2010). Held in Grenoble, July 26–28, 2010. IEEE Computer Society, Los Alamitos, CA, 2010, 123–132. IEEEXplore 5558640 q.v. 1426 [73] J. Thatcher and J. Wright, Generalized finite automata theory with an application to a decision problem of second-order logic. Math. Systems Theory 2 (1968), 57–81. MR 0224476 Zbl 0157.02201 q.v. 1416 [74] W. Thomas, Automata on infinite objects. In Handbook of theoretical computer science (J. van Leeuwen, ed.). Vol. B. Formal models and semantics. Elsevier Science Publishers, Amsterdam, and MIT Press, Cambridge, MA, 1990, 133–191. MR 1127189 Zbl 0900.68316 q.v. 1433 [75] B. Trakhtenbrot, Конечные автоматы и логика одноместных предикатов. Sibirsk. Mat. Ž. 3 (1962), 103–131. English trastlation, Finite automata and monadic second order logic. AMS Transl. 59 (1966), 23–55. q.v. 1416

1456

Javier Esparza, Orna Kupferman, and Moshe Y. Vardi

[76] M. Y. Vardi, Nontraditional applications of automata theory. In Theoretical aspects of computer software (M. Hagiya and J. C. Mitchell, eds.). Proceedings of the Second International Symposium (TACS ’94) held at Tohoku University, Sendai, April 19–22, 1994. Lecture Notes in Computer Science, 789. Springer, Berlin, 1994, 575–597. MR 1296693 Zbl 0942.68600 q.v. 1416, 1422 [77] M. Y. Vardi, Alternating automata and program verification. In Computer science today (J. van Leeuwen, ed.). Recent trends and developments. Lecture Notes in Artificial Intelligence. Springer, Berlin, 1995, 471–485. MR 1389588 q.v. 1416 [78] M. Y. Vardi and P. Wolper, Yet another process logic. In Logics of programs (E. M. Clarke and D. Kozen, eds.). Proceedings of the fourth workshop held at Carnegie-Mellon University, Pittsburgh, PA, June 6–8, 1983. Lecture Notes in Computer Science, 164. Springer, Berlin, 1984, 501–512. MR 0778958 Zbl 0549.68020 q.v. 1416 [79] M. Y. Vardi and P. Wolper, An automata-theoretic approach to automatic program verification. In Proceedings of the 1 st IEEE Symposium on Logic in Computer Science. Held in Cambridge, 1986. IEEE Computer Society, Los Alamitos, CA, 1986, 332–344. q.v. 1415, 1426, 1450 [80] M. Y. Vardi and P. Wolper, Automata-theoretic techniques for modal logics of programs. J. Comput. System Sci. 32 (1986), no. 2, 183–221. 16 th annual ACM-SIGACT symposium on the theory of computing (Washington, D.C., 1984). MR 0851189 Zbl 0622.03017 q.v. 1416 [81] M. Y. Vardi and P. Wolper, Reasoning about infinite computations. Inform. and Comput. 115 (1994), no. 1, 1–37. MR 1303019 Zbl 0827.03009 q.v. 1416, 1423, 1424 [82] T. Wilke, CTLC is exponentially more succinct than CTL. In Foundations of software technology and theoretical computer science (C. P. Rangan, V. Raman, and R. Ramanujam, eds.). Proceedings of the 19 th Conference (FST&TCS) held in Chennai, December 13–15, 1999. Lecture Notes in Computer Science, 1738. Springer, Berlin, 1999, 110–121. MR 1776791 Zbl 0952.03017 q.v. 1433 [83] P. Wolper, M. Y. Vardi, and A. P. Sistla, Reasoning about infinite computation paths (extended abstract). In Proceedings of the 24 th Annual Symposium on Foundations of Computer Science. Held in Tucson, AZ, November 7–9, 1983. IEEE Press, Los Alamitos, CA, 1983, 185–194. IEEEXplore 4568076 q.v. 1416

Chapter 39

Automata and quantum computing Andris Ambainis and Abuzer Yakaryılmaz

Contents 1. 2. 3. 4. 5. 6. 7.

Introduction . . . . . . . Mathematical background Preliminaries . . . . . . One-way QFAs . . . . . Two-way QFAs . . . . . Other models and results Concluding remarks . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

1457 1458 1461 1463 1474 1480 1483

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1485

1. Introduction Quantum computing combines quantum physics and computer science, by studying computational models based on quantum physics (which is substantially different from conventional physics) and building quantum devices which implement those models. If a quantum computer is ever built, it will be able to solve certain computational problems much faster than conventional computers. The best known examples of such problems are factoring and the discrete logarithm. These two number-theoretic problems are thought to be very difficult for conventional computers, but can be solved efficiently (in polynomial time) on a quantum computer [94]. Since several widely used cryptosystems (such as RSA and Diffie– Hellman) are based on the difficulty of factoring or the discrete logarithm, a quantum computer would be able to break those cryptosystems, shaking the foundations of cryptography. Another, equally surprising discovery was made in 1996, by Lov Grover [50] who designed a quantum algorithm that p solves a general exhaustive search problem with N possible solutions in time O. N /. This provides a quadratic speedup for a range of search problems, from problems that are classically solvable in polynomial time to NP-complete problems. Many other quantum algorithms have been discovered since then. (More information can be found in the surveys [13] and [72], and the “Quantum algorithm zoo” website [57].)

1458

Andris Ambainis and Abuzer Yakaryılmaz

Given that finite automata are one of the most basic models of computation, it is natural to study them in the quantum setting. Soon after the discovery of Shor’s factoring algorithm [94], the first models of quantum finite automata (QFAs) appeared in [62] and [71]. A number of different models and questions about the power of QFAs and their properties have been studied since then. In this chapter, we cover most of this work. We particularly focus on the results that show advantages of QFAs over their classical 1 counterparts, because those results show how “quantumness” adds power to the computational models. We note that some of early research on QFAs also claimed that, in some contexts, QFAs can be weaker than their classical counterparts. This was due to the first definitions of QFAs being too restricted [102]. Quantum computation is a generalisation of classical computation [101] and QFAs should be able to simulate classical finite automata, if we define QFAs in a sufficiently general way. Therefore, we particularly emphasise the most general model of QFAs that fully reflect the power of quantum computation. We begin with an introductory section (§ 2) on the basics of quantum computation for readers who are not familiar with it. Then, we give the basic notation and conventions used throughout this chapter (§ 3). After that, § 4 presents the main results on 1-way QFAs and § 5 presents the main results on 2-way QFAs. Each of those sections covers different models of QFAs that have been proposed, the classes of languages that they recognise, their state complexity in comparison to the corresponding classical models, and decidability and undecidability results. In § 6, we describe the results about QFAs in less conventional models or settings (for example, interactive proof systems with a QFA verifier or QFAs augmented with extra resources beyond the usual quantum model). We conclude with a discussion of directions for future research in § 7. We also refer the reader to [90] for an introductory paper on quantum automata and to [83] for another survey on quantum automata.

2. Mathematical background In this section, we review the basics of quantum computation. We refer the reader to [76] for more information. Quantum systems. The simplest way of understanding the quantum model is by thinking of them as a generalisation of probabilistic systems. If we have a probabilistic system with m possible states 1; 2; : : : ; m, we can describe it by a probability distribution p1 ; : : : ; pm over those m possibilities. The probabilities pi must be nonnegative real numbers and satisfy p1 C  Cpm D 1. In the quantum case, the probabilities p1 ; : : : ; pm are replaced by amplitudes ˛1 ; : : : ; ˛m . The amplitudes can be complex numbers and must satisfy j˛1 j2 C    C j˛m j2 D 1. 1 In the context of quantum computing, “classical” means “non-quantum.” For finite automata, this usually means deterministic or probabilistic automaton.

39. Automata and quantum computing

1459

More formally, let us consider a quantum systems with m basis states (for some finite m) which we denote by jq1 i; jq2 i; : : : ; jqm i. A state of such a system is a linear combination of basis states with complex coefficients (called amplitudes) j i D ˛1 jq1 i C ˛2 jq2 i C    C ˛m jqm i

(1)

that must satisfy j˛1 j C    C j˛m j D 1. We say that j i is a superposition of jq1 i; : : : ; jqm i. For example, if we have a system with 2 basis states j0i and j1i, some of the possible superpositions are 45 j0i C 35 j1i, 54 j0i 35 j1i, and p12 j0i C p12 j1i. We can view j i as a vector consisting of amplitudes: 0 1 ˛1 B ˛2 C B C j i D B : C: @ :: A 2

2

˛m

Then the basis states jqi i are vectors with 1 in the i -th component and 0 everywhere else and (1) canpbe interpreted as the addition of vectors. The length of the vector j i is k k D j˛1 j2 C    C j˛m j2 . Thus, in vector language, the requirement that j˛1 j2 C    C j˛m j2 D 1 is equivalent to just saying that k k D 1. That is, a quantum state is a vector of length 1. Unitary transformations. A transformation on a quantum state is specified by a transformation matrix U . If the state before the transformation is j i, the state after the transformation is U j i. A transformation is valid (allowed by the rules of quantum physics) if and only if k k D 1 implies kU j ik D 1. Transformation matrices that satisfy this constraint are called unitary. A transformation U can be also specified by describing U jq1 i; : : : ; U jqm i. Then, for any j i D ˛1 jq1 i C ˛2 jq2 i C    C ˛m jqm i, we have U j i D ˛1 U jq1 i C ˛2 U jq2 i C    C ˛m U jqm i:

For example, if we have a system with 2 basis states j0i and j1i, we can specify a transformation H by saying that H maps 1 1 1 1 j0i ! p j0i C p j1i and j1i ! p j0i p j1i: (2) 2 2 2 2 This determines how H acts on superpositions of j0i and j1i. For example, (2) implies that H maps 54 j0i 53 j1i to  3 1  1 1 1 7 4 1 p j0i C p j1i p j0i p j1i D p j0i C p j1i: 5 5 2 2 2 2 5 2 5 2 Measurements. To obtain information about a quantum state, we have to measure it. The simplest measurement is observing j i D ˛1 jq1 i C ˛2 jq2 i C    C ˛m jqm i

with respect to jq1 i; : : : ; jqm i. It gives jqj i with probability j˛j j2 . (The equality k k D 1 guarantees that probabilities of different outcomes sum to 1.) After the

1460

Andris Ambainis and Abuzer Yakaryılmaz

measurement, the state of the system changes to jqj i and repeating the measurement gives the same state jqj i. For example, observing 54 j0i C 35 j1i gives 0 with probability  2 4 2 9 and 1 with probability 53 D 25 . D 16 5 25 Partial measurement. In the context of QFAs, it may be the case that we only need to know whether the state qi is an accepting state or not. In this case, we can perform a partial measurement. Let Q1 ; : : : ; Qk be a partition of ¹q1 ; q2 ; : : : ; qm º into disjoint subsets. Then, measuring a state j i D ˛1 jq1 i C ˛2 jqP 2 i C    C ˛m jqm i with respect to this partition gives result Qi with probability pi D qj 2Qi j˛j j2 and the state after the measurement is X ˛j p jqj i: pi qj 2Qi

For example, if the quantum state is 12 j1i C 12 j2i C 12 j3i C 12 j4i and the partition is Q1 D ¹1; 2º and Q2 D ¹3; 4º, a partial measurement would give the result Q1 with 2 2 probability 21 C 21 D 12 and that state after the measurement is

1 1 p j1i C p j2i: 2 2 Such a measurement tells whether a QFA accepts a string and, at the same time, preserves the part of the quantum state which consists of accepting states (or the part of the quantum state which consists of nonaccepting states).

Dirac notation. As already mentioned above, we can view a quantum state j i as a vector consisting of amplitudes. The conjugate transpose of this vector is denoted  /; where ˛i denotes the conjugate of the complex h j: h j D . ˛1 ˛2    ˛m  number ˛i . (If ˛i is real, then ˛i D ˛i .) If we multiply j i with h j (according to the usual rules for matrix multiplication), we get an m  m matrix: 0 1  ˛1 ˛1 ˛1 ˛2    ˛1 ˛m  C B ˛2 ˛1 ˛2 ˛2    ˛2 ˛m B C j ih j D B C: :: :: :: :: @ A : : : :    ˛m ˛1 ˛m ˛2    ˛m ˛m This is the density matrix of the state

.

Mixed states. A mixed state (or mixture) .pj ; j j i/ is a probabilistic combination of P several quantum states j j i, with probabilities pj (where pj > 0 for all j and j pj D 1). For such a state, its density matrix is just the sum of density matrices of j j i, weighted by their respective probabilities: X D pj j j ih j j: j

If we measure a mixed state .pj ; j j i/, the probabilities of different measurement outcomes can be calculated from the density matrix . Thus,  provides a complete description of a mixed state: there may be multiple decompositions .pj ; j j i/ that give

39. Automata and quantum computing

1461

the same matrix , but they are all equivalent with respect to any measurement that we may perform. Superoperators. In general, we can perform a sequence of unitary transformations and measurements on a quantum state, with each transformation possibly depending on the results of the previous measurements. Such sequence is called a completely positive superoperator or CPSO. Alternatively, a completely positive superoperator can be described by a sequence P of mm matrices E D ¹E1 ; : : : ; Ek º (called Kraus operators) such that kiD1 EiŽ Ei D I . Such a CPSO maps a mixed state with the density matrix  to a mixed state with the P density matrix 0 D E./ D kiD1 Ei EiŽ . The two definitions are equivalent: for any sequence of unitary transformations and measurements, there is a set of Kraus operators E1 ; : : : ; Ek which produces the same result and the other way around. A bistochastic quantum operation, say E D P ¹E1 ; : : : ; Em º, is a special kind of superoperator satisfying both kiD1 EiŽ Ei D I and Pk Ž i D1 Ei Ei D I .

3. Preliminaries Basic notation. Throughout the chapter, z D †[¹¢; $º;  † is the input alphabet not containing the end markers ¢ and $, and † for a given string w , its length is denoted jwj, its i -th symbol is denoted wi , and the string ¢w $ is denoted w z; Q is the set of internal states, q1 2 Q is the initial state, and, Qa  Q is the set of accepting state(s);  fM .w/ is the accepting probability (or the accepting value) of M on the string w ;  for a given vector (row or column) v , its i -th entry is denoted vŒi ;  for a given matrix A, its .i; j /-th entry is denoted AŒi; j .

Language recognition. Let M be a machine and  2 R. The language L  † recognised by M with (strict) cutpoint or nonstrict cutpoint  is defined as L D ¹w 2 † j fM .w/ > º

or L D ¹w 2 † j fM .w/ > º;

L D ¹w 2 † j fM .w/ > 0º

or

respectively. The language L  † is said to be recognised by M with unbounded error if there exists a cutpoint  such that L is recognised by M with strict or nonstrict cutpoint . In the following, we assume that fM .w/ 2 Œ0; 1 for all w 2 † . The language L  † recognised by M with positive or negative one-sided unbounded error is defined as L D ¹w 2 † j fM .w/ D 1º;

respectively. The language L  † is said to be recognised by M with error bound  (0 6  < 21 ) if (i) fM .w/ > 1  when w 2 L and (ii) fM .w/ 6  when w … L.

Andris Ambainis and Abuzer Yakaryılmaz

1462

This notion is also known as recognition with bounded error. Moreover, in the case of positive one-sided bounded error, fM .w/ D 0 when w … L; and, in the case of negative one-sided bounded error, fM .w/ D 1 when w 2 L. Transition amplitudes and probabilities. In both probabilistic and quantum finite automata (see [85] [80] and [62]), the transition values (probabilities or amplitudes) are traditionally allowed to be in R and in C, respectively. On the other hand, the transition values of Turing machines (TMs) – see [43], [15], and [100] – are often selected from the restricted subsets of R or C. For example, for probabilistic Turing machines, it is often assumed that, at each step, there are two possible choices, and the machine chooses each of them with probability 21 . In this chapter, we assume the most general model possible. That is, unless specified otherwise, the transition values of probabilistic (or quantum) machines are «, C z , and A to denote the computable real numbers, supposed to be in R (or C). We use R efficiently computable complex numbers, and algebraic numbers, respectively, see [15] and [100]. Classical finite automata. We will compare quantum automata with following models of classical automata:    

deterministic finite automaton (DFA), nondeterministic finite automaton (NFA), probabilistic finite automaton (PFA), and, generalised finite automaton (GFA) of Turakainen [95].

The first three models are quite widely known. Each of them can be studied both in a 1-way version (where the head of the automaton moves from the left to the right) and in a 2-way version (where the automaton is allowed to move in both direction). We refer to the 1-way models as 1DFA, 1NFA and 1PFA, and we refer to the 2-way models as 2DFA, 2NFA and 2PFA. A 1-way probabilistic finite automaton (1PFA) [85] can be described by a 5-tuple z q1 ; Qa /; where A is the transition matrix, i.e., A Œj; i  is P D .Q; †; ¹A j  2 †º; the probability of the transition from state qi to state Pqj when reading symbol  . We require that all A are stochastic: A Œj; i  > 0 and, j A Œj; i  D 1 for all i . The computation of a 1PFA can be traced by a probability vector v in which vŒi  is the probability of being in state qi . For a given input string w 2 † , the string w z D ¢w $ is read symbol by symbol, vi D Awzi vi 1 , where 1 6 i 6 jwj z and v0 is the initial state vector whose first entry is equal to 1. The acceptance probability of P on string w is defined as X fP .w/ D vjwj z Œi : qi 2Qa

If we allow the transition matrices A to be arbitrary matrices consisting of arbitrary real numbers, we obtain a generalised finite automaton (GFA), see [95]. Then, the range of fG ./ is real numbers and it is called accepting value instead of accepting probability.

39. Automata and quantum computing

1463

1DFAs, 1NFAs, 2DFAs, 2NFAs and 1PFAs with bounded error all recognise the same class of languages: the regular languages (REG). On the other hand, 2PFAs can recognise some nonregular languages [41] with bounded error, such as UPAL D ¹an b n j n > 0º, but require exponential expected runtime [39]. The languages recognised by 1PFAs with cutpoint (nonstrict cutpoint) form the class of stochastic languages (co-stochastic languages), denoted by S (coS). The set S [ coS forms the class of unbounded-error stochastic languages (uS). 1GFAs are equivalent to 1PFAs: the classes of languages recognised by 1GFAs with cutpoint and nonstrict cutpoint are also S and coS, respectively [95]. This equivalence makes 1GFAs ¯ facts about 1PFAs. Moreover, any language L, defined as ® useful for proving L D w j fP .w/ ¤ 21 for a 1PFA P, is called an exclusive stochastic language, and S ¤ denotes the class of all such languages. The complement of S¤ is denoted SD . The D class SD Q is a subset of S defined by 1PFAs with rational-valued transitions. The class ¤

SQ is defined similarly.

4. One-way QFAs A number of different definitions of one-way QFAs (1QFAs) have been proposed over the years. They can all be described in a similar framework, by specifying a 5-tuple y q1 ; R/; .†; Q; ¹T j  2 †º;

with the following specifications.  Q is a finite set of (classical) states. The (quantum) state P of a 1QFA can be any superposition of basis states ¹jqi j q 2 Qº: j i D q2Q ˛q jqi. In mixed state models, the 1QFA can also be in a mixture .pi ; j i i/ of such states.  j 0 i D jq1 i is the initial state of the 1QFA. z , we have a corresponding transformation T on the  For each symbol  2 † 1QFA’s current state. In simpler models, T is a unitary transformation, denoted U . In more general models, T can be a sequence of unitary transformations and measurements, with the next operations in the sequence depending on the previous ones.  R is a rule for determining how the 1QFA accepts the strings. Typically, R is specified by a set of accepting states (Qa  Q) and measuring the final state of the QFA in the standard basis. If an accepting state q 2 Qa is obtained, the automaton accepts. Otherwise the automaton rejects.

The models differ in the set of transitions T and acceptance rules R that are allowed. Example. Let p be an odd number. Consider the following 1QFA M in one symbol alphabet † D ¹aº. The set of states is Q D ¹q1 ; q2 º and the initial state is j 0 i D jq1 i.

Andris Ambainis and Abuzer Yakaryılmaz

1464

The transformations on the endmarkers are identities and the transformation Ua that corresponds to reading a is defined by Ua jq1 i D cos jq1 i C sin jq2 i; Ua jq2 i D

where  D

sin jq1 i C cos jq2 i;

2 : p

The acceptance rule R is as follows: at the end of computation, we measure the state. If the result is q1 , we accept; otherwise, we reject. (In other words, we have Qa D ¹q1 º.) The initial state is jq1 i. After reading the first symbol a, the quantum state becomes j 1 i D cos jq1 i C sin jq2 i: After reading the second symbol a, it becomes Ua j

1i

D cos  Ua jq1 i C sin  Ua jq2 i

D cos .cos jq1 i C sin jq2 i/ C sin . sin jq1 i C cos jq2 i/ D cos 2jq1 i C sin 2jq2 i:

We can show that each consecutive application of Ua also rotates the state in the plane formed by jq1 i; jq2 i by an angle of  D 2 p , as shown in Figure 1. Thus, we have the following lemma. jq2 i Ua3 jq1 i Ua2 jq1 i Ua jq1 i jq1 i

Figure 1. 2 state 1QFA M

Lemma 4.1 ([9]). After reading aj , the state of M is  2j   2j  cos jq1 i C sin jq2 i: p p  Therefore, M accepts aj with probability cos2 2j . If p divides j , then p  2j  D cos2 0 D 1: cos2 p Otherwise,  2j  < 1: cos2 p

39. Automata and quantum computing

1465

So, for any odd p > 2, M recognises MODp D ¹ai j p divides i º  with negative one-sided error bound cos2 p .

(3)

Models. We now describe some important 1QFA models in the order from the most restrictive to the most general. Let w be the input string. 1. Moore–Crutchfield quantum finite automaton (MCQFA) [71] is the most restricted of known QFA models. In this model, the transformations T ’s have to be unitary (U ). The acceptance rule R is of the form described above. We measure the state of the QFA after reading w z and accept if the measurement gives q 2 Qa . 2. Kondacs–Watrous quantum finite automaton (KWQFA) [62] is a model in which T ’s still have to be unitary (U ), but the acceptance rule R involves measurements after every step. The state set Q is partitioned into the set of accepting states Qa , the set of rejecting states Qr , and the set of nonhalting states Qn , i.e., Q D Qa [ Qr [ Qn . After reading each symbol w zi , we perform a partial measurement whether the state is in Qa , Qr , or Qn . If Qa (Qr ) is obtained, the computation is terminated and the input string is accepted (rejected). If Qn is obtained, the computation is continued by reading the next symbol w z i C1 when i < jwj z , and the computation is terminated and the input string is rejected when i D jwj z. 3. Latvian quantum finite automaton (LaQFA) [5] is a model in which each T can be a sequence U1 ; M1 ; : : : ; Um ; Mm consisting of unitary transformations U1 ; : : : ; Um and measurements M1 ; : : : ; Mm . It is required that the transformations in the sequence are independent of the outcomes of measurements Mi (1 6 i 6 m). The acceptance is by measuring the state of the QFA after reading w z and accepting if the measurement gives q 2 Qa . 4. Nayak quantum finite automaton (NaQFA) [74] combines KWQFA and LaQFA. The transformations T ’s of NaQFA are as for LaQFA, and the acceptance rule R is as for KWQFA. A bistochastic quantum finite automaton (BiQFA) [49] is a generalisation of NaQFA such that the transformations T ’s can be bistochastic quantum operations and the acceptance rule R is as for KWQFA. 5. General one-way quantum finite automaton (1QFA) ([54] and [113]) allows each T to be a sequence U1 ; M1 ; : : : ; Um ; Mm of unitary transformations and measurements. Moreover, each transformation Ui or measurement Mi can depend on the outcomes of previous measurements Mi 1 . This is the most general model. It has been discovered several times in different but equivalent forms: quantum finite automaton with ancilla qubits (QFA-A) [79], fully quantum finite automaton (CiQFA) [32], quantum finite automaton with control language (QFA-CL) [21], and one-way finite automaton with quantum and classical states (1QCFA) [122].

1466

Andris Ambainis and Abuzer Yakaryılmaz

Why do we have so many models? Initially, researchers did not recognise the power that comes from performing sequences of unitary transformations and measurements. For this reason, the first models of QFAs were defined in an unnecessarily restrictive form. Which is the right model? Physically, we can perform any sequence of measurements and unitary transformations. Hence, 1QFAs should be the main model. Among the more restricted models, MCQFA and LaQFA can be motivated by the fact that measuring the quantum state during each transformation Ta may be difficult. Because of that, it may be interesting to consider models in which measurements are restricted. One natural restriction is to allow only one measurement at the end of the computation, as in MCQFA. The other possibility is to allow intermediate measurements after each Ta , as long as the rest of computation does not depend on their outcomes. Such measurements are easier to realise than general measurements (for example, this is the case in liquid state NMR quantum computing [75]). This leads to LaQFA. There is no compelling physical motivation behind KWQFAs and NaQFAs. These models allow the computation to stop depending on the result of an intermediate measurement. If we are able to do that, it is natural that we also allow the next transformations to depend on the measurement outcome – which leads to the most general model of 1QFAs. 4.1. Simulations. In this section, we present some basic simulation results that relate the power of 1QFA to the power of their classical counterparts. First, any probabilistic automaton can be transformed into an equivalent QFA, if the model of QFA is sufficiently general. Theorem 4.2. For a given n-state 1PFA P, there exists an n-state 1QFA M such that fP .w/ D fM .w/ for any w 2 † .

This result easily follows from the fact that stochastic operators (transformation A in a 1PFA) are a special case of superoperators (used by 1QFAs). The proof of the theorem can be found in [54], [113], and [90] but it was known as a folklore result in quantum computing community long before that. The same result is also valid in many other settings, for example, for probabilistic and quantum Turing machines [100]. The second simulation shows how to convert a 1QFA to a GFA with a quadratic increase in the number of internal states. Theorem 4.3 ([71], [64], and [113]). For a given n-state 1QFA M, there exists a n2 -state GFA G such that fM .w/ D fG .w/ for any w 2 † .

Proof. If we apply a superoperator, say E, to a quantum system in a state , the new P state is 0 D E./ D kiD1 Ei EiŽ . From this expression, one can see that the entries of the density matrix 0 are linear combinations of the entries of . We can linearise the computation of a given 1QFA (with a quadratic increase in the size of the set of states [113]) in a following way. We transform the density matrix into a real-valued vector, replacing each complex entry of the density matrix with two real-valued elements of the vector. We choose the transition matrices A of the 1GFA

39. Automata and quantum computing

1467

so that they transform this vector in the same way as the superoperators T of the 1QFA transform the density matrix. Due to the equivalence between 1GFAs and 1PFAs, this simulation result is very useful. For example, it is used to show the equivalence of 1PFAs and 1QFAs in the unbounded error case (§ 4.4) and various decidability and undecidability results (§ 4.5). In the bounded-error case, 1QFAs can recognise only regular languages (similarly to 1PFAs). The pure state version of this result was first shown for KWQFA in [62], with the bounds on the number of states shown in [7]. Theorem 4.4 ([62] and [7]). If a language L is recognised by an n-state 1QFA with pure states (e.g., MCQFAs and 1KWQFAs) with bounded error, then it can be recognised by a 1DFA with 2O.n/ states. Proof. Let M be the minimal 1DFA that recognises L, and let N be the number of states of M. Let q1 and q2 be two states of M. Then there is a string w such that reading w in one of the two states q1 ; q2 leads to an accepting state, and reading it in the other state leads to a rejecting state. Let M0 be a 1QFA that recognises L. Let M be in state q1 (resp., q2 ) after reading w1 (resp., w2 ), and j 1 i and j 2 i be the states of M0 after reading w1 and w2 , respectively. Let T be the sequence of transformations that corresponds to reading w (including the final measurement that produces the answer that says whether M0 accepts or rejects the input string w ). If M0 correctly recognises L, then applying T to one of j 1 i; j 2 i leads to a “yes” answer with probability at least 32 , and applying T to the other state leads to a “no” answer with probability at least 23 . The next lemma provides a necessary condition for that. Lemma 4.5 ([15]). Let j

1i

D

n X i D1

˛i ji i

and j

2i

D

n X i D1

ˇi ji i:

Then, for all T , the probabilities of T producing a “yes” answer on j by at most k 1 2 k, where v u n uX t k 1 j˛i ˇi j2 : 2k D

1 i,

j

2i

differ

i D1

0

Hence, if a 1-way pure state QFA M with n-dimensional state space recognises L, there must be N pure states j 1 i; : : : ; j N i in n dimensions such that k i j k > 1=3 for all i; j such that i ¤ j . Such sets of states are known as quantum fingerprints and quite tight bounds for the maximum number of quantum fingerprints in n dimensions are known [31]. In particular, we know that N D 2O.n/ .

We now sketch the proof of a similar result for the general case. Simulations of general 1QFAs by DFAs can be found in several papers (for example, [66]) but we also provide an upper bound on the number of states.

1468

Andris Ambainis and Abuzer Yakaryılmaz

Theorem 4.6. If a language is recognised by an n-state 1QFA with mixed states (e.g., 2 1QFA) with bounded error, then it can be recognised by a 1DFA with 2O.n / states. Proof. The proof is similar to the previous theorem, but now we have to answer the question: how many mixed states i can one construct so that, for any i ¤ j , there is a sequence of transformations T that produces different outcomes (“yes” in one case and “no” in the other case) with probability at least 32 ? 2 The answer is that the number of such i in n dimensions is at most 2O.n / . This follows from the fact that a mixed state in n dimensions can be expressed as a mixture .pl ; j l i/ of at most n pure states. We can then approximate each of j l i by a state j l0 i from an  -net for the unit sphere in n dimensions. (An  -net is a set of states S 0 such that, for any j i, there exists j 0 i 2 S such that k k 6  .) Since one can O.n/ O.n/ n O.n2 / construct an  -net with 2 states, there will be .2 / D2 choices for the set of states .j 1 i; j 2 i; : : : ; j n i/. We also need to use another  -net for .p1 ; : : : ; pn /, but 2 the size of this  -net is 2O.n/ D 2o.n / . 4.2. Succinctness results. In [7], it was shown that 1QFAs can indeed be exponentially more succinct than 1PFAs. Let p be a prime and consider the language MODp defined by (3). Theorem 4.7 ([7]). (i) If p is a prime, any 1PFA recognising MODp has at least p states. (ii) For any  > 0, there is a MCQFA with O.log p/ states recognising MODp with error bound  . We now describe the construction of [7] (in a simplified form due to [9]). Let Mk , for k 2 ¹1; : : : ; p 1º, be the two-state MCQFA given in Lemma 4.1 for  . Thus, M accepts aj with probability cos2 2jk . If p divides j , then  D 2k k p p  2 2 2jk cos D cos 0 D 1. For j that are not divisible by p , about half of them are p belongs to one of the accepted with probability less than 21 . (This happens if 2jk p      3 7 5 intervals 2 m C 4 ; 2 m C 4 or 2 m C 4 ; 2 m C 4 .) That is, each of the MCQFAs Mk distinguishes aj 2 Lp from many (but not all) aj … Lp . We now combine O.log n/ of the Mk ’s into one MCQFA M that distinguishes aj 2 Lp from all aj … Lp . Let k1 ; : : : ; kd be a sequence of d numbers, for an appropriately chosen d D O.log p/. The set of states of M consists of 2d states q1;1 ; q1;2 ; q2;1 ; q2;2 ; : : : ; qd;1 ; qd;2 . The transformation for a is defined by  2k    2k   i i jqi;1i C sin jqi;2 i; Ua .qi;1 / D cos p p  2k    2k   i i jqi;1 i C cos jqi;2 i: Ua .qi;2 / D sin p p

That is, on states jqi;1i and jqi;2 i, the automaton M acts in the same way as Mki .

39. Automata and quantum computing

1469

The starting state is jq1;1 i. The transformation U¢ can be any unitary transformation that satisfies U¢ jq1;1 i D j 0 i where j 0 i D p1 .jq1;1 i C jq2;1 i C    C jqd;1 i/ and U$ d can be any unitary transformation that satisfies U$ j 0 i D jq1;1 i. The set of accepting states Qa consists of one state q1;1 . If p divides j , then, by Lemma 4.1, the transformation .Ua /j maps jqi;1 i to itself. Since this happens for every i , the state j 0 i is also left unchanged by .Ua /j . Thus, U$ maps j 0 i to jq1;1 i, the only accepting state. If j is not divisible by p , we have the following result: Theorem 4.8 ([9]). There is a choice of d D 2 log 2p  values k1 ; : : : ; kd 2 ¹1; : : : ; p 1º such that the MCQFA M described above rejects all aj … Lp with probability at least 1  . The proof of this theorem is nonconstructive. It was shown in [9] that a random choice k1 ; : : : ; kd works with high probability. An explicit construction of k1 ; : : : ; kd for a slightly larger d D O.log2C3 p/ is also given in [9]. Constructing an explicit set k1 ; : : : ; kd such that d D O.log p/ and M recognises MODp is still an open problem, which is linked to estimating exponential sums in number theory [29]. Currently, it is also open what the biggest possible advantage of general 1QFAs over 1PFAs or 1DFAs is. By Theorem 4.6, 1QFAs with n states can be simulated by 2 1DFAs with 2O.n / states. On the other hand, [42] and [44] give a 1QFA with n states for a language over a 2.n log n/ symbol alphabet that requires 2.n log n/ states on 1DFAs and 1PFAs. (The paper [42] also claims a similar result for a language over a 4-symbol alphabet, but the proof of that appears to be either incomplete or incorrect.) There has been a substantial amount of further work on the state complexity of 1QFAs. We highlight some results: 1. any periodic language with period n over a unary alphabet can be recognised p by a .2 6n/-state MCQFA with bounded error (see [70]); 2. there a language with period n over a one-symbol alphabet that requires q exists   logn n states to be recognised by MCQFAs (see [20]);

3. if the l1 norm of the Fourier transform of the characteristic function of L (for a periodic L over a one-symbol alphabet) is small, a 1QFA with a smaller number of states is possible (see [22]).

There are also some negative results for restricted models. For example, there is a language that is recognised by a 1DFA with n states, but requires 2.n/ states for NaQFAs [10]. Due to Theorem 4.4, no such result is possible for the more general models. 4.3. Bounded-error language recognition in restricted models. For language recognition with bounded error, we can put the models of QFAs in order from the weakest to

1470

Andris Ambainis and Abuzer Yakaryılmaz

the strongest: 1QFA QFA-CL MCQFA < LaQFA < KWQFA 6 NaQFA 6 BiQFA < QFA-A D 1DFA: CiQFA 1QCFA

(4)

It is open whether the inclusions KWQFA6 NaQFA and NaQFA6 BiQFA are strict. The class of languages recognised by MCQFAs with bounded error (RMO) is exactly the class of group languages, see [18], [30], and [71]. (See [81] for the definition and the details about group languages.) Belovs et al. [14] have shown that the power of MCQFAs can be increased by reading multiple symbols at a time. However, even in this case, they cannot recognise all regular languages, see [14] and [84]. Similar to MCQFAs, a complete characterisation of the class of languages recognised by LaQFAs (BLaQAL) was obtained by algebraic techniques [5]. Namely, BLaQAL is equal to the class whose syntactic monoid is in BG, i.e., block groups. Therefore, BMO  BLaQAL. On the other hand, BLaQAL is a proper subset of the class of languages recognised by KWQFAs with bounded error, i.e., BMM, since BMM contains ¹a¹a; bºº, see [5], which is not in BLaQAL. The classes of languages recognisable by other models of 1QFAs have not been characterised as precisely (and it is not clear whether they even have simple characterisations). While 1KWQFAs recognise more languages than MCQFAs and LaQFAs, they cannot recognise some regular languages [62]; for example, ¹¹a; bºaº … BMM. Researchers have also identified two different sets of properties on minimal 1DFAs that are called as forbidden constructions (see [7], [30], and [49]) such that (i) the languages corresponding to the first type of forbidden constructions cannot be recognised by KWQFAs with any error bound, and (ii) the languages corresponding to the second type of forbidden constructions can be recognised by KWQFAs but only with certain error bounds. Using forbidden constructions, it was also shown that BMM is not closed under intersection or union [8]. Moreover, KWQFAs can recognise more languages if the error bound gets closer to 12 , see [6]. NaQFAs and BiQFA share many of the properties of 1KWQFAs in the bounded error setting, see [69] and [49]. In [49], it was shown that any language recognised by a BiQFA with bounded error is in the language class ER, which is a proper subset of REG. The relative power of various models has also been studied for subclasses of regular languages. For unary regular languages, LaQFAs (and all models that are more powerful than LaQFAs) recognise all unary regular languages since all unary regular languages are in the BG language variety [48]. MCQFAs cannot recognise all unary regular languages because they cannot recognise any finite language. If we restrict ourselves to R1 languages (another proper subclass of the regular languages), the computational powers of KWQFA, NaQFA, and BiQFA are equivalent [49].

39. Automata and quantum computing

1471

4.4. Unbounded-error, nondeterminism, and alternation. In the unbounded error setting, the language recognition power of 1PFAs and 1QFAs (and GFAs) are equivalent. This result is followed by combining Theorems 4.2 and 4.3 and the simulation of GFAs by 1PFAs given in [95]. That is, due to Theorem 4.2, any language recognised by a 1PFA with cutpoint is also recognised by a 1QFA with cutpoint; due to Theorem 4.3, any language recognised by a 1QFA is also recognised by a GFA; and any language recognised by a GFA with cutpoint is stochastic [95]. Therefore, the class of languages recognised by 1QFAs with unbounded error is uS D S [ coS [113]. Note that it is still an open problem whether S is closed under complementation (p. 158 of [80]). Unlike in the bounded error case, KWQFAs are sufficient to achieve equivalence with 1PFAs in the unbounded error setting [109]. For weaker models of QFAs, MCQFAs can recognise a proper subset of uS with unbounded error (see [71] and [17]), and the class of languages recognised by MCQFAs with cutpoint is not closed under complement. Moreover, if a language is recognised by a MCQFA and the isolation gap 1 is p.n/ for some polynomial p , then it is a regular language [17]. For LaQFAs, it is still an open problem whether they can recognise every stochastic language with cutpoint. Nondeterministic version of quantum models are defined by fixing the error type to be positive one-sided unbounded error [3]. That is, all strings accepted with nonzero probability form the language recognised by the nondeterministic quantum model. Notice that the same definition also works for classical models. 1NQFAs are the nondeterministic version of 1QFAs. NQAL denotes the class of languages recognised by 1NQFAs, see [110]. The first result on the computational power of 1NQFAs was that some nonregular languages, such as NEQ D ¹w 2 ¹a; bº j jwja ¤ jwjb º, are recognised by 1NQFAs, see [17] and [30]. A complete characterisation of NQAL was given in [110]: NQAL D S¤ . Similarly to the unbounded-error case, the most restricted known 1NQFA model recognising all languages in S¤ is nondeterministic KWQFAs. Moreover, since any unary language in S¤ is regular (p. 89 of [88]), 1NQFAs and 1NFAs have the same computational power on unary languages. Setting the error type to negative one-sided unbounded error (M must accept all x 2 L with probability 1 and reject every x … L with a non-zero probability), we obtain one-way universal QFAs (1UQFAs). A language L is recognised by a 1UQFA if and only if its complement is recognisable by a 1NQFA. Recently, alternating quantum models were introduced as a generalisation of the nondeterministic quantum model [105] and it was shown that one-way alternating QFAs with "-moves 2 can recognise any recursively enumerable language. Their one-way variants are also powerful: they can recognise the NP-complete problem SUBSETSUM 2 and some nonregular and nonstochastic unary languages like ¹an j n > 0º with only two alternations, and the PSPACE-complete problem SUBSETSUM-GAME with unlimited alternations, see [107] and [36]. 2 The automaton can spend more than one step on each symbol.

1472

Andris Ambainis and Abuzer Yakaryılmaz

4.5. Decidability and undecidability results. In this section, we consider decidability and complexity of various problems involving one-way QFAs whose transitions are defined using computable numbers or a subset of computable numbers (e.g., rational or algebraic numbers). 4.5.1. Equivalence and minimisation.Two automata A1 and A2 are said to be equivalent if fA1 .x/ D fA2 .x/ for all input strings x 2 † and they are said to be l -equivalent if fA1 .x/ D fA2 .x/ for all input strings x 2 † of length at most l . It is known that any two GFAs G1 and G1 with n1 and n2 states are equivalent if and only if they are .n1 C n2 1/-equivalent (see [80] and [97]). 3 Due to Theorem 4.3, any n-state 1QFA can be converted to an equivalent n2 -state GFA. Therefore, it follows that two 1QFAs M1 and M2 with n1 and n2 states are equivalent, if and only if they are .n21 C n22 1/equivalent (also see [30], [64], and [65].) Since there is a polynomial-time algorithm for checking the equivalence of two GFAs with rational amplitudes [97], this implies that the equivalence of two 1QFAs with rational amplitudes can be checked in polynomial time. The minimisation of a given 1QFA with algebraic numbers is decidable: there is an algorithm that takes a 1QFA as an input and then outputs a minimal size 1QFA that is equivalent to A, see [68]. Moreover, the algorithm runs in exponential space if the transitions of 1QFAs are rational numbers, see [67] and [82]. In [23], the problem of finding the minimum MCQFA for a unary periodic language (given by a vector that describes which strings belong to the language) was studied, and it was shown that the minimum MCQFA can be constructed in exponential time. 4.5.2. Emptiness problems and problems regarding isolated cutpoint. We continue with five emptiness problems and two problems regarding isolated cutpoints. Let A be an automaton and  be a cutpoint. 1. 2. 3. 4. 5. 6. 7.

Given A and  2 Œ0; 1, does there exist w 2 † such that fA .w/ > ? Given A and  2 Œ0; 1, does there exist w 2 † such that fA .w/ 6 ? Given A and  2 Œ0; 1, does there exist w 2 † such that fA .w/ D ? Given A and  2 Œ0; 1, does there exist w 2 † such that fA .w/ > ? Given A and  2 Œ0; 1, does there exist w 2 † such that fA .w/ < ? Given A and  2 .0; 1/, is the cutpoint isolated? Given A, is there a cutpoint 0 that is isolated?

All of these problems are known to be undecidable for 1PFAs with rational-valued transitions and rational cutpoints, see [80], [16], [19], and [26]. Since 1PFAs are a restricted form of 1QFAs, it follows that problems 1–6 are undecidable for general 1QFAs with rational-valued transitions and rational cutpoints, and problem 7 is undecidable for general 1QFAs with rational-valued transitions. On the other hand, the situation is not straightforward for the restricted one-way QFAs ([27] and [37]): problems 1–3 are undecidable for MCQFAs with rational-valued 3 The method presented in [97] was given for 1PFAs but it can be easily applied to any linearised one-way computational model (also see the bilinear machine given in [64]).

39. Automata and quantum computing

1473

transitions and rational cutpoints. However, problems 4–6 are decidable for MCQFAs with algebraic-valued transitions and algebraic cutpoints, and problem 7 is decidable for MCQFAs with algebraic-valued transitions. Furthermore, problems 1–3 remain undecidable for 13-state MCQFAs with rationalvalued transitions and an input alphabet of size 7 (see [37]) and for 25-state MCQFAs with rational-valued transitions and a binary alphabet [53]. If algebraic transitions are allowed, the number of states in undecidability results for MCQFAs can be decreased by 6. For KWQFAs, problems 4 and 5 are undecidable for algebraic-valued transitions and rational cutpoints due to [55], [109], [113], and [56]: given a fixed rational cutpoint  2 Œ0; 1, a 1PFA P with algebraic-valued transitions can be transformed into a KWQFA M with algebraic-valued transitions so that for any string x , if fP .x/ < , fP .x/ D , or fP .x/ > , then fM .x/ < , fM .x/ D , or fM .x/ > , respectively. Therefore, the undecidability results for 1PFAs imply similar undecidability results for KWQFAs regarding problems 4 and 5, see [55] and [56]. Currently, it is open whether problems 4 and 5 are decidable for LaQFAs and whether problems 6 and 7 are decidable for the models between MCQFAs and general 1QFAs. In [34], a promise version of the emptiness problem for 1PFAs was considered, with a promise that either the automaton accepts at least one input w 2 † with probability at least 1  or the accepting probability is at most  for all w 2 † , where  < 12 . It was shown that this problem is undecidable for 1PFAs with rational-valued transitions. This implies that the same problem is also undecidable for 1QFAs. Recently, the emptiness problem for alternating QFAs, i.e., whether the given automaton defines an empty set or not, was examined in [36], with the following results: (i) the problem is decidable for NQFAs with algebraic-valued transitions on general alphabet and UQFAs with computable-valued transitions on unary alphabets, but (ii) it is undecidable for UQFAs on general alphabets and alternating 1QFAs on unary alphabets, where both are defined with rational-valued transitions. 4.5.3. Other problems. In [80] (Theorem 6.17 on p. 190), the problem of deciding whether the stochastic language recognised by a 1PFA P with cutpoint  is regular (or context-free) was shown to be undecidable for 1PFAs with rational-valued transitions and rational  2 Œ0; 1/. By the discussion above, the same problem is undecidable for 1QFAs with rational-valued transitions and rational cutpoints and KWQFAs with algebraic-valued transitions and rational cutpoints. A k -QFA classifier [23] is a system of k QFAs .M1 ; : : : ; Mk / on an alphabet † such that each Mi accepts at least one string with probability bigger than 21 and there is no string which is accepted by both Mi and Mj (for some i; j with i ¤ j ) with probability bigger than 12 . A complete k -QFA classifier is a k -QFA classifier such that each string is accepted by exactly one QFA with probability bigger than 12 . It was shown that [23] for any k > 2, it is decidable whether .M1 ; : : : ; Mk / is a k -QFA classifier. On the other hand, it is undecidable whether .M1 ; : : : ; Mk / is a complete k -QFA classifier.

1474

Andris Ambainis and Abuzer Yakaryılmaz

In [25], two polynomial-time algorithms were given for KWQFAs on a unary alphabet † D ¹aº with rational-valued transitions. A KWQFA M on a unary alphabet can be viewed as a quantum Markov chains. Then, its non-halting subspace decomposes into the ergodic and the transient subspaces (see [7] and [25] for the details). The first algorithm of [25] computes the dimensions of these subspaces. The second algorithm decides whether fM has a period of d > 0 such that fM .ak / D fM .akCd / for all k 2 N.

5. Two-way QFAs A two-way model has a read-only input tape on which the given input, say w , is written between ¢ (the left endmarker) and $ (the right endmarker) symbols. The tape square on which w z i is written is indexed by i , where 1 6 i 6 jwj z. A two-way QFA can be defined either as a fully quantum machine or a classical machine augmented with a finite-size quantum register (memory). The former model is known as two-way QFAs with quantum head (2QFAs), and the latter model is known as two-way QFAs with classical head (2QCFAs). 5.1. 2-way QFAs with classical head. A 2-way QFA with classical head (2QCFA), also known as a two-way finite automaton with quantum and classical states [11], is a 2-way automaton augmented with a quantum register. The computation is governed classically. In each step, the classical part applies a quantum operator to the quantum register and then updates itself by also taking into account any measurement outcome obtained from the quantum part. Formally, a 2QCFA M is an 8-tuple M D .S; Q; †; ı; q1 ; s1 ; sa ; sr /; where  S is the set of states for the classical part and Q is the set of basis states for the quantum part;  ı is a transition function (consisting of ıc and ıq that governs the classical part and the quantum part of the machine, described in more detail below);  s1 2 S and q1 2 Q are the initial states for the classical and the quantum part, respectively;  sa 2 S and sr 2 S (sa ¤ sr ) are the accepting and the rejecting states, respectively.

Each step of M has two stages: a quantum transition (ıq ) and then a classical transition (ıc ): z determine an  the classical state s 2 S n ¹sa ; sr º and the input symbol  2 † action ıq .s; / that is performed on the quantum register. This action can be a unitary transformation or a projective measurement;  then, the computation is continued classically. If ıq .s; / was a unitary transformation, then the classical transition ıc .s; / is an element of S ¹ 1; 0; C1º

39. Automata and quantum computing

1475

specifying a new classical state and a movement of the tape head (left, stationary, or right, respectively). If ıq .s; / is a measurement, the classical transition ıc .s; ; / is also an element of S  ¹ 1; 0; C1º, but is defined by a triple .c; ; / that includes the outcome  of the measurement on the quantum part. At the beginning of the computation, the head is on the left endmarker, the classical state is s1 , and the quantum state is jq1 i. The computation is terminated and the input is accepted (resp., rejected) when M enters the state sa (resp., sr ). It is obvious that any 2PFA can be simulated by a 2QCFA. A particular case of a 2QCFA is a 1QFA with restart: it reads the input from the left to the right in one-way mode, and if the computation does not halt (does not enter an accepting or rejecting state), the computation is restarted after reading the right endmarker [111]. Its probabilistic counterpart is 1PFA with restart. 5.1.1. Bounded-error language recognition. Unlike one-way models, 2QCFAs are more powerful than their classical counterpart (2PFAs) [11]:  the language EQ D ¹w 2 a; b  j jwja D jwjb º is recognised by 2QCFAs in polynomial expected time [11], but can be recognised by 2PFAs only in exponential time, see [41] and [39];  the language PAL D ¹w 2 ¹a; bº j w D w r º can be recognised by 2QCFAs in exponential expected time but cannot be recognised by 2PFAs (and more generally, by Turing machines with working tape of size o.log n/) at all.

We now describe 2QCFAs for these languages. Both of them execute an infinite loop with two parts. The first part is quantum and the second part is classical. 2QCFA for EQ. The 2QCFA M1 has two quantum states ¹q1 ; q2 º.  In the quantum part, M1 starts in state jq1 i in its quantum register and reads w from p left to right. Each time when M1 reads a, it applies a rotation by an angle 2 in the real jq1 i-jq2 i plane in p the counterclockwise direction. When M1 reads b , it applies a rotation by 2 in the clockwise direction. When M1 arrives at the right endmarker, the quantum register is measured in the computational basis and the input is rejected if jq2 i is observed. If w 2 EQ, the rotations in both directions cancel out and the final quantum state is exactly jq1 i. Therefore, w is never rejected. If w … EQ, then the final quantum state is always away from the jq1 i-axis and the resulting 1 rejecting probability can be bounded from below by 2jwj 2 (a nice property of p rotation angle 2 ). See Figure 2 for some details of the quantum phase.  In the classical part, M1 performs a classical procedure (two consecutive random walks on the string w and then k consecutive coin flips) that results 1 in accepting w with probability 2k jwj 2 for some k > 1, using expected time O.jwj2 /.

Andris Ambainis and Abuzer Yakaryılmaz

1476 jq2 i

q1 i

p

jq1 i 1 > p 2r p sin.r

initial state jq1 i

jq1 i

cos.r

after r rotations

jq2 i

j i q2 i

q2 i

a single rotiation with angle

p

the lower bound for the distance to p the x -axis after r rotations with angle

Figure 2. Some details of the quantum phase, partially taken from [90]

If w 2 EQ, then w is accepted with probability 1 in O.jwj4 / expected time, through the classical part of the loop. If w … EQ, the probability of rejection in the quantum part of the loop is larger than the probability of accepting in the classical part, i.e., w is k rejected with a probability at least 2k2C2 > 21 in O.jwj2 / expected time.

2QCFA for PAL. The 2QCFA M2 has three quantum states ¹q1 ; q2 ; q3 º.

 It starts the quantum phase in state jq1 i and reads the input from left to right twice. In the first scan, it applies 0 0 1 1 4 3 0 4 0 3 1@ 1 3 4 0 A and Ub D @ 0 5 0 A Ua D 5 5 0 0 5 3 0 4

for each a and b , respectively. In the second reading, it applies inverses of the matrices, respectively. Then, the quantum register is fully measured with respect jq1 i; jq2 i; jq3 i and the input is rejected if the result is not jq1 i. So, if w is a palindrome, the state ends in jq1 i, i.e., 1 1 Uwjwj jq1 i D Uwjwj

1

   Uw11 Uwjwj    Uw2 Uw1 jq1 i;

and so w is not rejected. Otherwise, the computation does not return to the initial quantum state exactly, which is away from jq1 i by at least a value exponentially small in the length of input, and the input is rejected with a probability at least 25 jwj (due to the certain properties of Ua and Ub ; see [11] for the details).  Similar to M1 , in the classical phase the input is accepted with a sufficiently small probability, i.e., 2 4kjwj for some k > 1. Thus, M2 accepts w with probability 1 if w 2 PAL and otherwise rejects w with a 16k probability at least 16kC25 > 12 .

39. Automata and quantum computing

1477

We note that M2 only uses rational-valued amplitudes. On the other hand, allowing arbitrary real numbers does not help 2PFAs to recognise PAL, see [40]. These results have been generalised in [111], by showing that all languages in ¤ D SQ [ SQ can be recognised by KWQFAs with restart (and so by 2QCFAs) with bounded error. SD Q contains many well-known languages: EQ, PAL, TWIN D ¹wcw j w 2 2

n

¹a; bº º, SQUARE D ¹an b n j n > 0º, POWER D ¹an b 2 j n > 0º, the word problem of finitely generated free groups, all polynomial languages defined in [96], and MULT D ¹x #y #z j x; y; z are natural numbers in binary notation and x  y D zº:

Note that KWQFAs with restart is the most restricted of known two-way QFA models that are more powerful than their classical counterparts (1PFAs with restart). 5.1.2. Succinctness results. 2QFAs can also be more succinct than their one-way versions and their classical counterparts, see [111] and [118]. The main result is that for any m > 0, MODm (the complement of language MODm defined in (3)) can be recognised by a 1QFA with restart (and so by a 2QCFA) with a constant number of states for any one-sided error bound. On the other hand, the number of states required by bounded-error 2PFAs increases when m gets bigger. This also implies a similar gap between 2QCFAs and 1QFA: due to Theorem 4.6, a 1QFA with a constant number of states can be simulated by a 1DFA (and, hence, 2PFA) with a constant number of states (where the constant may be exponentially larger). 5.1.3. Other results. In [86], the simulation of a restricted bounded-error 2QCFA by weighted automata was presented. No other “non-trivial” upper bound is known for bounded-error 2QCFAs. On the other hand, it was shown that, if we allow arbitrary transition amplitudes (including non-computable ones), bounded-error 2QCFAs can recognise uncountably many languages in polynomial time [91]. This is an evidence that 2QFAs can be very sensitive to the type of numbers that we use as transition amplitudes. 5.2. 2-way QFAs with quantum head. The definition of 2QFAs with quantum head is technically more difficult than that of 2QCFAs. Because of that, we only provide an informal definition and an example of a 2QFA, and refer the reader to [113] for the remaining details. Let M be an n-state 2-way automaton with the set of states ¹q1 ; : : : ; qn º and let w be an input string. Then the possible configurations of M on the input w can be described by pairs .qi ; j / consisting of automaton’s internal state i for 1 6 i 6 n and the location j for 1 6 j 6 jwj z in the input string which the automaton is currently reading. A probabilistic automaton (2PFA) can be in a probability distribution of the classical configurations during its computation. A 2QFA can be in a quantum state with basis states jqi ; j i. The evolution of a 2QFA is governed by quantum operators (measurements, unitary operators, superoperators, etc.).

1478

Andris Ambainis and Abuzer Yakaryılmaz

The automaton M evolves according to a transition rule that depends on the current state qi and the symbol w zj at the current location. For example, if M evolves unitarily, we have local transitions of the form X .q;wz / (5) ˛q 0 ;c j jq 0 ; j C ci; jq; j i ! q 0 2¹q1 ;:::;qn º c2¹ 1;0;1º

where c 2 ¹ 1; 0; 1º corresponds to moving left, staying in place, or moving right, .q;w z / and the transition amplitudes ˛q 0 ;c j depend on the state q before the transition and the symbol w zj that the automaton reads. By combining these transitions for all q and j , we get an operator UM .w/ that describes the evolution of the whole state space of M . This operator UM .w/ must be unitary for any w 2 † . This implies a finite list of constraints .q;w z / on the amplitudes ˛q 0 ;c j in the local transition rules (5), known as the well-formedness conditions, see [103] and [113]. To stop the computation, we perform a partial measurement on QFA’s quantum state after each application of UM .w/, with respect to the partition of basis states into the set of accepting states Qa , the set of rejecting states Qr , and the set of non-halting states Qn . If the result is Qa (resp., Qr ), the computation is terminated and the input is accepted (resp., rejected). Otherwise, the computation is continues. The model above is the first 2QFA model, called two-way KWQFA (2KWQFA), see [62]. Although some interesting results have been obtained based on this model, it is still open whether 2KWQFAs can simulate 2PFAs. The Hilbert space can also evolve by superoperators (see [113]), and then 2QFAs can simulate both 2QCFAs and 2PFAs exactly. If the head of a 2QFA is not allowed to move left, then we obtain a 1.5-way QFA (1.5QFA). Here “1.5” emphasises that the head is quantum and so it can be in more than one position during the computation. 5.2.1. Bounded-error language recognition. As described above, 2QFAs and 1.5QFAs can be in a superposition over different locations of the input tape, instead of only being in a superposition of states. This enables them to use the length of the input as a counter. We present a linear-time 1.5-way KWQFA for the language EQ using this idea. 1.5-way KWQFA for EQ. Our automaton M has 5 states ¹q1 ; q2 ; qw ; qa ; qr º, with q1 as the starting state. To determine whether the input should be accepted, we use the following measurement: if the computation is in a configuration containing qa (resp., qr ), then the input is accepted (resp., rejected). Otherwise, the computation goes on. The transitions are defined as follows: 4 4 All transitions that are omitted below are not significant and so they can be arbitrary by guaranteeing that the related operator is unitary.

39. Automata and quantum computing

1479

 on the left endmarker, the starting state jq1 i is mapped to p1 jq1 i C p1 jq2 i and 2 2 the head of M moves one square to the right;  on symbol a, M performs the mapping: jq1 i ! jqw i, jqw i ! jq1 i, jq2 i ! jq2 i, staying in place if the state after the transformation is jqw i and moving to the right otherwise;  on symbol b , M performs the mapping: jq2 i ! jqw i, jqw i ! jq2 i, jq1 i ! jq1 i, staying in place if the state after the transformation is jqw i and moving to the right otherwise;  on the right endmarker, M maps jq1 i ! p1 jqa i C p1 jqr i and jq2 i ! 2

2

p1 jqr i. 2

p1 jqa i 2

An example run of the machine is given in Figure 3, in which each arrow represents a single step. It is clear that after the second step, the head moves to the different squares of the tape until the end of the computation, where they meet again and so they affect each other. qw qw qw 1 p 2

q1 1 p 2

¢

q1 q1 q1 q1 q1 q1

q1

a

$

a

a

b

b

b

q2 q2 q2 q2 q2 q2

q2

1 p qa 2

C

1 p qr 2

qw qw qw

Figure 3. An example run of M

To analyze how M works, we observe that, on the left endmarker, it enters the state C p12 jq2 i. Every a symbol results in the q1 component moving to the right in two steps and the q2 component moving to the right in one step. Every b results in q1 moving to the right in one step and q2 moving to the right in two steps. If jwja D jwjb , the automaton reaches the right endmarker at the same time in q1 and q2 . If jwja ¤ jwjb , one of components reaches the endmarker earlier than the other. In the first case, applying the transformation on the right endmarker gives the configuration 1  1  1 1 jqa ; jwji z C jqr ; jwji z C jqa ; jwji z jqr ; jwji z D jqa ; jwji: z 2 2 2 2 So, the input is accepted with probability 1. In the second case, p1 jq1 i 2

1 1 jq1 i ! p jqa i C p jqr i 2 2

1 1 and jq2 i ! p jqa i p jqr i 2 2 applied on the right endmarker at different times and, in each case, qa and qr are obtained (observed) with equal probability. Thus, the input is accepted with probability 12 .

1480

Andris Ambainis and Abuzer Yakaryılmaz

The probability of accepting x … EQ can be decreased from 12 to k1 , for arbitrary k (see [62] and [108]), with the number of states in the automaton increasing to O.k 2 /, using the construction of [62] and to O.logc k/ using the construction of [108]. Currently we do not know any language separating 2QCFAs and 2QFA, or any language requiring exponential expected time by two-way QFAs. Also, even though 1.5KWQFAs can recognise non-regular languages (such as EQ), it is not known whether they can recognise all regular languages with bounded error. It is also open whether 2QFAs can recognise a nonstochastic language with bounded error. 5.2.2. Unbounded-error language recognition. The superiority of 2QFAs also holds in the unbounded error case. The language k ˇ ± ° X ˇ NH D ax bay1 bay2 b    ay t b ˇ x; t; y1 ; : : : ; y t 2 ZC and 9k .1 6 k 6 t/; x D yi i D1

is nonstochastic [73], but is recognised by 1.5-way KWQFAs, see [109] and [113] (by a generalisation of the technique used by 1.5-way KWQFAs for EQ in the previous section). This shows their superiority over probabilistic automata, because 2PFAs cannot recognise any nonstochastic language [59]. In fact, 1.5-way KWQFAs can recognise most of the nonstochastic languages defined in the literature, see [38] and [45]. We note that the best known upper bound (in terms of complexity classes) for unbounded-error 2QFAs (with algebraic-valued transitions) is P \ L2 , see [100]. (Also see [117], where certain relations and upper bounds were defined on the running time of 2KWQFAs under different recognition modes.) 5.2.3. Undecidability of emptiness problem. 1.5KWQFAs have the capability of checking successive equalities, e.g., an ban ban ban    (see [104]). This leads to the following result: the emptiness problem for one-way KWQFAs (with algebraic-valued transitions) is undecidable [4]. This is shown by a reduction from the halting problem for one-register machines, which is known to be undecidable.

6. Other models and results Interactive proof systems. An interactive proof system consists of two parties: the prover with an unlimited computational power and the verifier who is computationally limited. Both parties are given an input w and can send messages to one another. We say that a language L has an interactive proof system if there is a strategy for the verifier with the following two properties: a. if w 2 L, there exists a strategy for the prover such that, given a prover who acts according to this strategy, the verifier accepts with probability at least 23 ; b. if w … L, then, for every prover’s strategy, the verifier rejects with probability at least 23 .

39. Automata and quantum computing

1481

QIP.M / denotes the class of all languages that have quantum interactive proof systems with verifiers of type M . Obviously, if L is recognisable by a type M machine, it has a trivial interactive proof system in which the verifier runs its algorithm for recognising L and disregards the prover. Thus, QIP.M / can be much larger than M . For finite automata, BMM is smaller than REG, but QIP.KWQFA/ D REG. For 2-way automata, we have the upper bound QIPCz .poly-time 2KWQFA/  NP, see [77]. In the multiprover version of this model (denoted QMIP./), the verifier can communicate with multiple provers and it is guaranteed that the provers do not interact one with another. Then, we know [115]:

 CFL  QMIPCz .KWQFA/  NE,  QMIPCz .poly-time 2KWQFA/ D NEXP, and  every recursively enumerable language is in QMIPCz .2KWQFA/,

where CFL and NE are the classes of context-free languages and languages recognisable in time 2O.n/ . It is interesting to compare this with the classical case, where NEXP is equal to the class of all languages that have interactive proofs with a polynomial time Turing machine as the verifier (which is a much stronger model than a 2KWQFA). An Arthur–Merlin (AM) proof system is an interactive proof system in which all of the verifier’s probabilistic choices are visible to the prover. Thus, the prover has complete information about the computational state of the verifier. In the quantum version, the verifier has a quantum register and the outcome is sent to the prover whenever it is measured (so that the prover still has complete information about the state of the verifier). If the verifier is a 2QCFAs and all the transitions are restricted to rational numbers, we have the following results [106]:  AMQ .2QCFA/ contains ASPACE.n/ [ PSPACE and some NEXP-complete languages, 5 and  every recursively enumerable language is in weak-AMQ .2QCFA/ where the prefix “weak-” denotes the class of languages having a proof system where the non-members do not need to be rejected with high probability.

The first result should be contrasted with the fact that weak-AM.2PFA/ is a proper subset of P (see [40]), and the second result should be compared with the fact that every recursively enumerable language is in weak-IP.2PFA/ (see [34]), which is a similar result but uses a stronger computational model: IP instead of AM. If we allow real and computable real numbers as amplitudes, AMR .2QCFA/ contains all languages and AMR« .2QCFA/ is equivalent to the class of recursive languages [91]. Moreover, it was shown that AMA .poly-time 2QCFA/ contains a language that is not in AMR .poly-time 2pfa/, see [120]. Before closing this item, we also refer to [98] and [78] for further results on weaker QFA verifiers in different set-ups. 5 The proof of AMQ .2QCFA/ contains PSPACE will appear in an extended version of [106].

1482

Andris Ambainis and Abuzer Yakaryılmaz

Debate systems. A debate system is a generalisation of IP system, where the verifier interacts with a prover (who tries to convince the verifier that the input w 2 L) and a refuter (who tries to prove that the input w … L). If w 2 L, there should be a strategy for the prover such that, regardless of the refuter’s strategy, the verifier accepts with probability at least 23 . If w … L, the refuter should have a strategy such that, for every prover’s strategy, the verifier rejects with probability at least 23 . The debate version of AMQ .2QCFA/ has been shown to contain all recursive languages, see [114]. In contrast, the debate version of AMQ .2PFA/ is a subset of NP, see [33]. Postselection. Postselection is the ability to discard some outcomes at the end of the computation and make the decision based on the surviving outcomes (even though these outcomes might occur with a very small probability). For example, if we have a QFA with 3 basis states jq1 i; jq2 i; jq3 i, we could discard the jq3 i part of the final state of the QFA and make the accept/reject decision based on the part of the final state that consists of jq1 i and jq2 i. Postselection is not possible physically but is interesting as a thought experiment. It has been studied for both quantum circuits [1] and quantum automata, see [93] and [112]. It has been shown that 1QFAs (1PFA) with postselection have the same computational power as 1QFAs (1PFAs) with restart. Closed timelike curves. Similar to postselection, closed timelike curves (CTC) are a model which is impossible physically, but is interesting as a thought experiment. A CTC is a device that allows sending information back in time, to previous steps of the computation, as long as this does not result in inconsistencies in the computation. In [92] and [89], 1QFAs and 1PFAs with the capability of sending one classical bit from the end of the computation to the beginning of the computation through a CTC have been examined. Surprisingly, it was shown that such 1QFAs can simulate 1QFAs with postselection, and vice versa, when their transitions are restricted to rational numbers. The same result was obtained also for 1PFAs, even for arbitrary transition probabilities. Promise problems. Promise problems are computational tasks where the goal is to separate two languages L1 ; L2 with L1 \ L2 D ; (the automaton must accept all w 2 L1 , reject all w 2 L2 , and is allowed to output any answer for w … L1 [ L2 ). Promise problems allow us to show separations between types of automata that are equivalent in the standard setting of recognising languages. For example, for the case of exact computation (no error allowed), 1QFAs cannot be more concise than 1PFAs [61]. On the other hand, for promise problems, the superiority of 1QFAs over 1PFAs can be unbounded [12]: there exists an infinite family of promise problems that can be solved exactly by tuning the transition amplitudes of a two-state MCQFA, while the size of the corresponding classical automata grows to infinity, see [87] and [47]. Recently, this result was generalised in [51] and [24], and further succinctness results were given in [121], [46], [52], and [118].

39. Automata and quantum computing

1483

Several results about the computational power of QFAs on promise problems have been obtained in [87] and[46]. For example, there is a binary promise problem solvable by a Las Vegas 1QFA and a unary promise problem solvable by a bounded-error 1QFA, but none of them can be solved by any bounded-error 1PFA. (For language recognition, these one-way models are of equal power and recognise exactly REG.) Moreover, there is a promise problem solvable by an exact 2QCFA in exponential expected time, but not by any bounded-error sublogarithmic space probabilistic Turing machine. No similar example is known for language recognition. Additionally, in [119], a particular subset of promise problems solvable by one-way classical and quantum models was considered, and certain separation results were obtained. Advice. In computation with advice, the automaton is provided extra information, called advice, which depends on the length of the input w but not on the particular w . Advice is a well known notion in complexity theory, but has not been studied much in the setting of QFAs. The first model was introduced in [116], but was based on KWQFAs. 6 As a result, some regular languages were shown not to be recognised by this model, with advice of up to linear size. Recently, this framework was generalised in [63], which can be a good starting point for studying QFAs with advice. Determining the bias of a coin. In [2], the state complexities of 1QFAs and 1PFAs were compared for the problem of determining the bias of a coin, if it is known that the coin lands “heads” with probability either p or p C  for some known p and  . A 1QFA can distinguish between the two cases with a number of states that is independent of p  and  , while any bounded-error 1PFA must have  p.1 p/ states [2]. Recently, it was also proven that [60] there  is no 1QFA having the following property: simultaneously 1 1 for every  2 ; n ¹0º, given access to an infinite sequence of coin tosses, if 2 2 1 the coin is 2 C  -biassed then the automaton spends at least 23 of its time guessing “biassed”, and if the coin is fair then the automaton spends at least 23 of its time guessing “fair.” Learning theory. The problem of learning probability distributions produced by QFA sources, i.e., identifying an unknown QFA from examples of its behaviour, was studied in [58]. Information-theoretically, QFAs can be learned from a polynomial number of examples, similarly to classical hidden Markov models. However, computationally, the problem is as hard as learning noisy parities, a very difficult problem in computational learning theory [28].

7. Concluding remarks Quantum finite automata (QFAs) combine the theory of finite automata with quantum computing. Many different models and aspects of QFAs have been studied and this research topic has recently celebrated its 20-year anniversary. 6 Also, note that the usage of advice defined in [116] is different than the usual definition for classical finite automata [35].

1484

Andris Ambainis and Abuzer Yakaryılmaz

There are some contexts in which quantum models are of the same power as classical models (for example, language recognition of 1QFAs with bounded or unbounded error) or have similar properties as classical models (for example, undecidability of the emptiness problem for 1-way automata). On the other hand, there are many cases in which quantum models are superior to classical models (for example, succinctness results for almost all models, nondeterministic language recognition power, and language recognition power of 2QFAs with bounded or unbounded error). Besides these, there are many research questions that are still open. Among restricted one-way QFAs, LaQFAs deserve a special attention. Moreover, it would be interesting to find more examples where QFAs can be substantially smaller than DFAs and PFAs. So far, most examples are periodic languages over a unary alphabet (see, e.g., [7] and [70]) or their simple generalisations. This raises a question: for what non-unary languages do QFAs achieve a quantum advantage in a non-trivial way? Investigating the state complexity of “non-uniform” QFAs is another interesting direction (see [99] for an example of measuring the state complexity (of the restricted QFA models) by fixing the input length). Compared to one-way models, two-way QFA models have not been widely examined, and there are many open problems related to them. Furthermore, promise problems, interactive proof systems, and computation with advice are hot new topics having connections with computational complexity. Further research on them will likely provide new insights. Another promising direction is connections of QFAs with algebra and using algebraic methods to study the power of QFAs. Acknowledgement. We are grateful to A. C. Cem Say and John Watrous for their helpful comments on the subject matter of this chapter. We would like to thank our anonymous referee for his/her helpful comments, and Narad Rampersad and Jeffrey Shallit for their suggestions to improve the language of the chapter significantly. We also would like to thank Marats Golovkins, Paulo Mateus, Emmanuel Jeandel, Carlo Mereghetti, Farid Ablayev, Daowen Qiu, Jozef Gruska, and James P. Crutchfield for kindly answering our questions. A. Yakaryılmaz would like to sincerely thank his Ph.D. supervisor A. C. Cem Say for introducing him to the field of quantum computation and for their collaborative work where he has learned much and gained a great deal of experience. A. Ambainis was supported by ERC Advanced Grant MQC and FP7 FET Proactive project QALGO. A. Yakaryılmaz was partially supported by TÜBİTAK with grant 108E142, CAPES with grant 88881.030338/2013-01, ERC Advanced Grant MQC, and FP7 FET projects QALGO and QCS. A. Yakaryılmaz worked on the chapter when he was in Boğaziçi University in Turkey, University of Latvia, and LNCC in Brazil.

39. Automata and quantum computing

1485

References [1] S. Aaronson, Quantum computing, postselection, and probabilistic polynomial-time. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 461 (2005), no. 2063, 3473–3482. MR 2171273 Zbl 1337.81032 q.v. 1482 [2] S. Aaronson and A. Drucker, Advice coins for classical and quantum computation. In Automata, languages and programming (L. Aceto, M. Henzinger, and J. Sgall, eds.). Proceedings of the 38 th International Colloquium (ICALP 2011) held in Zurich, July 4–8, 2011. Lecture Notes in Computer Science, 6755. Springer, Berlin, 2011, 61–72. MR 2874095 Zbl 1332.68052 q.v. 1483 [3] L. M. Adleman, J. DeMarrais, and M.-D. A. Huang, Quantum computability. SIAM J. Comput. 26 (1997), no. 5, 1524–1540. MR 1471992 Zbl 0895.68043 q.v. 1471 [4] M. Amano and K. Iwama. Undecidability on quantum finite automata. In Annual ACM Symposium on Theory of Computing (J. S. Vitter, L. L. Larmore, and F. T. Leighton, eds.). Proceedings of the 31st Symposium (STOC ’99) held in Atlanta, GA, May 1–4, 1999. Association for Computing Machinery, New York, 1999, 368–375. MR 1798057 Zbl 1346.68090 q.v. 1480 [5] A. Ambainis, M. Beaudry, M. Golovkins, A. K, ikusts, M. Mercer, and D. Thérien, Algebraic results on quantum automata. Theory Comput. Syst. 39 (2006), no. 1, 165–188. MR 2189805 Zbl 1101.68029 q.v. 1465, 1470 [6] A. Ambainis, R. Bonner, R. Freivalds, and A. K, ikusts, Probabilities to accept languages by quantum finite automata. In Computing and combinatorics (T. Asano, H. Imai, D. T. Lee, S.-I. Nakano, and T. Tokuyama, eds.). Proceedings of the 5th Annual International Conference (COCOON’99) held in Tokyo, July 26–28, 1999. Lecture Notes in Computer Science, 1627. Springer, Berlin, 1999, 174–183. MR 1730333 Zbl 0944.68118 q.v. 1470 [7] A. Ambainis and R. Freivalds, 1-way quantum finite automata: strengths, weaknesses and generalizations. In Proceedings 39 th Annual Symposium on Foundations of Computer Science. Held in Palo Alto, CA, November 8–11, 1998. IEEE Computer Society, Los Alamitos, CA, 1998, 332–341. IEEEXplore 743469 q.v. 1467, 1468, 1470, 1474, 1484 [8] A. Ambainis, A. K, ikusts, and M. Valdats, On the class of languages recognizable by 1-way quantum finite automata. In STACS 2001 (A. Ferreira and H. Reichel, eds.). Proceedings of the 18 th Annual Symposium on Theoretical Aspects of Computer Science held in Dresden, February 15–17, 2001. Lecture Notes in Computer Science, 2010. Springer, Berlin, 2001, 75–86. MR 1890780 Zbl 0976.68087 q.v. 1470 [9] A. Ambainis and N. Nahimovs, Improved constructions of quantum automata. Theoret. Comput. Sci. 410 (2009), no. 20, 1916–1922. MR 2517650 Zbl 1163.68020 q.v. 1464, 1468, 1469 [10] A. Ambainis, A. Nayak, A. Ta-Shma, and U. Vazirani, Dense quantum coding and quantum finite automata. J. ACM 49 (2002), no. 4, 496–511. MR 2146458 Zbl 1326.68133 q.v. 1469 [11] A. Ambainis and J. Watrous, Two–way finite automata with quantum and classical states. Theoret. Comput. Sci. 287 (2002), no. 1, 299–311. MR 1944456 Zbl 1061.68047 q.v. 1474, 1475, 1476 [12] A. Ambainis and A. Yakaryılmaz, Superiority of exact quantum automata for promise problems. Inform. Process. Lett. 112 (2012), no. 7, 289–291. MR 2879167 Zbl 1237.68082 q.v. 1482

1486

Andris Ambainis and Abuzer Yakaryılmaz

[13] D. Bacon and W. van Dam, Recent progress in quantum algorithms. Comm. ACM 53 (2010), no. 2, 84–93. q.v. 1457 [14] A. Belovs, A. Rosmanis, and J. Smotrovs, Multi-letter reversible and quantum finite automata. In Developments in language theory (T. Harju, J. Karhumäki, and A. Lepistö, eds.). Proceedings of the 11th International Conference (DLT 2007) held at the University of Turku, Turku, July 3–6, 2007. Lecture Notes in Computer Science, 4588. Springer, Berlin, 2007, 60–71. MR 2380420 Zbl 1179.68066 q.v. 1470 [15] E. Bernstein and U. Vazirani, Quantum complexity theory. SIAM J. Comput. 26 (1997), no. 5, 1411–1473. MR 1471988 Zbl 0895.68042 q.v. 1462, 1467 [16] A. Bertoni, The solution of problems relative to probabilistic automata in the frame of the formal languages theory. In Vierte Jahrestagung der Gesellschaft für Informatik. Herausgegeben im Auftrag der GI von D. Siefkes. Lecture Notes in Computer Science, Vol. 26. Springer, Berlin, 1975, 107–112. MR 0386897 Zbl 0327.94069 q.v. 1472 [17] A. Bertoni and M. Carpentieri, Analogies and differences between quantum and stochastic automata. Theoret. Comput. Sci. 262 (2001), no. 1–2, 69–81. MR 1836211 Zbl 0983.68094 q.v. 1471 [18] A. Bertoni and M. Carpentieri, Regular languages accepted by quantum automata. Inform. and Comput. 165 (2001), no. 2, 174–182. MR 1823662 Zbl 1003.68061 q.v. 1470 [19] A. Bertoni, G. Mauri, and M. Torelli, Some recursively unsolvable problems relating to isolated cutpoints in probabilistic automata. In Automata, languages and programming (A. Salomaa and M. Steinby, eds.). Fourth Colloquium, held at the University of Turku, Turku, July 18–22, 1977. Lecture Notes in Computer Science, 52. Springer, Berlin, 1977, 87–94. MR 0460000 Zbl 0366.94064 q.v. 1472 [20] A. Bertoni, C. Mereghetti, and B. Palano, Lower bounds on the size of quantum automata accepting unary languages. In Theoretical computer science (C. Blundo and C. Laneve, eds.). Proceedings of the 8 th Italian Conference (ICTCS2003) held in Bertinoro, October 13–15, 2003. Lecture Notes in Computer Science, 2841. Springer, Berlin, 2003, 86–96. MR 2072859 Zbl 1257.68096 q.v. 1469 [21] A. Bertoni, C. Mereghetti, and B. Palano, Quantum computing: 1-way quantum automata. In Developments in language theory (Z. Ésik and Z. Fülöp, eds.) Papers from the 7 th International Conference (DLT 2003) held at the University of Szeged, Szeged, July 7–11, 2003. Lecture Notes in Computer Science, 2710. Springer, Berlin, 2003., 1–20. MR 2053870 Zbl 1037.68058 q.v. 1465 [22] A. Bertoni, C. Mereghetti, and B. Palano, Small size quantum automata recognizing some regular languages. Theoret. Comput. Sci. 340 (2005), no. 2, 394–407. MR 2150762 Zbl 1087.68047 q.v. 1469 [23] A. Bertoni, C. Mereghetti, and B. Palano, Some formal tools for analyzing quantum automata. Theoret. Comput. Sci. 356 (2006), no. 1–2, 14–25. MR 2217824 Zbl 1160.68375 q.v. 1472, 1473 [24] M. P. Bianchi, C. Mereghetti, and B. Palano, Complexity of promise problems on classical and quantum automata. In Computing with new resources (C. S. Calude, R. Freivalds, and K. Iwama, eds.). Essays dedicated to J. Gruska on the occasion of his 80 th birthday. Lecture Notes in Computer Science, 8808. Springer, Cham, 2014, 161–175. MR 3332390 Zbl 1323.68338 q.v. 1482 [25] M. P. Bianchi and B. Palano, Behaviours of unary quantum automata. Fund. Inform. 104 (2010), no. 1–2, 1–15. MR 2791738 Zbl 1214.68191 q.v. 1474

39. Automata and quantum computing

1487

[26] V. D. Blondel and V. Canterini, Undecidable problems for probabilistic automata of fixed dimension. Theory Comput. Syst. 36 (2003), no. 3, 231–245. MR 1962327 Zbl 1039.68061 q.v. 1472 [27] V. D. Blondel, E. Jeandel, P. Koiran, and N. Portier, Decidable and undecidable problems about quantum automata. SIAM J. Comput. 34 (2005), no. 6, 1464–1473. MR 2165750 Zbl 1078.81012 q.v. 1472 [28] A. Blum, M. Furst, M. Kearns, and R. J. Lipton, Cryptographic primitives based on hard learning problems. In Advances in cryptology – CRYPTO ’93 (D. R. Stinson, ed.). Proceedings of the Thirteenth Annual International Cryptology Conference held at the University of California, Santa Barbara, CA, August 22–26, 1993. Lecture Notes in Computer Science, 773. Springer, Berlin, 1994, 278–291. MR 1288971 Zbl 0870.94021 q.v. 1483 [29] J. Bourgain, Estimates on exponential sums related to Diffie–Hellman distributions. Geom. Funct. Anal. 15 (2005), no. 1, 1–34. MR 2140627 Zbl 1102.11041 q.v. 1469 [30] A. Brodsky and N. Pippenger, Characterizations of 1-way quantum finite automata. SIAM J. Comput. 31 (2002), no. 5, 1456–1478. MR 1936654 Zbl 1051.68062 q.v. 1470, 1471, 1472 [31] H. Buhrman, R. Cleve, J. Watrous, and R. de Wolf, Quantum fingerprinting. Phys. Rev. Lett. 87 (2001), no. 16, 167902, 4 pp. q.v. 1467 [32] M. P. Ciamarra, Quantum reversibility and a new model of quantum automaton. In Fundamentals of computation theory (R. Freivalds, ed.). Proceedings of the 13th international symposium, FCT 2001, Riga, Latvia, August 22–24, 2001. Lecture Notes in Computer Science. 2138. Springer, Berlin, 376–379. Zbl 0999.68512 q.v. 1465 [33] A. Condon, Computational models of games. ACM Distinguished Dissertations. MIT Press, Cambridge, MA, 1989. MR 1100709 q.v. 1482 [34] A. Condon and R. J. Lipton, On the complexity of space bounded interactive proofs (extended abstract). In 30 th Annual Symposium on Foundations of Computer Science. Held in Research Triangle Park, NC, October 30–November 1, 1998. IEEE Computer Society, Los Alamitos, CA, 1998, 462–467. IEEEXplore 63519 q.v. 1473, 1481 [35] C. Damm and M. Holzer, Automata that take advice. In Mathematical foundations of computer science 1995 (J. Wiedermann and P. Hájek, eds.) Proceedings of the 20 th International Symposium (MFCS ’95) held in Prague, August 28–September 1, 1995. Lecture Notes in Computer Science, 969. Springer, Berlin, 1995, 149–158. MR 1467255 Zbl 1193.68152 q.v. 1483 [36] H. G. Demirci, M. Hirvensalo, K. Reinhardt, A. C. C. Say, and A. Yakaryılmaz, Classical and quantum realtime alternating automata. In Proceedings of the 6 th Workshop on NonClassical Models for Automata and Applications (S. Bensch, R. Freund, and F. Otto, eds.). books ocg.at, 304. Austrian Computer Society, Vienna, 2014, 101–114. q.v. 1471, 1473 [37] H. Derksen, E. Jeandel, and P. Koiran, Quantum automata and algebraic groups. J. Symbolic Comput. 39 (2005), no. 3–4, 357–371. MR 2168287 Zbl 1124.81004 q.v. 1472, 1473 [38] P. D. Diêu, Критерии представимости языков в вероятностных автоматах (Criteria of representability of languages in probabilistic automata). Kibernetika (Kiev) 1977, no. 3, 39–50, in Russian. English translation, Cybernetics 13 (1977), no. 3, 352–364. MR 0460001 q.v. 1480 [39] C. Dwork and L. J. Stockmeyer, A time complexity gap for two-way probabilistic finitestate automata. SIAM J. Comput. 19 (1990), no. 6, 1011–1023. MR 1069095 Zbl 0711.68075 q.v. 1463, 1475

1488

Andris Ambainis and Abuzer Yakaryılmaz

[40] C. Dwork and L. J. Stockmeyer, Finite state verifiers. I. The power of interaction. J. Assoc. Comput. Mach. 39 (1992), no. 4, 800–828. MR 1187213 Zbl 0799.68099 q.v. 1477, 1481 [41] R. Freivalds, Probabilistic two-way machines. In Mathematical foundations of computer science (J. Gruska and M. Chytil, eds.) Proceedings of the 10 th Symposium (Štrbské Pleso, 1981). Lecture Notes in Computer Science, 118. Springer, Berlin, 1981, 33–45. MR 0652738 Zbl 0486.68045 q.v. 1463, 1475 [42] R. Freivalds, Super-exponential size advantage of quantum finite automata with mixed states. In Algorithms and computation (S. Hong, H. Nagamochi, and T. Fukunaga, eds.). Lecture Notes in Computer Science, 5369. Springer, Berlin, 2008, 931–942. MR 2539983 Zbl 1183.68339 q.v. 1469 [43] R. Freivalds and M. Karpinski, Lower space bounds for randomized computation. In Automata, languages and programming (S. Abiteboul and E. Shamir, eds.) Proceedings of the Twenty-first International Colloquium (ICALP ’94) held in Jerusalem, July 11–14, 1994. Lecture Notes in Computer Science, 820. Springer, Berlin, 1994, 580–592. MR 1334131 Zbl 1418.68094 q.v. 1462 [44] R. Freivalds, M. Ozols, and L. Mančinska, Improved constructions of mixed state quantum automata. Theoret. Comput. Sci. 410 (2009), no. 20, 1923–1931. MR 2517651 Zbl 1163.68022 q.v. 1469 [45] R. Freivalds, A. Yakaryılmaz, and A. C. C. Say, A new family of nonstochastic languages. Inform. Process. Lett. 110 (2010), no. 10, 410–413. MR 2662059 Zbl 1229.68048 q.v. 1480 [46] A. Gainutdinova and A. Yakaryılmaz, Unary probabilistic and quantum automata on promise problems. Quantum Inf. Process. 17 (2018), no. 2, Paper No. 28, 17 pp. MR 3740504 Zbl 1402.81087 q.v. 1482, 1483 [47] V. Geffert and A. Yakaryılmaz, Classical automata on promise problems. Discrete Math. Theor. Comput. Sci. 17 (2015), no. 2, 157–180. MR 3400314 Zbl 1333.68166 q.v. 1482 [48] M. Golovkins, private communication, September 2012. q.v. 1470 [49] M. Golovkins, M. Kravtsev, and V. Kravcevs, Quantum finite automata and probabilistic reversible automata: R-trivial idempotent languages. In Mathematical foundations of computer science 2011 (F. Murlak and P. Sankowski, eds.). Proceedings of the 36 th International Symposium (MFCS 2011) held in Warsaw, August 22–26, 2011. Lecture Notes in Computer Science, 6907. Springer, Berlin, 2011, 351–363. MR 2881708 Zbl 1343.68138 q.v. 1465, 1470 [50] L. Grover, A fast quantum mechanical algorithm for database search. In Proceedings of the Twenty-eighth Annual ACM Symposium on the Theory of Computing (G. L. Miller, ed.) STOC 1996. Held in Philadelphia, PA, May 22–24, 1996. Association for Computer Machinery, New York, 1996, 212–219. MR 1427516 Zbl 0922.68044 q.v. 1457 [51] J. Gruska, D. Qiu, and S. Zheng, Potential of quantum finite automata with exact acceptance. Internat. J. Found. Comput. Sci. 26 (2015), no. 3, 381–398. MR 3366960 Zbl 1327.81128 q.v. 1482 [52] J. Gruska, D. Qiu, and S. Zheng, Generalizations of the distributed Deutsch–Jozsa promise problem. Math. Structures Comput. Sci. 27 (2017), no. 3, 311–331. MR 3606717 Zbl 1364.68211 q.v. 1482 [53] M. Hirvensalo, Improved undecidability results on the emptiness problem of probabilistic and quantum cut-point languages. In SOFSEM 2007: Theory and practice of computer science (J. van Leeuwen, G. F. Italiano, W. van der Hoek, C. Meinel, H. Sack, and F. Plasil, eds.) Proceedings of the 33rd Conference on Current Trends in Theory and Practice of

39. Automata and quantum computing

[54] [55]

[56] [57] [58] [59]

[60]

[61]

[62]

[63] [64] [65] [66] [67] [68] [69]

1489

Computer Science held in Harrachov, January 20–26, 2007. Lecture Notes in Computer Science, 4362. Springer, Berlin, 2007. 309–319. MR 2497100 Zbl 1132.68039 q.v. 1473 M. Hirvensalo, Quantum automata with open time evolution. Int. J. Nat. Comput. Res. 1 (2010), no. 1, 70–85. q.v. 1465, 1466 E. Jeandel, Indécidabilité sur les automates quantiques. Master’s thesis, ENS Lyon, Lyon, 2002. ftp://ftp.ens-lyon.fr/pub/LIP/Rapports/DEA/DEA2002/DEA2002-02.ps.gz q.v. 1473 E. Jeandel, private communication, September 2012. q.v. 1473 S. Jordan, Quantum algorithm zoo. https://quantumalgorithmzoo.org q.v. 1457 B. Juba, On learning finite-state quantum sources. Quantum Inf. Comput. 12 (2012), no. 1–2, 105–118. MR 2896299 Zbl 1268.81041 q.v. 1483 Ya. Ya. Kaneps, Стохастичность языков, распознаваемых двусторонними конечными вероятностными автоматами. Discret. Mat. 1, no. 4, 63–77, in Russian. English translation, Stochasticity of the languages acceptable by two-way finite probabilistic automata. Discrete Math. Appl. 1 (1991), no. 4, 405–421. Zbl 0796.68157 q.v. 1480 G. Kindler and R. O’Donnell, Quantum automata cannot detect biased coins, even in the limit. In 44 th International Colloquium on Automata, Languages, and Programming (I. Chatzigiannakis, P. Indyk, F. Kuhn, and A. Muscholl, eds.). Proceedings of the colloquium (ICALP 2017) held in Warsaw, July 10–14, 2017. LIPIcs. Leibniz International Proceedings in Informatics, 80. Schloss Dagstuhl. Leibniz-Zentrum für Informatik, Wadern, 2017, Art. No. 15, 8 pp. MR 3685755 Zbl 1441.68058 q.v. 1483 H. Klauck, On quantum and probabilistic communication: Las Vegas and one-way protocols. In Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing (F. F. Yao and E. M. Luks, eds.). Association for Computing Machinery, New York, 2000, 644–651. https://www.hklauck.com/stoc00.pdf q.v. 1482 A. Kondacs and J. Watrous, On the power of quantum finite state automata. In Proceedings 38 th Annual Symposium on Foundations of Computer Science. Held in Miami Beach, FL, October 20–21, 1997. IEEE Computer Society, Los Alamitos, CA, 1997, 66–75. IEEEXplore 646094 q.v. 1458, 1462, 1465, 1467, 1470, 1478, 1480 U. Küçük, A. C. C. Say, and A. Yakaryılmaz, Finite automata with advice tapes. Internat. J. Found. Comput. Sci. 25 (2014), no. 8, 987–1000. MR 3315802 Zbl 1309.68120 q.v. 1483 L. Li and D. Qiu, Determining the equivalence for one-way quantum finite automata. Theoret. Comput. Sci. 403 (2008), no. 1, 42–51. MR 2435630 Zbl 1175.68250 q.v. 1466, 1472 L. Li and D. Qiu, A note on quantum sequential machines. Theoret. Comput. Sci. 410 (2009), no. 26, 2529–2535. MR 2522984 Zbl 1172.68019 q.v. 1472 L. Li, D. Qiu, X. Zou, L. Li, L. Wu, and P. Mateus, Characterizations of one-way general quantum finite automata. Theoret. Comput. Sci. 419 (2012), 73–91. MR 2885820 Zbl 1235.68102 q.v. 1467 P. Mateus, private communication, October 2012. q.v. 1472 P. Mateus, D. Qiu, and L. Li, On the complexity of minimizing probabilistic and quantum automata. Inform. and Comput. 218 (2012), 36–53. MR 2967324 Zbl 1279.68164 q.v. 1472 M. Mercer, Lower bounds for generalized quantum finite automata. In Language and automata theory and applications (C. Martín-Vide, F. Otto, and H. Fernau, eds.). Revised

1490

[70]

[71]

[72] [73] [74]

[75]

[76]

[77]

[78] [79]

[80] [81]

[82] [83]

[84]

[85]

Andris Ambainis and Abuzer Yakaryılmaz papers from the 2nd International Conference (LATA 2008) held in Tarragona, March 13–19, 2008. Lecture Notes in Computer Science, 5196. Springer, Berlin, 2008, 373–384. MR 2540339 Zbl 1156.68459 q.v. 1470 C. Mereghetti and B. Palano, On the size of one-way quantum finite automata with periodic behaviors. Theor. Inform. Appl. 36 (2002), no. 3, 277–291. MR 1958244 Zbl 1013.68088 q.v. 1469, 1484 C. Moore and J. P. Crutchfield, Quantum automata and quantum grammars. Theoret. Comput. Sci. 237 (2000), no. 1–2, 275–306. MR 1756213 Zbl 0939.68037 q.v. 1458, 1465, 1466, 1470, 1471 M. Mosca. Quantum algorithms. In Encyclopedia of complexity and systems science (R. A. Meyers, ed). Springer Reference. Springer, Berlin, 2010, 7088–7118. q.v. 1457 M. Nasu and N. Honda, A context-free language which is not acceptable by a probabilistic automaton. Information and Control 18 (1971) no. 3, 233–236. Zbl 0218.68012 q.v. 1480 A. Nayak, Optimal lower bounds for quantum automata and random access codes. In 40 th Annual Symposium on Foundations of Computer Science. Proceedings of the symposium (FOCS ’99) held in New York, October 17–19, 1999. IEEE Computer Society, Los Alamitos, CA, 1999, 369–376. MR 1917575 IEEEXplore 814608 q.v. 1465 C. Negrevergne, T. S. Mahesh, C. A. Ryan, M. Ditty, F. C.-R. W. Power1, N. Boulant, T. Havel, D. G. Cory, and R. Laflamme, Benchmarking quantum control methods on a 12-qubit system. Phys. Rev. Lett. 96 (2006), no. 17, 170501, 4 pp. q.v. 1466 M. A. Nielsen and I. L. Chuang, Quantum computation and quantum information. Quantum computation and quantum information. 10 th anniversary edition. Cambridge University Press, Cambridge, MA, 2010. MR 1796805 Zbl 1288.81001 q.v. 1458 H. Nishimura and T. Yamakami, An application of quantum finite automata to interactive proof systems. J. Comput. System Sci. 75 (2009), no. 4, 255–269. MR 2499873 Zbl 1183.68350 q.v. 1481 H. Nishimura and T. Yamakami, Interactive proofs with quantum finite automata. Theoret. Comput. Sci. 568 (2015), 1–18. MR 3298430 Zbl 1312.68123 q.v. 1481 K. Paschen, Quantum finite automata using ancilla qubits. Technical report, University of Karlsruhe, Karlsruhe, 2000. https://publikationen.bibliothek.kit.edu/1452000 q.v. 1465 A. Paz, Introduction to probabilistic automata. Academic Press, New York and London, 1971. MR 0289222 Zbl 0234.94055 q.v. 1462, 1471, 1472, 1473 J.-É. Pin, On the language accepted by finite reversible automata. In Automata, languages and programming (T. Ottmann, ed.). Proceedings of the fourteenth international colloquium held at the University of Karlsruhe, Karlsruhe, July 13–17, 1987. Lecture Notes in Computer Science, 267. Springer, Berlin, 1987, 237–249. MR 0912712 q.v. 1470 D. Qiu, private communication, October 2012. q.v. 1472 D. Qiu, L. Li, P. Mateus, and J. Gruska, Handbook on finite state based models and applications. Chapter “Quantum finite automata.” Discrete Mathematics and Its Applications. Chapman and Hall/CRC, 2012. q.v. 1458 D. Qiu and S. Yu, Hierarchy and equivalence of multi-letter quantum finite automata. Theoret. Comput. Sci. 410 (2009), no. 30-32, 3006–3017. MR 2543354 Zbl 1179.68073 q.v. 1470 M. O. Rabin, Probabilistic automata. Information and Control 6 (1963), 230–243. Zbl 0182.33602 q.v. 1462

39. Automata and quantum computing

1491

[86] M. V. P. Rao and V. Vinay, Quantum finite automata and weighted automata. J. Autom. Lang. Comb. 13 (2008), no. 2, 125–139. MR 2549086 Zbl 1184.68326 q.v. 1477 [87] J. Rashid and A. Yakaryılmaz, Implications of quantum automata for contextuality. In Implementation and application of automata (M. Holzer and M. Kutrib, eds.). Proceedings of the 19 th International Conference (CIAA 2014) held at Universität Giessen, Giessen, July 30–August 2, 2014. Lecture Notes in Computer Science, 8587. Springer, Cham, 2014, 318–331. MR 3247102 Zbl 1302.68177 q.v. 1482, 1483 [88] A. Salomaaa and M. Soittola, Automata-theoretic aspects of formal power series. Bull. Amer. Math. Soc. (N.S.) 1 (1979), no. 4, 675–678. MR 1567169 Zbl 0377.68039 q.v. 1471 [89] A. C. C. Say and A. Yakaryılmaz, Computation with multiple CTCs of fixed length and width. Nat. Comput. 11 (2012), no. 4, 579–594. MR 3002415 Zbl 1332.68051 q.v. 1482 [90] A. C. C. Say and A. Yakaryılmaz, Quantum finite automata: a modern introduction. In Computing with new resources (C. S. Calude, R. Freivalds, and K. Iwama, eds.). Essays dedicated to J. Gruska on the occasion of his 80 th birthday. Lecture Notes in Computer Science, 8808. Springer, Cham, 2014, 208–222. MR 3332394 Zbl 1323.68278 q.v. 1458, 1466, 1476 [91] A. C. C. Say and A. Yakaryılmaz, Magic coins are useful for small-space quantum machines. Quantum Inf. Comput. 17 (2017), no. 11–12, 1027–1043. MR 3728367 q.v. 1477, 1481 [92] A. C. C. Say and A. Yakaryılmaz, Computation with narrow CTCs. In Unconventional computation (C. S. Calude, J. Kari, I. Petre, and G. Rozenberg, eds.). Proceedings of the 10 th International Conference (UC 2011) held at the University of Turku, Turku, June 6–10, 2011. Lecture Notes in Computer Science, 6714. Springer, Berlin, 2011, 201–211. MR 2833872 Zbl 1330.68080 q.v. 1482 [93] O. Scegulnaja-Dubrovska, L. LaN ce, and R. Freivalds, Postselection finite quantum automata. In Unconventional computation (C. S. Calude, M. Hagiya, K. Morita, G. Rozenberg, and J. Timmis, eds.) Proceedings of the 9 th International Conference (UC 2010) held in Tokyo, June 21–25, 2010. Lecture Notes in Computer Science, 6079. Springer, Berlin, 2010, 115–126. MR 2673483 Zbl 1286.68143 q.v. 1482 [94] P. W. Shor, Algorithms for quantum computation: Discrete logarithms and factoring. In 35 th Annual Symposium on Foundations of Computer Science (S. Goldwasser, ed.). Proceedings of the IEEE Symposium held in Santa Fe, NM, November 20–22, 1994. IEEE Computer Society Press, Los Alamitos, CA, 1994. 124–134. MR 1489242 IEEEXplore 365700 q.v. 1457, 1458 [95] P. Turakainen, Generalized automata and stochastic languages. Proc. Amer. Math. Soc. 21 (1969), 303–309. MR 0242596 Zbl 0184.02802 q.v. 1462, 1463, 1471 [96] P. Turakainen, Rational stochastic automata in formal language theory. In Discrete mathematics (J. L. Kulikowski, M. Michalewicz, S. V. Yablonskij, and Yu. I. Zhuravlev, eds.). Banach Center Publications, 7. Polish Academy of Sciences, Institute of Mathematics. PWN – Polish Scientific Publishers, Warsaw, 1982, 31–44. MR 0698096 Zbl 0541.68054 q.v. 1477 [97] W.-G. Tzeng, A polynomial-time algorithm for the equivalence of probabilistic automata. SIAM J. Comput. 21 (1992), no. 2, 216–227. MR 1154521 Zbl 0755.68075 q.v. 1472 [98] M. Villagra and T. Yamakami, Quantum and reversible verification of proofs using constant memory space. In Theory and practice of natural computing (A. Dediu, M. Lozano, and C. Martín-Vide, eds.). Proceedings of the 3rd International Conference (TPNC 2014)

1492

[99]

[100] [101] [102] [103] [104] [105]

[106] [107] [108]

[109]

[110] [111] [112]

Andris Ambainis and Abuzer Yakaryılmaz held in Granada, December 9–11, 2014. Lecture Notes in Computer Science, 8890. Springer, Cham, 2014, 144–156. MR 3333318 q.v. 1481 M. Villagra and T. Yamakami, Quantum state complexity of formal languages. In Descriptional complexity of formal systems (J. Shallit and A. Okhotin, eds.). Proceedings of the 17 th International Workshop (DCFS 2015) held in Waterloo, ON, June 25–27, 2015. Lecture Notes in Computer Science, 9118. Springer, Cham, 2015, 280–291. MR 3375039 Zbl 1432.68249 q.v. 1484 J. Watrous, On the complexity of simulating space-bounded quantum computations. Comput. Complexity 12 (2003), no. 1–2, 48–84. MR 2054894 Zbl 1068.68066 q.v. 1462, 1466, 1480 J. Watrous, Quantum computational complexity. In Encyclopedia of complexity and systems science (R. A. Meyers, ed). Springer Reference. Springer, Berlin, 2010, 7174–7201. MR 3074622 q.v. 1458 J. Watrous, private communication, May 2009. q.v. 1458 A. Yakaryılmaz, Classical and quantum computation with small space bounds. Ph.D. thesis. Boğaziçi University, Istambul, 2011. arXiv:1102.0378 [cs.CC] q.v. 1478 A. Yakaryılmaz, Superiority of one-way and realtime quantum machines. RAIRO Theor. Inform. Appl. 46 (2012), no. 4, 615–641. MR 3107866 Zbl 1279.68090 q.v. 1480 A. Yakaryılmaz, One-counter verifiers for decidable languages. In Computer science – theory and applications (A. A. Bulatov and A. M. Shur, eds.). Proceedings of the 8 th International Computer Science Symposium in Russia (CSR 2013) held in Ekaterinburg, June 25–29, 2013. Lecture Notes in Computer Science, 7913. Springer, Berlin, 2013, 366–377. MR 3101990 Zbl 1382.68138 q.v. 1471 A. Yakaryılmaz, Public qubits versus private coins. In Proc. Workshop on Quantum and Classical Complexity. University of Latvia Press, pages University of Latvia Press, Riga, 45–60, 2013. ECCC:TR12-130. q.v. 1481 A. Yakaryılmaz, Quantum alternation. Lobachevskii J. Math. 37 (2016), no. 6, 637–649. MR 3579037 Zbl 1407.68182 q.v. 1471 A. Yakaryılmaz and A. C. C. Say, Efficient probability amplification in two-way quantum finite automata. Theoret. Comput. Sci. 410 (2009), no. 20, 1932–1941. MR 2517652 Zbl 1163.68026 q.v. 1480 A. Yakaryılmaz and A. C. C. Say, Languages recognized with unbounded error by quantum finite automata. In Computer science – theory and applications (A. E. Frid, A. Morozov, A. Rybalchenko, and K. W. Wagner, eds.). Proceedings of the 4th International Computer Science Symposium in Russia. CSR 2009. Novosibirsk, August 18–23, 2009. Lecture Notes in Computer Science, 5675. Springer, Berlin, 2009, 356–367. Zbl 1248.68315 q.v. 1471, 1473, 1480 A. Yakaryılmaz and A. C. C. Say, Languages recognized by nondeterministic quantum finite automata. Quantum Inf. Comput. 10 (2010), no. 9–10, 747–770. MR 2731439 Zbl 1236.81071 q.v. 1471 A. Yakaryılmaz and A. C. C. Say, Succinctness of two-way probabilistic and quantum finite automata. Discrete Math. Theor. Comput. Sci. 12 (2010), no. 4, 19–40. MR 2760333 Zbl 1286.68297 q.v. 1475, 1477 A. Yakaryılmaz and A. C. C. Say, Probabilistic and quantum finite automata with postselection. Preprint, 2011. arXiv:1102.0666 [cs.CC] (A preliminary version of this paper appeared in Proceedings of Randomized and Quantum Computation, satellite workshop of MFCS and CSL 2010, 14–24.) q.v. 1482

39. Automata and quantum computing

1493

[113] A. Yakaryılmaz and A. C. C. Say, Unbounded-error quantum computation with small space bounds. Inform. and Comput. 209 (2011), no. 6, 873–892. MR 2817180 Zbl 1221.68092 q.v. 1465, 1466, 1471, 1473, 1477, 1478, 1480 [114] A. Yakaryılmaz, A. C. C. Say, and H. G. Demirci, Debates with small transparent quantum verifiers. Internat. J. Found. Comput. Sci. 27 (2016), no. 2, 283–300. MR 3493549 Zbl 1344.68133 q.v. 1482 [115] T. Yamakami, Constant-space quantum interactive proofs against multiple provers. Inform. Process. Lett. 114 (2014), no. 11, 611–619. MR 3230909 Zbl 1371.68090 q.v. 1481 [116] T. Yamakami, One-way reversible and quantum finite automata with advice. Inform. and Comput. 239 (2014), 122–148. MR 3281904 Zbl 1309.68125 q.v. 1483 [117] T. Yamakami, Complexity bounds of constant-space quantum computation (extended abstract). In Developments in language theory (I. Potapov, ed.). Proceedings of the 19th International Conference (DLT 2015) held in Liverpool, July 27–30, 2015. Lecture Notes in Computer Science, 9168. Springer, Cham, 2015, 426–438. MR 3440690 Zbl 1434.68292 q.v. 1480 [118] S. Zheng, J. Gruska, and D. Qiu, On the state complexity of semi-quantum finite automata. In Language and automata theory and applications (A. Dediu, C. Martín-Vide, J. L. Sierra-Rodríguez, and B. Truthe, eds.). Proceedings of the 8th International Conference (LATA 2014) held in Madrid, March 10–14, 2014. Lecture Notes in Computer Science, 8370. Springer, Cham, 2014, 601–612. MR 3173928 Zbl 1407.68269 q.v. 1477, 1482 [119] S. Zheng, L. Li, D. Qiu, and J. Gruska, Promise problems solved by quantum and classical finite automata. Theoret. Comput. Sci. 666 (2017), 48–64. MR 3606060 Zbl 1359.68181 q.v. 1483 [120] S. Zheng, D. Qiu, and J. Gruska, Power of the interactive proof systems with verifiers modeled by semi-quantum two-way finite automata. Inform. and Comput. 241 (2015), 197–214. MR 3337351 Zbl 1309.68074 q.v. 1481 [121] S. Zheng, D. Qiu, J. Gruska, L. Li, and P. Mateus, State succinctness of two-way finite automata with quantum and classical states. Theoret. Comput. Sci. 499 (2013), 98–112. MR 3084151 Zbl 1296.68098 q.v. 1482 [122] S. Zheng, D. Qiu, L. Li, and J. Gruska, One-way finite automata with quantum and classical states. In Languages alive (H. Bordihn, M. Kutrib, and B. Truthe, eds.). Essays dedicated to J. Dassow on the occasion of his 65th birthday. Lecture Notes in Computer Science, 7300. Springer, Berlin, 2012, 273–290. MR 3002062 Zbl 1330.68183 q.v. 1465

Index

2-renewing sequence . . . . . . . . 538 —A— Ap . . . . . . . . . . . . abelianisation map . . . . AC0 . . . . . . . . . . . ACC0 . . . . . . . . . . accessible – automaton . . . . . . – state . . . . . . . . . action – algebra . . . . . . . . – free – . . . . . . . . . – lattice . . . . . . . . a-cycle . . . . . . . . . . additive theory of the reals advice . . . . . . . . . . a-graph . . . . . . . . . a-level . . . . . . . . . . algebra – action – . . . . . . . – compact – . . . . . . – congruence – . . . . . – free – . . . . . . . . . – homomorphism . . . – Kleene – . . . . . . . – residuated – . . . . – with domain . . . . – with tests . . . . . – locally finite – . . . . – profinite – . . . . . . – Hopfian – . . . . . – self-free – . . . . . – pro-T – . . . . . . . . – quotient – . . . . . . – relatively free – . . .

. . . .

. . . .

. . . .

572, 626 . . 961 497, 574 . . 497

. . . . . . . . . . . .

8 5

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . .

. . 755 . . 621 . . 617 . . 617 . . 617 751–754 . . 755 . . 753 . . 753 . . 634 . . 624 . . 627 . . 628 . . 624 . . 617 . . 617

. . . . . . . . . .

. . . . .

755 871 756 550 229 1483 . 550 . 550

– residually in C . . . . . . . . . 617 – subalgebra . . . . . . . . . . . 617 – syntactic ranked tree – . . . . . 805 – term . . . . . . . . . . . . . . 617 – topological – . . . . . . . . . . 621 – generator . . . . . . . . . . . 621 – recognisable subset . . . . . 622 – residually in a class . . . . . 622 – self-free – . . . . . . . . . . 628 – trivial – . . . . . . . . . . . . 617 – uniform – . . . . . . . . . . . 622 algorithm – compression . . . . . . . . . . 537 – extension . . . . . . . . . . . . 541 – Hopcroft’s – . . . . . . . . . . 345 – McNaughton–Yamada – 45, 49, 66, 72 – state elimination – . . . . . . . 424 almost – finite type shift . . . . . . . . 1022 – prime . . . . . . . . . . . . . . 920 alphabet . . . . . . . . . . . . . . . 4 – canonical – . . . . . . . . . . . 950 – input – . . . . . . . . . . . . . . 81 – involutive – . . . . . . . . . . 842 – output – . . . . . . . . . . . . . 81 – ranked – . . . . . . . . . . . 1300 amalgamated free product . . . . . 864 amenable group . . . . . . . . . . 891 amplitude . . . . . . . . . 1459, 1462 analytic set . . . . . . . . . . . . . 702 anticipation block map . . . . . . . 990 aperiodic – identity . . . . . . . . . . . . 47, 72 – monoid . . . . . . . . . . . 572, 853 – group-free – . . . . . . . . . 499

xxiv

Index

– semigroup . . . . . . . . . . . 626 – tiling . . . . . . . . . . . . . . . 93 Arden’s lemma . . . . . . 46, 49, 65, 67 arena . . . . . . . . . . . . . . . . 274 arithmetic – cardinal – . . . . . . . . . . . 698 – integer – . . . . . . . . . . . 1205 – Presburger – . 957, 1193, 1199, 1201 – progression . . . . . . . . . . . 948 – real – . . . . . . . . . . . . . 1205 arithmetical hierarchy . . . . . . . 793 Artin group . . . . . . . . . . . . 879 Artin–Schreier polynomial . . . . . 938 atom . . . . . . . . . . . . . . . . 413 átomaton . . . . . . . . . . . . . . 413 automata – enumeration . . . . . . . . . . 460 – group . . . . . . . . . 842, 885–902 – minimisation of – . . . 1190, 1472 – morphism of – . . . . . . . . . . 17 – see also automaton automatic – group . . . . . . . . . . . 875–883 – normal forms . . . . . . . . 875 – right/left – . . . . . . . . . . 878 – mapping . . . . . . . . . . . . 885 – monoid . . . . . . . . . . . . . 878 – presentation . . . . . . 1036, 1038 – equivalent – . . . . . . . . 1057 – injective – . . . . . . . . . 1040 – with advice . . . . . . . . 1061 – real numbers . . . . . . . . . . 922 – semigroup . . . . . . . . . . . 878 – sequence . . . . . . . . . . 914, 953 – set . . . . . . . . . . . . . . . 916 – structure . . . . . . 875, 1036, 1038 – geodesic . . . . . . . . . . . 877 – with uniqueness . . . . . . . 877 – transformation . . . . . . . . . 885 automaton . . . . . . . . 5, 42, 61, 737 – 1.5-way quantum – . . . . . . 1478 – 2D – . . . . . . . . . . . . . . 312 – deterministic – . . . . . . . . 316

– accessible – . . . . . . . . . . . 8 – adjacency matrix . . . . . . . . 999 – almost finite type – . . . . . . . 355 – ˛ -extensible – . . . . . . . . . 541 – alternating – – finite – (AFA) . . 443, 770, 1419 – tree . . . . . . . . . . . . 1433 – Antimirov – . . . . . . . . . . 422 – asynchronous – . . . . . . . 1176 – behaviour . . . . . . . . 43, 62, 66 – bideterministic – . . . . . . 417, 442 – bipartite – . . . . . . . . . . 1013 – biseparable – . . . . . . . . . . 414 – bistochastic quantum finite – (BiQFA) . . . . . . . . . . . 1465 – Blum and Hewitt – . . . . . . . 317 – Boolean – . . . . . . . . . 443, 770 – bounded – . . . . . . . . . . . 891 – Büchi – . . . . . . . . . . . 1419 – Carton–Michel – . . . . . . . . 194 – Cayley – . . . . . . . . . 897–898 – cellular – . . . . . . . . . . . . 777 – Černý – . . . . . . . . . . . . 535 – characteristic – . . . . . . . . . 173 – coaccessible – . . . . . . . . . . 8 – Cocke–Younger–Kasami – . . 1407 – communicating – . . . . . . . 1149 – complete – . . . . . . . 8, 444, 1019 – unambiguous – . . . . . . . 194 – conjugate – . . . . . . . . . . . 999 – contained in another . . . . . 1018 – contracting – . . . . . . . . 891, 897 – cyclic – . . . . . . . . . . . . 353 – decomposition – . . . . . . . 1013 – depth – . . . . . . . . . . . . . 343 – desert – . . . . . . . . . . . . 781 – deterministic finite – (DFA) . 7, 411, 1004, 1462 – complete – . . . . . . . . . . 412 – minimal – . . . . . . . . . . 412 – one-way – (1DFA) . . . . . 1462 – quotient . . . . . . . . . . . 412 – two-way – (2DFA) . . . . . 1462

Index

– dimension . . . . . . . . . . . . 42 – distance – . . . . . . . . . . . 672 – Earley’s – . . . . . . . . . . 1405 – edge . . . . . . . . . . . . . . 998 – equation – . . . . . . . . . . . 422 – equivalence . . . . . . . . . . . 6 – essential – . . . . . . . . . . . 998 – Eulerian – . . . . . . . . . . . 542 – expansion . . . . . . . . . . 1026 – extended – . . . . . . . . . . . . 32 – extension – . . . . . . . . . . 1002 – finite tree – (NFTA) . . . . . . 239 – finite truncated – . . . . . . . . 865 – finitely unambiguous – . . . . . 155 – Fischer – . . . . . . . . . . . 1008 – flower – . . . . . . . . . . . . 845 – follow – . . . . . . . . . . 73, 422 – fundamental theorem . 41, 43, 62, 66 – generalised finite – (GFA) 1462, 1466 – Glushkov – . . . . . 54, 68, 73, 421 – guidable – . . . . . . . . . . . 285 – heap . . . . . . . . . . . . . . 154 – hedge – – language . . . . . . . . . . . 256 – inherently weak – . . . . . . 1211 – in-split . . . . . . . . . . . . 1016 – inverse – . . . . . . . . . . . . 844 – involutive – . . . . . . . . . . 844 – irreducible – . . . . . . . . . . 920 – k - – . . . . . . . . . . . . . . 914 – Kari – . . . . . . . . . . . . . 544 – Kondacs–Watrous quantum finite – (KWQFA) . . . . . . . . . . 1465 – Krieger – . . . . . . . . . . . 1005 – latest appearance – (LAA) . . . 202 – Latvian quantum finite – (LaQFA) . . . . . . . . . . . 1465 – left delay . . . . . . . . . . . 1022 – level – . . . . . . . . . . . . 1113 – limited – . . . . . . . . . . . . 428 – nondeterminism . . . . . . . 421 – local – . . . . . . . . 25, 355, 1018 – LR – . . . . . . . . . . 1400–1402

xxv

– max-plus – . . . . . . . . . . . 154 – Mealy – . . . . . . . . . . . . 885 – bireversible – . . . 887, 899–902 – contracting – . . . . . . . . . 891 – dual – . . . . . . . . . . . . 887 – nuclear – . . . . . . . . . . . 891 – reset machine . . . . . . . . 898 – reversible – . . . . . . . 887, 897 – minimal – . . . 17, 339, 1005, 1009 – complete – . . . . . . . . . . . 17 – nondeterministic – . . . . . . 368 – Moore–Crutchfield quantum finite – (MCQFA) . . . . . . . . . . 1465 – multidimensional – . . . . . . . 918 – multi-head – . . . . . . . . . . 426 – multiple initial state . 420–421, 442 – multitape – . . . . . . . . . . . . 82 – Nayak quantum finite – (NaQFA) . . . . . . . . . . . 1465 – Nerode – . . . . . . . . . . . . . 17 – nondeterministic – finite – (NFA) 411, 769, 1418, 1462 – Chrobak normal form 420–421, 425, 431 – one-way – (1NFA) . . . 1462 – two-way – (2NFA) . . . 1462 – finite hedge – (NFHA) . . . . 256 – non-uniform – . . . . . . . . . 501 – nuclear – . . . . . . . . . . . . 891 – of an expression – derived-term . . . . . . 60, 71, 73 – equation – . . . . . . . . . . . 73 – Thompson – . . . . . . . . 57, 62 – on finite-trees . . . . . . . . 1038 – on ! -strings . . . . . . 1036, 1046 – on ! -trees . . . . . . . . . . 1036 – one-cluster – . . . . . . . . . . 544 – one-way – quantum finite – (1QFA) . . 1463 – real-time cellular – . . . . . . 777 – parity – . . . . . . . . . . . . 1220 – partially ordered – . . . . . . . 520 – polynomially unambiguous – . 155

xxvi

Index

– position . . . . . . . . 54, 73, 421 – probabilistic finite – (PFA) . . 1462 – one-way – (1PFA) . . . . . 1462 – prophetic – . . . . . . . . . . . 194 – pushdown – (PDA) . . . 1385, 1389 – deterministic – . . . . . . . 1389 – probabilistic – (PPDA) . . . 1390 – reduced – . . . . . . . . . 1389 – tabulation . . . . . . . . . 1390 – valid prefix property . . . . 1395 – weighted – (WPDA) . . . . 1389 – quasi-reversible – . . . . . . . 442 – quotient – . . . . . . . . . 340, 412 – random – . . . . . . . . . . . . 546 – real – number – (RNA) . . . . . . . 972 – vector – (RVA) 972, 1204–1205, 1208–1209, 1211 – reduced – . . . . . . . . . . 1005 – reduction . . . . . . . . . . . 1008 – region – . . . . . . . . . . . 1269 – residual – . . . . . . . . 369, 1005 – restarting – . . . . . . . . . . . 427 – reversal – . . . . . . . . . . . 341 – reversible – . . . . . . . . 433, 442 – right delay . . . . . . . . . . 1021 – rotating limited – . . . . . . . . 429 – semi- – . . . . . . . . . . . . . 344 – sequential – . . . . . . . . . . 155 – shift recognised – . . . . . . . 998 – simple – . . . . . . . . . . . . 358 – slow – . . . . . . . . . . . . . 344 – for Hopcroft . . . . . . . 344, 352 – for Moore . . . . . . . . 344, 351 – split . . . . . . . . . . . . . . 724 – splitter . . . . . . . . . . . . . 340 – Stallings – . . . . . . . . . . . 845 – standard – . . 10, 55–56, 62, 68, 73 – weighted – . . . . . . . . . . . 68 – standard local – . . . . . . . 1019 – state . . . . . . . . . . . . . . 998 – strongly connected . . . . . . 1007 – subset – . . . . . . . . . . . . 530

– subset accepted . . . . . . . . . 62 – sweeping limited – . . . . . . . 429 – symbolic conjugate – . . . . . 1011 – synchronised – . . . . . . . . 1005 – synchronising – . . . . . . . . 525 – tessellation – . . . . . . . . . . 312 – timed – . . . . . . . . . . . . 1263 – deterministic – . . . . . . . 1288 – diagonal-free – . . . . . . . 1263 – with invariants – . . . . . . 1263 – tree – . . . . . . . . . . . . . . 804 – tree walking – (TWA) . . . . . 250 – trellis – . . . . . . . . . . . . . 777 – trie – . . . . . . . . . . . . . . 356 – trim – . . . . . . . . . 8, 844, 998 – two-way – . . 425, 520, 1462, 1474 – quantum finite – (2QFA) . . 1474 – simulation . . . . . . . . . . 425 – unambiguous – . . . . . . . . . 155 – universal – . . . . . . . . . . . 781 – weak – . . . . . . . . . . . . 1205 – weighted – . . . . . . 65, 115, 1477 – weighted finite – (WAF) . . . 1112 – with advice . . . . . . . . . . 1061 – Zielonka – . . . . . . . 1176, 1250 – see also automata automorphism – orbits . . . . . . . . . . . . . . 865 – Whitehead – . . . . . . . . . . 856 —B— Baire space . . . . . . . . . . . . . 698 base – -complement . . . . . . 1191, 1202 – -k expansion . . . . . . . . . . 914 – numeration – . . . . 949, 1191, 1202 – Bertrand – . . . . . . . . . . 951 – linear – . . . . . . . . . . . 950 basis . . . . . . . . . . . . . . . . 497 – complete – . . . . . . . . . . . 497 – converging to the identity . . . 644 – of self-free topological algebra . 628

Index

– standard – . . . . . . . . . . . 497 – theorem . . . . . . . . . . . . 640 Benois’ theorem . . . . . . . . . . 858 beta-polynomial . . . . . . . . . . 951 biclique edge cover . . . . . . . . 414 bicombing . . . . . . . . . . . . . 883 biseparable residual . . . . . . . . 414 bisimulation . . . . . . . . . . . 1232 – strong timed – . . . . . . . . 1267 – time-abstract – . . . . . . . . 1266 bistochastic quantum finite automaton (BiQFA) . . . . . . . . . . . 1465 black-box checking . . . . . . . . 403 Blikle net . . . . . . . . . . . . . 744 block . . . . . . . . . . . . . . 642, 990 – decomposition . . . . . . . . . . 51 – groups . . . . . . . . . . . . 1470 – map . . . . . . . . . . . . . . 973 – substitution . . . . . . . . . . . 990 blocking condition . . . . . . . . 1230 Blum and Hewitt automaton . . . . 317 Boolean – automaton . . . . . . . . . . . 443 – operator . . . . . . . . . . . 1190 – semiring . . . . . . . . 3, 730, 1388 – space . . . . . . . . . . . . . . 626 bootstrapping . . . . . . . . . . . 471 Borel – hierarchy . . . . . . . . . . . . 699 – set . . . . . . . . . . . . . . . 698 bounded – gap . . . . . . . . . . . . . . . 953 – section problem . . . . . . . . 672 – for CFMs . . . . . . . . . 1153 boundedness problem . . . . . . . 673 Bounded-Synchronising-Colouring 556 branched covering . . . . . . . . . 896 branching-time logic . . . . . . . 1429 – CTL . . . . . . . . . . . . . 1430 – CTL? . . . . . . . . . . . . 1429 – -calculus . . . . . . . . . . 1431 breakpoint construction . . . . 199, 279 Brown’s lemma . . . . . . . . . . 669

xxvii

Büchi – automaton . . . . . . . . . . 1419 – condition . . . . . . . . . . . . 267 – recurrence condition . . . . 193, 267 – ’s theorem . . . . . . . . 311, 1046 – set . . . . . . . . . . . . . . . 193 Büchi–Elgot–Trakhtenbrot theorem 1077 —C— Cantor – normal form – (CNF) . . . . . 697 – set . . . . . . . . . . . . . . 1106 – space . . . . . . . . . . . . . . 698 cardinal – arithmetic . . . . . . . . . . . 698 – number . . . . . . . . . . 695–696 Carton–Michel automaton . . . . . 194 Catalan numbers . . . . . . . . . . 464 CC0 . . . . . . . . . . . . . . . . 497 Černý – automaton . . . . . . . . . . . 535 – conjecture . . . . . . . . . . . 536 – function . . . . . . . . . . . . 536 channel . . . . . . . . . . . . . 1149 – bounded – . . . . . . . . . . 1166 – fifo . . . . . . . . . . . . . . 1149 – system – insertion . . . . . . . . . . 1157 – lossy – . . . . . . . . . . . 1156 characteristic . . . . . . . . . . . . 475 – automaton . . . . . . . . . . . 173 – mapping . . . . . . . . . . . . 379 – polynomial – . . . . . . . . . . 950 – sequence . . . . . . . . . . . . 954 – vector . . . . . . . . . . . . . 543 Chomsky normal form . . . . . . 1407 Chomsky–Schützenberger theorem 463, 465 Christol’s theorem . . . . . . . . . 937 Chrobak normal form 420–421, 425, 431 Church problem 1218–1220, 1222–1224, 1226–1227, 1234–1235

Index

xxviii

circuit . . . . . . . . . . . – class . . . . . . . . . . – critical – . . . . . . . . – depth . . . . . . . . . . – size . . . . . . . . . . . – uniformity . . . . . . . – width . . . . . . . . . . classification theorem . . . clique-width . . . . . . . . clock constraint . . . . . . clopen . . . . . . . . . . . closed timelike curves . . . closure – backward – . . . . . . . – deterministic – . . . . . – horizontal – . . . . . . – concatenation – . . . – polynomial – . . . . . . – properties . . . . . . . – reflexive-transitive – . . – unambiguous – . . . . . – vertical – . . . . . . . . – concatenation – . . . coaccessible – automaton . . . . . . . – state . . . . . . . . . . Cobham–Semenov theorem

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . 54 . . 602 . . 305 . . 326 . . 603 . 1131 732, 754 . . 602 . . 305 . . 326

495 497 153 496 496 496 496 995 689 1263 . 620 1482

. . . . . 8 . . . . . 5 . 948, 1057, 1193, 1205 Cocke–Younger–Kasami automaton 1407 code – bifix – . . . . . . . . . . . . . 849 – sliding block – . . . . . . . . . 973 coding . . . . . . . . . . . . . 915, 954 coefficient in a series . . . . . . . . . 64 cofinality . . . . . . . . . . . . . . 709 coincidence condition . . . . . . . 529 co-induction . . . . . . . . . . . . . 74 colours . . . . . . . . . . . . . . . 221 combing . . . . . . . . . . . . . . 878 communicating – automaton . . . . . . . . . . 1149 – finite-state machine (CFM) . . 1152 – insertion . . . . . . . . . . 1157

– local acceptance . . . . . . 1172 – lossy – . . . . . . . . . . . 1156 – model checking . . . . . . 1166 communication complexity . . . . 416 commutator in groups . . . . . . . 842 compact space . . . . . . . . . . . 620 complement . . . . . . . . . . . 1191 – picture language . . . . . . . . 313 complete – automaton . . . . . . . 8, 444, 1019 – minimal automaton . . . . . . . 17 – set . . . . . . . . . . . . . . . 701 – unambiguous automaton . . . . 194 compressible – pair . . . . . . . . . . . . . . . 550 – set . . . . . . . . . . . . . . . 550 computable real number . . . 921, 1462 computation . . . . . . . . . . . . . 82 concatenation . . . . . . . . . . . 502 – column – . . . . . . . . . . . . 305 – horizontal – . . . . . . . . . . 305 – modp -concatenation . . . . . . 502 – product . . . . . . . . . . . . . . 4 – row – . . . . . . . . . . . . . . 305 – vertical – . . . . . . . . . . . . 305 cone type . . . . . . . . . . . . 873, 882 congruence . . . . . . . 213, 842, 1009 – kernel – . . . . . . . . . . . . 619 – Nerode – . . . . . . . . . . . . 340 – on an algebra . . . . . . . . . . 617 – syntactic – . . . . 21, 214, 618, 1009 conjugacy . . . . . . . . . . . . . 786 – labelled – . . . . . . . . . . . 999 – of shifts . . . . . . . . . . . . 990 – problem . . . . . . . . . . . . 842 conjugate – automata . . . . . . . . . . . . 999 – elements . . . . . . . . . . . . 842 consecutive transitions . . . . . . . . 5 consistency problem . . . . . . . . 387 constant term – of a language . . . . . . . . . . . 42

Index

– of a series . . . . . . . . . . . . 65 – of an expression . . . . . . . . . 42 constrained queue-content decision diagrams (CQDD) . . . . . . 1155 constraint – equality . . . . . . . . 1194, 1205 – inequation . . . . . . . 1196, 1208 – linear – . . 1193–1194, 1196–1197, 1205, 1208 – modular – . . . . . . . 1193, 1197 context – of a word . . . . . . . . . . . 1009 – right – . . . . . . . . . . . . 1005 context-free – grammar (CFG) . . 464, 772, 1388 – probabilistic – (PCFG) . . . 1388 – consistency . . . . . . . 1388 – proper – . . . . . . . . . . 1388 – stochastic – . . . . . . . . 1388 – weighted – (WCFG) . . . . 1388 – Kolam grammar . . . . . . . . 319 – language . . 88, 464, 772–774, 1481 – matrix grammar . . . . . . . . 320 – word problem submonoid . . . 863 continuous – function . . . . . . . . . . . . 700 – operation . . . . . . . . . 766–768 contracting – automaton . . . . . . . . . 891, 897 – group . . . . . . . . . . . . 891, 897 contraction – graph – . . . . . . . . . . . . . 996 – symbol . . . . . . . . . . . . . 996 control problem . . . . . . . . . 1228 controllability condition . . . . . 1228 controller . . . . . . . . . . . . 1227 – synthesis – centralised – . . . . . 1228, 1234 – decentralised – . . . . . . . 1252 – distributed – . . . . . . . . 1237 – generalised – . . . . . . . . 1234 convolution – string . . . . . . . . . . 1038, 1058

xxix

– tree . . . . . . . . . . . 1036, 1038 Conway – partial – semiring . . . . . . . 731 – quasi- – semiring . . . . . . . . 63 – ’s leap . . . . . . . . . . . . . 732 – semiring . . . . . . . . 63, 72, 731 Coxeter group . . . . . . . . . . . 879 Cramér’s model . . . . . . . . . . 919 Crespi–Reghizzi–Pradella tile grammars . . . . . . . . . . . 324 critical – circuit . . . . . . . . . . . . . 153 – graph . . . . . . . . . . . . . . 153 cross-section theorem . . . . . . . . 87 Curtis–Lyndon–Hedlund theorem . 990 cut . . . . . . . . . . . . . . . . . 657 cycle rank . . . . . 417–418, 424, 440 cyclic automaton . . . . . . . . . . 353 cyclicity . . . . . . . . . . . . . . 152 cylinder . . . . . . . . . . . . . . 973 —D— D0L language . . . . . . . . DAG – finitary vertex . . . . . . – infinitary vertex . . . . . – leveled – . . . . . . . . . – peeling . . . . . . . . . – rank function . . . . . – run – . . . . . . . . . . . D-class . . . . . . . . . . . . – rank . . . . . . . . . . . – regular – . . . . . . . . . – structure group . . . . . . decentralised control problem decomposition – of an automaton . . . . . – prefix-suffix – . . . . . . – theorem . . . . . . . . . deduction system . . . . . . . degree of irreversibility . . .

. . . 103 . . . . . . . . . . .

. . . . . . . . . . .

. . . . . .

. . . . .

. 1013 . . 363 . . 995 . 1392 . . 443

199 199 191 204 204 203 1024 1024 1024 1024 1252

xxx

Index

delay – left – . . . . . . . . . . . . . 1022 – right – . . . . . . . . . . . . 1021 density matrix . . . . . . . . . . 1460 derivation – of an expression . . . . . 58, 62, 70 – broken – . . . . . . . . . . . . 73 descriptional complexity measure . . . . . . . . . 412–443 – quotient complexity . . . . 413, 430 – syntactic complexity . . . . . . 433 desert automaton . . . . . . . . . . 781 determinacy . . . . . . . 274, 702–703 – regular – . . . . . . . . . . . . 221 – Wadge–Borel – . . . . . . . . 706 determined game . . . . . . . . . . 703 determinisation . . . . . . . . . . 418 – NDD . . . . . . . . . . . . . 1197 – problem . . . . . . . . . . . . 420 deterministic – 2D automaton . . . . . . . . . 316 – finite automaton (DFA) 7, 411, 1004, 1462 – complete – . . . . . . . . . . 412 – minimal – . . . . . . . . . . 412 – one-way – (1DFA) . . . . . 1462 – quotient . . . . . . . . . . . 412 – two-way – (2DFA) . . . . . 1462 – state complexity . . . . . . 412, 430 – tiling system . . . . . . . . . . 316 – transducer . . . . . . . . . . . . 81 – transition complexity . . . . . . 414 Diag-DREC . . . . . . . . . . . . 316 digit . . . . . . . . . . . . . . . 1191 – sign – . . . . . . . . . 1191, 1203 dimension of an automaton . . . . . 42 Dirac notation . . . . . . . . . . 1460 direct – product . . . . . . . . . . . . . 617 – sum . . . . . . . . . . . . . . 732 Dirichlet’s theorem . . . . . . . . 920 discriminant . . . . . . . . . . . . 471 disjunctive rational subset . . . . . 862

distance – geodesic – . . . . . . . . . . . 850 – prefix metric . . . . . . . . . . 853 – profinite – . . . . . . . . . . . 576 distributed – alphabet . . . . . . . . . . . 1250 – synthesis – architecture . . . . . . . . 1236 – local specifications . . . . . 1239 – problem . . . . . . . . . . 1237 divisible group . . . . . . . . . . . 939 division matrix . . . . . . . . 993–994 document type definitions (DTD) . 259 domain of a tree . . . . . . . . 237, 266 dominant – eigenvalue . . . . . . . . . . . 962 – singularity . . . . . . . . . . . 470 domino system . . . . . . . . . . . 311 Doner–Thatcher–Wright theorem 1084 DOTA . . . . . . . . . . . . . . . 316 dot-depth hierarchy . . . . . 510, 1083 dual formula . . . . . . . . . . . . 278 duplicator player . . . . . . . . . . 226 dyadic rational . . . . . . . . . . 1115 Dyck language . . . . . . . . . 777, 862 dynamical system . . . . . . . . . 972 —E— Earley’s automaton . . . . . . . 1405 edge . . . . . . . . . . . . . . 989, 993 EF logic . . . . . . . . . . . . . . 822 Ehrenfeucht–Fraïssé game . . . . . 593 Eilenberg’s theorem . . . . . . . . 582 elementary – equivalent matrices . . . 995, 1003 – function . . . . . . . . . . . 1079 elimination ordering . . . . . . 424, 468 emptiness problem . . . . . . 241, 1265, 1472–1473, 1480 empty word . . . . . . . . . . . . . 4 encoding – dual – . . . . . . . . . . . . 1203 – extension . . . . . . . . . . . . 255

Index

– first-child-next-sibling (FNCS) . 254 – fcns.t/ . . . . . . . . . . . . 255 – fractional part . . . . . 1202, 1206 – integer – part . . . . . . . . . 1202, 1206 – vectors . . . . . . . . . . . 1191 – integers . . . . . . . . . . . 1191 – operation . . . . . . . . . . . . 255 – real – numbers . . . . . . . . . . 1202 – vectors . . . . . . . . . . . 1203 – relation . . . . . . . . . . . . 1189 – serialised – . 1192, 1196, 1201, 1203 – valid – . . . . . . 1190–1191, 1203 endomorphism – extension . . . . . . . . . . . . 857 – virtually injective – . . . . . . 857 entourage . . . . . . . . . . . . . 619 entropy . . . . . . . . . . . . . 643, 991 equation – existential theory of –s . . . . . 865 – language . . . . . . . . 29–32, 765 – profinite – . . . . . . . . . . . 578 – symmetrical – . . . . . . . . . 580 – system of –s . . . . . . . . . . 631 – explicit – . . . . . . . . . . . 771 – resolved – . . . . . . . . . . 771 – strict – . . . . . . . . . . . . 772 – with rational constraints . . . . 865 – word – . . . . . . . . . . . . . 789 equivalence – Moore – . . . . . . . . . . . . 343 – of automata . . . . . . . . . 1472 error – bounded – . 1462, 1467–1469, 1475, 1478 – one-sided unbounded . . . . 1471 – unbounded – 1461, 1467, 1471, 1480 – zero – . . . . . . . . . . . . 1482 essential – automaton . . . . . . . . . . . 998 – graph . . . . . . . . . . . . . . 989 Euclidean division . . . . . . . . . 697

xxxi

even shift . . . . . . . . . . . . . 989 exact computation . . . . . . . . 1482 example – negative – . . . . . . . . . . . 379 – positive – . . . . . . . . . . . 379 exceptional set . . . . . . . . . . . 471 existential theory of equations . . . 865 expansion – ˛ -expansion . . . . . . . . . . 951 – automaton . . . . . . . . . . 1026 – graph – . . . . . . . . . . . . . 996 – symbol – . . . . . . . . . . . . 996 explicit – operation . . . . . . . . . . . . 629 – profinite equation . . . . . . . 578 expression – broken derivation . . . . . . . . 73 – constant term . . . . . . . . . . . 42 – depth . . . . . . . . . . . . . . . 42 – derivation . . . . . . . . 58, 62, 70 – equivalent – . . . . . . . . . . . 42 – language denoted by . . . . . . . 42 – literal length . . . . . . . . . . . 42 – omega-regular – . . . . . . . . 195 – rational – . . . . . . . . . . 41, 741 – linear – . . . . . . . . . . . . 27 – weighted – . . . . . . . . . . . 65 – reduced – . . . . . . . . . . 46, 66 – regular – . . . . . 42, 411, 459, 502 – defined by grammar . . . . . 463 – length . . . . . . . . . . . . 461 – size . . . . . . . . . . . . . 461 – uncollapsible – . . . . . . . . 462 – series denoted by an – . . . . . . 65 – star-normal form . . . . . . . . . 57 – valid – . . . . . . . . . . . . . . 65 – weighted rational – . . . . . . . . 74 extension – algebraic – . . . . . . . . . . . 852 – finite-index – . . . . . . . . . . 853 – HNN – . . . . . . . . . . . . . 864 – operation . . . . . . . . . . . . 255

xxxii

Index

—F— factor map . . . . . . . . . . . . . 972 factorisation – forest theorem . . . . . . . 654, 659 – tree . . . . . . . . . . . . . . . 654 fellow traveller property . . . . . . 876 field . . . . . . . . . . . . . . . . 466 finite – closure property . . . . . . . . 669 – language . . . . 419–420, 425, 437 – operation problem . . . . . . 435 – model theory . . . . . . . . . 1071 – order element . . . . . . . . . 842 – substitution . . . . . . . . . 80, 103 finitely presented group . 842, 864, 872 first-order logic (FOL) . 289, 687, 1031, 1034, 1074 – interpretation . . . . . . . . . 1049 – on trees . . . . . . . . . . . . 803 – with child relations . . . . . . . 810 Fischer automaton . . . . . . . . 1008 fixed point . . . . . . . . 747, 915, 954 – induction . . . . . . . . . 751–752 – subgroup . . . . . . . . . . . . 856 – see also pre-fixed point fixpoint . . . . . . . . . . . . . 1024 flow equivalent . . . . . . . . . . . 996 flower automaton . . . . . . . . . 845 follow automaton . . . . . . . 73, 422 fooling set . . . . . . . . . . . . . 414 forbidden – factor . . . . . . . . . . . . . . 988 – pattern . . . . . . . . . . . . . 591 – characterisation . . . . . . . 443 forest . . . . . . . . . . . . . . . . 819 – algebra . . . . . . . . . . . . . 818 – syntactic – . . . . . . . . . . 821 formal – Laurent series . . . . . . . . . 466 – power series . . . . . . . . 464, 937 – algebraic – . . . . . . . . . . 464 formula – first-order – . . . . . . . . . . 592

– monadic second-order – . . . . 592 forward diamond condition . . . 1252 forward Ramsey split . . . . . . . 682 fragile language . . . . . . . . . . 788 Franks’ theorem . . . . . . . . . . 997 free – abelian group . . . . . . . 863, 875 – action . . . . . . . . . . . . . 871 – group . . . . . . . . . 843, 892, 901 – basis . . . . . . . . . . . . . 843 – free factor . . . . . . . . . . 851 – generalised word problem . . 847 – rank . . . . . . . . . . . . . 843 – monoid . . . . . . . . . . 4, 58, 498 – ! -semigroup . . . . . . . . . . 721 – partially abelian group . . . . . 863 – product . . . . . . . . . . . . . 901 – amalgamated – . . . . . . . . 864 – profinite – group – rank . . . . . . . . . . 643, 645 – monoid . . . . . . . . . . . 577 – semigroup . . . . . . . . . . . . 4 – variable . . . . . . . . . . . . 957 full shift . . . . . . . . . . . . . . 988 function – automatic real – . . . . . . . 1107 – complexity – . . . . . . . . . . 955 – continuous – . . . . . . . . . . 700 – elementary – . . . . . . . . . 1079 – multi-valued – . . . . . . . . . 470 – quasi-automatic – . . . . . . . 940 – rank – . . . . . . . . . . . . . 204 – rational – . . . . . . . . . . 84, 931 – real – – continuous – . . . . . . . . 1116 – smooth – . . . . . . . . . . 1116 – uniformly continuous – . . . . 620 – word – . . . . . . . . . . . . 1113 functionally recursive group . . . . 888 functorial star . . . . . . . . . . . 742 fundamental group . . 872, 874, 880, 884

Index

xxxiii

—G— grammar – 2D – . . . . . . . . . . . . . . 318 Gaifman’s theorem . . . . . . . . 1081 – Boolean – . . . . . . . . . . . 775 Gale–Stewart game . . . . . . . . 703 – conjunctive – . . . . . . . . . . 774 game . . . . . . . . . . . . . . 221, 274 – linear – . . . . . . . . . . . 777 – colouring function . . . . . . . 221 – context-free – (CFG) 464, 772, 1388 – colours . . . . . . . . . . . . . 221 – context-free matrix – . . . . . . 320 – cops and robber – . . . . 417–418 – Kolam – . . . . . . . . . . . . 319 – determined – . . . . . . . . . . 703 – picture – . . . . . . . . . . . . 318 – duplicator players . . . . . . . 226 – Průša grid – (PGG) . . . . . . . 323 – Gale–Stewart – . . . . . . . . . 703 – puzzle – . . . . . . . . . . . . 319 – graph – . . . . . . . . . . . . . 417 – regional-tile – (RTG) . . . . . . 324 – membership – . . . . . . . . . 276 – symbolic – . . . . . . . . . . . 319 – parity – . . . . . . . . . . . 221, 274 – tile – (TG) . . . . . . . . . . . 324 – players Zero and One . . . . . 221 – Crespi–Reghizzi–Pradella – . 324 – positional strategy . . . . . 222, 275 – unambiguous – . . . . . . . . . 464 – positionally determined – . . . 275 – with grid . . . . . . . . . . . . 321 – simulation – . . . . . . . . . . 225 – spoiler players . . . . . . . . . 226 graph . . . . . . . . . . . . . . 530, 989 – adjacency matrix . . . . . . . . 989 – two-player – . . . . . . . . . . 221 – Cayley – . . . . . . . . . . 863, 872 – Wadge – . . . . . . . . . . . . 705 – colouring . . . . . . . . . . . . 547 – see also determinacy – completely reducible – . . . . . 152 – see also winning – contraction . . . . . . . . . . . 996 gate . . . . . . . . . . . . . . . . 495 – critical – . . . . . . . . . . . . 153 – generalised input . . . . . . . . 496 – edge . . . . . . . . . . . . . . 989 – input . . . . . . . . . . . . . . 495 – essential – . . . . . . . . . . . 989 – output . . . . . . . . . . . . . 495 – Eulerian – . . . . . . . . . . . 542 genealogical ordering . . . . . . . 960 – expansion . . . . . . . . . . . 996 generalised – game . . . . . . . . . . . . . . 417 – centralised controller synthesis 1234 – higher edge – . . . . . . . . . . 990 – finite automaton (GFA) 1462, 1466 – hyperbolic – . . . . . . . . . . 891 – power series . . . . . . . . . . 938 – morphism . . . . . . . . . . . 992 – sequential machine (GSM) . . . . 81 – of a max-plus matrix . . . . . . 153 – word problem . . . . . . . 842, 883 – of constant out-degree . . . . . 547 generator of a topological class . . 701 – of groups . . . . . . . . . . . . 864 geodesic distance . . . . . . . . . 850 – path . . . . . . . . . . . . . . 989 gliding point . . . . . . . . . . 159, 161 – primitive – . . . . . . . . . . . 548 Glushkov automaton . . 54, 68, 73, 421 – Schreier – . . . . 848, 861, 873, 891 golden – state . . . . . . . . . . . . . . 989 – mean shift . . . . . . . . . . . 989 – strongly connected – . . . . . . 542 – ratio . . . . . . . . . . . . . . 442 – syntactic – . . . . . . . . . . 1024 good substitution . . . . . . . . . . 968 – tameness . . . . . . . . . . . . 637 graded monoid . . . . . . . . . . 46, 64 – totally synchronising – . . . . . 556

xxxiv

Index

– underlying – . . . . . . . . . . 530 – underlying – of a letter . . . . . 550 – vertex . . . . . . . . . . . . . 989 – Wielandt – . . . . . . . . . . . 555 greedy algorithm – compression . . . . . . . . . . 537 – extension . . . . . . . . . . . . 541 Green’s relations . . . . . . . 658, 1024 grid constructor . . . . . . . . . . 322 Grigorchuk group . 885, 887–888, 890, 892–895 Gröbner basis . . . . . . . . . . . 466 group – affine – . . . . . . . . . . . 888, 898 – Alëshin – . . . . . . . 888, 892, 900 – amenable – . . . . . . . . . . . 891 – Artin – . . . . . . . . . . . . . 879 – asynchronously automatic – . . 878 – automata – . . . . . . 842, 885–902 – regular weakly branch . . . . 893 – automatic – . . . . . . . 875–883 – normal forms . . . . . . . . 875 – right/left – . . . . . . . . . . 878 – ball of radius n . . . . . . . . . 874 – Basilica – . . . . . . 890, 892–893 – Baumslag–Solitar – . 884, 888, 898 – biautomatic – . . . . . . . . . 878 – conjugacy problem . . . . . 881 – bounded – . . . . . . . . 891–892 – Bowen–Franks – . . . . . . . . 997 – braid – . . . . . . . . . . . . . 879 – branch – . . . . . . . . . 893–894 – commensurator . . . . . . . . . 852 – conjugacy separable – . . . . . 890 – contracting – . . . . . . . . 891, 897 – Coxeter – . . . . . . . . . . . 879 – divisible – . . . . . . . . . . . 939 – exponential growth . . . . . . . 894 – finite – . . . . . . . . . . . . . 862 – finitely presented – . . 842, 864, 872 – free – . . . . . . . . . 843, 892, 901 – abelian – . . . . . . . . . 863, 875 – basis . . . . . . . . . . . . . 843

– free factor . . . . . . . . . . 851 – generalised word problem . . 847 – partially abelian – . . . . . . 863 – rank . . . . . . . . . . . . . 843 – functionally recursive – . . . . 888 – fundamental – . . 872, 874, 880, 884 – graph – . . . . . . . . . . . 863, 865 – Grigorchuk – . . 885, 887–888, 890, 892–895 – growth . . . . . . . . . . 894–895 – Gupta–Sidki – . . . . 885, 887–888, 892–893 – Heisenberg – . . . . . . . . . . 884 – identity . . . . . . . . . . . . . 741 – intermediate growth . . . . . . 894 – iterated monodromy . . . . . . 897 – Kazhdan – . . . . . . . . . . . 902 – kernel . . . . . . . . . . . . . 636 – lamplighter – . . . . . . . . . . 888 – language . . . . . . . . . 791, 1470 – mapping class – . . . . . . . . 879 – nilpotent – . . . . . . 855, 864, 884 – non-uniformly exponential growth . . . . . . . . . . . . . 895 – of an automaton . . . . . . . . 885 – p -group . . . . . 499, 855, 888, 892 – polynomial growth . . . . . . . 894 – pure braid – . . . . . . . . . . 879 – regular branch – . . . . . . . . 893 – relatively hyperbolic – . . . . . 883 – relators . . . . . . . . . . . . . 872 – residually finite – . . . . . . . . 855 – right angled Artin – . . . . . . 863 – self-similar – . . . . . . . . 887, 895 – semi-hyperbolic – . . . . . . . 883 – structure – . . . . . . . . . . 1024 – surface – . . . . . . . . . 873–874 – virtually – abelian – . . . . . . . . . . . 864 – free – . . . . . . . 862–863, 865 – weakly branch – . . . . . . 891, 894 – word-hyperbolic – . . 865, 881–884

Index

growing – letter . . . . . . – substitution . . . growth – function . . . . – maximal – . . . – of groups . . . . – series . . . . . . – type . . . . . . guidable automaton Gupta–Sidki group .

. . . . . . . . 965 . . . . . . . . 966 . . . . . . .

. . . . . . .

. . . . . . .

. . . . . 874 . . . . . 966 . . 894–895 . . . . . 874 . . 965–966 . . . . . 285 885, 887–888, 892–893

—H— Hadamard’s theorem . . . . . . . . 469 Hahn’s power series . . . . . . . . 938 Hankel matrix . . . . . . . . . . . 314 Hausdorff completion . . . . . . . 621 Hausdorffisation . . . . . . . 620–621 HD0L (ultimate) periodicity problem 977 heap – automaton . . . . . . . . . . . 154 – model . . . . . . . . . . . . . 154 hedge – automaton – language . . . . . . . . . . . 256 – nondeterministic finite – (NFHA) . . . . . . . . . . . . 256 – of a tree . . . . . . . . . . . . 254 Heisenberg group . . . . . . . . . 884 hierarchy – Borel – . . . . . . . . . . . . . 699 – dot-depth – . . . . . . . . 510, 1083 – Straubing – . . . . . . . . . . 1083 – Wadge – . . . . . . . . . . . . 700 higher-order – collapsible pushdown automaton 1315 – pushdown automaton . . . . 1315 – stack . . . . . . . . . . . . . 1313 Higman–Haines set . . . . . . . . 435 history tree . . . . . . . . . . . . . 210 HNN extension . . . . . . . . . . 864 homogeneous partition . . . . . . . 324

Hopcroft’s algorithm – configuration . . . – splitter . . . . . . – waiting set . . . . hyper-arithmetical set hyperbolic – boundary . . . . . – graph . . . . . . . – plane . . . . . . . – space . . . . . . .

xxxv

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

854, 857, 883 . . . . . 891 . . . . . 873 . . . . . 881

346 345 345 793

—I— I -closed . . . . . . . . . . . . . 1251 ideal . . . . . . . . . . . . . . 467, 730 idempotent . . . . . . . . . . . . 1024 – tree . . . . . . . . . . . . . . . 814 identity . . . . . . . . . . . . . . . 618 – aperiodic – . . . . . . . . . . 47, 72 – classical – . . . . . . . . . . . 749 – group – . . . . . . . . . . . . . 741 – matrix star – . . . . . . . . . . 734 – natural – . . . . . . . . . . . 46, 67 – of unary algebras . . . . . . . . 530 – permutation – . . . . . . . . . 734 – product star – . . . . . . . . . 731 – profinite . . . . . . . . . . . . 629 – rational – . . . . . . . . . . . . . 45 – sum star – . . . . . . . . . . . 731 – trivial – . . . . . . . . . 46, 67, 69 image . . . . . . . . . . . . . . 1109 implicit operation . . . . . . . . . 629 incidence matrix . . . . . . . . . . 961 inclusion problem . . . . . . . . 1284 incompressible – pair . . . . . . . . . . . . . . . 550 – set . . . . . . . . . . . . . . . 550 independent looping tree automaton (ILTA) . . . . . . . . . . . . . 770 index of a partition . . . . . . . . . 339 inference rule . . . . . . . . . . 1392 – instantiation . . . . . . . . . 1393 – item . . . . . . . . . . . . . 1392 initialisable set . . . . . . . . . . . 714

Index

xxxvi

in-merge – graph morphism . . – labelled – . . . . . – of graph . . . . . . inner-cuts u . . . . . . input alphabet . . . . . in-split – labelled – . . . . . – of graph . . . . . . integer – arithmetic . . . . . – multiplicatively – dependent – . . . – independent – . . intersection . . . . . . inverse – automaton . . . . . – monoid . . . . . . . – semigroup . . . . . involutive automaton . . irreducible – automaton . . . . . – matrix . . . . . . . – shift space . . . . . – substitution . . . . . isomorphism problem . isoperimetric inequality iterated – function system . . – monodromy group .

kernel . . . . . . . . . . . . . . . . 84 . 992 – congruence . . . . . . . . . . . 619 1000 – group – . . . . . . . . . . . . . 636 . 992 – k - – . . . . . . . . . 916, 919, 954 . 657 Kleene – lattice . . . . . . . . . . . . . 753 . . 81 – monoid . . . . . . . . . . . . . . 73 . . . . . 1000 – ’s theorem . . . . . . 39, 43, 62, 66 . . . . . . 992 Kleene algebra . . . . . . . . 751–754 Kleene–Schützenberger theorem . . 118 . . . . . 1205 Kondacs–Watrous quantum finite automaton (KWQFA) . . . . 1465 . . . . . . 948 Krieger automaton . . . . . . . . 1005 . . . . . . 948 Kripke structure . . . . . . . . . 1430 . . . . . 1190 Krohn–Rhodes – complexity . . . . . . . . . . . 640 – theorem . . . . . . . . . . . . 606 . . . . . . 844 . . . . 844, 853 Kronecker theorem . . . . . . . . . 948 . . . . . . 865 —L— . . . . . . 844 label . . . . . . . . . . . . . . . . . 5 . . . . . . 920 Lagrange implicit function theorem . . . . . . . . . 475–476 . . . . . . 961 . . . 642, 1007 lamplighter group . . . . . . . . . 888 . . . . . . 961 Landau’s function . . . . . . . . . 419 . . . . 842, 883 language . . . . . . . . . . . . . . . 4 – accepted – . . . . . . . . . . . . 43 . . . . 880, 883 – base . . . . . . . . . . . . . . 833 – class . . . . . . . . . . . . . . 580 . . . . . 1107 – computable operation . . . . . 782 . . . . . . 897 – conjugacy . . . . . . . . . . . 786 – conjunctive – . . . . . . . 774–777 —J— – linear – . . . . . . . . 777–778 J1 . . . . . . . . . . . . 570–571, 639 – constant term . . . . . . . . . . . 42 Jensen’s inequality . . . . . . . . . 477 – context-free – 88, 464, 772–774, 1481 join operation (of two lattices) . . . 599 – continuous operation . . . 766–768 Julia set . . . . . . . . . . . . . . 897 – convergent sequence . . . . . . 766 —K— – D0L – . . . . . . . . . . . . . 103 k -automaton . . . . . . . . . . . . 914 – definite – tree – . . . . . . . . . . . . 808 k -block . . . . . . . . . . . . . . 990  implicit signature . . . . . . . . 637 – word – . . . . . . . . . . . . 807 k -recognisable set . . . . 947, 956, 972 – dense – . . . . . . . . . . . . . 588 Kazhdan group . . . . . . . . . . . 902 – Dyck – . . . . . . . . . . . 777, 862 . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

Index

– equation . . . . . . . . 29–32, 765 – exclusive stochastic . . . . . 1471 – finite – . . . . . 419–420, 425, 437 – operation problem . . . . . . 435 – forest – – recognised – . . . . . . . . . 821 – fragile – . . . . . . . . . . . . 788 – group – . . . . . . . . . . 791, 1470 – implementable – . . . . . . . 1252 – infinite – – operation problem . . . . . . 430 – label testable – . . . . . . . . . 821 – lattice . . . . . . . . . . . . . 578 – left – . . . . . . . . . . . . . . 339 – local – . . . . . . . . . . . . . . 24 – max-regular – . . . . . . . . . 723 – monomial – . . . . . . . . . . 678 – monotone operation . 768–769, 791 – nesting tree – . . . . . . . . . . 833 – non-counting – . . . . . . . . . 509 – nonstochastic – . . . . . . . . 1480 – of a hedge automaton . . . . . 256 – path testable – . . . . . . . . . 831 – P-complete – . . . . . . . . . . 778 – periodic – . . . . . . . 1469, 1472 – picture – . . . . . . . . . . . . 304 – closure properties . . . . . . 312 – complement . . . . . . . . . 313 – local – . . . . . . . . . . . . 307 – logic formula . . . . . . . . 311 – recognisable – . . . . . . . . 308 – piecewise-testable – . 571, 584, 639 – polynomial – . . . . . . . 678, 1477 – prime – . . . . . . . 778, 788, 1468 – quotient . . . . . . . . . . 58, 573 – rational – . . . . . . . . . 4, 41, 857 – recognisable – . . . . . . 6, 43, 857 – tiling – . . . . . . . . . . . . 308 – recursive – 779, 782–789, 1481–1482 – recursively enumerable – . 782–789, 1471, 1481 – regional – . . . . . . . . . . . 324 – regular – . . . . . . . . . . . 4, 459

xxxvii

– reversible – . . . . . . . . . . 433 – right – . . . . . . . . . . . . . 339 – †1 -language . . . . . . . . . . 587 – slender – . . . . . . . . . . . . 588 – sparse – . . . . . . . . . . . . 588 – star-free – 420, 433, 503, 600, 1080 – stochastic – 1463, 1471, 1473, 1477 – subregular – . . . . . 420, 433, 437 – super-turtle – . . . . . . . . . . 521 – tree – – NFTA-recognisable . . . . . 239 – recognisable – . . . . . . 238, 805 – regular – . . . . . . 243, 268, 805 – turtle – . . . . . . . . . . . . . 520 – unambiguous – . . . . . . . . . 519 – unary – 419–420, 425–426, 428, 437, 440, 1463, 1469–1471, 1473–1474, 1483 – operation problem . . . . 430, 435 – universal witness – . . . . . . . 432 – valid accepted – computation (VALC) . . . . . . . . . . . . 776 – variety . . . . . . . . . . . . . 573 – with zero . . . . . . . . . . . . 587 – word-based operation . . . . . 768 Las Vegas computation . . . . . 1483 latest appearance – automaton (LAA) . . . . . . . 202 – record (LAR) . . . . . . . . . 202 lattice – generated by a set . . . . . . . 599 – of languages . . . . . . . . . . 578 Latvian quantum finite automaton (LaQFA) . . . . . . . . . . . 1465 leaf transitions . . . . . . . . . . . 239 learner . . . . . . . . . . . . . . . 379 learning – from given data . . . . . . . . 387 – in the limit . . . . . . . . . . . 379 – through a minimally adequate teacher (MAT) . . . . . . . . . . . . . 382 length – litteral- . . . . . . . . . . . . . . 42

Index

xxxviii

– of a path . . . . . . . – of a regular expression – of a word . . . . . . . letter . . . . . . . . . . . – arity . . . . . . . . . – growing – . . . . . . – neutral – . . . . . . . – nullary – . . . . . . . – occurrence . . . . . . level automaton . . . . . lift construction . . . . . limit space . . . . . . . . limited – automaton . . . . . . – nondeterminism . . . – series . . . . . . . . . limitedness problem . . . Lindenmayer system . . . linear – numeration base . . . – ordering . . . . . . . – rational expression . . – recurrence . . . . . . – representation . . . . – temporal logic (LTL) . – time computation linearisation . . . . . linked pair . . . . . . Liouville – number . . . . . . – ’s inequality . . . literal length . . . . . local – automaton . . . . – language . . . . . – picture language . – specifications . . . locally finite – algebra . . . . . . – semigroup . . . . – semiring . . . . . – series . . . . . . .

. . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . .

. 5 locally threshold testable . . . . . 1081 461 logic – definable relation . . . . . . . 1035 . 4 . 4 – first-order – (FOL) . 289, 687, 1031, 1034, 1074 804 – on trees . . . . . . . . . . . 803 965 – with child relations . . . . . 810 504 – formula . . . . . . . . . . . 1035 804 – picture language . . . . . . . 311 . 4 1113 – interpretation . . . 1043, 1045, 1049 – linear temporal – (LTL) . 816, 1081, . 200 1158, 1417 . 897 – monadic second-order – (MSO) 244, 289, 1075, 1163, 1221 . . . . . 428 – MSO-definable . . . . . . . 245 . . . . . 421 – weak – (WMSO) . . . . . . . 290 . . . . . 157 – weighted – . . . . . . . . . . 129 . . . . . 673 – propositional dynamic – (PDL) 1164 . . 793, 1107 – global formula . . . . . . . 1165 – local formula . . . . . . . 1165 . . . . . 950 – path expression . . . . . . 1165 . . . . . 657 – sentence . . . . . . . . . . . 1035 . . . . . . 27 – temporal – . . . . . . . . . . . 228 . . . . . 929 . . . . . 118 logical – interpretation . . . . . . . . . 1092 . . 816, 1081, 1158, 1417 – reduction . . . . . . . . 1092–1093 . . . . . 922 loop . . . . 1161 – complexity . . . . . . . . . . . . 53 . . . . . 721 – in an ! -automaton . . . . . . . 217 – index . . . . . . . . . . . . . . . 53 . . . . . 923 – testing predicate . . . . . . . 1233 . . . . . 922 lower bound technique . . . . 415–418 . . . . . . 42 – communication complexity . . 416 – pumping method . . . . . . . . 416 25, 355, 1018 LR automaton . . . . . . . 1400–1402 . . . . . . 24 Lyndon word . . . . . . . . . . . . 361 . . . . . 307 —M— . . . . 1239 magic number problem . . . . . . 421 . . . . . 634 Mal’cev product . . . . . . . . 602, 631 . . . . . 669 map . . . . . 134 – abelianisation – . . . . . . . . 961 . . . . . 119 – factor – . . . . . . . . . . . . . 972

Index

– in-merging – . . . . . . . . . 1011 – in-splitting – . . . . . . . . . 1011 – out-splitting – . . . . . . . . 1012 – Parikh – . . . . . . . . . . . . 961 – sliding block – . . . . . . . . . 990 Maple . . . . . . . . . . . . . 468, 472 mapping class group . . . . . . . . 879 marker . . . . . . . . . . . . . . . . 89 marking . . . . . . . . . . . . . . . 89 Markov – chain . . . . . . . . . . . . . 1347 – quantum – . . . . . . . . . 1474 – recursive – (RMC) . . . . . 1372 – decision process (MDP) . . . 1350 – recursive – . . . . . . . . . 1377 MAT-learning . . . . . . . . . . . 382 matrix – adjacency – . . . . . . . . 989, 999 – alphabetic – . . . . . . . . . . 999 – block decomposition . . . . . . . 51 – column division – . . . . . . . 993 – context-free – grammar . . . . 320 – density – . . . . . . . . . . . 1460 – elementary equivalent – . 995, 1003 – embedding . . . . . . . . . . . 887 – Hankel – . . . . . . . . . . . . 314 – incidence – . . . . . . . . . . . 961 – irreducible – . . . . . . . . . . 961 – primitive – . . . . . . . . . . . 961 – representation . . . . . . . . . . 87 – row division – . . . . . . . . . 994 – similar – . . . . . . . . . . . 1013 – stabilisation . . . . . . . . . . 674 – stable – . . . . . . . . . . . . . 670 – stochastic – . . . . . . 1462, 1466 – strong shift equivalent – . 995, 1003 – symbolic – elementary equivalent – . . 1013 – strong shift equivalent – . . 1013 – transition – . . . . . . . . . 43, 999 max-plus – automaton . . . . . . . . . . . 154

xxxix

– convex – hull . . . . . . . . . . . . . 162 – set . . . . . . . . . . . . . . 162 – eigenvalue . . . . . . . . . . . 164 – eigenvector . . . . . . . . . 161, 164 – semiring . . . . . . . . . . . . 152 – series . . . . . . . . . . . . . . 154 – spectral theorem . . . . . . . . 159 max-regular language . . . . . . . 723 Mazurkiewicz trace . . . . . . . 1176 McNaughton–Papert theorem 596, 1080 McNaughton–Yamada algorithm 45, 49, 66, 72 Mealy – automaton . . . . . . . . . . . 885 – bireversible – . . . 887, 899–902 – contracting – . . . . . . . . . 891 – dual – . . . . . . . . . . . . 887 – nuclear – . . . . . . . . . . . 891 – reset machine . . . . . . . . 898 – reversible – . . . . . . . 887, 897 – machine . . . . . . . . . . . . . 82 measure – ergodic – . . . . . . . . . . . . 973 – invariant – . . . . . . . . . . . 973 – uniquely ergodic – . . . . . . . 973 measurement – partial – . . . . . . . . . . . 1460 – quantum – . . . . . . . . . . 1459 membership – game . . . . . . . . . . . . . . 276 – problem . . . . . . . . . . . . 842 – rational subset – . . . . . . . 864 message sequence chart (MSC) . 1161 meta-transition . . . . . . . . . . . 181 method – recursive – . . . . . . . . 45, 51, 66 – state-elimination – . . 45, 47, 66, 72 – system-solution – . . 45, 48, 66–67 metric . . . . . . . . . . . . . . . 619 – space . . . . . . . . . . . . . . 619 – see also pseudometric – see also pseudo-ultrametric

xl

Index

military ordering . . . . . . . . 210, 960 minimal – automaton . . . 17, 339, 1005, 1009 – complete – . . . . . . . . . . . 17 – nondeterministic – . . . . . . 368 – dynamical system . . . . . . . 972 – strongly connected component 1007 – weighted finite automaton (WFA) . . . . . . . . . . . . 1122 minimisation of automata . 1190, 1472 min-plus semiring . . . . . . . 152, 669 model checking . . . . . . 1424, 1448 – CFMs . . . . . . . . . . . . 1166 – program complexity . . . . . 1450 monadic – second-order – logic (MSO) 244, 289, 1075, 1163, 1221 – MSO-definable . . . . . . 245 – weak – (WMSO) . . . . . 290 – weighted – . . . . . . . . . 129 – theory of one successor (S1S) 228 – transitive closure logic (MTC) 1079 monoid . . . . . . . . . . . . . . . . 3 – aperiodic – . . . . . . . . . 572, 853 – group-free – . . . . . . . . . 499 – automatic – . . . . . . . . . . 878 – commutative – . . . . . . . . . 570 – countably factorisable – . . . . 745 – finitely – factorisable – . . . . . . . . 734 – generated – . . . . . . . . 61, 64 – free – . . . . . . . . . . . 4, 58, 498 – free profinite – . . . . . . . . . 577 – generating set . . . . . . . . . . 61 – graded – . . . . . . . . . . . 46, 64 – group kernel – . . . . . . . . . 636 – idempotent – . . . . . . . . . . 570 – inverse – . . . . . . . . . . 844, 853 – J-trivial – . . . . . . . . . . . 571 – Kleene – . . . . . . . . . . . . . 73 – non-solvable – . . . . . . . . . 499 – partial – . . . . . . . . . . . . 734

– rational – . . . . . . . . . . . . . 73 – recognisable – . . . . . . . . . . 37 – solvable – . . . . . . . . . . . 499 – syntactic – . . . . . . . 20, 498, 861 – transition – . . . . . . . . . 844, 853 monomial language . . . . . . . . 678 monotone operation . . . 768–769, 791 Moore – equivalence . . . . . . . . . . 343 – ’s algorithm . . . . . . . . . . 343 – slow automaton . . . . . . 344, 351 Moore–Crutchfield quantum finite automaton (MCQFA) . . . . 1465 morphic – composition . . . . . . . . . . . 80 – equivalence problem . . . . . . 102 morphism . . . . . . . . . . . 529, 915 – fixed point . . . . . . . . . 915, 954 – graph – . . . . . . . . . . . . . 992 – injective – . . . . . . . . . . . . 92 – in-merge – . . . . . . . . . . . 992 – length multiplying – . . . . . . 575 – nonerasing (non-erasing) – . 80, 575, 585 – of automata . . . . . . . . . . . 17 – of deterministic automata . . . 844 – of shifts . . . . . . . . . . . . 990 – prolongable – . . . . . . . 915, 954 – recognising – . . . . . . . . . . . 37 – relational – . . . . . . . . . . . 632 – semigroup – . . . . . . . . . . 996 – syntactic – . . . . . . . . . . . . 20 – uniform – . . . . . . . 80, 915, 954 Morse–Hedlund theorem . . . . . . 955 mu-calculus . . . . . . . . . . . 1231 Muller – recurrence condition . . . . 193, 267 – set . . . . . . . . . . . . . . . 193 multicontext . . . . . . . . . . . . 806 multidimensional – automatic set . . . . . . . . . . 918 – automaton . . . . . . . . . . . 918 multi-head finite automaton . . . . 426

Index

multiple initial state . . multiplication (exterior) multiplicatively – dependent set . . . – independent set . . multitape automaton . . multi-valued function . Myhill–Nerode – relation . . . . . . . – atom . . . . . . . – theorem . . . . . .

. 420–421, 442 . . . . . . . 64 . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

948 948 . 82 470

. . . . . . 412 . . . . . . 413 . . . . . . 376

—N— natural language processing (NLF) 1383 Nayak quantum finite automaton (NaQFA) . . . . . . . . . . . 1465 NC . . . . . . . . . . . . . . . . . 497 NC1 . . . . . . . . . . . . . . . . 497 negative example . . . . . . . . . 379 Nerode – automaton . . . . . . . . . . . . 17 – congruence . . . . . . . . . . . 340 – equivalence . . . . . . . . . . . 18 net . . . . . . . . . . . . . . . . . 620 – Cauchy – . . . . . . . . . . . . 620 – convergence . . . . . . . . . . 620 network – communication – . . . . . . . 1149 nilpotent group . . . . . . 855, 864, 884 Nivat – conjecture . . . . . . . . . . . 956 – theorem . . . . . . . . . . . . 127 nondeterministic – finite – automaton (NFA) 411, 769, 1418, 1462 – Chrobak normal form 420–421, 425, 431 – one-way – (1NFA) . . . 1462 – two-way – (2NFA) . . . 1462 – tree automaton (NFTA) . . . 239 – finite hedge automaton (NFHA) 256 – message complexity . . . . . . 414

xli

– minimal automaton . . . . . . 368 – quantum finite automaton (NQFA) . . . . . . . . . . . 1471 – state complexity . . . . . . 412, 430 – transition complexity . . . . . . 414 nonerasing (non-erasing) morphism 80, 575, 585 non-self dual . . . . . . . . . . . . 707 – class . . . . . . . . . . . . . . 709 – degree . . . . . . . . . . . . . 708 nonstochastic language . . . . . . 1480 non-uniformity . . . . . . . . . . . 496 normal subgroup . . . . . . . . . . 850 normalisation . . . . . . . . . . . 950 number – ˇ - – . . . . . . . . . . . . . . 951 – cardinal – . . . . . . . . 695–696 – decision diagram (NDD) . . . . 948, 1192–1194, 1196–1199, 1201 – ordinal – . . . . . . . . . . . . 695 – Parry – . . . . . . . . . . . . . 951 – Perron – . . . . . . . . . . . . 962 – Pisot – . . . . . . . . . . . . . 950 numeration – base . . . . . . . . 949, 1191, 1202 – Bertrand – . . . . . . . . . . 951 – linear – . . . . . . . . . . . 950 – system – abstract – (ASN) . . . . . . . 959 – Pisot – . . . . . . . . . . . . 950 —O— observability condition . observation table . . . . occurrence of a letter . . ! -automatic – presentation . . . . – structures – first-order theory . ! -automaton . . . . . . – alternating – . . . . – dual . . . . . . . – state-controlled –

. . . . . 1230 . . . 377–378 . . . . . . . 4 . . . . . . 227 . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

227 189 222 223 222

xlii

Index

– transition-controlled – . . . . 223 – weak – . . . . . . . . . . . . 224 – cascade . . . . . . . . . . . . . 199 – conditional determination . . . 219 – finite-state – . . . . . . . . . . 192 – loop . . . . . . . . . . . . . . 217 – strongly unambiguous – . . . . 208 – tower . . . . . . . . . . . . . . 218 – unambiguous – . . . . . . . . . 208 – universal – . . . . . . . . . . . 222 – wall . . . . . . . . . . . . . . 218 – with output . . . . . . . . . . . 198 ! -concatenation . . . . . . . . . . 190 ! -idempotent Conway semiring . . 731 ! -language . . . . . . . . . . . . . 189 – initial congruence relation . . . 213 – parity index . . . . . . . . . . 219 – Rabin index . . . . . . . . . . 219 – regular . . . . . . . . . . . . . 194 – saturation . . . . . . . . . . . 215 – syntactic congruence . . . . . . 214 ! -power . . . . . . . . . . . . 190, 577 ! -product . . . . . . . . . . . . . 190 ! -regular expression . . . . . . . . 195 ! -semigroup . . . . . . . . . . . . 720 – free – . . . . . . . . . . . . . . 721 – pointed – . . . . . . . . . . . . 721 – syntactic – . . . . . . . . . . . 721 ! -word . . . . . . . . . . . . 189–190 operation – Boolean – . . . . . . . . 502, 1190 – explicit – . . . . . . . . . . . . 629 – implicit – . . . . . . . . . . . . 629 – problem . . . . . . . 429, 444–445 – finite language . . . . . . . . 435 – for finite automata . . . . 430, 445 – for regular expressions . . . . 440 – infinite language . . . . . . . 430 – unary language . . . . . 430, 435 – star – . . . . . . . . . . . . . . 502 order – problem . . . . . . . . . . 842, 883 – -type . . . . . . . . . . . . . . 696

ordered semigroup ordering – elimination – . – genealogical – – linear – . . . . – military – . . – radix – . . . . – term – . . . . ordinal number . . out-merge . . . . – labelled – . . output alphabet . . out-split – labelled . . . out-splitting map .

. . . . . . 790, 1009 . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

424, 468 . . 960 . . 657 210, 960 . . 960 . . 467 . . 695 . . 994 . 1001 . . . 81

. . . . . . . . . . . . . . . .

—P— Parikh map . . . . . . . parity – automaton . . . . . – game . . . . . . . . – index – of an ! -language – problem . . . . . – recurrence condition parse – forest . . . . . . . . – phrase . . . . . . . – tree . . . . . . . . . parsing . . . . . . . . . – strategy . . . . . . – tabular – method . . partial – Conway semiring . – signature . . . . . . partition . . . . . . . . – coarser – . . . . . . – function . . . . . . – homogeneous – . . – index . . . . . . . . – refinement . . . . . path . . . . . . . . . . – accepting – . . . . .

1001 1012

. . . . . . 961 . . . . . 1220 . . . . 221, 274 . . . . . . 219 . . . . . . 288 . . . . 193, 268 . . . . . .

. . . . . .

. . . .

. . 1387 . . 1384 . . 1384 . . 1384 1386, 1401 . . . 1387

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . 731 . . 356 339, 992 . . 339 . 1396 . . 324 . . 339 . . 339 . . . 5 . . . 5

Index

– end . . . . . . . . . . . . . . . . 5 – final – . . . . . . . . . . . . . . 5 – in a graph . . . . . . . . . . . 989 – initial – . . . . . . . . . . . . . . 5 – left-recurring – . . . . . . . . . 207 – length . . . . . . . . . . . . . . 5 – origin . . . . . . . . . . . . . . 5 – successful – . . . . . . . . . . . 5 P-complete language . . . . . . . . 778 perfect field . . . . . . . . . . . . 934 period . . . . . . . . . . . . . 954, 992 periodic – array . . . . . . . . . . . . . . 974 – dynamical system . . . . . . . 972 – inside a subset of Nd . . . . . . 974 – language . . . . . . . . 1469, 1472 – locally – set . . . . . . . . . . 975 – tiling . . . . . . . . . . . . . . . 93 – word . . . . . . . . . . . . . . 954 – see also ultimately periodic Perron – number . . . . . . . . . . . . . 962 – theorem . . . . . . . . . . . . 961 Perron–Frobenius’ theorem . . . . 962 p -group . . . . . . . 499, 855, 888, 892 picture . . . . . . . . . . . . . . . 304 – bordered – . . . . . . . . . . . 304 – domain . . . . . . . . . . . . . 304 – homogeneous – . . . . . . . 304 – grammar . . . . . . . . . . . . 318 – language . . . . . . . . . . . . 304 – closure properties . . . . . . 312 – complement . . . . . . . . . 313 – local – . . . . . . . . . . . . 307 – logic formula . . . . . . . . 311 – recognisable – . . . . . . . . 308 – regional – . . . . . . . . . . . 324 pipeline . . . . . . . . . . . . . 1240 Pisot – number . . . . . . . . . . . . . 950 – numeration system . . . . . . . 950 plant . . . . . . . . . . . . . . . 1227 play . . . . . . . . . . . . . . . . 274

xliii

pointlike – conjecture . . . . . . . . . . . 638 – subset . . . . . . . . . . . . . 638 Polish space . . . . . . . . . . . . 698 polynomial – Artin–Schreier – . . . . . . . . 938 – closure . . . . . . . . . . . . . 678 – language . . . . . . . . . 678, 1477 – series . . . . . . . . . . . . . . . 64 position automaton . . . . . . . . . 421 positional strategy . . . . . . . 222, 275 positionally determined game . . . 275 positive example . . . . . . . . . . 379 postselection . . . . . . . . . . . 1482 power series . . . . . . . . . . . . . 64 – algebraic – . . . . . . . . . . . 932 – generalised – . . . . . . . . . . 938 – Hahn’s – . . . . . . . . . . . . 938 powerset construction . . . . . 412, 418 prebase . . . . . . . . . . . . . . . . 73 preclone . . . . . . . . . . . . . . 813 – finitary – . . . . . . . . . . . . 815 – free – . . . . . . . . . . . . . . 814 prefix – -closed . . . . . . . . . . . . . 376 – common – . . . . . . . . . . . 363 – metric . . . . . . . . . . . . . 853 – -suffix decomposition . . . . . 363 – tree acceptor . . . . . . . . . . 376 prefix code . . . . . . . . . . . . . 527 – maximal – . . . . . . . . . . . 527 – synchronised – . . . . . . . . . 527 pre-fixed point . . . . . . . . . . . 747 – see also fixed point preorder . . . . . . . . . . . . . 1009 preperiod . . . . . . . . . . . . . . 954 Presburger arithmetic 957, 1193, 1199, 1201 presentation – ! -automatic – . . . . . . . . . 227 prime language . . . . . 778, 788, 1468 primitive – matrix . . . . . . . . . . . . . 961

xliv

Index

– substitution . . . . . . . . . 642, 961 Pringsheim’s theorem . . . . . . . 470 priority function . . . . . . . . . . 193 probabilistic – finite automaton (PFA) . . . . 1462 – pushdown automaton (PPDA) 1390 problem – bounded section – . . . . . . . 672 – boundedness – . . . . . . . . . 673 – for CFMs . . . . . . . . . 1153 – Church – . 1218–1220, 1222–1224, 1226–1227, 1234–1235 – conjugacy – . . . . . . . . . . 842 – decision – . . . . . . . . . 842, 873 – HD0L (ultimate) periodicity problem . . . . . . . . . . . . 977 – emptiness – 241, 1265, 1472–1473, 1480 – inclusion – . . . . . . . . . . 1284 – isomorphism – . . . . . . . 842, 883 – limitedness – . . . . . . . . . . 673 – membership – . . . . . . . . . 842 – rational subset – . . . . . . . 864 – model-checking – for CFMs . 1153 – NP-complete – . . . . . . . . 1471 – order – . . . . . . . . . . . 842, 883 – Post correspondence – . . . . . 864 – promise – . . . . . . . . . . 1482 – reachability – for CFMs 1152–1153, 1167 – universality – . . . . . . . . 1284 – word – 633, 842, 873, 880, 883, 1477 – generalised – . . . . . . . 842, 883 – in automata groups . . . . . . 890 – over a monoid . . . . . . . . 499 – submonoid . . . . . . . 862–863 product . . . . . . . . . . . . . . 3, 499 – 2-sided semidirect – . . . . . . 606 – concatenation – . . . . . . . . . 4 – deterministic – . . . . . . . . . 602 – direct – . . . . . . . . . . . . . 617 – Mal’cev- . . . . . . . . . . 602, 631 – prefix . . . . . . . . . . . . . . 499

– semidirect – . . . . . . . . 604, 640 – suffix . . . . . . . . . . . . . . 499 – unambiguous – . . . . . . . . . 602 – wreath – . . . . . . . . . . . . 605 – principle . . . . . . . . . . . 605 profinite – algebra . . . . . . . . . . . . . 624 – C-identity . . . . . . . . . . . 581 – ordered . . . . . . . . . . . 581 – distance . . . . . . . . . . . . 576 – equality . . . . . . . . . . . . 580 – equation . . . . . . . . . . . . 578 – free – monoid . . . . . . . . . 577 – Hopfian algebra . . . . . . . . 627 – identity . . . . . . . . . . . . . 629 – inequality . . . . . . . . . . . 580 – seft-free algebra . . . . . . . . 628 – topology . . . . . . . . . . . . 622 – uniformity . . . . . . . . . . . 622 program – branching . . . . . . . . . . . 501 – super-turtle – . . . . . . . . . . 520 – turtle – . . . . . . . . . . . . . 520 projection . . . . . . . . . 88, 507, 702 projective hierarchy . . . . . . . . 702 promise problems . . . . . . . . 1482 proof system – Arthur–Merlin – . . . . . . . 1481 – interactive – . . . . . . . . . 1480 property – decidable – . . . . . . . . . . . . 35 – geometric – . . . . . . . . . . 875 prophetic automaton . . . . . . . . 194 propositional dynamic logic (PDL) 1164 – global formula . . . . . . . . 1165 – local formula . . . . . . . . . 1165 – path expression . . . . . . . 1165 Prouhet–Thue–Morse substitution . 644 Průša grid grammar (PGG) . . . . . 323 pseudoidentity . . . . . . . . . . . 629 – basis . . . . . . . . . . . . . . 629 pseudometric . . . . . . . . . . . . 619 pseudoquasivariety . . . . . . . . . 618

Index

pseudo-ultrametric . . . . . . . . . 619 – pro-Q – . . . . . . . . . . . . . 623 pseudovariety . . . . . 570, 618, 1027 – C-pseudovariety of stamps . . . 582 – decidable – . . . . . . . . . . . 630 – generated by . . . . . . . . . . 618 – has computable  -closures . . . 634 – of finite semigroups . . . . . . 575 – order computable – . . . . . . . 635 –  -full – . . . . . . . . . . . . . 634 – weakly – reducible – . . . . . . . . . . 634 – tame – . . . . . . . . . . . . 634 pseudoword . . . . . . . . . . . . 615 p -substitution . . . . . . . . . . . 975 Puiseux series . . . . . . . . . . . 470 pumping lemma . . . . . . . . . 6, 920 pure induction . . . . . . . . . . . 756 purely substitutive word . . . . . . 954 pushdown automaton (PDA) 1385, 1389 – deterministic – . . . . . . . . 1389 – probabilistic – (PPDA) . . . . 1390 – reduced – . . . . . . . . . . 1389 – tabulation . . . . . . . . . . 1390 – valid prefix property . . . . . 1395 – weighted – (WPDA) . . . . . 1389 puzzle grammar . . . . . . . . . . 319 —Q— quadtree . . . . . . . . quantale . . . . . . . . quantifier – 9>@0 . . . . . . . . – 9>@1 . . . . . . . . – 9mod . . . . . . . . – elimination . . . . . – generalised – . . . . – Ramsey – . . . . . – unary cardinality . . quantum – automaton – 1.5-way – . . . . – alternating . . . .

. . . . . . 323 . . . . . . 744 . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . . . . .

1045 1045 1045 1032 1048 1048 1048

xlv

– computation . . . . . . 1457–1458 – fingerprints . . . . . . . . . . 1467 – finite automaton (QFA) – bistochastic – (BiQFA) . . 1465 – fully – (CiQFA) . . . . . . 1465 – Kondacs–Watrous – (KWQFA) . . . . . . . . . . 1465 – Latvian – (LaQFA) . . . . 1465 – Moore–Crutchfield – (MCQFA) . . . . . . . . . . 1465 – Nayak – (NaQFA) . . . . . 1465 – nondeterministic – (NQFA) 1471 – one-way nondeterministic – (1NQFA) . . . . . . . . . . 1471 – one-way – (1QFA) . . . . . 1463 – general – . . . . . . . . . 1465 – with quantum and classical states (1QCFA) . . . . . . . . . . . 1465 – two-way – (2QFA) . . . . . 1474 – with ancilla qubits (QFA-A) 1465 – with control language (QFA-CL) . . . . . . . . . . 1465 – Markov chain . . . . . . . . 1474 – measurement . . . 1459, 1463, 1465 – state . . . . . . . . . . . . . 1459 – superoperator . . . . . 1461, 1466 – system . . . . . . . . . . . . 1458 quasi-automatic function . . . . . . 940 quasi-convex subgroup . . . . . . . 881 quasi-Conway semiring . . . . . . . 63 quasi-geodesic . . . . . . . . . . . 883 quasi-identity . . . . . . . . . . . 742 quasi-isometry . . . . . . . . . 863, 875 quasi-variety . . . . . . . . . . 742, 753 queue-content decision diagram (QDD) . . . . . . . . . . . . 1154 quotient – of a language . . . . . . . . 58, 573 – of a series . . . . . . . . . . . . 70

—R— 1478 Rabin 1471 – index . . . . . . . . . . . . . . 219

xlvi

Index

– pair . . . . . . . . . . . . . . . 193 – recurrence condition . . . . 193, 268 – ’s basis theorem . . . . . . . 1042 – ’s theorem . . . . . . . . . . 1041 radius of convergence . . . . . . . 469 radix ordering . . . . . . . . . . . 960 Ramsey’s theorem . . . . . . . . . 215 random walk . . . . . . . . . 892, 1475 – self-similar – . . . . . . . . . . 892 rank – function . . . . . . . . . . . . 204 – of a D-class . . . . . . . . . 1024 – of a free profinite group . . 643, 645 rational – closure . . . . . . . . . . . . 45, 65 – composition . . . . . . . . . . . 89 – constraint . . . . . . . . . . . 865 – cross-section . . . . . . . . . . 877 – element . . . . . . . . . . . . 737 – expression . . . . . . . . . 41, 741 – linear – . . . . . . . . . . . . 27 – weighted – . . . . . . . . . . . 65 – formal power series . . . . . 1113 – function . . . . . . . . . . 84, 931 – identity . . . . . . . . . . . . . . 45 – language . . . . . . . . . 4, 41, 857 – monoid . . . . . . . . . . . . . . 73 – operation . . . . . . . . . . . . 737 – relation . . . . . . . . . . . . . . 86 – series . . . . . . . . . . 65, 120, 737 – subset . . . . . . . . . . 61, 72, 86 – transduction . . . . . . . . . . . 82 – weighted – expression . . . . . . 74 reachability relation . . . . . . . . 554 real – arithmetic . . . . . . . . . . 1205 – number automaton (RNA) . . . 972 – vector automaton (RVA) . . . . 972, 1204–1205, 1208–1209, 1211 realisability . . . . . . . . . . . 1171 – specification – MSO – . . . . . . . . . . . 1173

– PDL – . . . . . . . . . . . 1174 – sequential – . . . . . . . . 1172 realisation . . . . . . . . . . . . . . 87 recognisable – element . . . . . . . . . . . . 737 – language . . . . . . . . . 6, 43, 857 – tiling – . . . . . . . . . . . . 308 – monoid . . . . . . . . . . . . . . 37 – picture language . . . . . . . . 308 – series . . . . . . . . . . 66, 116, 737 – set . . . . . . . . . . . 62, 72, 949 – tree language . . . . . . . . . . 238 recognition . . . . . . 1388, 1391, 1400 recurrence condition . . . . . . . . 192 – Büchi – . . . . . . . . . . 193, 267 – co-Büchi – . . . . . . . . . 193, 267 – Muller – . . . . . . . . . . 193, 267 – parity – . . . . . . . . . . . 193, 268 – Rabin – . . . . . . . . . . 193, 268 – Streett – . . . . . . . . . . 193, 268 – transition – . . . . . . . . . . . 193 – weak – . . . . . . . . . . . . . 193 recursion scheme . . . . . . 1304, 1307 – Damm-safe – . . . . . . . . . 1335 – homogeneous – . . . . . . . 1334 – safe – . . . . . . . . . . . . 1334 recursive – language . 779, 782–789, 1481–1482 – Markov – chain (RMC) . . . . . . . . 1372 – decision process (MDP) . . 1377 recursively enumerable – language . . . 782–789, 1471, 1481 – pseudovariety – completely  -tame – . . . . . 634 –  -tame – . . . . . . . . . . . 634 reduced – automaton . . . . . . . . . . 1005 – expression . . . . . . . . . . 46, 66 – word . . . . . . . . . . . . . . 843 reduction – of automata . . . . . . . . . 1008 – relation . . . . . . . . . . . . . 701

Index

refinement of a partition . . . . . . 339 region . . . . . . . . . . . . . . 1267 – automaton . . . . . . . . . . 1269 – graph . . . . . . . . . . . . . 1269 regional – picture . . . . . . . . . . . . . 324 – -tile grammar (RTG) . . . . . . 324 regular – D-class . . . . . . . . . . . . 1024 – determinacy . . . . . . . . . . 221 – expression . . . . 42, 411, 459, 502 – 2D – . . . . . . . . 306, 320, 322 – defined by grammar . . . . . 463 – length . . . . . . . . . . . . 461 – size . . . . . . . . . . . . . 461 – language . . . . . . . . . . . 4, 459 – sequence . . . . . . . . . . . . 971 – tree . . . . . . . . . . . . . . . 287 – grammar . . . . . . . . . 246, 258 – language . . . . . . 243, 268, 805 – uncollapsible – expression . . . 462 – winning condition . . . . . . . 221 Reiterman’s theorem . . . . . . . . 582 relation – congruence . . . . . . . . . . 1050 – logically definable – . . . . . 1035 – regular – . . . . . . . . . . . 1036 – synchronous rational – . . . . 1058 relational – morphism . . . . . . . . . . . 632 – structure . . . . . . . . . . . 1073 representation . . . . . . . . . . . . 66 – linear – . . . . . . . . . . . . . 118 – normal – . . . . . . . . . . . . 950 – of a series . . . . . . . . . . . 118 – S- – . . . . . . . . . . . . . . 959 – U - – . . . . . . . . . . . . . . 950 reset – threshold . . . . . . . . . . . . 534 – word . . . . . . . . . . . . . . 525 residual automaton . . . . . . 369, 1005 residually finite group . . . . . . . 855 residuation . . . . . . . . . . . . . 755

xlvii

restarting automaton . . . . . . . . 427 retract . . . . . . . . . . . . . . . 644 return word . . . . . . . . . . . . 963 reversal automaton . . . . . . . . . 341 reverse polish length . . . . . . . . 416 reversible – automaton . . . . . . . . . 433, 442 – language . . . . . . . . . . . . 433 – Mealy automaton . . . . . 887, 897 rewriting – rule – isometric – . . . . . . . . . . 324 – system – confluent – . . . . . . . . . . 843 – length-reducing etc. . . . . 860 Ridout’s theorem . . . . . . . . . . 925 ring . . . . . . . . . . . . . . . . . 3 road colouring problem . . . 547–548 root . . . . . . . . . . . . . . . . 266 – of a vertex in a-graph . . . . . 550 rotating limited automaton . . . . . 429 Roth’s theorem . . . . . . . . . . . 923 run – DAG . . . . . . . . . . . . . . 203 – of a tree automaton . . . . . 267, 770 – of a TWA . . . . . . . . . . . 251 – of an alternating tree automaton 271 – of an NFHA . . . . . . . . . . 256 – of an NFTA . . . . . . . . . . 239 – tree – core of a – . . . . . . . . . . 208 – labelled compressed – . . . . 206 —S— safety properties . . . . . . . . . 1428 Safra’s construction . . . . . . . . 197 Sakarovitch conjecture . . . . . . . 862 Sakoda–Sipser problem . . . . . . 425 satisfiability . . . . . . . . 1424, 1448 saturated subset . . . . . . . . . . 339 Schreier graph . . . . 848, 861, 873, 891 Schützenberger’s theorem . . . . . 600

xlviii

Index

self dual . . . . . . . . . . . . . . 707 – class . . . . . . . . . . . . . . 709 – degree . . . . . . . . . . . . . 708 self-similarity biset . . . . . . . . . 895 semi-automaton . . . . . . . . . . 344 semidirect product . . . . . . . 604, 640 semigroup . . . . . . . . . . . . . . 3 – aperiodic – . . . . . . . . . . . 626 – automatic – . . . . . . . . . . 878 – completely regular – . . . . . . 639 – free – . . . . . . . . . . . . . . . 4 – inverse – . . . . . . . . . . . . 865 – left-zero – . . . . . . . . . . . 585 – locally finite – . . . . . . . . . 669 – morphism . . . . . . . . . . . 996 – of an automaton . . . . . . . . 885 – ordered – . . . . . . . . . 790, 1009 – projective profinite – . . . . . . 645 – simple – . . . . . . . . . . . . 790 – stable – . . . . . . . . . . . . . 586 – syntactic – . . . . . 575, 1009–1010 – torsion . . . . . . . . . . . . . 669 – transition – . . . . . . . . . . 1009 – zero . . . . . . . . . . . . . 1024 semilattice . . . . . . . . . . . . . 570 – order . . . . . . . . . . . . . . 746 semilinear sets . . . . . . . . . . . . 36 semimodule . . . . . . . . . . . . 125 seminearring . . . . . . . . . . . . 826 – induced – . . . . . . . . . . . 828 semiring . . . . . 3, 63, 115, 730, 1387 –  -semiring . . . . . . . . . . . 731 – left-handed inductive – . . . 751 – partial – . . . . . . . . . . . 731 – right-handed inductive – . . . 751 – symmetric inductive – . . . . 751 – Boolean – . . . . . . . 3, 730, 1388 – commutative – . . . . . . . . . 730 – complete – . . . . . . . . . . . 744 – complete iteration . . . . . . . 745 – completely idempotent – . . . . 744 – continuous – . . . . . . . . . . 746 – iteration . . . . . . . . . . . 747

– Conway – . . . . . . . 63, 72, 731 – countably complete – . . . . . 744 – countably complete iteration . . 745 – countably idempotent – . . . . 744 – dual – . . . . . . . . . . . 736, 743 – formal series . . . . . . . . 735, 743 – idempotent – . . . . . . . . . . 730 – iteration . . . . . . . . . . . . 741 – locally finite – . . . . . . . . . 134 – matrix – . . . . . . . . . . 733, 743 – max-plus – . . . . . . . . . . . 152 – min-plus – . . . . . . . . . 152, 669 – of binary relations . . . . . . . 730 – of languages . . . . . . . . . . 730 – ! -continuous – . . . . . . . . . 746 – ! -continuous iteration . . . . . 747 – ! -idempotent Conway – . . . . 731 – ! -idempotent iteration . . . . . 742 – ordered – . . . . . . . . . . . . 746 – partial – Conway – . . . . . . . . . . 731 – iteration – . . . . . . . . . . 741 – partial iterative – . . . . . . . . 732 – polynomial – . . . . . . . . . . 735 – quasi-Conway – . . . . . . . . . 63 – sum ordered – . . . . . . . . . 746 – symmetric partial iterative – . . 736 – tropical – . . . . . . . . . . 669, 730 – zero-sum free – . . . . . . . . 731 semistructured data . . . . . . . 1088 sentence . . . . . . . . . . . . 245, 592 sentential form . . . . . . . . . 319, 464 separator symbol . . . . . . . . . 1202 sequence – Beatty – . . . . . . . . . . . . . 94 – characteristic – . . . . . . . . . 954 – regular – . . . . . . . . . . . . 971 – Thue–Morse – . . . . . . . . . 914 sequential – automaton . . . . . . . . . . . 155 – series . . . . . . . . . . . . . . 155 – transducer . . . . . . . . . . . . 81 serialisation . . . . . . . . 1192, 1203

Index

series . . . . . . . . . . . . . . . . . 64 – characteristic – . . . . . . . . . 735 – coefficient in a – . . . . . . . . . 64 – constant term . . . . . . . . . . . 65 – formal – . . . . . . . . . . 735, 743 – limited – . . . . . . . . . . . . 157 – locally finite – . . . . . . . . . 119 – max-plus – . . . . . . . . . . . 154 – polynomial – . . . . . . . . . . . 64 – proper – . . . . . . . . . . 65, 735 – quotient . . . . . . . . . . . . . 70 – rational – . . . . . . . . 65, 120, 737 – rational formal power – . . . 1113 – recognisable – . . . . . 66, 116, 737 – representation . . . . . . . . . 118 – sequential – . . . . . . . . . . 155 – support . . . . . . 64, 116, 138, 735 – unambiguous – . . . . . . . . . 155 set . . . . . . . . . . . . . . . . . 414 – analytic – . . . . . . . . . . . 702 – Borel – . . . . . . . . . . . . . 698 – clopen – . . . . . . . . . . . . 620 – complete . . . . . . . . . . . . 701 – compressible . . . . . . . . . . 550 – constraints . . . . . . . . . . . 770 – difference . . . . . . . . . . 1190 – exceptional – . . . . . . . . . . 471 – Higman–Haines – . . . . . . . 435 – hyper-arithmetical – . . . . . . 793 – incompressible- . . . . . . . . 550 – initialisable – . . . . . . . . . . 714 – Julia – . . . . . . . . . . . . . 897 – k -recognisable – . . . 947, 956, 972 – locally periodic – . . . . . . . . 975 – multidimensional automatic – . 918 – periodic inside a subset of Nd . 974 – projection . . . . . . . . . . 1199 – recognisable – . . . . . . . . . 949 – S -recognisable – . . . . . . . . 959 – substitutive – . . . . . . . . . . 959 – syndetic – . . . . . . . . . . . 953 – test – . . . . . . . . . . . . . . . 97 – ultimately periodic – . . . . . . 948

xlix

– U -recognisable – . . . . . . . . 950 – vanishing – . . . . . . . . . . . 467 – well-ordered – . . . . . . . . . 939 shift – conjugacy . . . . . . . . . . . 990 – edge – . . . . . . . . . . . . . 989 – even – . . . . . . . . . . . . . 989 – full – . . . . . . . . . . . . . . 988 – golden mean – . . . . . . . . . 989 – higher block . . . . . . . . . . 990 – sofic – . . . . . . . . . . . 642, 989 – space . . . . . . . . . . . . 642, 988 – almost finite type – . . . . 1022 – entropy . . . . . . . . . 643, 991 – finite type – . . . . . . . . . 988 – flow equivalent – . . . . . . 996 – forbidden factor . . . . . . . 988 – in-splitting – . . . . . . . . 1011 – irreducible – . . . . . . 642, 1007 – minimal – . . . . . . . . . . 642 – morphism . . . . . . . . . . 990 – periodic – . . . . . . . . . . 642 – recognised by an automaton . 998 – transformation . . . . . . . 973, 988 – see also subshift  -algebra . . . . . . . . . . . . . 617 †1 -sentence . . . . . . . . . . . . 573 sign – header . . . . . . . . . 1192, 1203 – symbol . . . . . . . . . 1191, 1203 signature – algebraic – . . . . . . . . . . . 617 – computable implicit – . . . . . 633 –  implicit – . . . . . . . . . . . 637 – of a state . . . . . . . . . . 355, 357 – partial – . . . . . . . . . . . . 356 – tree – . . . . . . . . . . . . . . 356 simple – automaton . . . . . . . . . . . 358 – semigroup . . . . . . . . . . . 790 – transducer . . . . . . . . . . . . 81 simulation – delayed – of an ! -automaton . . 226

l

Index

– direct – of an ! -automaton . . . 225 – forward – simulation of an ! -automaton . . . . . . . . . . 225 – game for an ! -automaton . . . 225 – of limited automata . . . . . . 428 – of multi-head finite automata . . 426 – of restarting automata . . . . . 427 – of two-way finite automata . . . 425 – relation for ! -automata . . . . 225 singularity – analysis . . . . . . . . . . . . 469 – dominant – . . . . . . . . . . . 470 Skolem–Mahler–Lech theorem . . 929 sliding block – code . . . . . . . . . . . . . . 973 – map . . . . . . . . . . . . . . 990 small cancellation . . . . . . . 873, 879 Snake-DREC . . . . . . . . . . . 317 sofic shift . . . . . . . . . . . . . 989 space – Baire – . . . . . . . . . . . . . 698 – Boolean – . . . . . . . . . . . 626 – Cantor . . . . . . . . . . . . . 698 – compact – . . . . . . . . . . . 620 – hyperbolic – . . . . . . . . . . 881 – metric – . . . . . . . . . . . . 619 – Polish – . . . . . . . . . . . . 698 – uniform – . . . . . . . . 619–621 spanning tree . . . . . . . . . . . . 847 spectral vector . . . . . . . . . . . 160 split . . . . . . . . . . . . . . . . 659 – forward Ramsey – . . . . . . . 682 – normalised – . . . . . . . . . . 659 – of an automaton . . . . . . . . 724 – Ramsey – . . . . . . . . . . . 659 spoiler player . . . . . . . . . . . 226 stability relation . . . . . . . . . . 548 stable – matrix . . . . . . . . . . . . . 670 – pair . . . . . . . . . . . . . . . 548 – preorder . . . . . . . . . . . 1009 – subset . . . . . . . . . . . . . . 70 Stalling s’ construction . . 845–849, 864

stamp . . . . . . . . . . . . . . . 582 – ordered – . . . . . . . . . . . . 582 – quasi-aperiodic – . . . . . . . . 586 star – functorial- . . . . . . . . . . . 742 – height . . . . . . . . . . . 416, 440 – of a rational language . . . . . 72 – of an expression . . . . . . . . 53 – preserving homomorphism . 418 – problem . . . . . . . . . . 53, 72 – (restricted) . . . . . . . . . 417 – normal form – strong – . . . . . . . . . 422, 482 – operation . . . . . . . . . . . . . 36 starable element . . . . . . . . . . . 64 star-free language . 420, 433, 503, 600, 1080 star-normal form – of an expression . . . . . 57, 69, 73 state . . . . . . . . . . . . . 5, 81, 989 – accessible – . . . . . . . . . . . 5 – Büchi – . . . . . . . . . . . . 193 – coaccessible – . . . . . . . . . . 5 – co-Büchi – . . . . . . . . . . . 193 – complexity . 412, 1468–1469, 1477, 1483 – confluent – . . . . . . . . . . . 356 – final – . . . . . . . . . . . . 5, 81 – fusion . . . . . . . . . . . . . 355 – future . . . . . . . . . . . . . 339 – height . . . . . . . . . . . . . 357 – initial – . . . . . . . . . . . . 5, 81 – merge . . . . . . . . . . . . . 355 – mergeable – . . . . . . . . . . 355 – partial signature . . . . . . . . 356 – past . . . . . . . . . . . . . . 339 – separated – . . . . . . . . . . . 340 – signature . . . . . . . . . . 355, 357 – weak – . . . . . . . . . . . . . 193 stochastic – language . . 1463, 1471, 1473, 1477 – matrix . . . . . . . . . 1462, 1466

Index

strategy . . . . . . . . . . . . . . 274 – positional – . . . . . . . . 222, 275 – winning – . . . . . . . . . . . 274 Streett pair . . . . . . . . . . . . . 193 strictly connected component . . . 714 strongly connected component . 152, 554 – minimal – . . . . . . . . . . 1007 structure . . . . . . . . . . . . . 1034 – automatic – . . . . . . 1036, 1038 – Boolean algebra . . . . 1040, 1052 – Büchi-automatic – . . . . . . 1038 – domain . . . . . . . . . . . . 1034 – elementary substructure . . . 1060 – free – group . . . . . . . . . . . 1052 – monoid . . . . . . . . . . 1063 – integral domain . . . . 1052, 1063 – isomorphic . . . . . . . . . . 1034 – ordinal – . . 1052, 1055–1056, 1063 – power – . . . . . . . . . . . 1044 – quotient . . 1050–1051, 1063–1064 – Rabin-automatic – . . . 1036, 1038 – random graph 1052, 1054, 1063–1064 – real arithmetic – . . . . 1031, 1064 – relational – . . . . . . . . . . 1073 – signature . . . . . . . . . . . 1034 – universal automatic – . 1039, 1045 – word – . . . . . . . . . . . . 1073 subgroup – finitely generated – . . . . . . . 841 – fixed point – . . . . . . . . . . 856 – index of a – . . . . . . . . . . 842 – intersection . . . . . . . . . . . 851 – normal – . . . . . . . . . . . . 850 – [p -]pure . . . . . . . . . . . . 853 – quasi-convex – . . . . . . . . . 881 subminimal element . . . . . . . . 824 subset – automaton . . . . . . . . . . . 530 – disjunctive rational – . . . . . . 862 – pointlike – . . . . . . . . . . . 638 – recognisable – . . . . . . . . . 619 – of topological algebras . . . . 622

li

– recognised by homomorphism . 619 subshift . . . . . . . . . . . . 642, 973 – entropy . . . . . . . . . . . . . 643 – generated – . . . . . . . . . . . 973 – irreducible – . . . . . . . . . . 642 – minimal – . . . . . . . . . . . 642 – periodic – . . . . . . . . . . . 642 – sofic – . . . . . . . . . . . . . 642 – see also shift substitution . . . . . . . 529, 642, 954 – block – . . . . . . . . . . . . . 990 – erasing – . . . . . . . . . . . . 963 – good – . . . . . . . . . . . . . 968 – growing – . . . . . . . . . . . 966 – irreducible – . . . . . . . . . . 961 – of constant length . . . . . . . 529 – !˛ - – . . . . . . . . . . . . . . 970 – periodic – . . . . . . . . . . . 644 – primitive – . . . . . . . . . 642, 961 – projection . . . . . . . . . . . 963 – proper – . . . . . . . . . . . . 644 – Prouhet–Thue–Morse – . . . . 644 – sub- – . . . . . . . . . . . . . 967 subtree . . . . . . . . . . . . . . . 238 subword . . . . . . . . . . . . . . 571 successor . . . . . . . . . . . . . . 957 suffix-closed . . . . . . . . . . . . 376 sum . . . . . . . . . . . . . . . . . 3 – order . . . . . . . . . . . . . . 746 superposition . . . . . 1459, 1463, 1478 supremum of a countable sequence 712 symbol – contraction – . . . . . . . . . . 996 – expansion . . . . . . . . . . . 996 – terminal – . . . . . . . . . 246, 258 symbolic – conjugacy of automata . . . . 1011 – grammar . . . . . . . . . . . . 319 – representation . . . . . . . . 1189 synchronised automaton . . . . . 1005 synchronising – ratio . . . . . . . . . . . . . . 556

lii

– word . . . . . . . . – of a code . . . . . syndetic set . . . . . . syntactic – congruence . . . . . – of an ! -language – disambiguation . . . – graph . . . . . . . . – monoid . . . . . . . – morphism . . . . . – ! -semigroup . . . . – ranked tree algebra . – semigroup . . . . .

Index

. . . . . 1005 – with invariants – . . . . . . . 1263 . . . . . . 527 topological – algebra . . . . . . . . . . . . . 621 . . . . . . 953 – generator . . . . . . . . . . . 621 – recognisable subset . . . . . 622 . 21, 618, 1009 . . . . . . 214 – residually in a class . . . . . 622 . . . . . 1385 – self-free – . . . . . . . . . . 628 – class . . . . . . . . . . . . . . 701 . . . . . 1024 . . . . 20, 861 – space – compact – . . . . . . . . . . 620 . . . . . . . 20 – totally disconnected – . . . . 620 . . . . . . 721 – zero-dimensional – . . . . . 620 . . . . . . 805 575, 1009–1010 topology – compact-open – . . . . . . . . 627 —T— – pointwise convergence – . . . . 627 tame – profinite – . . . . . . . . . . . 622 – graph . . . . . . . . . . . . . . 637 – pro-Q – . . . . . . . . . . . . . 622 – pseudovariety . . . . . . . . . 634 – pro-V . . . . . . . . . . . . . 855 TC0 . . . . . . . . . . . . . . . . 497 transducer – deterministic – . . . . . . . . . . 81 temporal logic . . . . . . . . . . . 228 term . . . . . . . . . . . . . . . . 804 – normalised . . . . . . . . . . . . 83 – sequential – . . . . . . . . . . . 81 – applicative – . . . . . . . . . 1303 – simple – . . . . . . . . . . . . . 81 – ordering . . . . . . . . . . . . 467 test set . . . . . . . . . . . . . . . . 97 – unambiguous – . . . . . . . . . . 86 – weighted finite – (WFT) . . . 1135 theory . . . . . . . . . . . . . . 1035 – decidable – . 1032, 1035, 1037, 1039, transduction 1063 – finite-state – . . . . . . . . . . . 82 Thompson construction . . . . . . 422 – finite-valued – . . . . . . . . . . 84 Thue–Morse – rational – . . . . . . . . . . . . . 82 – sequence . . . . . . . . . . . . 914 transformation between models . . 418 – word . . . . . . . . . . . . . . 955 transition . . . . . . . . . . . . . 5, 81 tile – consecutive – . . . . . . . . . . 5 – matrix . . . . . . . . . . . 43, 999 – Wang – . . . . . . . . . . . . . 92 tiling . . . . . . . . . . . . . . . . . 92 – monoid . . . . . . . . . . . 844, 853 – aperiodic – . . . . . . . . . . . . 93 – recurrence condition . . . . . . 193 – periodic – . . . . . . . . . . . . 93 – semigroup . . . . . . . . . . 1009 – recognisability . . . . . . . . . 314 – spontaneous – . . . . . . . . . . 54 – system . . . . . . . . . . 308, 1090 – system . . . . . . . . . 1156, 1300 – deterministic – . . . . . . . . 316 transitive closure – logic . . . . . . . . . . . . . 1079 – unambiguous – . . . . . . . 309 timed automaton . . . . . . . . . 1263 – operator . . . . . . . . . . . 1079 – deterministic – . . . . . . . . 1288 tree . . . . . . . . . . . . . . 266, 1300 – diagonal-free – . . . . . . . . 1263 – automaton . . . . . . . . . . . 804

Index

– accepting run . . . . . . . . 268 – nondeterministic – (NFTA) . 239 – unambiguous – . . . . . . . 284 – bounded width . . . . . . . . 1091 – convolution – . . . . . 1036, 1038 – decomposition . . . . . . . . 1091 – domain . . . . . . . . . . . 237, 266 – factorisation – . . . . . . . . . 654 – height . . . . . . . . . . . . . 238 – history – . . . . . . . . . . . . 210 – language – NFTA-recognisable . . . . . 239 – recognisable – . . . . . . 238, 805 – regular – . . . . . . 243, 268, 805 – model property . . . . . . . . 1094 – parse – . . . . . . . . . . . . 1384 – prefix game . . . . . . . . . . 809 – ranked – . . . . . . . . . . . . 804 – regular – . . . . . . . . . . . . 287 – grammar . . . . . . . . . 246, 258 – signature . . . . . . . . . . . . 356 – spanning – . . . . . . . . . . . 847 – Sturmian – . . . . . . . . . . . 353 – unranked – . . . . . . . . . . . 819 – walking automaton (TWA) . . . 250 – width . . . . . . . . . . . . . 1091 – wreath product . . . . . . . . . 832 – see also quadtree – see also subtree trie – representation . . . . . . . . . 473 two-way – automaton . . . . . 520, 1462, 1474 – simulation . . . . . . . . . . 425 – classical head . . . . . . . . 1474 – quantum finite automaton (2QFA) . . . . . . . . . . . 1474 – quantum head . . . . . . . . 1474 type . . . . . . . . . . . . . . . 1302 – homogeneous – . . . . . . . 1334 type II conjecture . . . . . . . 636, 638

liii

—U— U-equation . . . . . . . . . . . . . 631

ultimately periodic – array . . . . . . . . . . . . . . 974 – set . . . . . . . . . . . . . . . 948 – word . . . . . . . . . . . . 190, 954 unambiguous – automaton . . . . . . . . . . . 155 – ! -automaton . . . . . . . . . . 208 – series . . . . . . . . . . . . . . 155 – tiling system . . . . . . . . . . 309 – transducer . . . . . . . . . . . . 86 – tree automaton . . . . . . . . . 284 unary – algebra – heterotypical identity . . . . 530 – homotypical identity . . . . . 530 – language . . 419–420, 425–426, 428, 437, 440, 1463, 1469–1471, 1473–1474, 1483 – operation problem . . . . 430, 435 – term . . . . . . . . . . . . . . 529 undecidability . . . . . . . 1472, 1480 uniform – algebra . . . . . . . . . . . . . 622 – morphism . . . . . . . 80, 915, 954 – space . . . . . . . . . . . 619–621 – winning strategy . . . . . . . . 274 uniformity – basis . . . . . . . . . . . . . . 619 – discrete – . . . . . . . . . . . . 622 – DLOGTIME . . . . . . . . . . 496 – polynomial-time . . . . . . . . 496 – product – . . . . . . . . . . . . 621 – profinite – . . . . . . . . . . . 622 – quotient – . . . . . . . . . . . 620 – transitive – . . . . . . . . . . . 619 union . . . . . . . . . . . . . . . 1190 unitary transformation . . . 1459, 1465 universal – automaton . . . . . . . . . . . 781 – ! -automaton . . . . . . . . . . 222

Index

liv

– cover . . . . . . – witness language universality problem unranked tree . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . 872 . . 432 . 1284 . . 819

—V— valid accepted language computation (VALC) . . . . . . . . . . . . 776 valuation . . . . . . . . . . . . . 1262 vanishing set . . . . . . . . . . . . 467 variable – free – . . . . . . . . . . . . . . 957 variety . . . . . . . . . . 617, 730, 756 – C-positive . . . . . . . . . . . 581 – generated by – a class of algebras . . . . . . 617 – a set . . . . . . . . . . . . . 599 – of languages . . . . . . . . . . 573 – positive – . . . . . . . . . . . 576 vertex . . . . . . . . . . . . . . . 989 – initial – . . . . . . . . . . . . . 989 – ramified – . . . . . . . . . . . 551 – terminal – . . . . . . . . . . . 989 visualisation . . . . . . . . . . . . 165 Vorobets–Mariya–Yaroslav theorem 901 —W— Wadge – class . . . . . . . . . . – degree . . . . . . . . . – game . . . . . . . . . . – hierarchy . . . . . . . . – order . . . . . . . . . . – rank . . . . . . . . . . Wadge–Borel determinacy . Wang – system . . . . . . . . . – tile . . . . . . . . . . . weak – alternating ! -automaton – automaton . . . . . . . – conjugation . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

701 708 705 700 701 709 706

. . . . 310 . . . . 309 . . . . 224 . . . 1205 . . . . 636

– monadic second-order logic (WMSO) . . . . . . . . . . . . 290 – recurrence condition . . . . . . 193 – state . . . . . . . . . . . . . . 193 – tameness . . . . . . . . . . . . 634 weight . . . . . . . . . . . 1386–1387 weighted – automaton . . . . . . 65, 115, 1477 – context-free grammar (WCFG) 1388 – finite automaton (WAF) . . . 1112 – average preserving – . . . . 1119 – faithful – . . . . . . . . . . 1119 – minimal – . . . . . . . . . 1122 – strongly continuous – . . . 1118 – finite transducer (WFT) . . . 1135 – monadic second-order logic . . 129 – pushdown automaton (WPDA) 1389 – rational expression . . . . . . 65, 74 – relation . . . . . . . . . . . . 1135 well quasi-order (wqo) . . . . 789–792 well-structured transition systems (WSTS) . . . . . . . . . . . 1156 winning – condition . . . . . . . . . . . . 274 – region . . . . . . . . . . . 221, 274 – strategy . . . . . . . . . . . . 274 wire . . . . . . . . . . . . . . . . 496 word . . . . . . . . . . . . . . . . . 4 – accepted – . . . . . . . . . . . . 6 – almost periodic – . . . . . . . . 978 – automatic – . . . . . . . . . . 953 – characteristic – . . . . . . . . . 954 – context . . . . . . . . . . . . 1009 – cyclically reduced – . . . . . . 843 – empty – . . . . . . . . . . . . . 4 – equation . . . . . . . . . . . . 789 – length . . . . . . . . . . . . . . 4 – Lyndon – . . . . . . . . . . . . 361 – maximal growth – . . . . . . . 966 – metric . . . . . . . . . . . . . 874 – !˛ -substitutive – . . . . . . . . 970 – periodic – . . . . . . . . . . . 954 – problem 633, 842, 873, 880, 883, 1477

Index

– generalised – . . . . – in automata groups . – over a monoid . . . – submonoid . . . . . – reduced – . . . . . . – reset – . . . . . . . . – return – . . . . . . . – structure . . . . . . . – substitutive – . . . . . – synchronising – . . . – of a code . . . . . . – Thue–Morse – . . . . – Tribonacci – . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. 842, 883 . . . 890 . . . 499 862–863 . . . 843 . . . 525 . . . 963 . . 1073 . 954, 962 . . 1005 . . . 527 . . . 955 . . . 962

– ultimately periodic – – valid – . . . . . . . – see also subword wreath product . . . . . – for trees . . . . . .

lv

. . . . 190, 954 . . . . . 1158 . 605, 887–888 . . . . . . 832

—X— XML . . . . . . . . . . . . . . . 1088 – Schema . . . . . . . . . . . . 259 —Z— zeta function . . . . . . . . . . . . 992 Zielonka automaton . . . . 1176, 1250

Handbook of Automata Theory Volume II. Automata in Mathematics and Selected Applications

Automata theory is a subject of study at the crossroads of mathematics, theoretical computer science, and applications. In its core it deals with abstract models of systems whose behaviour is based on transitions between states, and it develops methods for the description, classification, analysis, and design of such systems. The Handbook of Automata Theory gives a comprehensive overview of current research in automata theory, and is aimed at a broad readership of researchers and graduate students in mathematics and computer science. Volume I is divided into three parts. The first part presents various types of automata: automata on words, on infinite words, on finite and infinite trees, weighted and maxplus automata, transducers, and two-dimensional models. Complexity aspects are discussed in the second part. Algebraic and topological aspects of automata theory are covered in the third part. Volume II consists of two parts. The first part is dedicated to applications of automata in mathematics: group theory, number theory, symbolic dynamics, logic, and real functions. The second part presents a series of further applications of automata theory such as message-passing systems, symbolic methods, synthesis, timed automata, verification of higher-order programs, analysis of probabilistic processes, natural language processing, formal verification of programs and quantum computing. The two volumes comprise a total of thirty-nine chapters, with extensive references and individual tables of contents for each one, as well as a detailed subject index.

https://ems.press ISBN Set 978-3-98547-006-8 ISBN Vol. II 978-3-98547-003-7