The concept of constraint is widely used in linguistics, computer science, and psychology. However, its implementation v
651 125 2MB
English Pages 325 [326] Year 2014
C ! "#$% & ' $" &'() (*"++,- . ! / !& 0"#$%
. ! / 1(! &
/& & !&& & ) ) & ! )1 2'(3$#45$6%%786*#9"6"2'(3$745:;86$6%%786*#9"6:
C ONTENTS
List of Illustrations
xi
List of Tables
xiii
Preface
I
Foundations and Overview
1
Constraints in (Computational) Linguistics Philippe Blache, Jørgen Villadsen 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Constraints and programming . . . . . . . . . . . . . . . . 1.3 Constraints, linguistics and parsing . . . . . . . . . . . . . 1.3.1 Constraints on trees: active constraints and parsing 1.3.2 GPSG: the separation of information . . . . . . . . 1.3.3 HPSG: the notion of satisfaction . . . . . . . . . . 1.3.4 OT: relaxing constraints . . . . . . . . . . . . . . 1.3.5 PG: constraints as syntactic structure . . . . . . . 1.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
xv
1
. . . . . . . . . .
3 3 4 6 8 10 11 13 15 17 18
Constraints and Logic Programming in Grammars and Language Analysis Henning Christiansen 21 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 21
C ONSTRAINTS AND L ANGUAGE
2.2
Background . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Different Notions of Constraints in Grammars and Logic Programming . . . . . . . . . . . . . . . . . 2.2.2 Constraint Handling Rules, CHR . . . . . . . . . . . 2.3 Abductive Reasoning in Logic Programming with Constraints 2.4 Using CHR with Definite Clause Grammars for Discourse Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 CHR Grammars . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4
Model-theoretic Syntax: Property Grammars, Status and Directions Philippe Blache, Jean-Philippe Prost 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Model Theory for Modelling Natural Language . . . . . . . 3.3 The Constructive Perspective: A Constraint Network for Representing and Processing the Linguistic Structure . . . . 3.3.1 Generative-Enumerative vs. Model-Theoretic Syntax 3.3.2 Generativity and hierarchical structures . . . . . . . 3.3.3 The Property Grammar Framework . . . . . . . . . 3.4 The Descriptive Perspective: A Constraint Network for Completing the Linguistic Structure . . . . . . . . . . . . . . . . 3.5 Grammaticality Judgement . . . . . . . . . . . . . . . . . . 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22 22 23 25 28 30 33 33
37 37 38 40 41 43 46 49 52 55 56
Constraints in Optimality Theory: Personal Pronouns and Pointing Helen de Hoop 61 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2 First and second versus third person pronouns . . . . . . . . 64 4.3 Incremental optimisation of anaphoric third person pronouns 70 4.4 OT semantic analysis of personal pronouns and pointing . . 72 4.5 OT syntactic analysis of personal pronouns and pointing . . 81 4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 vi
C ONTENTS
II Recent Advances in Constraints and Language Processing 91 5
Constraint-driven Grammar Description Benoît Crabbé, Denys Duchier, Yannick Parmentier, Simon Petitjean 93 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.2 Semi-automatic production of tree-adjoining grammars . . . 97 5.2.1 Lexical rules . . . . . . . . . . . . . . . . . . . . . 97 5.2.2 Description languages . . . . . . . . . . . . . . . . 98 5.3 eXtensible MetaGrammar: constraint-based grammar description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.3.1 A language for describing tree fragments . . . . . . 100 5.3.2 A language for combining tree fragments . . . . . . 106 5.3.3 Towards a library of linguistic principles . . . . . . . 107 5.4 Cross framework grammar design using metagrammars . . . 109 5.4.1 Producing a lexical-functional grammar using a metagrammar . . . . . . . . . . . . . . . . . . . . . . . 111 5.4.2 Producing a property grammar using a metagrammar 114 5.4.3 Towards extensible metagrammars . . . . . . . . . . 117 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6
Extending the Constraint Satisfaction for better Language Processing Kilian A. Foth, Patrick McCrae, Wolfgang Menzel 123 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6.2 The Constraint Satisfaction Problem . . . . . . . . . . . . . 126 6.3 NLP formalisms and the CSP . . . . . . . . . . . . . . . . . 128 6.3.1 Constraints as value subsets . . . . . . . . . . . . . 128 6.3.2 Hard and soft Constraints . . . . . . . . . . . . . . . 130 6.3.3 Uniform and free-form Constraints . . . . . . . . . 131 6.3.4 Axiomatic and empirical grammars . . . . . . . . . 132 6.4 Dependency Grammar Modelling with Locally-Scoped Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.5 Expressivity of WCDG . . . . . . . . . . . . . . . . . . . . 134 6.6 Extending Local Constraints to Global Phenomena . . . . . 135 6.6.1 Supra-local Constraints . . . . . . . . . . . . . . . . 135 6.6.2 Recursive Tree Traversal . . . . . . . . . . . . . . . 138 6.6.3 Localised Ancillary Constraints . . . . . . . . . . . 139 6.6.4 Cascading and Recursive Ancillary Constraints . . . 141 vii
C ONSTRAINTS AND L ANGUAGE
6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 145 6.8 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 7
8
On Semantic Properties in Constraint-Based Grammars Verónica Dahl, Baohua Gu, J. Emilio Miralles 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 7.2 Background on Property Grammars . . . . . . . . . . 7.3 Semantic Property Grammars . . . . . . . . . . . . . . 7.4 Our Parsing Methodology . . . . . . . . . . . . . . . . 7.4.1 Background: HyProlog . . . . . . . . . . . . . 7.4.2 A Hyprolog Parser for Property Grammars . . 7.5 Related Work . . . . . . . . . . . . . . . . . . . . . . 7.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
149 149 151 152 154 154 157 159 160 161 164
Multi-dimensional Type Theory: Rules, Categories and Combinators for Syntax and Semantics Jørgen Villadsen 167 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 167 8.1.1 Background . . . . . . . . . . . . . . . . . . . . . . 168 8.1.2 Arguments . . . . . . . . . . . . . . . . . . . . . . 168 8.1.3 Formulas . . . . . . . . . . . . . . . . . . . . . . . 169 8.1.4 Strings . . . . . . . . . . . . . . . . . . . . . . . . 169 8.1.5 Combinators . . . . . . . . . . . . . . . . . . . . . 170 8.1.6 Type Language and Type Interpretation . . . . . . . 171 8.1.7 Theory of Inhabitation and Theory of Formation . . 172 8.1.8 Nabla . . . . . . . . . . . . . . . . . . . . . . . . . 173 8.2 The Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 8.2.1 Comments . . . . . . . . . . . . . . . . . . . . . . 175 8.3 The Categories . . . . . . . . . . . . . . . . . . . . . . . . 176 8.3.1 Comments . . . . . . . . . . . . . . . . . . . . . . 177 8.4 The Combinators . . . . . . . . . . . . . . . . . . . . . . . 177 8.4.1 Comments . . . . . . . . . . . . . . . . . . . . . . 178 8.5 Examples: Syntax and Semantics . . . . . . . . . . . . . . . 180 8.5.1 Step-by-Step Formula Extraction . . . . . . . . . . 181 8.5.2 Further Examples . . . . . . . . . . . . . . . . . . . 183 8.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 viii
C ONTENTS
9
Constraint-based Sign Language Processing Annelies Braffort, Michael Filhol 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 9.2 Linguistic description of Sign Languages . . . . . . 9.2.1 Phonology . . . . . . . . . . . . . . . . . . 9.2.2 Phonetics . . . . . . . . . . . . . . . . . . . 9.2.3 Lexicon . . . . . . . . . . . . . . . . . . . . 9.2.4 Lexicon, syntax... and linguistic levels . . . . 9.3 Language models . . . . . . . . . . . . . . . . . . . 9.3.1 Generative/categorical grammars . . . . . . . 9.3.2 Machine learning approaches . . . . . . . . 9.3.3 SL-specific approaches . . . . . . . . . . . . 9.4 AZee . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 KAZOO . . . . . . . . . . . . . . . . . . . . . . . . 9.5.1 SL Generation Module (SL Gene) . . . . . . 9.5.2 Virtual Signer Animation Module (VS Anim) 9.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . . .
10 Geometric Logics Hedda R. Schmidtke 10.1 Introduction . . . . . . . . . . . . . . . . 10.2 Geometric Semantics . . . . . . . . . . . 10.3 Expressiveness of Context Logic . . . . . 10.3.1 Binary Relations in Context Logic 10.3.2 Perception and Reasoning . . . . 10.3.3 Changing Perspectives . . . . . . 10.4 Conclusions . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
191 191 192 193 193 194 196 197 197 198 199 204 210 211 212 214 214
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
219 220 221 226 227 228 229 231 232
III Applications
235
11 Constraint-based Word Segmentation for Chinese Henning Christiansen, Bo Li 11.1 Introduction . . . . . . . . . . . . . . . . . . . . 11.2 Background and Related Work . . . . . . . . . . 11.2.1 The Chinese Word Segmentation Problem 11.2.2 CHR Grammars . . . . . . . . . . . . . 11.3 A Lexicon in a CHR Grammar . . . . . . . . . . ix
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
237 237 238 238 239 242
C ONSTRAINTS AND L ANGUAGE
11.4 Maximum Matching . . . . . . 11.5 Maximum Ambiguous Segments 11.6 Discussion . . . . . . . . . . . . 11.7 Conclusion . . . . . . . . . . . Bibliography . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
243 245 247 248 249
12 Supertagging with Constraints Guillaume Bonfante, Bruno Guillaume, Mathieu Morey, Guy Perrier253 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 253 12.2 The Companionship Principle in Brief . . . . . . . . . . . . 257 12.2.1 Parsing with an AB-grammar . . . . . . . . . . . . 257 12.2.2 Filtering lexical taggings with the Companionship Principle . . . . . . . . . . . . . . . . . . . . . . . 258 12.2.3 Implementation with Automata . . . . . . . . . . . 261 12.3 Lexicalised Grammars . . . . . . . . . . . . . . . . . . . . 261 12.4 The Companionship Principle . . . . . . . . . . . . . . . . 268 12.4.1 The statement of the Companionship Principle . . . 268 12.4.2 The “Companionship Principle” language . . . . . . 269 12.4.3 Generalisation of the Companionship Principle to abstraction . . . . . . . . . . . . . . . . . . . . . . 270 12.4.4 The Undirected Companionship Principle . . . . . . 272 12.4.5 The Affine and Linear Companionship Principles . . 273 12.5 Implementation of the Companionship Principle with automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 12.5.1 Automaton to represent sets of lexical taggings . . . 276 12.5.2 Implementation of the Companionship Principle . . 277 12.5.3 Approximation: the Rough Companionship Principle (RCP) . . . . . . . . . . . . . . . . . . . . . . . 280 12.5.4 Affine and Linear Companionship Principle (ACP) . 281 12.5.5 Implementation of Lexicalised grammars . . . . . . 283 12.6 Application to Interaction Grammars . . . . . . . . . . . . . 284 12.6.1 Interaction Grammars . . . . . . . . . . . . . . . . 284 12.6.2 Companionship Principle for IG . . . . . . . . . . . 287 12.7 Application to Lexicalised Tree Adjoining Grammars (LTAG)290 12.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 Contributors
299
Index
305
x
L IST OF I LLUSTRATIONS
1.1
Example of a feature-structure satisfying an input description 13
3.1
Example of parse deemed ungrammatical through checking . . . . . . . . . . . . . . . . . . . . . . . Tree model for a NP constituent . . . . . . . . . . Tree model for a S constituent . . . . . . . . . . .
3.2 3.3 4.1 4.2 4.3 4.4 4.5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9
model . . . . . 51 . . . . . 53 . . . . . 54
Pointing to two imaginary addressees [you]1 and [you]2 (Zwets 2014) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pointing to introduce a discourse referent (Zwets 2014) . . . Pointing at drawing while uttering sentence (4.4.3) . . . . . Pointing to an imaginary addressee while uttering sentence (4.4.4) (Zwets 2014) . . . . . . . . . . . . . . . . . . . . . Pointing at paper while uttering a sentence (4.4.5) (Zwets 2014) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tree rewriting in TAG . . . . . . . . . . . . . . . . . . . TAG tree templates . . . . . . . . . . . . . . . . . . . . Structural redundancy in TAG . . . . . . . . . . . . . . . Combining elementary tree fragments . . . . . . . . . . Fragments described using the language L(0) / . . . . . . Models for the combination of fragments of Figure 5.5 . Fragments described using the language L(0) / (continued) Models for the combination of fragments of Figure 5.7 . Fragments described using the language L(g names) . . .
. . . . . . . . .
. . . . . . . . .
73 74 76 78 79
95 96 96 99 102 102 103 103 104
C ONSTRAINTS AND L ANGUAGE
5.10 5.11 5.12 5.13
Description of double PP complementation . . . . . . . . Combination scheme for L(colours) . . . . . . . . . . . . Fragments described using the language L(colours) . . . . LFG grammar and c-and f-structures for the sentence “John loves Mary” . . . . . . . . . . . . . . . . . . . . . . . . . 5.14 Fragment of a PG for French (basic verbal constructions) .
. 104 . 105 . 106 . 112 . 115
6.1 6.2 6.3
AGENT assignment in German perfect tense active sentences 141 AGENT assignment in German passives . . . . . . . . . . . 143 AGENT assignment based on German active/passive detection145
7.1 7.2
New Category Inference . . . . . . . . . . . . . . . . . . . 164 A sample ontology of biological concepts that have IS-A relation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
9.1
The sign BALL in its citation form (a) (source: Dictionnaire bilingue LSF, Editions IVT), and the parameters that can vary depending on the context (b). . . . . . . . . . . . . . . 195 The sign CAR PARK (source: Dictionnaire bilingue LSF, Editions IVT) . . . . . . . . . . . . . . . . . . . . . . . . . 196 The four manual units composing S1 translated in LSF . . . 201 Signed example sentence S1 modelled with P/C and null nodes201 Rule time line illustrations (parameter arguments are in italics)204 The sign HELLO, THANK YOU . . . . . . . . . . . . . . . 205 An AZee output of type SCORE . . . . . . . . . . . . . . . 206 Kazoo: module organisation. . . . . . . . . . . . . . . . . . 211 Kazoo demo page, version 1.0. . . . . . . . . . . . . . . . . 213
9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9
12.1 The LTA of sentence (1) before and after filtering with the Companionship Principle . . . . . . . . . . . . . . . . . . . 262 12.2 Automaton A for the last constraint of Table 12.4 (page 261) 279 12.3 The PT of Sentence (2) . . . . . . . . . . . . . . . . . . . . 285 12.4 PTDs for Sentence (2) . . . . . . . . . . . . . . . . . . . . 286 12.5 Polarity composition . . . . . . . . . . . . . . . . . . . . . 286 12.6 Filtering in Interaction Grammars . . . . . . . . . . . . . . 289 12.7 Filtering in Tree Adjoining Grammars . . . . . . . . . . . . 293
xii
L IST OF TABLES
4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9
Incremental optimisation of the interpretation of he, stage 1, sentence (4.3.1) . . . . . . . . . . . . . . . . . . . . . . . . Incremental optimisation of the interpretation of he, stage 2, sentence (4.3.1) . . . . . . . . . . . . . . . . . . . . . . . . Interpretive optimisation of pointing in sentence (4.4.3), Figure 4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interpretive optimisation of pointing in sentence (4.4.4), Figure 4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interpretive optimisation of pointing in sentence (4.4.5), Figure 4.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interpretive optimisation of pointing in sentence (4.4.2), Figure 4.2 (page 74) . . . . . . . . . . . . . . . . . . . . . . . Expressive optimisation of reference to the speaker in sign language context . . . . . . . . . . . . . . . . . . . . . . . Expressive optimisation of reference to the speaker in spoken language context . . . . . . . . . . . . . . . . . . . . . Expressive optimisation of reference to multiple addressees in spoken language context . . . . . . . . . . . . . . . . . .
71 72 77 79 80 80 82 83 83
6.1
Operators in WCDG . . . . . . . . . . . . . . . . . . . . . 134
12.1 12.2 12.3 12.4
Toy lexicon of an AB-grammar . . . . . . . . . . . Types of the grammar redefined with a flat structure Head companions . . . . . . . . . . . . . . . . . . Argument companions . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
258 260 260 261
P REFACE
This book is addressed to scholars and students of linguistics and computational linguistics as well as others. Constraints are fundamental notions in the characterisation and processing of language. A model of language may consist of a generative and a constraining part, independently of whether the model concentrates on syntax or on the relationship between the meanings and their expression, and some models are entirely based on constraints. Different sorts of constraints appear in studies of languages, in computational linguistics and in a variety of programming paradigms. This book does not claim to provide a unified view of constraints, but aims at creating a mutual inspiration and transfer of results between the different fields and directions covered in this book. Constraint programming emerged due to a need to solve complex, mathematically formulated problems, including optimisation problems, and – especially in its variants of constraint logic programming – has provided an additional expressibility that is very useful for applications on language. Generative grammar formalisms, typically with a context-free backbone, have simple grammatical rules or complex attributes to capture semantic properties, and constraints can express concordance and formalise flow of information between different subphrases. Completely constraint based systems include Optimality Theory and Property Grammars that are also described in this book. Language is considered from a general perspective, ranging from human languages such as written, spoken or signed languages, over biological sequence data to streams of sensor signals received by a robot or an ambient intelligent computer application. They are all systems of encoded meanings
C ONSTRAINTS AND L ANGUAGE
in some syntactic form and present analogous problems of characterising them as well as concretely extracting meanings from expressions. This book arises from the series of workshops on Constraint Solving and Language Processing which started in 2004 and have been held with varying intervals since then. Proceedings have been published in Lecture Notes in Computer Science in 2004, and from 2012 on they appear in the FoLLI sub-series of Lecture Notes in Computer Science. Proceedings from the intermediate years have been issued as technical reports that are available online; a complete list is maintained at http://www.ruc.dk/~henning/ CSLP_AllWorkshops/. The first workshop took place at Roskilde University, Denmark, as part of a research project funded by the Danish Natural Science Research Council, lead by Henning Christiansen with the participation of Philippe Blache, Verónica Dahl, Jørgen Villadsen and the late Peter Rossen Skadhauge. Since then, these workshops have taken place in different cities of the world, Sitges in Spain, Sydney in Australia, Hamburg in Germany (with the ESSLLI summer school); after a pause of a few years, the series was resumed in 2012 due to an initiative by Denys Duchier and Yannick Parmentier in Orléans, France. The editors would like to thank the participants, program committee members, organisers and the long row of prominent invited speakers of these workshops, and especially those who contributed to this book. Finally, we are grateful to the Viking Ship Museum, Roskilde, Denmark, for allowing us to use the photo of its Viking longship copy, the Sea Stallion of Glendalough, for the book cover. The chapters of this book are divided into three parts. Part I provides foundations and overview of fields that are central to Constraints and Language, parts II and III presents a collection of recent results and applications.
H ENNING C HRISTIANSEN Roskilde, July 2014
xvi
Part I Foundations and Overview
C HAPTER O NE C ONSTRAINTS IN (C OMPUTATIONAL ) L INGUISTICS P HILIPPE B LACHE , J ØRGEN V ILLADSEN
1.1 I NTRODUCTION The notion of constraints started to occupy a central position in linguistic theories from the introduction of unication in grammars. This evolution has been rst done implicitly with so-called logic grammars, closely related with the history of Prolog: see for example Metamorphosis Grammars (Colmerauer, 1975); Denite Clause Grammars (Pereira and Warren, 1980) and Functional Unication Grammars (Kay, 1984). What is important with unication is that it has been a major step towards the introduction of constraints both in programming and in linguistics. Basically, unication can implemented with an equation system, as it is the case in Prolog II (Colmerauer, 1986). Variables being possibly any kind of objects, unication becomes on the one hand a powerful mechanism to reduce the search space and on the other hand a way to represent high level relations between the objects or their characteristics. The introduction of unication in grammars arrived with the representation of linguistic items’ properties by means of feature structures (see (Carpenter, 1992) for a precise description), enabling the implementation of relations not only between categories, but also between features. It became possible for example to represent directly different mechanisms such as sub-categorisation, agreement or lexical
4
C HAPTER O NE
selection. GPSG (Gazdar et al., 1985) was among the first linguistic theories making an intensive use of feature structures and integrating the above mentioned processes. This theory constituted a major rupture with the dominant generative paradigm precisely for this reason: syntactic relations were not anymore represented only in terms of rules, but also by means of other statements. The first major innovation in GPSG is the separate representation of linear precedence. The second is the possibility to express directly feature co-occurrence restriction. These two mechanisms (among others) act as constraints on the structure introducing this idea that building a syntactic structure is not only a matter of derivation (in other words does not only relies on rules), but also on other kinds of relations between the linguistic objects. The introduction of constraints into linguistic theories became then obvious. We propose in this chapter to explore the evolution of the notion of constraints in syntactic theories and how the computational and the linguistic perspective progressively get closer. In a first part, we propose an overview of constraint programming, showing how constraints not only introduce a way to control the processing (typically by reducing the search space), but also constitute an alternative way of representing and processing information (renewing the notion of declarativity). In a second part, we will detail the evolution of linguistic theories, from unification to constraints, showing how the notion of satisfaction can become the core of the theory. We will illustrate this evolution with the description of different theories, among which HPSG (Pollard and Sag, 1994), Optimality Theory (Prince and Smolensky, 1993) and Property Grammars (Blache, 2000). In the last section, we will situate this evolution in the model-theoretic perspective, and discuss how constraints can deeply renew our view of linguistic theory.
1.2 C ONSTRAINTS AND PROGRAMMING One way to present the notion of constraint relies on the state of the search space generated by a program. As classically presented (see e.g. (Saraswat and Rinard, 1990)), the state of a system is described by a store, which is the set of variables used in the program and a valuation function assigning each variable a value. A constraint provides partial information on the possible values. It specifies a subset of values in the initial domain, reducing then the state space. The conjunction of two constraints is the intersection of the set of values defined by each constraint. A typical constraint program consists in refining at each step the state of the search space
C ONSTRAINTS IN (C OMPUTATIONAL ) L INGUISTICS
5
in a monotonic way: the set of possible values at one step is a subset of the possible values of the same variables at the prior step. A solution in a Constraint Satisfaction Problem (CSP, see (Jaffar and Lassez, 1986)) is an assignment of values so that all constraints are satisfied simultaneously. An example over finite domains illustrates this process. Let’s note I1 an integer variable, and S1 a set variable. The following example shows constraints over the two kinds of variables, where : I1 ∈ {1, 5} {4, 6, 8} ⊆ S1
(1.1)
Both constraints illustrate how the domain of the different variables can be reduced, implementing the representation of partial information about them. Describing a problem consists in stipulating the different constraints over the variables of interest into a constraint store. No value can be assigned to a variable without satisfying the constraint store. At each step of the process, the constraint store can be enriched with new constraints. Moreover, constraints interact as explained before: a same variable can be constrained with different stipulations, either directly or not. Constraint propagation consists then in evaluating the intersection of the domains specified by the different constraints, as illustrated in the following example: {4, 6} ⊆ S1 S2 ⊆ {1, 2, 4} {4, 6} ⊆ S1 ⊆ {4, 6, 8} S3 ⊆ {4, 6, 8} −→ S2 ⊆ {4} S2 ⊆ S3 S3 ⊆ {4, 6, 8} S1 ⊆ S3
(1.2)
In this example, constraint propagation makes it possible to reduce drastically the definition domain of the different variables. In some cases, we can see that this process can lead to a unique variable assignment, which is a solution satisfying the constraint system. This is one of the major interests of constraints: their interaction specifies a priori a reduced definition domain for each variable, which simply means that no value can be chosen outside it. This characteristic illustrates how constraints can be active: they are applied a priori, before doing any processing (Van Hentenryck, 1989). This is another major interest in constraint programming: the classical strategy in imperative programming consists in enumerating the possible values to assign before verifying their properties. This process correspond to the generate-and-test strategy. In constraint programming, active constraints as presented above make it possible to apply property verification before gen-
6
C HAPTER O NE
erating a value. Many other kinds of constraints can be stipulated, such as ordering, equality, arithmetic, etc. To summarise, constraints are especially useful in two respects: information representation and control over the processes. In particular, they propose a direct way to represent partial information. Moreover constraint satisfaction being monotonic, all the different constraints are at the same level in the sense that they can be evaluated independently. Finally, constraint propagation provides an efficient way to control the processes by reducing the search space a priori. These different characteristics, plus the fact that there is a large variety of constraint types, make this approach well adapted to language processing.
1.3 C ONSTRAINTS , LINGUISTICS AND PARSING As said above, unification in linguistics opened the door to the introduction of constraints both for representing information (most linguistic theories now use this notion), but also in terms of computing: logic programming and the implementation of unification by means of an equation system constituted an adequate paradigm in which unification was considered as an active constraint. Both from theoretical and computational reasons, unification progressively gave the floor to constraints. This shift from unification to constraints in linguistics has been detailed by Pollard (1996). In this presentation, Carl Pollard identified the main properties shared by ConstraintBased Grammars (hereafter CBG), founding a new theoretical paradigm. We highlight in the following some of them: • Expressivity: “The language in which the grammatical theory is expressed does not impose constraints on the theory; it is the theory that imposes the constraints.” This property is directly related to what is called declarativity in programming: statements in the theory should not describe mechanisms on how to build the structure, but have to be linguistically motivated. For example, the description of how feature values are propagated or the kind of trees that are considered as valid should not belong to the theory. Theory and formalisms should be clearly distinguished. • Empirical Adequacy: “First write constraints that get the facts right, and worry later about which constraints are axioms and which are theorems. There are no deep principles.”
C ONSTRAINTS IN (C OMPUTATIONAL ) L INGUISTICS
7
In many cases in linguistics, facts require some axioms or principles that cannot be necessarily deduced or proved from anything else. What a linguistic theory should do is to describe as many facts as possible, even though axioms instead of theorems are required. • Locality: “Constraints are local in the sense that whether or not they are satisfied by a candidate structure is determined solely by that structure, without reference to other structures.” This property means that constraints have to be evaluated independently from other considerations. In other words, constraints need to be stipulated for themselves, without being part of an operational architecture. • Psycholinguistic Responsibility: “Linguistic theories must be capable of interfacing with plausible cognitive models.” In many cases, linguistic theories have been elaborated independently from the object they study - language- and its use by human subjects. Such theories became abstract formal objects. We will see that constraints can play an important role in the elaboration of cognitively grounded theories in the sense that they can represent linguistically motivated operations and they can be evaluated independently. In other words, their role can be observed in human language processing. • Radical Non-autonomy: “The grammar consists of assertions that mutually constrain several different domains. Some of these constraints may apply only to one domain. But typically, constraints are interface constraints.” In this interpretation, the grammar is a set of constraints, whatever the domain (morphology, syntax, phonology, prosody, etc.). All domains are independent in the sense that none of them is the result of the transformation of another. Parsing results then from constraint interaction. Another important view bringing a broader theoretical perspective to constraints in linguistics is the distinction made by Geoffrey Pullum between Generative-Enumerative Syntax (hereafter GES) and Model-Theoretic Syntax (hereafter MTS) (Pullum and Scholz, 2001); (Pullum, 2007)1 . In this work, Pullum underlines some of the main properties of generative approaches that make them problematic when adopting a broad perspective 1
See chapter 3 in this volume for a more precise presentation of MTS.
8
C HAPTER O NE
such as the one proposed by Pollard. One of these properties is the fact that GES relies on the idea that a grammar is a recursive definition of a set, then computably enumerable. Language processing relies there on a finite set of primitive elements and a finite set of operations for composing them into larger complex units. This conception entails a specific relation between language and grammar in which the language is the set of derived strings generated from the grammar by means of derivation. In this perspective, language is recursively enumerable, which as a side effect means that we cannot say anything about elements that does not belong to the set. In GES, parsing consists then in finding a derivation which makes it possible to build a tree. This approach is holistic in the sense that the entire system is required for finding a derivation: no GES rule can be evaluated in itself, but is just a step in the derivation process. On the opposite, a grammar in MTS does not recursively define a set of expressions: it merely states necessary conditions on the syntactic structure of expressions. The goal in MTS is to find an interpretation of grammatical information (not building a structure) or, in other words, to describe the characteristics of an input (not associating it with a structure). A MTS grammar is a set of independent assertions (formulas). Grammatical statements are here constraints on categories and a model is a set of categories satisfying the grammatical constraints. Pollard’s and Pullum’s papers propose a new perspective for linguistic theories in which description of linguistic facts occupies a central position in the architecture. Instead of describing how to build a structure compatible with the observations, the question is to describe the facts thanks to grammatical statements that are satisfied by their description. In this theoretical framework, constraints are the grammatical statements and parsing, as described in the following sections, relies then on constraint satisfaction.
1.3.1 C ONSTRAINTS
ON TREES : ACTIVE CONSTRAINTS AND PARSING
Denys Duchier in several works with different colleagues has explored the question of constraints on trees, their representation and their implementation into a constraint satisfaction problem (Duchier and Thater, 1999; Duchier, 2000; Duchier and Debusmann, 2001). He proposed in particular a specification of dominance constraints with set operators, with the following
C ONSTRAINTS IN (C OMPUTATIONAL ) L INGUISTICS
9
abstract syntax (adaptation from (Koller and Niehren, 2002)):
ϕ ::= X : f (X1, ..., Xn) | X ∗ Y | X⊥Y | X = Y | ϕ ∪ ϕ
(1.3)
In this representation, the variables X and Y denote nodes in a tree. A dominance constraint ϕ is a conjunction of different constraints shown above: labelling X : f (X1, ..., Xn), dominance X ∗ Y , inequality X = Y , and disjointedness X⊥Y . As described by Koller and Niehren (2002): a labelling constraint expresses that X denotes a node which has the label f and whose children are denoted by X1, ..., Xn. A dominance constraint X ∗ Y expresses that X denotes a node that is somewhere above (or equal to) the node denoted by Y in the tree. An inequality constraint expresses that the two variables denote different nodes, and a disjointedness constraint X⊥Y expresses that neither of the two nodes dominates the other. The proposal consists then to encode this information in terms of set constraints. At each node, it is possible to identify different regions in the tree with which constraints will be expressed: the node itself , all nodes above, all nodes below, and all nodes to the side (i.e. in disjoint subtrees). These regions are denoted with different variables, respectively: Eqx ,U px , Downx , Sidex :
Other set variables can be defined in terms of set operations, for example: Eqdownx Equpx
= Eqx Downx = Eqx U px
(1.4)
Constraints are then encoded by means of these set variables as in the following: x + y ≡
Eqdowny ⊆ Downx ∧ Equpx ⊆ U py ∧ Sidex ⊆ Sidey
(1.5)
10
C HAPTER O NE
This constraint stipulates that variables equal or below y are below x, variables equal or above x are above y, and variables disjoint from x are also disjoint from y. All other types of constraints on trees can be encoded as constraints on finite set in the same manner. Parsing becomes then a constraint satisfaction problem. Finding a parse consists in finding an assignment of the different variables that satisfies these constraints. This approach of constraints applied to trees makes it clear that parsing can be seen as a constraint satisfaction problem. In this proposal, the approach can be applied to simple phrase-structure grammars and has also been proposed for dependency grammars.
1.3.2 GPSG:
THE SEPARATION OF INFORMATION
GPSG (Gazdar et al., 1985) was the first theory that proposed to distinguish different types of information, encoded implicitly in a tree: dominance and linearity. Dominance is encoded in GPSG with immediate dominance rules (or ID rules) and linearity with linear precedence statements (LP rules). The following example proposes a partial description of the NP in French: NP →id N; Det; {AP; PP} (1.6) Det ≺ ∗; N ≺ PP; AP ≺ PP The ID-rule indicates the possible constituents of the NP, two of them being mandatory (Det and N), the others compulsory. The LP rules specify the different possible positions of the constituents in a well-formed structure. Both types of rules makes it possible to generate the set of well-formed trees. In this interpretation, ID-rules define the domain, which is the set of all possible trees, and LP-rules act as constraints on this domain, restricting the space of possible solutions to the set of trees satisfying the LP constraints. This is then a typical application of constraints, playing the role of a filtering device. Moreover, GPSG was also one of the first theories to propose an intensive use of feature structures, enabling to encode precise relations between categories and features. GPSG was then a pioneer theory in bringing the unification process into the description of well-formedness. This was done thanks to different principles, propagating feature values through the tree nodes, but also with a specific mechanism called feature co-occurrence restriction (FCR, see also (Petersen and Kilbury, 2009)), as illustrated in the following example: [+INV ] ⊃ [+AUX, FIN]
(1.7)
C ONSTRAINTS IN (C OMPUTATIONAL ) L INGUISTICS
11
This FCR stipulates that if a verb occurs initially in a sentence containing a subject (feature [+INV]), then this verb must be a finite auxiliary. FCRs offer the possibility to stipulate a new type of constraints. Unlike constraints on the tree structure as shown above, these constraints enables to specify relations at a fine level, directly between feature values, which is one of the basis of lexicalist approaches. These two major innovations, ID/LP representation and constraints on feature values, are of deep importance in linguistic theory: they help in understanding what kind of information is used during the derivation process. Applying a phrase-structure rule comes then to apply a complex mechanism, relying on complex and heterogeneous information. And this kind of information (in particular LP statements and FCRs) typically acts as constraints on the structure. A constraint-based approach to GPSG parsing (see (Blache, 1992)) can then be seen as generating the set of possible trees and reducing this domain by applying the different constraints.
1.3.3 HPSG:
THE NOTION OF SATISFACTION
The idea of considering parsing as a constraint satisfaction problem appeared only after constraints were introduced in linguistic theories. This question has been explored from the first-order logic point of view by Johnson (1988, 1994) that describes how “grammaticality of an utterance corresponds to the satisfiability of a set of constraints, and ungrammaticality or ill-formedness corresponds to the unsatisfiability of that set”. This is exactly what has been applied in the theoretical elaboration of HPSG (Pollard and Sag, 1994; Sag et al., 2003). HPSG pushed one step forward the ideas of GPSG in abandoning the notion of derivation rule and representing all information and syntactic relations by means of feature value propagation. Some schema indicate the general hierarchical syntactic structure (which more or less corresponds to abstract trees) starting from which all constraints can be applied. More precisely, information being encoded by means of feature structures (hereafter FS), constraints are stipulated in terms of partial FS, each encoding a specific information. Such partial FSs are called descriptions and stipulate different kinds of information such as agreement, restrictions, propagation by means of structure sharing, etc. Description application follows the attribute-value logic described by Carpenter (1992). AV-logic proposes a formal definition of feature structures together with a description language: in this language, well-formed formulas are the descriptions. All descriptions specify a particular proper-
12
C HAPTER O NE
ties of features structures. A description corresponds then to a constraint on feature structures. A satisfaction relation is then applied between feature structures and descriptions. From an operational point of view, a description implements specific relations between feature values (such as lexical restrictions) as well as syntactic principles. Description Relations : In addition to the classical logical connectives ∨ . and ∧, AV-logic introduces four relations noted as δ , =, : and @. They can be described as follow : • δ ( f , q) : feature value function, expresses the value from the node q following the path f. • F@π : expresses the value of the feature structure F at the path π . . • π1 = π2 : means that the value of the object got by following the path π1 is token identical to the object got by following the path π2 . • π : φ : means that the value at the path π satisfies the description φ . Satisfaction The following properties are not a complete definition of satisfaction relation but the interpretation of the previous relations : • F |= π : φ if F@π |= π . • F |= π1 = π2 if δ (π1 , q) = δ (π2 , q) • F |= φ ∧ ψ if F |= φ and F |= ψ • F |= φ ∨ ψ if F |= φ or F |= ψ Figure 1.1 illustrates this satisfaction relation. In this case, the feature structure F satisfies the description : . F |= (Subj = Pred Agt) ∧ (Pred : (Pat : John)) Descriptions can be more or less specified according to the described properties: in this example, the description focuses on the predicative structure, but could also involve any other feature. What is important in the HPSG perspective is that all linguistic properties are described in terms of descriptions or, in other words, in terms of constraints on the structure. HPSG still relies on a phrase-structure backbone: very general hierarchical structures are defined by means of ID-schema (which are abstract trees). The constraints are applied on the structures built by combining these schema. In this sense, there is no classical derivation as in other phrase-structure
C ONSTRAINTS IN (C OMPUTATIONAL ) L INGUISTICS ⎡
S ENTENCE : John loves Mary ⎢ S UBJ: 1 John ⎢ ⎢ ⎢ 2 Mary ⎢ O BJ: ⎢ ⎡ ⎤ F=⎢ R EL : loves ⎢ ⎢ ⎢ AGT: ⎥ ⎢ 1 ⎥ ⎢ P RED: ⎢ ⎦ ⎣ ⎣ PAT: 2
13
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
Figure 1.1: Example of a feature-structure satisfying an input description
approaches (for example GPSG in which ID-rules need to be applied), the structure is directly built thanks to a satisfaction process. We can say that HPSG is then a truly constraint-based theory, in which constraints are not only a filtering device, but the core of the process.
1.3.4 OT:
RELAXING CONSTRAINTS
Optimality Theory (Prince and Smolensky, 1993) also occupies a central position in the use of constraints in linguistic theory. More precisely, OT introduced among other things the notion of constraint violation: constraints stipulate any kind of linguistic information, but they all can be violated by some structures. OT has a specific architecture relying on two steps: 1. Generation (GEN): starting from an input representing an underlying form, generation of all the possible outputs, also called candidates. 2. Evaluation (EVAL): selection of the optimal candidate (the output) thanks to a set of ordered constraints (in some presentations, the set of constraints are considered as a component of the theory, as GEN and EVAL). (GEN) { candidate1 , (EVAL)
input ↓ candidate2 , ↓ output
...,
candidaten }
In OT, the lexicon contains the underlying forms, starting from which GEN generates all candidates. The set of candidates is then filtered (and or-
14
C HAPTER O NE
dered) by evaluating constraints. In OT, constraints are universal: all structures should satisfy them. But at the same time, as said above, they can be violated. Two types of constraints are used: faithfulness and markedness constraints. The first require that the form of the output has to be as close as possible as that of the input. In particular, segments (or constituents) of the input need to have correspondents in the output. Moreover, the output need to keep the linear order of the input segments. Markedness constraints indicate whether the corresponding information is used to mark a variation into a given language (unmarked information being in some sense a default value). Markedness constraints indicate how the structure of the underlying form can be changed (there is then a tension between the two types of constraints). The following table illustrates the two types of constraints in phonology: Markedness
Faithfulness
O NSET: syllables must have an onset VOICED -C ODA: obstruents must not be voiced in syllable coda position I DENT-IO: the specification of the feature [voiced] of an input segment must be preserved in its output correspondent
Constraints are ranked by order of importance. In OT (at least in theory) all constraints are universal. The difference between languages is taken into account thanks to different constraint orderings. In the previous example, the constraint ranking for German is the following: VOICED -C ODA I DENT-IO This ranking explains final devoicing in German (obstruents at the end of a syllable must be devoiced). For the example, the singular form of the word “Hand” is pronounced /hant/ while the plural “Hände” is pronounced /hε nde/. The underlying form of the word (in the lexicon) is /hand/. Starting from this input, the GEN function generates many different candidates among which /hand/ and /hant/: GEN(/hand/) = {/hand/, /hant/} We can see that the form /hant/ satisfies the VOICED -C ODA constraint, but violates the I DENT-IO: the input segment /d/, which is voiced, is realised as /t/ in the candidate form. The satisfaction and violation of the constraints can be summarised in a table : Input : (a) (b)
/bε d/
VOICED -C ODA
/hant/ /hand/
*!
I DENT-IO *
C ONSTRAINTS IN (C OMPUTATIONAL ) L INGUISTICS
15
In this table, we indicate constraint violation with *. Constraint ranking is represented by column ordering: constraint in a column are higher than other constraints to its right. This table shows that both candidates violate a constraint. However, the constraint violated by the candidate /hant/ is “less important” in the hierarchy. Then, its rank in the set of candidates is higher than the other form and makes it the optimal candidate (indicated in the table with the sign ). More generally, the optimal candidate is the one that violates the smallest number of high-ranked constraints. The mechanism of constraint violation is of deep importance in order to represent variation, not only between languages, but also into a language. It also makes it possible to explain different levels of grammaticality as proposed by Keller (2000).
1.3.5 PG:
CONSTRAINTS AS SYNTACTIC STRUCTURE
The reader can find in another chapter of this volume2 a precise presentation of Property Grammars (hereafter PG). We briefly remind here its main characteristics before developing the main contribution of PG: a complete representation of syntactic information in terms of constraints. PG proposes to systematise the idea of GPSG consisting in representing explicitly the different kinds of syntactic information. GPSG has been elaborated starting from an observation: rules in phrase-structure grammars encode both constituency information and linear order between the constituents. GPSG proposed then to split such information in two separate statements: immediate dominance and linear precedence. This is the ID/LP formalism. PG proposes to systematise this approach starting from the same observation: ID/LP rules still encode implicit information such as the the fact that some constituents cannot be repeated, some must (or cannot) cooccur as well as the fact that some constituents are the minimal realisation of a phrase. PG propose then to encode this heterogeneous information by means of different types of statements: • Constituency: set of all the possible elements of a construction. • Uniqueness: constituents that cannot be repeated within a construction. • Linearity: linear order. • Obligation: set of obligatory constituents, one of them (exclusively to the others) being realised. 2
See chapter 3.
16
C HAPTER O NE
• Requirement: obligatory co-occurrence between constituents within a construction. • Exclusion: impossible co-occurrence between constituents within a construction. These statements can all be considered as constraints, in the sense defined at the beginning of the chapter. The domain is the set of categories, the statements stipulate constraints on the possible constituents of a phrase. More precisely, any syntactic category can be described by its set of possible constituents plus the set of constraints that stipulate restrictions on their realisation. In other words, a well-formed realisation of a phrase is a subset of its possible constituents satisfying the constraints describing it. The following example illustrates this representation and gives the properties describing the PP in Chinese (these information being acquired from the Chinese Treebank): Constituency Linearity
Uniqueness Requirement Exclusion
{P, NP, LCP, IP, ADVP, PP, QP, DP, FLR, UCP, PU, CP, VV} P≺* ADVP ≺ PP FLR ≺ NP, LCP, IP PU ≺ NP, PP, LCP VV ≺ NP {NP, IP, QP, LCP, ADVP, DP} ADVP ⇒ PP VV ⇒ NP NP ⊗ *{P FLR, PU, VV} LCP ⊗ *{P, FLR} IP ⊗ *{PU, FLR} {QP, DP, UCP, CP} ⊗ *{P} FLR ⊗ *{P, NP, LCP, IP, PU} PU ⊗ *{P, NP, PP, IP, LCP} VV ⊗ *{NP}
In PG, no other information is needed to describe a syntactic structure. This is a major difference with other theories. We have seen for example that OT first need to generate a set of structures (the candidates) on which constraints are applied, in a filtering manner. We need then here a specific mechanism (GEN) which is in fact purely operational, and that does not bring any linguistic information in itself. The situation is also specific in HPSG: this theory (because of the specific role of heads) still needs to build first a generic underspecified phrase-structure backbone (thanks to IDschema) before applying constraints on such structures. In this case too, this
C ONSTRAINTS IN (C OMPUTATIONAL ) L INGUISTICS
17
comes to a two-step mechanism, generating first a structure. In PG, no such external device is required: all information is encoded as constraints, that are applied not on structures, but on set of categories. The parsing process only consists in evaluating all the possible constraints of the grammar, given an input set of categories corresponding to the words of the sentence to parse. The parsing process is then uniquely constraint satisfaction. Two important consequences follow from this particularity. First, all constraints can be evaluated independently from the others: PG is nonholistic. This is a major difference with more classical or generative-based grammars in which a structure has to be built first. In such a case, a specific rule cannot be applied alone, it is part of the derivation (or part of more complex structure such as ID-schemata). On the opposite, any constraint in PG can be evaluated without needing access to the rest of the grammar (as it is necessary with derivation). Second, the parsing process consists in evaluating a set of constraints. As a result, these constraints and their evaluation (they can be satisfied or violated) form the syntactic description of the input. In other words, no specific structure is built: the set of evaluated constraint is the syntactic description, in the same way as a tree is the syntactic description of a sentence in phrase-structure grammars. Concretely, parsing a sentence consists in evaluating all the possible constraints that are relevant for the corresponding set of categories. These constraints can be satisfied or violated and this unique information is sufficient to describe the syntactic characteristics of the input. No specific structure is built there: the state of the constraint system after evaluation is the syntactic description.
1.4 C ONCLUSION The evolution of linguistic theories shows an increasing role of constraints in the representation of syntactic information. They have been more and more intensively used in order to filter the syntactic structures, adding new kind of information. A decisive step has been taken by recent theories such as OT and more importantly HPSG, for which constraint satisfaction plays a central role, until being the unique process as in PG. This shift is the same as the one in computer science: constraints were originally used as a complementary mechanism making it possible to control the search space. Then, constraint programming has proposed to use constraint satisfaction as the unique processing mechanism. In the same way, linguistics has started to include constraints in its representation, until offering the possibility to
18
C HAPTER O NE
use constraints a the unique representation. This evolution opens in fact a new theoretical paradigm. Syntax was until now completely dependent from the generative view of grammar and language in which a language is generated by a grammar, which is revealed (or implemented) by the derivation process. In other words, derivation was the core of the processing architecture, even implicitly in some cases. We have seen that constraints and satisfaction can completely replace derivation. We are then in a completely different relation between language and grammar, in which a language is modelled, not generated by the grammar. This is an epistemological shift, opening the way to new architectures of language processing, with deep consequences both in computational, but also in cognitive perspectives.
B IBLIOGRAPHY Blache, P. (1992). Using Active Constraints to Parse GPSG. In Proceedings of the 15th International Conference on Computational Linguistics (COLING’92), pages 81–86. Nantes, France. Blache, P. (2000). Constraints, Linguistic Theories and Natural Language Processing. In D. Christodoulakis, editor, Natural Language Processing, volume 1835 of Lecture Notes in Artificial Intelligence (LNAI). SpringerVerlag. Carpenter, B. (1992). The Logic of Typed Feature Structures. Cambridge University Press. Colmerauer, A. (1975). Les grammaires de métamorphose. Technical report, Université d’Aix-Marseille (Groupe d’Intelligence Artificielle). Colmerauer, A. (1986). Theoretical model of Prolog II. In M. van Caneghen and D. Wane, editors, Logic Programming and its Applications. Ablex Series in Artificial Intelligence. Duchier, D. (2000). Constraint Programming for Natural Language Processing. Technical report, ESSLLI. Duchier, D. and Debusmann, R. (2001). Topological Dependency Trees: A Constraint-Based Account of Linear Precedence. In Proceedings of 39th Annual Meeting of the Association for Computational Linguistics, pages 180–187, Toulouse, France. Association for Computational Linguistics.
C ONSTRAINTS IN (C OMPUTATIONAL ) L INGUISTICS
19
Duchier, D. and Thater, S. (1999). Parsing with Tree Descriptions: a constraint-based approach. In Sixth International Workshop on Natural Language Understanding and Logic Programming (NLULP’99), pages 17–32, Las Cruces, New Mexico. Gazdar, G., Klein, E., Pullum, G., and Sag, I. (1985). Generalized Phrase Structure Grammars. Blackwell. Jaffar, J. and Lassez, J.-L. (1986). Constraint logic programming. Technical report, Department of Computer Science, Monash University, Victoria, Australia. Johnson, M. (1988). Attribute Value Logic and the Theory of Grammar. CSLI Lecture Notes. Johnson, M. (1994). Two ways of formalizing grammars. Linguistics and Philosophy, 17(3), 221–248. Kay, M. (1984). Functional unification grammar: A formalism for machine translation. In Proceedings of the 10th International Conference on Computational Linguistics and 22nd Annual Meeting of the Association for Computational Linguistics, pages 75–78, Stanford, California, USA. Association for Computational Linguistics. Keller, F. (2000). Gradience in Grammar - Experimental and Computational Aspects of Degrees of Grammaticality. Ph.D. thesis, University of Edinburgh. Koller, A. and Niehren, J. (2002). Constraint programming technology in computational linguistics. In D. Barker-Plummer, D. Beaver, J. van Benthem, and P. Scotto di Luzio, editors, Words, Proof and Dialog, pages 95–122. CSLI. Pereira, F. and Warren, D. (1980). Definite clause grammars for language analysis - a survey of the formalism and a comparison with augmented transition networks. Artificial Intelligence, 13, 231–278. Petersen, W. and Kilbury, J. (2009). What feature co-occurrence restrictions have to do with type signatures. In J. Rogers, editor, Proceedings of the 10th conference on Formal Grammar, pages 123–137. CSLI. Pollard, C. (1996). The nature of constraint-based grammar. In Pacific Asia Conference on Language, Information, and Computation (PACLIC), Kyung Hee University, Seoul, Korea.
20
C HAPTER O NE
Pollard, C. and Sag, I. (1994). Head-driven Phrase Structure Grammars. Center for the Study of Language and Information Publication (CSLI), Chicago University Press. Prince, A. and Smolensky, P. (1993). Optimality Theory: Constraint Interaction in Generatire Grammar. Technical report, TR-2, Rutgers University Cognitive Science Center, New Brunswick, NJ. Pullum, G. (2007). The evolution of model-theoretic frameworks in linguistics. In J. Rogers and S. Kepser, editors, Proceedings of Model-Theoretic Syntax at 10 Workshop, pages 1–10, ESSLLI, Dublin, Ireland. Pullum, G. and Scholz, B. (2001). On the Distinction Between ModelTheoretic and Generative-Enumerative Syntactic Frameworks. In P. de Groote, G. Morrill, and C. Rétoré, editors, Logical Aspects of Computational Linguistics: 4th International Conference, number 2099 in Lecture Notes in Artificial Intelligence, pages 17–43, Berlin. Springer Verlag. Sag, I., Wasow, T., and Bender, E. (2003). Syntactic Theory. A Formal Introduction. CSLI. Saraswat, V. A. and Rinard, M. C. (1990). Concurrent constraint programming. In Conference Record of the Seventeenth Annual ACM Symposium on Principles of Programming Languages (POPL), pages 232–245, San Francisco, California, USA. Van Hentenryck, P. (1989). Constraint Satisfaction in Logic Programming. Logic Programming Series, The MIT Press, Cambridge, MA.
C HAPTER T WO C ONSTRAINTS AND L OGIC P ROGRAMMING IN G RAMMARS AND L ANGUAGE A NALYSIS H ENNING C HRISTIANSEN
2.1 I NTRODUCTION Constraints are an important notion in grammars and language analysis, and constraint programming techniques have been developed concurrently for solving a variety of complex problems. In this chapter we consider the synthesis of these branches into practical and effective methods for language analysis. With a tool such as Constraint Handling Rules, CHR, to be explained below, the grammar writer or programmer working with language analysis can define his or her own constraint solvers specifically tailored for the linguistic problems at hand. We concentrate on grammars and language analysis methods that combine constraints with logic grammars such as Definite Clause Grammars and CHR Grammars, and show also a direct relationship to abductive reasoning. Section 2.2 reviews background on different but related notions of constraints in grammars and programming, and a brief introduction to Constraint Handling Rules is given. The relation between abductive reasoning and constraint logic programming, most notably in the form of Prolog with CHR, is spelled out in section 2.3. Sections 2.4–2.5 show how this materialises into methods for language analysis together with Definite Clause Grammars and in the shape of CHR Grammars.
22
C HAPTER T WO
2.2 BACKGROUND We assume a basic knowledge of the logic programming language Prolog and its grammar notation Definite Clause Grammars, DCG. There are plenty of good standard books on logic programming and resources available on the internet; Christiansen (2010), for example, gives a brief introduction intended for linguistics students.
2.2.1 D IFFERENT N OTIONS AND
OF
C ONSTRAINTS
IN
G RAMMARS
L OGIC P ROGRAMMING
The term constraints is used with many, slightly different but overlapping meanings in computation and linguistics. Typically constraints C appear together with a generative model G , where they serve as a filter to reduce the extension of G . For example, G may characterise a set of decorated syntax trees and C accepts only those trees whose decorations reflect correct inflection. In a Definite Clause Grammar, for example, the unification of attributes from different subtrees serves as constraints that limit the extension. (Typed) feature structures can be seen as extensions to the standard terms of Prolog and considered more suited for modelling grammatical and semantic features of language; they require a separately programmed unification algorithm when used with, say, Prolog-based grammars. There is a flourishing tradition for many sorts of such unification-based grammars or constraint-based grammars (where constraints typically are realised through a sort of unification); we shall not go into more details but refer to (Pereira and Shieber, 1987; Gazdar and Mellish, 1989; Shieber, 1992; Francez and Wintner, 2012) for more information. Formally, these grammars are special cases of Knuth’s Attribute Grammars (Knuth, 1968). However, attributes and constraints imposed on them are informative themselves and should not only be seen as devices to rule out syntactically wrong phrases or analyses thereof (or, strictly speaking, phrase-like structures). Syntactic features and semantic representations given as attributes may represent the desired results of an analysis. Another notion of constraints comes in from the tradition of constraint programming, in which a mathematical or logical problem concerns the assignment of values to variables that satisfies a number of constraints specified by a collection of predicates, where unification or its special case of syntactic equality are just examples of such predicates. For a general overview of this field, see (Apt, 2003), and for an introduction to its incarnation in
C ONSTRAINTS AND L OGIC P ROGRAMMING IN L ANGUAGE A NALYSIS
23
logic programming, (Jaffar and Lassez, 1987). As a simple example, we may consider a constraint problem in the variables x and y in the domain of integer numbers given as x ≤ y ∧ y ≤ x. This constraint problem has an infinite number of solutions characterised by the reduced or solved constraint x = y. The device that performs this reduction is an algorithm called a constraint solver. In linguistic terms, we may consider a grammar for a language about mathematical entities, so that the analysis of two sentences produces the inequalities shown, one for each sentence, and we take x = y as the meaning of the discourse consisting of those two sentences: we show this below in example 2.2.1 using the combined notations of DCG and Constraint Handling Rules. Whereas the aforementioned unification- or constraint-based grammars such as DCGs communicate meanings and features through the links and nodes of syntax trees, constraint logic programming allows to communicate through a global structure, typically called a constraint store. Thus an analysis may produce a syntax tree together with a constraint store in reduced form, by a cooperation between a parsing algorithm and the constraint solver. In case the constraint solver identifies constraint violations, it may force the parser to try another decomposition strategy, perhaps leading to a rejection of the entire phrase being analysed. While a constraint store in principle can be implemented as an attribute carried along the branches of a syntax tree, and thus provides no extension in a strict mathematical sense to the previous, the notion of a global store of information collected in an incremental way points in the direction of discourse analysis, i.e., the knowledge from one sentence is available for the analysis of the next one and so on, and the constraint solver serves as to maintain the constraint store, which now takes the role of a knowledge base.
2.2.2 C ONSTRAINT H ANDLING RULES , CHR The programming language of Constraint Handling Rules (CHR) is an extension to Prolog that makes it possible to write specialised constraint solvers in a rule-based fashion. CHR consists of rewriting rules over constraint stores, and each time a new constraint is called, the given CHR program will accommodate it into the evolving constraint store (or perhaps produce a failure, leading to a shift of control in a driver process such as a parser). CHR is now available as an integrated part of several major Prolog systems. Here we give only a very brief introduction; a comprehensive account on CHR and its applications can be found in the book by Frühwirth (2009). CHR has three sorts of rules of the following forms.
24
Simplification rules: Propagation rules: Simpagation rules:
C HAPTER T WO
h1 , . . . , hn h1 , . . . , hn h1 , . . . , hk \ hk+1 , . . . hn
==>
Guard | b1 , . . . , bm Guard | b1 , . . . , bm Guard | b1 , . . . , bm
The h’s are head constraints and b’s body constraints, and Guard is a guard condition (typically testing values of variables found in the head). A rule can be applied when its head constraints are matched simultaneously by constraints in the store and the guard is satisfied. For a simplification rule, the matched constraints are removed and the suitably instantiated versions of the body constraints are added. The other rules execute in a similar way, except that for propagation, the head constraints stay in the store, and for simpagation, only those following the backslash are removed. Prolog calls inside the body are executed in the usual way. The indicated procedural semantics is consistent with a logical semantics based on a reading of simplifications as bi-implications, propagations as implications. A simpagation H1 \ H2 G | B is logically considered equivalent with the simplification H1 , H2 G | H1 , B, although it is executed in a different way. Example 2.2.1. In section 2.2.1 we discussed informally a constraint solver for inequalities. In the following, we define a constraint solver with the described behaviour, using the constraint predicate leq(x,y) to represent x ≤ y. :- chr_constraint leq/2. leq(A,B), leq(B,A) A=B.
Solving inequalities is a standard introductory example for CHR; see (Frühwirth, 2009) for the few additional rules necessary to handle all combinations of inequalities correctly. The CHR program can be tested from the Prolog system’s command line as follows. A collection of constraints is given as a query, and the interpreter returns the result. ?- leq(X,Y), leq(Y,X). X=Y?
In case the execution of a rule body leads to failure, it means that the constraints in the query are considered to be inconsistent with respect to the knowledge embedded in the current program. This may happen if a unification fails, e.g., 1=2, or when the built-in predicate fail is called.
C ONSTRAINTS AND L OGIC P ROGRAMMING IN L ANGUAGE A NALYSIS
25
2.3 A BDUCTIVE R EASONING IN L OGIC P ROGRAMMING WITH
C ONSTRAINTS
Abductive reasoning was formulated as a principle by C.S.Peirce (1839– 1914) as a third fundamental form to complement deduction and induction. Abduction has been studied in the context of logic programming, and it is also an important principle and a general metaphor for discourse analysis, so therefore it is relevant here to discuss this notion in a little detail. There are different formulations of what abduction means, but here we stay with the simplest form namely that abduction is the process of suggesting a suitable set of facts, which serves as a hypothesis, which, when added to our current knowledge, can explain an unexpected observation. Unexpected means here, not explainable from our current knowledge base without additional hypotheses added. Furthermore, the new hypothesis must not conflict with our current knowledge. In symbols, when our current knowledge base is called K and the observation O, we need to find a hypothesis or explanation H such that K ∪ H ⇒ O but not K ∪ H ⇒ false. Discourse analysis can be understood as abduction as best known from the seminal paper by Hobbs et al. (1993): a listener A wants to figure out the new (to A) knowledge embedded in a discourse produced by a speaker B. For A, the discourse is an observation O, and when he concludes, based on his current knowledge K “Aha, now I understand, B wants to convey the message or knowledge H – this makes sense and can explain why he is telling this story O.” This fits the general pattern of abduction shown above. One of the advantages of viewing discourse understanding as abduction – as opposed to a compositional assembly process from the bits of meanings embedded in each word – is that presupposed knowledge can be extracted: The utterance of the sentence “Now her husband is drunk again", if assumed to be true, can only be explained if 1) the husband is drunk, and 2) that this is a regularly recurring event. A more detailed analysis might also add (if it was not known), that 3) the husband suffers from alcoholism, and 4) has access to alcohol. If, furthermore, the general setting is new to the listener, he may conclude further “hmmm, there are a female and a male character involved, and they happen to be married”. To understand abduction in logic programming, we may consider a situation where a person has formalised a knowledge base in terms of a logic program, call it P, and makes an observation O. He wants to check if O is really the case according to P by asking the query ?- O, and he receives
26
C HAPTER T WO
Prolog’s laconic answer no. However, it may be the case that if the program is extended with a set of additional facts, which we call H to conform with the pattern above, that O will succeed in the extended program P ∪ H ; thus H may be a proposal for an abductive explanation. However, this argument may not be sufficient as knowledge about the general setting puts some restrictions on which combinations of facts can co-exist. For example, if the problem is given by a detective story, i.e., O is an observed crime, a hypothesis such as did_it(the_ butler) may be rejected, if this implies was_at(misty_london, the_butler) and it is a known fact that was_at(sunny_brighton, the_butler). This conflicts with the world knowledge that a person cannot be in two different places at the same time. Thus, to do abduction in logic programming, it maybe suggested to add an additional component of so-called integrity constraints I C to put restrictions on which sets of facts are acceptable as explanations. To fit the general characterisation of abduction above, we may set K = P ∪ I C . An abductive logic program can be defined as a triplet P, I C , A where A is a collection of predicates called abducibles, from which explanations can be composed, and P and I C as above. Abducible predicates are usually assumed not to occur in the head of any clause of P. Console et al. (1991) formalised in an elegant way how the execution of abductive logic programs can be understood as an extension to Prolog’s standard goal-directed recursive descent style: instead of failing when it runs an abducible atom not mentioned in P, it simply adds it to the growing explanation. A consistency check with respect to I C needs to be incorporated, ideally in an incremental way in order to reduce the search space. When Prolog is combined with CHR, this is exactly what CHR is doing: when a constraint is encountered during the execution of a query, it is added to the constraint store, and the constraint solver – i.e., the current set of rules – will apply, thus serving as an incremental test of consistency. This relationship between abduction and CHR was discovered by Abdennadher and Christiansen (2000) and its combination with Prolog as just described by Christiansen and Dahl (2005a). The idea is elucidated by the following example. Example 2.3.1 (Christiansen (2009)). The following program consists of two Prolog rules describing different ways for someone to be happy, together with a CHR component that defines a constraint solver for three predicates which may be understood as abducible.
C ONSTRAINTS AND L OGIC P ROGRAMMING IN L ANGUAGE A NALYSIS
27
happy(X):- rich(X). happy(X):- professor(X), has(X,nice_students). :- use_module(library(chr)). :- chr_constraint rich/1, professor/1, has/2. professor(X), rich(X) ==> fail.
The single CHR rule formalises the real world experience that the salary paid to professors does not make them rich. The following query asks for how a certain professor may become happy, it is shown together with the answer produced by the combined Prolog and CHR interpreter. ?- happy(henning), professor(henning). professor(henning), has(henning,nice_students) ? ; no
The presence of the additional information in the query, that the person is a professor, triggers the CHR rule when the hypothesis rich(henning) is suggested by the Prolog program, thus rejecting this alternative. Above, the semicolon is typed by the user with the meaning of asking for alternative answers, and the system’s no assures there are no more solutions than the one reported. As it appears, the implementation of abductive logic programming with CHR requires no program transformation or additional interpretation overhead. The approach may be summarised as translation of terminology from one computational domain to another. Abductive logic programming
Constraint logic programming with CHR
Abductive logic programs Abducible predicate Integrity constraints Program rules Explanation
Prolog programs with a little bit of CHR Constraint predicate CHR Rules Program rules Final constraint store
The precise details for this equivalence are spelled out in (Christiansen, 2009). Most other approaches to abductive logic programming are based on meta-interpreters written in Prolog, see (Denecker and Kakas, 2002) for an overview and a later approach by Mancarella et al. (2009). Several of these can handle negation, which is not possible in our CHR-based approach described above, but our method favours in terms of efciency and
28
C HAPTER T WO
the flexibility from using an existing, fully instrumented programming system. The similarity between constraint programming and abduction was also observed in an early paper by Maim (1992) before the appearance of CHR.
2.4 U SING CHR WITH D EFINITE C LAUSE G RAMMARS FOR
D ISCOURSE A NALYSIS
Definite Clause Grammars, DCG, play well together with CHR for discourse analysis as discussed very briefly in section 2.2.1. While, technically speaking, this way of using CHR to gradually collect the knowledge contained in a discourse is an application of abductive reasoning, its use with DCG is fairly easy to understand also for students without any knowledge about abduction. This approach to discourse analysis is demonstrated by a simple example adapted from (Christiansen, 2014b). Example 2.4.1. We consider a constraint solver to be used for the analysis of stories about the students at a small university at some fixed moment of time. The university has a number of rooms and other places, where students are allowed to come. A section of those are two lectures halls encoded in the program below as lectHall1, lectHall2; a reading room rRoom, student bar bar, a garden garden, etc. There are currently two courses going on, programming in lectHall1 and linguistics in lectHall2. The following constraint predicates are introduced, in(s,r) indicating that student s is in room r, attends(s,c) that student s attends course c, can_see(s1 ,s2 ) that student s1 can see student s2 , and finally reading(s) indicating that student s is reading. A student can only be in one room at a time, and reading can take place in any room but the lecture halls. A constraint solver for this can be expressed in CHR as follows; the constraint diff(x,y) is a standard device indicating that x and y must be different (easily defined in CHR; left out for reasons of space). :- chr_constraint attends/2, in/2, reading/1. attends(St, programming) ==> in(St, lectHall1). attends(St, linguistics) ==> in(St, lectHall2). in(St, R1) \ in(St, R2) R1=R2. reading(St) ==> in(St, R), diff(R, lectHall1), diff(R, lectHall2).
A rule body may provide alternative suggestions for explaining different observations, so for example for student x to see student y, they must be in the
C ONSTRAINTS AND L OGIC P ROGRAMMING IN L ANGUAGE A NALYSIS
29
same room or they may see each other through a video call using skype. The two additional constraints and a rule are introduced to capture this as follows; the semicolon stands for Prolog’s disjunction (which is implemented by backtracking). :- chr_constraint can_see/2, skypes/2. can_see(St1,St2) ==> in(St1,R), in(St2,R) ; skypes(St1,St2), in(St1,R1), in(St2,R2), diff(R1,R2).
The CHR declarations shown so far define a constraint solver that can be used together with any parsing algorithm in order to collect knowledge from a discourse. Here we show a DCG in which the constraints are called directly from within the grammar rules. story --> [] ; s, [’.’], story. s --> np(St1), [sees], np(St2), {can_see(St1,St2)}. s --> np(St), [is,at], np(C), {attends(St,C)}. s --> np(St), [is,reading], {reading(St)}. np(peter) --> [peter]. np(mary) --> [mary]. np(jane) --> [jane]. np(programming) --> [the,programming,course]. np(linguistics) --> [the,linguistics,course].
Consider now a query phrase(story,[peter,· · · ]) for the analysis of the text Peter sees Mary. Peter sees Jane. Peter is at the programming course. Mary is at the programming course. Jane is reading. It yields the following answer, i.e., the final constraint store when the text has been traversed by the grammar. attends(mary,programming) attends(peter,programming) in(jane,X) in(mary,lectHall1) in(peter,lectHall1)
can_see(peter,jane) can_see(peter,mary) reading(jane) skypes(peter,jane) diff(X,lectHall1) diff(lectHall2,X)
Referring to the discussion of discourse interpretation as a special case of abduction, section 2.3 above, we may say that this constraint store is a knowledge base – or explanation – necessary for the discourse to be correctly produced. The variable written as “X” stands for Jane’s location which is not determined from the discourse; it is only known not to be one of the lecture halls. The example above displays an incremental analysis of the text, in which the knowledge learned up to a certain point is available for the analysis
30
C HAPTER T WO
of the next sentence. Furthermore, the use of constraint techniques delays choices that cannot be resolved currently, but may be resolved later when new knowledge is introduced. As an example of this methodology, we refer to Christiansen et al. (2007b), who applied DCG and CHR in a similar way to analyse use case text (use cases as applied in software development) for producing UML class diagrams. This involved an approach to pronoun resolution expressed as CHR rules based on which persons have been mentioned so far and in which distance from the pronoun under consideration. This paper uses CHR also to generalise properties mentioned for a specific person into properties for the class to which this person belongs. This combination of CHR with Prolog has been presented initially as a language called HYPROLOG (Christiansen and Dahl, 2005a) with special tools for declaration of abducible predicates, which in addition to the CHR constraint declarations shown above, also generates facilities of a weak form of negation of abducible predicates plus some other utilities. It includes also a notion of assumptions (Dahl et al., 1997) that are very much like abducibles, but are explicitly produced and possibly consumed; declarations of assumptions are also compiled into CHR.
2.5 CHR G RAMMARS A DCG as we demonstrated above together with CHR executes as a top-down parser that uses backtracking when examining different alternative parses. CHR itself can be used in a straightforward way for bottom-up parsing which, as is well known, is more robust to errors and less restrictive on the context-free backbone of the grammars that can be used (e.g. Aho et al., 1988). Christiansen (2005) has introduced a grammar notation, CHR Grammars, that is compiled into CHR analogously to the way that DCGs are compiled into Prolog. CHR Grammars feature different kinds of context dependent rules having an expressive power that goes far beyond DCGs and other logic programming-based grammar systems. The following example demonstrates how CHR can be used for bottom-up parsing. Each nonterminal in the grammar is represented as a constraint predicate with two additional attributes to indicate the location of each occurrence in the entire string to be analysed. Example 2.5.1. We consider a small language of which “Peter likes Mary” is a typical sentence. The languages has nonterminal symbols np, verb and
C ONSTRAINTS AND L OGIC P ROGRAMMING IN L ANGUAGE A NALYSIS
31
sentence. The string shown can be encoded by the following set of CHR constraints. token(0,1,peter), token(1,2,likes), token(2,3,mary)
Each single token or phrase recognised carries indices corresponding to the point immediately before, respectively after, the token or phrase in the input string. The lexical part of the grammar that classifies each single token is given by the following rules. token(N0,N1,peter) ==> np(N0,N1). token(N0,N1,mary) ==> np(N0,N1). token(N0,N1,likes) ==> verb(N0,N1).
The rule to recognise a sentence is a follows. np(N0,N1), verb(N1,N2), np(N2,N3) ==> sentence(N0,N3).
In order to analyse a text, a set of token constraints as shown above is entered as a query, and the rules will apply as many times as possible, and for this example leaving the following final constraint store, that describes the recognised phrases, including all subphrases; we do not repeat the token constraints but they will also be present. np(0,1) np(2,3)
verb(1,2) sentence(0,3)
The example may be varied using simplification rules instead, leading to the removal of intermediate nonterminals. However, when propagation rules are used in case of an ambiguous grammar, all the different possible parses will automatically be produced. Notice also that it is straightforward to add additional attributes to each constraint (=nonterminal) symbol to hold other interesting syntactic, semantic or other properties associated with a phrase. The use of additional CHR constraints and rules for abductive interpretation as demonstrated with DCGs in section 2.4 above can be used here, by calling constraints corresponding to abducibles in the right-hand sides of the rules. CHR Grammars support a wide range of grammatical patterns that can be translated into conditions on the position indices. This includes gaps between subphrases or requirements that certain subphrases must be present immediately before or after, or in an arbitrary distance, from the symbols being matched. For example, to express that a nonterminal a followed immediately by nonterminal b can be reduced into an ab if followed by a c in a distance between zero and ten positions, can be expressed in CHR in the following way.
32
C HAPTER T WO
a(N0,N1), b(N1,N2), c(N3,_) ==> N3 >= N2, N3 =< N2+10 | ab(N0,N2).
However, as these indices are tiresome to write and easy to get wrong, CHR Grammars offer a high-level notation, and the CHR Grammar compiler translates this into the right constraints, index variables and guards to form CHR rules as the one just shown. The CHR rule shown above can be written as the following CHR Grammar rule. a, b /- 0...10, c ::> ab.
The symbol written as three dots is a pseudo-nonterminal that can be given with or without limits for the length of the substring that it spans. The material to the right of the “/-” symbol indicates what is called a right context, and there is a similar marker “-\” for indicating left context. Such gaps are highly relevant for biological sequence analysis, e.g., for gene finding and protein structure prediction. Another of CHR Grammars’ features is parallel matching, indicated by an operator “$$”, so the following rule will recognise an a phrase as a special_a if it has a length between 10 and 20 and that a b has been recognised inside the substring spanned by a. a $$ 10...20 $$ ...,b,...
::>
special_a.
A detailed description of all options can be found in (Christiansen, 2005) and at the CHR Grammar website (Christiansen, 2002) that has a comprehensive users’ guide, several example grammars and source code that runs under the SWI and SICStus Prolog systems. The CHR grammar system includes the same utilities for abduction and assumptions as the HYPROLOG system explained at the end of section 2.4 above. CHR Grammars have been used for biological sequence analysis by, among others, Bavarian and Dahl (2006), for modelling various phenomena in natural language, e.g., (Aguilar-Solis and Dahl, 2004; Dahl, 2004). CHR without the CHR Grammar notation has been used for variants of the bottom-up parsing method for analysis of hieroglyph inscriptions (Hecksher et al., 2002) and for analysis of time expressions in pre-tagged biographic texts (van de Camp and Christiansen, 2012). The latter used multiple indices for each token and recognised unit to indicate its position in which specific document, paragraph, sentence and placement within that sentence. The Chinese Word Segmentation Problem was approached with CHR Grammars by Christiansen and Li (2011); see also chapter 11 by these authors in the present volume.
C ONSTRAINTS AND L OGIC P ROGRAMMING IN L ANGUAGE A NALYSIS
33
2.6 C ONCLUSION The use of constraints as devices in grammars and as programming tools have been discussed with an emphasis on the amalgamation into practical tools for language analysis. The programming language of Constraint Handling Rules, CHR, was demonstrated as a tool for representation and evaluation of semantic and other information being extracted from a text under analysis. It has been argued elsewhere by Christiansen and Dahl (2005b) that this approach may provide an integration of the semantic and pragmatic aspects of language analysis. The relation to abductive reasoning was shown, in that a straightforward use of constraints together with Definite Clause Grammars and CHR Grammars is an instance of abduction. We have shown only fairly simple examples, but it should be emphasised that constraint solvers that model very complex semantic domains can be written in CHR; see (Frühwirth, 2009) and the growing literature on applications of CHR. As an example of this, we may mention a solver written in CHR for handling different sorts of calendric expressions, including resolving relative time expressions (Christiansen, 2014a) from partial information.
B IBLIOGRAPHY Abdennadher, S. and Christiansen, H. (2000). An experimental CLP platform for integrity constraints and abduction. In Proceedings of FQAS2000, Flexible Query Answering Systems: Advances in Soft Computing series, pages 141–152. Physica-Verlag (Springer). Aguilar-Solis, D. and Dahl, V. (2004). Coordination revisited - a constraint handling rule approach. In C. Lemaître, C. A. R. García, and J. A. González, editors, IBERAMIA, volume 3315 of Lecture Notes in Computer Science, pages 315–324. Springer. Aho, A. V., Sethi, R., and Ullman, J. D. (1988). Compilers: Principles, Techniques and Tools. Addison-Wesley. Apt, K. (2003). Principles of Constraint Programming. Cambridge University Press. Bavarian, M. and Dahl, V. (2006). Constraint based methods for biological sequence analysis. Journal of Universal Computing Science, 12(11), 1500–1520.
34
C HAPTER T WO
Christiansen, H. (2002). CHR Grammar web site; released 2002. http://www.ruc.dk/~henning/chrg. Christiansen, H. (2005). CHR Grammars. Theory and Practice of Logic Programming, 5(4-5), 467–501. Christiansen, H. (2009). Executable specifications for hypothesis-based reasoning with Prolog and Constraint Handling Rules. J. Applied Logic, 7(3), 341–362. Christiansen, H. (2010). Logic programming for linguistics: a short introduction to prolog, and logic grammars with constraints as an easy way to syntax and semantics. TRIANGLE, 1, 31–64. Christiansen, H. (2014a). Constraint logic programming for resolution of relative time expressions. In A. Beckmann, E. Csuhaj-Varjú, and K. Meer, editors, Computability in Europe 2014, Lecture Notes in Computer Science. Springer. To appear. Christiansen, H. (2014b). Constraint programming for context comprehension. In P. Brézillon and A. Gonzalez, editors, Context in Computing. To appear. Christiansen, H. and Dahl, V. (2005a). HYPROLOG: A new logic programming language with assumptions and abduction. In M. Gabbrielli and G. Gupta, editors, ICLP, volume 3668 of Lecture Notes in Computer Science, pages 159–173. Springer. Christiansen, H. and Dahl, V. (2005b). Meaning in Context. In A. Dey, B. Kokinov, D. Leake, and R. Turner, editors, Proceedings of Fifth International and Interdisciplinary Conference on Modeling and Using Context (CONTEXT-05), volume 3554 of Lecture Notes in Artificial Intelligence, pages 97–111. Christiansen, H. and Li, B. (2011). Approaching the chinese word segmentation problem with CHR grammars. In CSLP 2011: Proc. 4th Intl. Workshop on Constraints and Language Processing, volume 134 of Roskilde University Computer Science Research Report, pages 21–31. Christiansen, H., Have, C. T., and Tveitane, K. (2007). Reasoning about use cases using logic grammars and constraints. In CSLP ’07: Proc. 4th Intl. Workshop on Constraints and Language Processing, volume 113 of Roskilde University Computer Science Research Report, pages 40–52.
C ONSTRAINTS AND L OGIC P ROGRAMMING IN L ANGUAGE A NALYSIS
35
Console, L., Dupré, D. T., and Torasso, P. (1991). On the relationship between abduction and deduction. Journal of Logic and Computation, 1(5), 661–690. Dahl, V. (2004). An Abductive Treatment of Long Distance Dependencies in CHR. In H. Christiansen, P. R. Skadhauge, and J. Villadsen, editors, Proceedings of the First International Workshop on Constraints Solving and Language Processing, volume 3438 of Lecture Notes in Computer Science, pages 17–31. Springer. Dahl, V., Tarau, P., and Li, R. (1997). Assumption grammars for processing natural language. In Proceedings of the Fourteenth International Conference on Logic Programming (ICLP), pages 256–270, Leuven, Belgium. Denecker, M. and Kakas, A. C. (2002). Abduction in logic programming. In A. C. Kakas and F. Sadri, editors, Computational Logic: Logic Programming and Beyond, volume 2407 of Lecture Notes in Computer Science, pages 402–436. Springer. Francez, N. and Wintner, S. (2012). Unification grammars. Cambridge University Press, New York, NY. Frühwirth, T. (2009). Constraint Handling Rules. Cambridge University Press. Gazdar, G. and Mellish, C. (1989). Natural Language Processing in Prolog: An Introduction to Computational Linguistics. Addison-Wesley Publishing Co., Reading, Massachusetts. Hecksher, T., Nielsen, S. T., and Pigeon, A. (2002). A CHRG model of the ancient Egyptian grammar. Unpublished student project report, Roskilde University, Denmark. Hobbs, J. R., Stickel, M. E., Appelt, D. E., and Martin, P. A. (1993). Interpretation as abduction. Artificial Intelligence, 63(1-2), 69–142. Jaffar, J. and Lassez, J.-L. (1987). Constraint logic programming. In POPL, Conference Record of the Fourteenth Annual ACM Symposium on Principles of Programming Languages, Munich, Germany, January 21-23, 1987, pages 111–119. Knuth, D. E. (1968). Semantics of context-free languages. Mathematical Systems Theory, 2(2), 127–145.
36
C HAPTER T WO
Maim, E. (1992). Abduction and constraint logic programming. In Proceedings of the 10th European Conference on Artificial Intelligence (ECAI), pages 149–153, Vienna, Austria. Mancarella, P., Terreni, G., Sadri, F., Toni, F., and Endriss, U. (2009). The ciff proof procedure for abductive logic programming with constraints: Theory, implementation and experiments. Theory and Practice of Logic Programming, 9(6), 691–750. Pereira, F. C. N. and Shieber, S. M. (1987). Prolog and Natural-Language Analysis, volume 10 of CSLI Lecture Notes Series. Center for the Study of Language and Information. Shieber, S. M. (1992). Constraint-Based Grammar Formalisms. MIT Press. van de Camp, M. and Christiansen, H. (2012). Resolving relative time expressions in Dutch text with Constraint Handling Rules. In D. Duchier and Y. Parmentier, editors, CSLP, volume 8114 of Lecture Notes in Computer Science, pages 166–177. Springer.
C HAPTER T HREE M ODEL - THEORETIC S YNTAX : P ROPERTY G RAMMARS , S TATUS AND D IRECTIONS P HILIPPE B LACHE , J EAN -P HILIPPE P ROST
3.1 I NTRODUCTION The question of the logical modelling of natural language is concerned with providing a formal framework, which enables representing and reasoning about utterances in natural language. The body of work in this area is organised around two different hypotheses, which yield significantly different notions of what the object of study is. Each of those two hypotheses is based on a different side of Logic: the proof-theoretic hypothesis, and the model-theoretic one. The proof-theoretic hypothesis, on the one hand, considers that natural language can be modelled as a formal language. It sets the syntax of the observed natural language at the syntax level of the modelling formal language. All the works based on Generative Grammar rely on this hypothesis. The model-theoretic hypothesis, on the other hand, considers that natural language, along all its dimensions including Syntax, must be modelled through the semantic level of Logic. Underlying are two fairly different notions of what natural language is — and is not, and what should — or not — be modelled. We introduce and compare the main characteristics of both the ProofTheoretic and the Model-Theoretic paradigms. We argue that representing the linguistic description of an utterance solely through a hierarchical syntactic structure is severely restrictive. We show evidence of these restric-
38
C HAPTER T HREE
tions through the study of specific problems. We then argue that a ModelTheoretic representation of Syntax (MTS) does not show those restrictions, and provides a more informative linguistic description than Proof-Theoretic Syntax. We give an overview of a specific framework for MTS, called Property Grammar (PG), which we use to illustrate our point. We show, in particular, how to rely on a graph as a linguistic representation, in order to address various language problems.
3.2 M ODEL T HEORY FOR M ODELLING NATURAL L ANGUAGE The Proof-Theoretic and the Model-Theoretic approaches to modelling Natural Language differ in scope of modelling, i.e. with regard to the observations being modelled. They also differ with regard to the nature of the linguistic knowledge being captured and represented. Proof-Theoretic Syntax With the proof-theoretic one, natural language is defined as the set L (G)1 of all the strings licensed by the grammar G. What is meant here by licensed is proven: all the strings in L (G) are those, and only those, which can be proven by a set of production rules from G. The proof itself captures the whole linguistic knowledge about any string s in L (G), in the form of a tree representation: the parse tree — or syntactic structure. Thus, a parse tree for the sentence s is merely a graphic (in the sense of the Graph Theory) representation of the proof that s ∈ L (G). This isomorphism between linguistic structure and proof of membership has strong consequences on the modelling scope, since anything that cannot be proven cannot be represented either. Therefore, and assuming that a grammar G is available, which captures all the observed canonical linguistic phenomena of human language, the set of all the objects being modelled under the proof-theoretic hypothesis is limited to the set of the grammatical strings, and only these ones. All ungrammatical strings are just out of scope, and no knowledge can be represented about them. This notion is very restrictive, for it does not account for the extreme variability of language usages, including non-canonical or even ill-formed productions. 1
More formally, the language L (G) is usually defined as the n-tuple L,C, S, G, where L is a lexicon (terminal vocabulary), C is a set of morpho-syntactic categories (non-terminal vocabulary), S ∈ C is a start symbol (for labelling the tree root), and G (the grammar) is a set of production rules.
M ODEL - THEORETIC S YNTAX : P ROPERTY G RAMMARS
39
Furthermore, the linguistic description being represented is heavily driven by Syntax, and does not (or rarely) account for other linguistic dimensions. Even when it does, the information on syntax is required in order for information on other dimensions to be represented. Recent works, on the contrary, propose to consider all different sources of information as interacting at the same level (see, in particular, works around Construction Grammar, e.g. (Goldberg, 2003; Sag, 2012)). Model-Theoretic Syntax The model-theoretic notion of language, on the other hand, does not show those limitations in scope. Here, no assumption is made as to which utterances ought to be described — hence covered by the model, and which ones ought not to be. All observed utterances get a linguistic description of their structure, irrespective of their well-formedness. The description takes the form of a set of grammar statements, each of them being either verified or not by the structure. All those grammar statements together, instantiated for a given utterance, constitute a constraint network. The grammaticality of an utterance is then defined with respect to this description: the utterance is grammatical if and only if all the statements are verified. Meanwhile, descriptions which would include failing statements may still exist — they are simply not deemed grammatical. Hence scopewise, any observed utterance may get a linguistic description. Another incentive of a model-theoretic representation of grammar is that it offers the possibility to interpret in different ways the knowledge at stake. For instance, in this chapter, we present two different (though not opposed) perspectives on what the representation of a linguistic structure should be like: the constructive perspective, and the descriptive perspective. The Constructive Perspective on MTS The constructive perspective considers that the constraint network instantiated for an utterance is self-sufficient for describing the linguistic knowledge about this utterance. The network comes in replacement of the conventional linguistic phrase and dependency structures. Yet, if required, those structures may be recovered by induction from the network. This perspective also makes use of the MT grammar for the parsing process, which, therefore, solely relies on the constraint grammar for building up the constraint network through an inference process. Section 3.3 elaborates further on various aspects of such a perspective on MTS. The Descriptive Perspective on MTS The descriptive perspective sees the constraint network as a complement of the conventional parse structure
40
C HAPTER T HREE
(phrase or dependency). The parse structure merely serves the purpose of the structure, in the sense of the Model Theory in Logic. According to it, the objects of study are models of theories expressed in a formal meta-language, where a theory is a set of statements specified in a formal language, and a model of a theory is a structure, which satisfies all the statements of that theory. Section 3.4 elaborates further on the Descriptive perspective on MTS. Either way (constructive or descriptive), the MT representation, inclusive of the constraint network, is much more informative than the sole parse structure, and allows for more exact reasoning on the corresponding utterance. The next two sections introduce those two perspectives.
3.3 T HE C ONSTRUCTIVE P ERSPECTIVE : A C ONSTRAINT N ETWORK FOR R EPRESENTING AND P ROCESSING THE L INGUISTIC S TRUCTURE The generative conception of grammar relies on the derivation process which, in turn, depends on a hierarchical representation of syntactic information. However, several works have shown the limits of such a representation. From a generative point-of-view, parsing an input corresponds to finding a set of derivation rules, which makes it possible to generate the surface realisation of this input. This conception of grammar relies then on a specific view of what language is: the set of surface forms that can be generated by the grammar. This conception is very restrictive for several reasons. One is the extreme variability of language usages, including non canonical or even ill-formed productions. Another is the fact that this view is purely syntactically driven: only syntax is taken into account here and when other sources of information such as prosody are considered (which is rarely the case), they are considered as “complementary” to syntax, giving syntax a preeminent position. Recent works propose on the contrary to consider all different sources of information as interacting at the same level (see in particular works around Construction Grammar (Goldberg, 2003; Sag, 2012)). Even though many linguistic theories now challenge this way of considering the relationship between language and grammar, most of them remain more or less based on the generative framework, in the sense that what we can call the context-free backbone still occupies a central position. Model-Theoretic Syntax (Pullum and Scholz, 2003; Pullum, 2007) proposes a paradigmatic shift, making it possible to escape from this framework. This section presents the main characteristics of these different concep-
M ODEL - THEORETIC S YNTAX : P ROPERTY G RAMMARS
41
tions of syntax. It describes more precisely the specific problems coming from the hierarchical conception of syntax, showing how it can constitute a severe limitation for linguistic description. We propose then an overview of a specific MTS framework, called Property Grammars, following this requirements. We precise formally the status of the constraints we use, and how, in this approach, a syntactic description comes to a graph. We explain in particular how it is possible to take advantage of such a representation in order to shift from the classical tree domain to a graph one.
3.3.1 G ENERATIVE -E NUMERATIVE
VS .
M ODEL -T HEORETIC
S YNTAX There are two different approaches in logic: one is purely syntactic and only uses the form of the formulas in order to demonstrate a theorem, the second is semantic and relies on formulas interpretation. The same distinction also holds for natural language syntax. A first approach (the syntactic one in logic) consists in studying sentence well-formedness. The problem consists there in finding a structure adequate to the input. In this case, grammatical information is then represented by means of a set of rules, the syntactic structure representing the set of rules used during parsing. An alternative approach consists in studying directly the linguistic properties of the sentence, instead of building a structure. Pullum and Scholz (2001) call these approaches respectively Generative Enumerative Syntax (GES) and Model-Theoretic Syntax (MTS). The first corresponds to the generative theories, it has been extensively experimented. The latter still remains less studied and only a few works belong to this paradigm (Pullum, 2007). One of the reasons is that generativity has been for years almost the unique view for formal syntax and it is difficult to move from this conception to a different one. In particular, one of the problem comes from the fact that all approaches, even those in the second perspective, still rely on a hierarchical (tree-like) representation of syntactic information. The generative conception of syntax relies on a particular relation between grammar and language: a specific mechanism, derivation, makes it possible to generate a language from a grammar. This basic mechanism can be completed with other devices (transformations, moves, feature propagation, etc.) but in all cases constitutes the core of all generative approaches. In such case, grammaticality consists in finding a set of derivations between the start symbol of the grammar and the sentence to be parsed. As a side effect, a derivation step coming to a local tree, it is possible to build a syntac-
42
C HAPTER T HREE
tic structure, represented by a tree. It is then possible to reduce in a certain sense the question of grammaticality to the possibility of building a tree. This reminder seems to be trivial, but it is important to measure its consequences. The first is that grammaticality is reduced, as it has been noticed by Chomsky (1975), to a boolean value: true when a tree can be built, false otherwise. This is a very restrictive view of grammaticality, as it also has been noticed by Chomsky (1975) (without proposing a solution), which forbids a finer conception, capable of representing in particular a grammaticality scale (also called gradience, see (Keller, 2000) or (Aarts, 2007)). This generative conception of syntax is characterised as being enumerative (see (Pullum and Scholz, 2001) in the sense that derivation can be seen as an enumeration process, generating all possible structures and selecting them by means of extra constraints (as it is typically the case in the Optimality Theory, see (Prince and Smolensky, 1993)). Model Theoretic Syntax proposes an alternative view ((Blackburn et al., 1993; Cornell and Rogers, 2000; Pullum and Scholz, 2001)). In this conception, a grammar is a set of assessments, the problem consists in finding a model into a domain. From a logical perspective, generative approaches rely on a syntactic conception in the sense that parsing consists in applying rules depending on the form of the structures generated at each step. For example, a nonterminal is replaced with a set of constituents. On the opposite, modeltheoretic approaches rely on a semantic view in which parsing is based on the interpretation (the truth values) of the statements of the grammar. A grammar in MTS is a set of statements or, formally speaking, formulas. Each formula describes a linguistic property; its interpretation consists in finding whether this statement is true or false for a given set of values (the universe of the theory in logical terms). When a set of values satisfies all assessments of the grammar (in other words when the interpretation of all the formulas for this set of values is true), then this set is said to be a model. As far as syntax is concerned, formulas indicate relations between categories or, more precisely, between descriptions of categories. These descriptions correspond to the specification of a variable associated with several properties: they can be seen as formulas. For example, given K a set of categories, a description of a nominative noun comes to the formula: ∃x[cat(x, N) ∧ gend(x, masc)]
(3.1)
A category can be described by a more or less precise description, according to the number of conjuncts. A grammatical statement is a more complex formula, adding to the categories descriptions over relations. For
M ODEL - THEORETIC S YNTAX : P ROPERTY G RAMMARS
43
example, a statement indicating that a determiner is unique within a noun phrase comes to the formula: [cat(x, Det) ∧ cat(y, Det) → x ≈ y]
(3.2)
Parsing Concretely, when parsing a given input, a set of categories is instantiated, making it possible to interpret all the atomic predicates corresponding to the features (category, gender, number, etc.), making it possible to interpret in turn the complex predicates formed by the grammar statements. In this perspective, we say that an instantiated category is a value and finding a model consists in finding a set of values satisfying all the grammatical statements. For example, the set of words “the book” makes it possible to instantiate two categories with labels Det and N (these labels representing the conjunction of features). Intuitively, we can say that the set of values {Det, N} is a model for the category NP. Various parsing strategies have been implemented in line with that approach. Balfourier et al. (2002) implement an incremental strategy, where the choice of a suitable assignment relies on a heuristic of shortest possible span. VanRullen (2005) implements a multi-graph representation of the constraint network, and a strategy, which allows different granularities of solution, from chunks to deep parses. Prost (2008) revisits the CKY parsing algorithm, based on dynamic programming techniques, in order to optimise the proportion of properties (instead of probabilities) violated by the solution parse. More recently, Duchier et al. (2010) explore the possibility to model the parsing problem according to a model-theoretic grammar as a Constraint Optimisation Problem. Finding a model is, then, completely different from deriving a structure. As stressed by Pullum and Scholz (2001), instead of enumerating a set of expressions, an MTS grammar simply states conditions on these expressions. Model-Theoretic Syntax (hereafter MTS) moves then from a classical tree domain to a graph domain for the representation of syntactic information. We show how constraints can be an answer to this problem: first, they can represent all kinds of syntactic information and second, they constitute a system, where all constraints are at the same level and evaluated independently from each others (no order is enforced on the constraints for evaluation).
3.3.2 G ENERATIVITY
AND HIERARCHICAL STRUCTURES
Geoffrey Pullum, during a lecture at ESSLLI in 2003, explained that “Model Theoretic Syntax is not Generative Enumerative Syntax with con-
44
C HAPTER T HREE
straints”. In other words, constraints are not to be considered only as a control device (in the DCG sense for example) but have to be part of the theory. Some theories (in particular HPSG) try to integrate this aspect. But it remains an issue both for theoretical and technical reasons. The problem comes in particular from the fact that usually the dominance relation plays a specific role in the representation of syntactic information: dominance structures have first to be built before verifying other kinds of constraints. This is a problem when no such hierarchical relations can be identified. Moreover, we know since GPSG that dominance constitutes only a part of syntactic information to be represented in phrase-structure approaches, not necessarily to be considered as more important than others. Syntactic information is usually defined, especially in generative approaches, over tree domains. This is due to the central role played by the notion of dominance, and more precisely by the relation existing between the head and its direct ancestor. In theories like HPSG (see (Sag et al., 2003)), even though no rules are used (they are replaced with abstract schemata), this hierarchical organisation remains at the core of the system. As a consequence, constraints in HPSG can be evaluated provided that a tree can be built: features can be propagated and categories can be instantiated only when the hierarchical skeleton is known. This means that one type of information, dominance, plays a specific role in the syntactic description. However, in many cases, a representation in terms of tree is not adapted or even not possible. The following example illustrate this situation. It present the case of a cleft element adjunct of two coordinated verbs. S Cleft C’est
wh-S
avec
colère que
NP Jean
C’est It is
avec with
colère anger
que that
Jean Jean
VP
Conj
V
NP
a posé
son livre
a posé put
son livre his book
et
et and
VP V
NP
quitté
la salle
quitté left
la salle the room
Arrows in this figure shows in what sense the tree fails in representing the distribution of the cleft element onto the conjuncts. Moreover, there also exist other kinds of relations, for example the obligatory co-occurrence in French between “c’est” and “que”.
M ODEL - THEORETIC S YNTAX : P ROPERTY G RAMMARS
45
The second example, presented in the following structure, illustrates the fact that in many cases, it is not possible to specify clearly what kind of syntactic relation exists between different parts of the structure:
NP
NP
le piano
les doigts
Le piano The piano
les doigts the fingers
S NP
VP
ça
a beaucoup d’importance
ça it
a beaucoup d’importance has a lot of importance
This example illustrates a multiple detachment construction. In this case, detached element are not directly connected by classical syntactic relations to the rest of the structure: the two relations indicated by arrows are dependencies at the discourse level (plus an anaphoric relation). Many other examples can be given, illustrating this problem: it is not always possible to give a connected structure on the basis of syntactic relations. Moreover, when adding other kinds of relations, the structure is not anymore a tree. This conception has direct consequences on the notion of grammaticalness. First, building a tree being a pre-requisite, nothing can be said about the input when this operation fails. This is the main problem with generative approaches that can only indicate whether or not an input is grammatical, but do not explain the existence of levels of grammaticality (the gradience phenomenon, see (Keller, 2000; Pullum and Scholz, 2001)). A second consequence concerns the nature of linguistic information, that is typically spread over different domains (prosody, syntax, pragmatics, and related domains such as gestures, etc.). An input, in order to be interpreted, does not necessarily need to receive a complete syntactic structure. The interpretation rather consists in bringing together pieces of information coming from these different domains. This means that interpreting an input requires to take into account all the different domains and their interaction, rather than building a structure for each of them and then calculating their interface. In this perspective, no specific relation plays a more important role than others. This is also true within domains: as for syntax, the different properties presented in the previous section has to be evaluated independently from the others.
46
C HAPTER T HREE
3.3.3 T HE P ROPERTY G RAMMAR F RAMEWORK A seminal idea in GPSG (see (Gazdar et al., 1985)) was to dissociate the representation of different types of syntactic information: dominance and linear precedence (forming the ID/LP formalism), but also some other kinds of information stipulated in terms of co-occurrence restriction. This proposal is not only interesting in terms of syntactic knowledge representation (making it possible to factorise rules, for example), but also theoretically. Remind that one of the main differences between GES and MTS frameworks lies in the relation between grammar and language: MTS approaches try to characterise an input starting from available information, with no need to “overanalyse”, to re-build (or infer) information that is not accessible from the input. For example, GES techniques have to build a connex and ordered structure, representing the generation of the input. On the opposite, nothing in MTS imposes to build a structure covering the input, which makes it possible for example to deal with partial or heterogeneous information. Property Grammars (see (Blache, 2005)) systematises the GPSG proposal in specifying these different types. More precisely, they propose to represent separately the following properties: • Constituency: set of all the possible elements of a construction. • Uniqueness: constituents that cannot be repeated within a construction. • Linearity: linear order. • Obligation: set of obligatory constituents, one of them (exclusively to the others) being realised. • Requirement: obligatory co-occurrence between constituents within a construction. • Exclusion: impossible co-occurrence between constituents within a construction. This list is not closed and other types of information can be added. For example, dependency (syntactico-semantic relation between a governor and a complement), or adjacency (juxtaposition of two elements). We focus in this paper on the 6 basic relations indicated above. These relations makes it
M ODEL - THEORETIC S YNTAX : P ROPERTY G RAMMARS
47
possible to represent most of the syntactic information. We call these relations “properties”, they can also be considered as constraints on the structure. We adopt in the remaining of this paper the following notations: x, y (lower case) represent individual variables; X,Y (upper case) are set variables. We note C(x) the set of individual variables in the domain assigned to the category C (see (Backofen et al., 1995) for more precise definitions). We use the set of binary predicates for immediate domination (), linear precedence (≺) and equality (≈). Let us now define more precisely the different properties. The first one (constituency) implements the classical immediate dominance relation. The others can be defined as follow: • Const(A, B) : (∀x, y)[(A(x) ∧ B(y) → x y] This is the classical definition of constituency, represented by the dominance relation: a category B is constituent of A stipulates that there is a dominance relation between the corresponding nodes. • Uniq(A) : (∀x, y)[A(x) ∧ A(y) → x ≈ y] If one node of category A is realised, there cannot exists other nodes with the same category A. Uniqueness stipulates constituents that cannot be repeated in a given construction. • Prec(A, B) : (∀x, y)[(A(x) ∧ B(y) → y ≺ x)] This is the linear precedence relation as proposed in GPSG. If the nodes x and y are realised, then y cannot precedes x • Oblig(A) : (∃x)(∀y)[A(x) ∧ A(y) → x ≈ y] There exists a node x of category A and there is no other node y of the same category. An obligatory category is realised exactly once. • Req(A, B) : (∀x, y)[A(x) → B(y)] If a node x of category A is realised, a node y of category B has too. This relation implements co-occurrence, in the same way as GPSG does. • Excl(A, B) : (∀x)( ∃y)[A(x) ∧ B(y)]
48
C HAPTER T HREE
When x exists, there cannot exist a sibling y. This is the exclusion relation between two constituents. What is interesting in this representation of syntactic information is that all relations are represented independently form each others. They all are assessments in the MTS sense, and they can be evaluated separately (which fits well with the non-holistic view of grammar information proposed by Pullum). In other words there is no need to assign the dominance relation a specific role: this is one information among others, what is meaningful is the interaction between these relations. More precisely, a set of categories can lead to a well-formed structure when all these assessments are satisfied, altogether. We do not need first to build a structure relying on dominance and then to verify other kind of information represented by the rest of the relations. In other words, in this approach, “MTS is not GES with constraints” (Pullum and Scholz, 2003). Parsing with Property Grammar Concretely, when taking into consideration a set of categories (an assignment), building the syntactic structure comes to evaluating the constraint system for this specific assignment, in order to infer new categories and build up the parse structure. The result of the evaluation indicates whether or not the assignment corresponds to a well-formed list of constituents. For example, given two nodes x and y, if they only verify a precedence relation, nothing else can be said. But when several other properties such as requirement, uniqueness, constituency are also satisfied, the assignment {x, y} becomes a model for an upper-level category. For example, if we have x and y such that Det(x) and N(y), this assignment verifies precedence, uniqueness, constituency and requirement properties. This set of properties makes it possible to characterise a NP. At the opposite, if we take x and y such that Det(x) and Adv(y), no constraint involving both constituents belong to the system: they do not constitute a model, and no new category can be inferred. In terms of representation, unlike the classical approaches, syntactic information is not represented by means of a tree (see (Huddleston and Pullum, 2002)), but with a directed labelled graph. Nodes are categories and edges represent constraints over the categories: dominance, precedence, requirement, etc. A non-lexical category is described by a set of constraints, that are relations between its constituents. It is possible to take under consideration only one type of property (in other words one type of relation): this comes to extract a subgraph from the total one. For example, one can consider only constituency properties. In this case, the corresponding sub-
M ODEL - THEORETIC S YNTAX : P ROPERTY G RAMMARS
49
graph of dominance relations is (generally) a tree. But what is needed to describe precisely an input is the entire set of relations. In the following, we represent the properties with the set of relations noted ⇒ (requirement), ⊗ (exclusion), ◦ (uniqueness), (constituency), ↑ (obligation), ≺ (precedence). A Property Grammar graph (noted PGgraph) is a tuple of the form: G = W, ⇒, ⊗, ◦, , ↑, ≺, θ in which W is the set of nodes, θ the set of terminal nodes. A model is a pair G,V where V is a function from W to Pow(W). We describe the use of such graphs in section 3.5.
3.4 T HE D ESCRIPTIVE P ERSPECTIVE : A C ONSTRAINT N ETWORK FOR C OMPLETING THE L INGUISTIC S TRUCTURE According to the model-theoretic hypothesis, human language is represented on the semantic level of Logic: the objects of study are models of theories expressed in a formal language, where a theory is a set of statements specified in a formal meta-language, and a model of a theory is a structure, which satisfies all the statements of that theory. Hence, applied to natural language: • a theory is a set of grammar statements, specified by a conjunction
Φ = i φi , where every atom φi is a logical formula, which puts elements of the structure in a relationship ; • a structure is a linguistic parse structure (e.g. phrase structure, dependency structure, or both). A grammar is, then, a conjunctive formula, parameterised by the structure, and a theory is an instance of the grammar for a given structure. For instance, for a domain of phrase structures, the φi are relations, which hold on constituents (e.g. In a Noun Phrase in English, the Determiner precedes the Noun). Duchier et al. (2009) formulate a Model-Theoretic semantics for Property Grammar along these lines.
50
C HAPTER T HREE
Model checking We first give a few definitions, in order to set the notations in use in the following. Let S be a set of words in the target language, and L a set of labels, which denote morpho-syntactic categories; a lexicon is then a subset V ⊂ L × S (which implicitly assumes that the terminals are POS-tagged words). Let PL be the set of all the possible properties on L ; a PG grammar Φ is specified by a pair (PG , LG ), with PG ⊆ PL and LG ⊆ V . Let τ : s be a (phrase structure) tree decorated with labels in L , and whose surface realisation is the string of words s; let Φs be an instantiation of Φ for τ : s; τ : s is a model for Φs iff τ : s makes Φs true. We denote by τ : s |= Φs the satisfaction of Φs by τ : s. The instantiation Φs is also called the constraint network associated with τ : s for the grammar Φ. Definition 1 (Grammaticality). τ :s is grammatical with respect to the grammar Φ iff τ : s |= Φs
Since Φs = i φis , Definition 1 means that every instance of property φis of Φs for the sentence s must be satisfied for s to be deemed grammatical with respect to the grammar Φ. The model checking process involves: • instantiating the grammar Φ for the parse tree τ : s, • building up the corresponding constraint network Φs , and • checking the truth of every atom φis . Processing-wise, the existence of a linguistic structure is required prior to checking it against a grammar — which involves constructing the constraint network. Figure 3.1 exemplifies a phrase structure deemed ungrammatical through model checking. Under such an interpretation, the linguistic structure plays the role of a semantic object, which makes a theory true (in which case the structure is deemed a model of the theory), or not. That is, the parse structure is the object, which makes the grammar (as a conjunction of statements) true, or not. Here, the conventional parsing process, seen as the generation process of the parse structure, is kept separate from the model checking process. Language cover As far as knowledge representation is concerned, a formal framework for MTS shows the following properties: 1. the independence of the grammar statements allows for the definition of non-classical satisfaction, whereby a given structure only partially satisfies a subset of Φ, and violates its complement;
M ODEL - THEORETIC S YNTAX : P ROPERTY G RAMMARS
51
419:
En_effet, sept projets sur quatorze, soit la moitié, ont un financement qui n’ est toujours pas assuré et dont le calendrier n’ est pas_encore arrêté. ( (SENT (ADV En_effet) (PUNC ,) (NP (DET sept) (NC projets) (PP (P sur) (NP (NC quatorze))) (PUNC ,) (COORD (CC soit) (NP (DET la) (NC moitié)))) (PUNC ,) (VN (V ont)) (NP (DET un) (NC financement) (Srel (NP (PROREL qui)) (VN (ADV n’) (V est) (AdP (ADV toujours) (ADV pas)) (VPP assuré)))) (COORD (CC et) (Sint (NP (NC dont) (DET le) (NC calendrier)) (VN (ADV n’) (V est) (ADV pas_encore) (VPP arrêté)))) (PUNC .)))
Figure 3.1: Example of parse deemed ungrammatical through model checking
2. how structures are generated is not specified in the grammar. Property 1 means that any structure may be represented in the framework, whether grammatically satisfactory or not. The syntax of non-canonical language can, thus, be modelled by a structure, which only loosely meets the grammar. Property 2 means that processing-wise, the generation of candidate-model structures is formally distinct from the grammar check. It opens all kinds of perspectives with regard to the parsing process. We have already seen in section 3.3 that the entire parsing process, including the generation of the structure, may be driven by the PG grammar itself. Another option, in line with the model theory, is to consider that the generation of the candidate structures is not of concern to the theory. In this case, different options are available. Generation of model-candidates Since a model-theoretic representation is independent from any processing aspects regarding the generation of candidate structures, the strategy for generating them may be conceived separately from model checking. Although an inference process may be designed in order to construct structures on the sole basis of the constraint grammar (see the Constructive perspective in section 3.3, and for instance
52
C HAPTER T HREE
(Maruyama, 1990; Balfourier et al., 2002; Prost, 2008; Duchier et al., 2010) for parsing strategies), nothing makes it compulsory. It is, therefore, possible, for instance, to check likely structures generated by a probabilistic parser against the MT grammar. Note, as well, that the type of linguistic structure concerned by a ModelTheoretic representation may take different forms, depending on the formal framework in use. The seminal work of Maruyama (1990), for instance, is concerned with dependency structure, while Optimality Theory (Prince and Smolensky, 1993) is more used for describing phonological structures. As for Property Grammar, the framework is essentially used with phrase structures, though a few works rely on it for multimodal annotation, or biological sequences analysis. Structure enrichment for solving language problems A major incentive of the descriptive perspective on MTS, and PG in particular, stands in its potential alliance with other structures and processes in order to address a variety of language problems. Grammaticality judgement is one of those. The problem is encountered in contexts such as statistical machine translation (Zwarts and Dras, 2008), summarisation (Wan et al., 2005), or second language learning (Wong and Dras, 2011), where the grammaticality of a candidate solution is a non-trivial decision problem. A common approach to address it is to train a statistical classifier (Foster et al., 2008; Wong and Dras, 2011; Wagner, 2012) in order to determine the grammaticality of a candidate around a threshold of likelihood. The use of a constraint network, associated with the linguistic structure, alleviates the decision through exact model checking (Prost, forthcoming). The graph-based representation also shares properties with the graph structure of semantic networks, such as that of Joubert and Lafourcade (2008). The combination of the two is expected to open avenues of research with respect to the Syntax-Semantics interface. The solving of problems concerned with both dimensions, such as grammar error detection, should be eased by the resulting enriched structure (Prost and Lafourcade, 2001).
3.5 G RAMMATICALITY J UDGEMENT Classically, syntactic information is usually represented in terms of decorated ordered trees (see (Blackburn et al., 1993; Blackburn and MeyerViol, 1994)). In this approach, tree admissibility relies on a distinction between dominance relation (that gives the structure) and other constraints on
M ODEL - THEORETIC S YNTAX : P ROPERTY G RAMMARS
53
the tree such as precedence, co-occurrence restriction, etc. In our view, all relations have to be at the same level. In other words, dominance does not play a specific role: co-occurrence restriction for example can be expressed and evaluated independently from dominance. This means that each property represents a relation between nodes, dominance being one of them. When taking into consideration the entire set of relations, the structure is then a graph, not a tree. More precisely, each property specifies a set of relations between nodes: precedence relations, co-occurrence relations, dominance relations, etc. It can be the case that the dominance subset of relations (a subgraph of the graph of relations), is a tree, but this can be considered as a side effect. No constraint for example stipulates a connexity restriction on the dominance subgraph. In PG, a grammar is then conceived as a constraint system, corresponding to a set of properties as defined above. Parsing an input consists in finding a model satisfying all the properties (or more precisely, the properties involving the categories of an assignment). In this case, the input is said to be grammatical, its description being the set of such properties. However, it is also possible to find models that satisfy partially the system. This means that some constraints can be violated. If so, the input is not grammatical, but the set of satisfied and violated properties still constitute a good description. We call such set a characterisation. This notion replaces that of grammaticalness (which is a particular case of characterisation in which no property is violated). The following example (Figure 3.2) illustrates the case of an assignment A={NP, Det, Adj, N}. All properties are satisfied, each relation forms an labelled edge, the set of relations being a graph. A phrase is characterised when it is connected to a graph of properties. NP
J rrr U JJJ r JJ% r r y ↑ Det M 73 N MM ≺ rr8 ≺MMM ⇒ ≺r r & rr
Ad j
Figure 3.2: Tree model for a NP constituent
Figure 3.3 shows a more complete graph, corresponding to an entire sentence. Again, no relation in this graph plays a specific role. The information comes from the fact that this set of categories are linked by several relations. The set of relations forms a description: it tells us that linearity, requirement,
54
C HAPTER T HREE
S K KK ss s KK ⇒ s K% s ys v / V P qL ↑ ≺ NPU M M LL q M q q MMM LL L% M& xqqq ↑ 3r87 N Det M Aux 6V MM ≺ ≺ rr ≺MMM ⇒ ≺ rrr &
Ad j
Figure 3.3: Tree model for a S constituent
obligation, constituency properties are satisfied, they characterise an S. Theoretically, each node can be connected to any other node. Nothing forbids for example to represent a relation of some semantic type between the adjective and the verb nodes. By another way, when taking from this graph constituency relations only, we obtain a dominance tree: S NP
VP
Finally, insofar as a property can be satisfied or violated in a characterisation, we have to label relations with their type and their interpretation (true or false, represented by + or -). The following example presents a graph for the assignment A={NP, Adj, Det, N}, in which the determiner has been realised after the adjective.
ysss Ad j N NN
ss
NP
U JJ
+
+
≺+
NNN &
≺−
JJJ $ 83 NI r r r ≺+ rr ⇒+ +
↑+
Det
In this graph, all constraints but the precedence between Det and Adj have been satisfied, the corresponding relations being labelled with +. As a side effect, representing information in this way also constitutes a possibility to rank the inputs according to a grammaticalness evaluation. We present in this section how to use characterisations in order to quantify
M ODEL - THEORETIC S YNTAX : P ROPERTY G RAMMARS
55
such information. The idea (see (Blache et al., 2006)) consists in analysing the form of the graph and its density, taking into account the interpretation of the relations. More precisely the method consists in calculating an index from the cardinality of P+ and P− , (respectively the set of satisfied and violated properties). Let’s call N + and N − the cardinality of these sets. The first indication that can be obtained is the ratio of satisfied properties with respect to the total number of evaluated properties E. This index is called + the Satisfaction ratio, calculated as SR = NE . Going further, it is also possible to give an account of the coverage of the assignment by means of the ratio of evaluated properties with respect to the total number of properties T describing the category in the grammar. This coefficient is called Completeness coefficient: CC = ET . A Precision Index can in turn be proposed, integrating these two previous information: PI = SR+CC 2 . Finally, a general index can be proposed, taking into consideration the different indexes of all the constituents. For example, a phrase containing only well-formed constituents has to be assigned a higher value than one containing ill-formed ones. This is done by means of the Grammaticalness Index, d being the number of embedded constructions Ci : if d = 0 then ∑d
GI(C )
GI = PI, else GI = PI × i=1 d i . In reality, these different figures need to be balanced with other kind of information. For example, we can take into consideration the relative importance of constraint types in weighting them. Also, the influence of SR and CC over the global index can be modified by means of coefficients. This possibility of giving a quantified estimation of grammaticalness directly comes from the possibility of representing syntactic information in a fully constraint-based manner, that has been made possible thanks to the MTS view of grammar.
3.6 C ONCLUSION The representation of syntactic information by means of constraints, as described in this paper, shows several advantages. First, it provides an elegant computational framework for MTS, where derivation does not play any role. In the Constructive perspective on MTS, the shift from generative to model-based conception of natural language syntax then becomes concrete: constraint satisfaction completely replaces derivation. This evolution becomes possible provided that we abandon a strict hierarchical representation of syntax in which dominance plays a central role.
56
C HAPTER T HREE
As a consequence, such a fully constraint-based approach offers the possibility to replace ordered trees domain with that of constraint graphs. This is not only a matter of representation, but has deep consequences on theory itself: different types of information is represented by different relations, all of them being at the same level. The Descriptive perspective on MTS, where the constraint networks complements and enriches the conventional linguistic structure, also shows interesting properties. Through the provision of a finer-grained description of the linguistic properties of a sentence than the sole parse structure, it alleviates the address of various language problems. Grammaticality judgement, for instance, or the interaction between the Syntax and Semantics dimensions, should benefit from such a graph-based representation. The Property Grammar framework described in this paper represents the possibility of an actual MTS implementation, in which constraints are not only a control layer over the structure, but represent the structure itself: MTS is not GES plus constraints, provided that dominance is not represented separately from other information.
B IBLIOGRAPHY Aarts, B. (2007). Syntactic gradience: the nature of grammatical indeterminacy. Oxford University Press. Backofen, R., Rogers, J., and Vijay-Shanker, K. (1995). A first-order axiomatization of the theory of finite trees. Journal of Logic, Language, and Information, 4(1), 5–39. Balfourier, J.-M., Blache, P., and Rullen, T. V. (2002). From Shallow to Deep Parsing Using Constraint Satisfaction. In Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002). Blache, P. (2005). Property grammars: A fully constraint-based theory. In H. C. et al., editor, Constraint Solving and Language Processing, volume LNAI 3438. Springer. Blache, P., Hemforth, B., and Rauzy, S. (2006). Acceptability prediction by means of grammaticality quantification. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 57–64, Sydney, Australia. Association for Computational Linguistics.
M ODEL - THEORETIC S YNTAX : P ROPERTY G RAMMARS
57
Blackburn, P. and Meyer-Viol, W. (1994). Linguistics, logic and finite trees. Bulletin of the IGPL, 2, 3–31. Blackburn, P., Gardent, C., and Meyer-Viol, W. (1993). Talking about trees. In Proceedings of the Sixth Conference of the European Chapter of the Association for Computational Linguistics (EACL’93), pages 21–29. Chomsky, N. (1975). The Logical Structure of Linguistic Theory. Plenum Press. Cornell, T. and Rogers, J. (2000). Model Theoretic Syntax. In C. L. LaiShen and R. Sybesma, editors, The GLOT International State-of-theArticle Book 1, pages 177–198. Mouton de Gruyter, Berlin. Duchier, D., Prost, J.-P., and Dao, T.-B.-H. (2009). A model-theoretic framework for grammaticality judgements. In Proceedings of the 14th International Conference on Formal Grammar (FG 2009), pages 17–30, Bordeaux, France. Duchier, D., Dao, T.-B.-H., Parmentier, Y., and Lesaint, W. (2010). Property grammar parsing seen as a constraint optimization problem. In Proceedings of the 15th International Conference on Formal Grammar (FG 2010), pages 82–96, Copenhagen, Denmark. Foster, J., Wagner, J., and van Genabith, J. (2008). Adapting a WSJ-Trained Parser to Grammatically Noisy Text. In Proceedings of ACL-08: HLT, Short Papers, pages 221–224, Columbus, Ohio. Association for Computational Linguistics. Gazdar, G., Klein, E., Pullum, G., and Sag, I. (1985). The Logic of Typed Feature Structures. Blackwell. Goldberg, A. (2003). Constructions: A new theroretical approach to language. Trens in Cognitive Sciences, 7(5), 219–224. Huddleston, R. and Pullum, G. K. (2002). The Cambridge Grammar of the English Language. Cambridge University Press. Joubert, A. and Lafourcade, M. (2008). Evolutionary basic notions for a thematic representation of general knowledge. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2008), pages 305–309, Marrakech, Morocco.
58
C HAPTER T HREE
Keller, F. (2000). Gradience in Grammar - Experimental and Computational Aspects of Degrees of Grammaticality. Ph.D. thesis, University of Edinburgh. Maruyama, H. (1990). Structural Disambiguation with Constraint Propagation. In Proceedings of the 28th Annual Meeting of the Association for Computational Linguistics, pages 31–38, Pittsburgh, Pennsylvania, USA. Association for Computational Linguistics. Prince, A. and Smolensky, P. (1993). Optimality Theory: Constraint Interaction in Generatire Grammar. Technical report, TR-2, Rutgers University Cognitive Science Center, New Brunswick, NJ. Prost, J.-P. (2008). Modelling Syntactic Gradience with Loose Constraintbased Parsing. Ph.D. thesis, Macquarie University, Sydney, Australia, and Université de Provence, Aix-en-Provence, France (cotutelle). Prost, J.-P. and Lafourcade, M. (2001). Pairing Model-Theoretic Syntax and Semantic Network for Writing Assistance. In 6th International Workshop on Constraint Solving and Language Processing (CSLP’11), pages 56– 68, Karlsruhe, Germany. Pullum, G. (2007). The evolution of model-theoretic frameworks in linguistics. In J. Rogers and S. Kepser, editors, Proceedings of Model-Theoretic Syntax at 10 Workshop, pages 1–10, ESSLLI, Dublin, Ireland. Pullum, G. and Scholz, B. (2001). On the Distinction Between ModelTheoretic and Generative-Enumerative Syntactic Frameworks. In P. de Groote, G. Morrill, and C. Rétoré, editors, Logical Aspects of Computational Linguistics: 4th International Conference, number 2099 in Lecture Notes in Artificial Intelligence, pages 17–43, Berlin. Springer Verlag. Pullum, G. K. and Scholz, B. C. (2003). Model-Theoretic Syntax Foundations - Linguistic Aspects. Draft; ask for authors’ written consent prior to citation or quotation. Sag, I. (2012). Sign-based construction grammar: An informal synopsis. In H. Boas and I. Sag, editors, Sign-Based Construction Grammar, pages 39–170. CSLI. Sag, I., Wasow, T., and Bender, E. (2003). Syntactic Theory. A Formal Introduction. CSLI.
M ODEL - THEORETIC S YNTAX : P ROPERTY G RAMMARS
59
VanRullen, T. (2005). Vers une analyse syntaxique à granularité variable. Ph.D. thesis, Université de Provence, Aix-Marseille, France. Wagner, J. (2012). Detecting Grammatical Errors with Treebank-Induced, Probabilistic Parsers. Ph.D. thesis, Dublin City University, Dublin, Ireland. Wan, S., Dras, M., Dale, R., and Paris, C. (2005). Towards Statistical Paraphrase Generation: Preliminary Evaluations of Grammaticality. In Proceedings of The 3rd International Workshop on Paraphrasing (IWP2005), pages 88–95, Jeju Island, South Korea. Wong, S.-M. J. and Dras, M. (2011). Exploiting parse structures for native language identification. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1600– 1610, Edinburgh, Scotland, UK. Association for Computational Linguistics. Zwarts, S. and Dras, M. (2008). Choosing the Right Translation: A Syntactically Informed Classification Approach. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 1153–1160, Manchester, UK.
C HAPTER F OUR C ONSTRAINTS IN O PTIMALITY T HEORY: P ERSONAL P RONOUNS AND P OINTING H ELEN DE H OOP
This chapter introduces the framework of Optimality Theory (OT) in order to show how interacting and (potentially) conflicting constraints of various nature can be used to analyse the use and interpretation of personal pronouns and pointing in context. I will discuss recent results of research on personal pronouns and pointing signs and gestures and present OT analyses of the data, following Zwets (2014) and de Schepper (2013). Also, I will address the semantic distinction between first and second person pronouns on the one hand and third person pronouns on the other.
4.1 I NTRODUCTION The framework of Optimality Theory is rooted in neural network modelling, which has optimisation as a core concept (cf. (Prince and Smolensky, 1997), (Smolensky and Legendre, 2006)). The connections between the nodes in a network can be conceived of as constraints, in the sense that connected units influence each other in an excitatory (positive) or inhibitory (negative) way. The Harmony of a pattern of activation is a numerical measure of how well this pattern of activation conforms to the constraints that are implicit in the connections of the network. These constraints are violable and typically conflicting. If a pattern of activation is maximally harmonic,
62
C HAPTER F OUR
it is called optimal. Optimality Theory emerged as a linguistic theory in the early nineties upon realisation that this concept of optimisation in neural network modelling could be used for a theory of grammar as well (cf. (Prince and Smolensky, 1997, 2004)). Optimality Theory views grammar as a set of violable and potentially conflicting constraints that apply to linguistic elements. These constraints vary in (relative) strength. They are ranked in strict domination, which means that violation of a higher ranked constraint is always worse than violation of all lower constraints combined (Prince and Smolensky, 1997, 2004). Given a certain linguistic input, the optimal linguistic output is the one that best satisfies the ranked set of constraints in the grammar. In OT syntax and semantics, differences between production and comprehension result from the different directions of optimisation, either from meaning to form or from form to meaning (cf. (Hendriks and de Hoop, 2001), (Hendriks et al., 2010)). In Optimality Theoretic syntax the linguistic input is the message that a speaker wants to get across. The optimal output is computed as the linguistic form, a word or a sentence, that best expresses this input meaning. In Optimality Theoretic semantics the linguistic input is a specific form, a word or a sentence, and the optimal output is the interpretation that the hearer arrives at after evaluating the possible interpretations of the input against the ranked set of constraints. Thus, a speaker optimises syntactic structure with respect to a semantic input (the speaker’s ‘thought’ or intention), whereas a hearer optimises the interpretation of a certain utterance in a certain context (Hendriks and de Hoop, 2001; Hendriks et al., 2010). Two notoriously conflicting constraints in language use are Economy and Iconicity, which are widely acknowledged among linguists, although formulated differently in various approaches (cf. e.g., (Zipf, 1949; Horn, 1984; Haiman 1985; Haspelmath, to appear). Economy is a general markedness principle (or rather, a family of constraints) that favours the least complex linguistic outputs, whereas Iconicity is a general principle of faithfulness (again a family of constraints) that requires as many of the distinctions present in the linguistic input to be represented in the linguistic output. One general assumption is that Economy is a principle for the speaker’s sake (ease of production), whereas Iconicity is a principle for the hearer’s sake (ease of comprehension), and because they are necessarily conflicting, languages can be understood to never stop changing by trying to resolve this eternal battle. Personal pronouns reflect this conflict between Economy and Iconicity as well, because they are syntactically reduced noun phrases (Economy)
C ONSTRAINTS IN O PTIMALITY T HEORY
63
which differ in the number of features they represent (Iconicity). That is, whereas the world’s 6000 languages or so almost all seem to have personal pronouns that refer to the speaker of the utterance, the addressee(s) of the utterance, and to some other referent, they differ greatly in other distinctions that they make. Semantic distinctions that might but need not be marked in the personal pronouns of a language are number, gender, grammatical function, politeness, and animacy (cf. (Siewierska, 2004; Heine and Song, 2010; de Schepper, 2013)). Moravcsik (to appear) discusses one example of the conflict between Economy and Iconicity in the domain of personal pronouns: the distinction of gender in third person plural pronouns. Gender differentiation in the plural seems less useful than in the singular, because groups of people are not necessarily homogeneous in gender; they are often mixed. Therefore, gender distinction in plural pronouns is less often attested than in singular pronouns in the languages of the world. Satisfaction of Iconicity would predict gender differentiation in third person plural pronouns, while Economy would prefer no distinction marked at all. In this case, two languages spring to mind in which the conflict between the two constraints is resolved differently. Economy outranks Iconicity in English, since this language only has one single form, the gender-neutral third person plural pronoun they, while in French Iconicity outranks Economy, since French differentiates between the feminine third person plural elles and its masculine counterpart ils (Moravcsik, to appear). Of course, the constraints Economy and Iconicity conflict with each other in many different linguistic areas and at many different levels of linguistic expression. We cannot say in general that in English Economy outranks Iconicity while in French it is the other way around. That will vary per construction. One important characteristic of Optimality Theory is that grammar is not taken to be a modular system (Prince and Smolenksy, 2004; Smolensky and Legendre, 2006; Hendriks et al., 2010). A grammar consists of a set of constraints that can be of different nature, i.e., phonological, morphological, syntactic, semantic, pragmatic. These constraints are ranked with respect to each other, such that a given input after evaluation against this set of ranked constraints gives rise to a winning output candidate, which is the optimal solution to the puzzle presented by a given input in view of the conflicting constraints. Also, there is no principled distinction between grammatical (internal) and extra-grammatical (external) constraints that may reinforce each other or compete with each other. This opens up the way of having linguistic principles interact and possibly conflict with other constraints coming from the linguistic and extra-linguistic context. I will illustrate this with three case studies in the domain of personal pronouns and pointing. But be-
64
C HAPTER F OUR
fore I do so, I will discuss basic characteristics of different types of personal pronouns. In the next section I will focus on a major difference I assume exists between first and second person pronouns on the one hand and third person pronouns on the other.
4.2 F IRST AND SECOND VERSUS THIRD PERSON PRONOUNS
Two persons who are necessarily present in face-to-face communication are the speaker and the addressee, whereas other persons are optional. First and second person are the most important, therefore, but at the same time the most trivial or redundant in everyday communication. In many languages, pronouns can be omitted if they refer to first and second person subjects, because the meaning of first and second person is easy to recover. Even in English, the addressee is understood but not expressed in imperative constructions (‘Close the door, please!’), while the first person narrator is understood but not necessarily expressed in diary language (‘Went to the market yesterday’). These implicit subjects cannot be called anaphoric, since there is no linguistic antecedent on which they are referentially dependent. Perhaps we can call first and second person reference ‘topical’ in the sense that the speaker and the addressee, when referred to in the sentence, are also the topic of that sentence. In a diary context this seems quite plausible, but in other contexts it may be less obvious. Still, in many languages we see reflexes of a so-called person hierarchy in which first and second person pronouns are ranked higher than third person pronouns and full noun phrases on a scale of topicality or prominence. For example, in many of the world’s languages first and second person pronouns are less frequent or less preferred or sometimes even unacceptable in grammatically less important functions, such as the agent in a by-phrase of a passive construction. The reasons for such a person hierarchy may vary, however, and the dichotomy between first and second person pronouns on the one hand and third person pronouns on the other does not hold across the board (see e.g., (de Swart, 2005; de Schepper and de Hoop, 2012; de Schepper, 2013). In formal semantic theory not much attention is usually paid to differences between first and second person pronouns on the one hand and third person pronouns on the other. In formal discourse theory (Discourse Representation Theory), for example, a discourse referent is introduced as a variable in the discourse, e.g., x, which is identified (x = Robert) or further constrained
C ONSTRAINTS IN O PTIMALITY T HEORY
65
(‘x is a man’) and then predicated over (‘x started to quarrel’). Pronouns are treated as variables as well, and variables corresponding to pronouns usually come from syntactic antecedents. But as I have pointed out above, first and second person pronouns do not have linguistic antecedents. I will follow Maier and de Schepper (2014) who argue that first and second person pronouns are deictic (like proper nouns) whereas third person pronouns are anaphoric. Personal pronouns are called ‘pronouns’ because they are used instead of ‘nouns’ or more adequately, instead of full noun phrases. However, there are no noun phrases that are first or second person. Noun phrases are always third person. Of course, a noun phrase can be used to refer to the speaker or the addressee, but that does not make these noun phrases first or second person markers themselves. Examples of third person expressions referring to the speaker or the addressee are Mommy referring to the speaker of the question “Shall Mommy help you?” and silly cow, referring to the addressee of the utterance “Sit down, silly cow!” Pronouns are typically grammaticalised expressions originating from (most often) nouns, but also other types of pronouns or terms of spatial deixis (Heine and Song, 2010). In languages such as Indonesian in which politeness plays an important role, but grammatical function does not, we often find a wider range of terms that refer to the speaker or the addressee (cf. (Flannery, 2010)). Nouns may grammaticalise into ‘real’ first and second person pronouns, a process which is probably easier when a language lacks person agreement on the verb (Aalberse, 2009). Bresnan (2001) provides an Optimality Theoretic account of forms and meanings of personal pronouns. She pairs a range of personal pronominal forms – zero, bound, clitic, weak and strong or free pronouns – to their referential role and functions. She uses two Harmony constraints on pairings of forms and functions, as formulated in (4.2.1) (Bresnan, 2001, p. 119): Example 4.2.1. Harmony constraints on pronominals a. Reduced ⇔ TOP: Pronominals are reduced if and only if they are specialised for topic anaphoricity. b. Overt ⇔ AGR: Pronominals are inherently specified for person/ number/ gender if and only if they are overt. Constraint (4.2.1a) states that pronouns are reduced if and only if they are specialised for ‘topic anaphoricity’. Thus, while pronouns themselves are already reduced forms (Heine and Song, 2010), Bresnan models the fact
66
C HAPTER F OUR
that when a language has two types of personal pronouns, one more reduced in form than the other, the reduced forms are used for ‘topic anaphoricity’, whereas the non-reduced forms are used for focus. The interaction between Economy and Iconicity is such that generally, the most reduced form is the least iconic. Therefore, reduced pronouns are typically used for referents which are already salient in the discourse because they are anaphoric and refer to the continuing topic in the conversation (Haiman, 1985). Bresnan (2001) provides OT syntactic analyses, in which a certain semantic input will lead to the use of a certain pronoun in a certain language. In her analyses the two Harmony constraints given in (4.2.1) above are ‘undominated’, so only lower ranked constraints dealing with forms or functions can make the difference in the choice for a certain type of pronoun. I do not think Bresnan’s account can be correct, for several reasons, which I discuss below. In Bresnan’s account personal pronouns share three major types of properties cross-linguistically: PRO that stands for shifting reference and anaphoricity, TOP that stands for the information-structural notion of reference to the topic of the discourse, and AGR representing morphological features including person, number and gender. Bresnan states that whereas all pronouns share the property of PRO, not all personal pronouns have AGR and TOP features (Bresnan, 2001, p. 116). Note, however, as already pointed out above, that whereas personal pronoun systems in the languages of the world may differ as to whether they make distinctions on the basis of number, gender, grammatical function, etcetera, they all make the distinction in person. One could even say that this is the reason why these pronouns are called personal pronouns. Personal pronouns are indeed person markers par excellence (Siewierska, 2004). Bresnan (2001) however subsumes the marking of person under the general label AGR, which is according to her not a property that pronouns necessarily have. Thus, according to Bresnan (2001) person marking is not a universal function of personal pronouns. By contrast, I assume that person marking is the basic function of personal pronouns. I follow Heine and Song (2010, p. 118) who define personal pronouns as “words whose primary or only function is to express distinctions of personal deixis.” I also disagree with Bresnan’s (2001) claim that anaphoricity is a universal property of personal pronouns. She argues that: “anaphoricity distinguishes pronominals from basic expressions that are pure deictics, like this and that: though pronominals often derive historically from deictics (Greenberg, 1986, p. xix), they must have anaphoricity as a synchronic property to be functioning as personal pronouns.” (Bresnan, 2001, p. 115)
C ONSTRAINTS IN O PTIMALITY T HEORY
67
First, if anaphoricity were indeed a basic property of personal pronouns and person marking was not, we would predict the possible existence of languages that would have only one personal pronoun shifting reference to the speaker (first person) of an utterance, the addressee (second person), or another individual (third person), just depending on the context. However, all languages appear to have personal pronouns that distinguish at least three persons (first, second, and third), which is also one of Greenberg’s (1963) language universals. This does not mean of course that all personal pronouns in a language unambiguously refer to first, second or third person. De Schepper (2013) discusses the neutral singular pronoun paradigm in Jambi City Malay, a language with four levels of politeness: friendly, neutral, respectful to family, respectful to non-family (Lukman, 2009). The personal pronoun awak at the neutral politeness level is ambiguous between first and second person singular, so it can mean either I or you depending on the context. However, at other politeness levels the language does have different pronouns for first and second person. Second, whereas third person pronouns are indeed inherently anaphoric, although they can still be interpreted deictically, first and second person pronouns are in fact not anaphoric at all. So, if anaphoricity were a universal property of personal pronouns, as Bresnan (2001) claims, then first and second person pronouns could not fall into the category of personal pronouns. Bresnan (2001) states that the second occurrence of I in the sentence “I said that I would come” is anaphoric in the sense that it is referentially dependent on the first occurrence of I. In my view, however, both occurrences of I refer to the speaker of the utterance (the main clause), so the second one is not referentially dependent on the first one. In indirect speech representation, first and second person pronouns refer to the main clause speaker and addressee, not to the embedded speaker and addressee. This can be seen when we replace the first I by Jane: “Jane said that I would come.” In this sentence there is no antecedent I for the second occurrence of I, but this is not necessary either, since it still refers to the speaker of the utterance. Of course, in a direct speech report, this is different. Whereas the pronouns I and you in the sentence “Jane said to Jacky that I love you” refer to the speaker and the addressee of the main clause respectively, they refer to the individuals Jane and Jacky in the sentence “Jane said to Jacky: ‘I love you’ ”, where Jane and Jacky are the speaker and the addressee of the embedded utterance. Maier and de Schepper (2014) convincingly argue in favour of the view that first and second person pronouns are deictic, while third person pronouns are typically anaphoric. When indeed first and second person pronouns are inherently deictic
68
C HAPTER F OUR
and not anaphoric, as I have argued above, Bresnan’s Harmony constraint in (4.2.1a) might run into problems in light of languages that have reduced and non-reduced pronouns also in the first or second person, e.g., French reduced tu ‘you’ versus non-reduced toi ‘you’ or Dutch reduced je ‘you’ and ’k ‘I’ versus non-reduced jij ‘you’ and ik(ke) ‘I’. In these cases we cannot maintain that the reduced variants are specialised for topic-anaphoricity. However, we could maintain that the non-reduced ones are specialised for focus functions, that is, presenting new information or expressing contrast (Bresnan, 2001). In Yimas, for example, most pronominal forms are bound morphemes, and only first and second person pronouns may occur as true independent pronouns as well (Foley, 1991). Compared to English, these free pronouns for first and second person are used relatively infrequently and the semantic-pragmatic effect of using them is to express a contrast. In languages that do not have reduced versus non-reduced pronouns, interaction with prosody gives the same result, compare the English example “It is me (not you) who is to blame”. For non-reduced or stressed pronouns different readings can be obtained, as stress in (spoken) language is used for different reasons (the encoding of new information, the expression of contrast, or to indicate a shift in reference). In the case of third person pronouns in English, stress may result in a deictic reading (“He [speaker points to an unidentified person in the room] does not have a hand-out yet”) or a contrastive reading (“He is a very nice person, but she...”) (de Hoop, 2004). Since first and second person pronouns are inherently deictic, a stressed (or non-reduced) first or second person pronoun will not result in a deictic, but rather in a contrastive reading. If this indeed holds for first and second person pronouns, the same may be true for third person pronouns. The generalisation that thus arises is not that personal pronouns are reduced if and only if they are specialised for topic anaphoricity (because all personal pronouns are reduced and topical to a certain degree), but rather that if a language has more and less reduced pronominal forms, the non-reduced ones are specialised for semantic or pragmatic emphasis or stress, indicating focus or contrast in the discourse. In everyday conversation semantically and pragmatically unstressed pronouns are much more frequent than the stressed ones, which is why they are expected to grammaticalise into shorter, more economical (reduced) forms. The fact that all languages have personal pronouns that distinguish between first, second and third person has sometimes been challenged in the literature by reference to sign languages (see (Maier et al., 2013) for extensive discussion). In Sign Language of the Netherlands (NGT), as well as in other sign languages, pointings are taken to be the equivalents of per-
C ONSTRAINTS IN O PTIMALITY T HEORY
69
sonal pronouns. However, pointings are also used as gestures co-occurring with spoken Dutch and any other spoken language. Also, pointings do not necessarily refer to individuals (person marking), but they can also refer to locations. This is in fact similar to certain personal pronouns in spoken languages that can also find their origin in spatial deixis terms (cf. (Heine and Song, 2010)). Heine and Song (2010) discuss the evolution of the Late Old Japanese spatial term anata ‘over there’ that became a third person singular expression meaning ‘person over there’ in Early Modern Japanese before it shifted to a second person singular pronoun ‘you’ around 1750. Heine and Song (2010) also provide the following example from Standard Korean: Example 4.2.2. I jjog-eun gwaenchan-eunde geu jjog-eun eotteo-seyo that side-NOM how-END this side-NOM good-and “I am o.k., and how about you?” But can we still assume that sign languages have separate pronouns that refer to the speaker (first person), the addressee (second person) and other individuals (third person), when they all seem to have the same form (pointing) that is also used for referring to locations? Maier et al. (2013) argue that, while first person can be characterised as pointing to the (chest of the) speaker, Sign Language of the Netherlands does not have different second and third person pointing signs. It has been argued in the literature on sign language that eye gaze is a grammatical feature that may be used to tell apart second and third person pointing (Berenz, 2002; Alibaši´c Ciciliani and Wilbur, 2006). In this view, alignment of eye gaze and hand orientation indicates second person pointing, while misalignment between the two indicates third person. Maier et al. (2013) provide the following example as counterevidence to this essential role of eye gaze: A signer working at her computer is interrupted by her officemate. Unwilling to take her eyes off the screen she turns her body slightly, so that her colleague can see her hands, and she signs “Can’t you see I’m busy?”. Because the speaker does not look at her addressee, the pointing cannot be syntactically or lexically defined as second person, only pragmatically. However, I would like to argue that the distinction between second and third person pointing may not be a difference in syntax nor pragmatics, but a difference in semantics. Following Wechsler (2010) I propose that the interpretation of first and second person pronouns as well as pointing takes place via self-ascription by the speech-act participants, where “[t]he phrase self-ascription includes direct
70
C HAPTER F OUR
pronoun interpretation by its self-ascriber (i.e., speaker for first person, addressee for second person) (...) as well as indirect pronoun interpretation by a non-self-ascriber, who makes an inference from another interlocutors’ self-ascription” (Wechsler, 2010, p. 349). Thus first person pointing involves self-ascription by the speaker who produces the pointing and second person pronoun pointing involves self-ascription by the addressee. The colleague in the example above is the addressee of the utterance, and therefore interprets the pointing sign via self-ascription, independently of the eye gaze of the signer. Thus, an important distinction between first and second person pronouns (including pointing) versus third person pronouns (including pointing) lies in their semantics: first and second person pronouns are deictic and interpreted via self-ascription by the speaker and the addressee respectively, while third person pronouns are typically anaphoric. In the next three sections, I will present three case studies in the domain of personal pronouns and pointing and provide OT analyses to explain certain patterns.
4.3 I NCREMENTAL OPTIMISATION OF ANAPHORIC THIRD PERSON PRONOUNS
People hardly ever make mistakes in the interpretation of anaphoric pronouns, so apparently their grammar leads them to the intended interpretation of a pronoun in context. I believe this mechanism is optimisation of interpretation (cf. (Hendriks and de Hoop, 2001; Smolensky and Legendre, 2006; Hendriks et al., 2010; de Hoop, 2013)). In Optimality Theoretic (OT) semantics, the input to the process of optimisation is a form (an utterance) and the output an interpretation. In principle there is always an infinite number of interpretations possible, but on the basis of a ranked set of constraints only one candidate interpretation will come out as the optimal one. For example, upon hearing sentence (4.3.1) below, which is example (1a) of Kehler and Rohde (2013), the optimal interpretation of the third person pronoun he will emerge on the basis of the content and the structure of the utterance as well as on our world knowledge, as all reflected in potentially conflicting constraints (Kehler and Rohde, 2013, p. 2): Example 4.3.1. Mitt narrowly defeated Rick, and he quickly demanded a recount. So, how do readers arrive at the optimal interpretation of the anaphoric third person pronoun he when there are two potential antecedents available in
C ONSTRAINTS IN O PTIMALITY T HEORY
71
the linguistic context? Kehler and Rohde (2013) argue on the basis of psycholinguistic evidence that pronoun interpretation is affected by probabilistic expectations about coherence relationships within the discourse on the one hand, and expectations about what entities will be mentioned next, on the other. They present a probabilistic model that is capable of explaining pronoun interpretation preferences incrementally. The arguments for their model come from the results of sentence completion studies. This is remarkable, because a sentence completion task is a language production task, whereas the proposed model is a model of language interpretation. De Hoop (2013) proposes an alternative OT analysis in which a constraint on world knowledge outranks a lexical constraint of implicit causality (Garvey et al., 1974; Cozijn et al., 2011) that comes with the verb defeat which in turn outranks a constraint that favours topic continuation. Topic continuation requires the pronoun to refer to the continuing topic, usually the subject of the preceding clause (de Hoop, 2004; Beaver, 2004). Thus, in (4.3.1) several constraints point into the direction of Mitt to be interpreted as the antecedent of he. However, after finishing reading the complete sentence the pronoun he in (4.3.1) is interpreted as referring to Rick, because of our world knowledge. Following the incremental optimisation of interpretation approach, developed by Lamers and de Hoop (2005) and de Hoop and Lamers (2006), and in accordance with the probabilistic account of Kehler and Rohde (2013), we can see that at the stage when the pronoun is encountered, the optimal interpretation of the pronoun is ‘Mitt’: Tableau 4.1: Incremental optimisation of the interpretation of he, stage 1, sentence (4.3.1)
Mitt narrowly defeated Rick, and he... ||he|| = Mitt’ ||he|| = Rick’
W ORLD K NOWLEDGE
I MPLICIT C AUSALITY
C ONTINUING T OPIC
*
*
Tableau 4.1 should be read as follows. The left upper cell contains the input, in this case an utterance. The other cells of the left column specify the (relevant) candidate outputs (interpretations) for the anaphoric third person pronoun he in the input. The constraints are ranked from left to right in the other three columns of the tableau. An asterisk indicates that a constraint is violated by the output candidate. It can be seen that interpreting he in (4.3.1) as referring to Rick violates two constraints, whereas interpreting it as Mitt does not violate any of the three constraints in this stage. Therefore, the first
72
C HAPTER F OUR
candidate output is optimal in this stage of interpretation of sentence (4.3.1). The optimal output is indicated in the tableau by the pointing finger. Subsequently, when the utterance is completed as in sentence (4.3.1) above, the process of optimisation in stage 2 yields a different winner, as shown in Tableau 4.2. This is called a ‘jump’ to another interpretation by Lamers and de Hoop (2005) and de Hoop and Lamers (2006): Tableau 4.2: Incremental optimisation of the interpretation of he, stage 2, sentence (4.3.1)
Mitt narrowly defeated Rick, and he... ||he|| = Mitt’ ||he|| = Rick’
...quickly demanded a recount
||he|| = Mitt’ ||he|| = Rick’
W ORLD I MPLICIT C ONTI K NOWL - C AUSAL - NUING EDGE ITY T OPIC
* *
*
In the second (final) stage of interpreting sentence (4.3.1), world knowledge overrules the earlier syntactically and lexically based preference of interpreting the pronoun he. A recount is normally demanded by the person who was defeated, in this case Rick. Hence, interpreting he as referring to Mitt would violate this constraint. Because the constraint W ORLD K NOWL EDGE outranks the other two constraints, this time the second output candidate becomes optimal. The fact that the optimal interpretation violates two constraints whereas the sub-optimal one violates only one does not play a role in determining the winner, since in Optimality Theory violation of a stronger constraint is worse than violations of all weaker constraints combined.
4.4 OT SEMANTIC ANALYSIS OF PERSONAL PRONOUNS AND POINTING
Although third person pronouns are typically anaphoric, they can have deictic readings too. But whereas first and second person pronouns in language are straightforwardly interpreted as referring to the speaker and the addressee of the utterance, third person pronouns cannot be directly interpreted deictically. If they are not anaphoric, they need extra-linguistic clues from the context to be interpreted correctly by the hearer. That is, in a room
C ONSTRAINTS IN O PTIMALITY T HEORY
73
full of men the sentence “He will go to Paris tomorrow” uttered out of the blue will not make much sense to the addressee, unless the speaker of the utterance points or nods or looks to a specific individual in the room. Only then the third person pronoun gets a deictic interpretation. Thus, whereas first and second person pronouns are inherently deictic (in the sense that they refer to the speaker and the addressee of the utterance), third person pronouns are inherently anaphoric, and need some sort of pointing in order to obtain a deictic interpretation. Pfau (2011, p. 144) observes that some utterances can indeed not be interpreted without taking into account the accompanying pointing gestures. Additional pointing can also be necessary when second person pronouns are used in a context with multiple addressees, as discussed by de Schepper (2013, p. 34), for example: “You [speaker points at person A] should stay and you [speaker points at person B] should go.” Another example of this is taken from Zwets’ (2014) study on differences and similarities between pointing signs (part of sign language) and pointing gestures (co-occurring with spoken language). While uttering the sentence given in (4.4.1), the speaker points to two imaginary addressees, as can be seen in Figure 4.1 (taken from Zwets (2014)). The speaker in this experiment discusses a dilemma in which she was the only eye-witness of a car accident involving two people, but she was in a hurry to catch her flight (the square brackets indicate co-occurrence with pointing). Example 4.4.1. Ik zou gewoon naar die twee toegaan, zo van ja ik heb gezien wat [jij]1 deed, dat [jij]2 onschuldig was... “I would approach those two, like ‘I have seen what [you]1 did, that [you]2 were innocent’...”
Figure 4.1: Pointing to two imaginary addressees [you]1 and [you]2 (Zwets 2014)
74
C HAPTER F OUR
Thus, personal pronouns may be accompanied by pointing gestures for various reasons (Zwets, 2009, 2014). Sign languages, however, only have pointing and no additional linguistic forms that can be considered the equivalents of personal pronouns in spoken language. Consider the following example from Sign Language of the Netherlands (NGT), taken from Zwets (2014): Example 4.4.2. PT:colleague COLLEAGUE INFLUENCE #NGT SIGN “Colleagues influenced my NGT signing”
Figure 4.2: Pointing to introduce a discourse referent (Zwets 2014)
The signer who signs the utterance in (4.4.2), illustrated in Figure 4.2, first points to an empty arbitrary location, and then produces the sign for ‘colleague’, thereby introducing a discourse referent for her colleagues and localising this discourse referent in the surrounding space. The interpretation of the pointing in (4.4.2) thus crucially depends on the linguistic context. Because it precedes the sign for ‘colleague’, the pointing sign itself refers to the colleagues. Clearly, the pointing here is a linguistic element and it has the grammatical function of localising a referent in the surrounding space (cf. (Liddell, 2003)). This grammatical process of localising a referent makes it possible to refer back to this referent later on in the discourse by pointing again at the location (Liddell, 2003; Barberà and Zwets, 2013). That is, pointing to the same location later on in the discourse will be interpreted anaphorically. Hence, a pointing sign can be interpreted as an anaphoric third person pronoun indeed which usually takes as its antecedent the topic of the sentence (Crasborn et al., 2009). Suppose the signer in (4.4.2) would like to continue her utterance by another statement about her colleagues, for instance that they are lousy signers. In spoken language, such a statement could be expressed by using an anaphoric third person plural pronoun they, referring back to the antecedent (my) colleagues. However, there is no separate sign for they or any other personal pronoun in Sign Language of the Netherlands. Avoiding (uneconomical) repetition of the noun phrase (my) colleagues, the signer can simply point to the location at
C ONSTRAINTS IN O PTIMALITY T HEORY
75
which the discourse referent was introduced in order to refer to them again in the continuing discourse. Thus, pointing to an empty location that was previously assigned to a discourse referent gives rise to an anaphoric interpretation of the pointing sign (Barberà and Zwets, 2013; Crasborn et al., 2009). Zwets (2014) develops an OT semantic analysis of pointing, that I will summarise in this section. In her corpus data of pointing signs and pointing gestures she found four different types of pointing, to wit: (i) pointing to a location to refer to that location; (ii) pointing to an object to refer to that object; (iii) pointing to an object to refer to another object; (iv) pointing to an empty location to refer to an (absent) object. In order to model the interpretive optimisation of these four types of pointing, she proposes three constraints, which are all independently motivated (but also differently formulated) in the existing literature on sign language and gestures. The three constraints are listed in (1)-(3), ranked from strongest to weakest (Zwets, 2014, Chapter 2): Definition 1. C ONNECT: when pointing coincides with a linguistic element, it is interpreted as referring to the object referred to by the linguistic element. Definition 2. R EF O BJECT: pointing is interpreted as referring to an entity rather than to a place. Definition 3. S TAY L OCAL: pointing is interpreted as pointing to an actual location or object in the surrounding space. The first constraint captures the fact that people will interpret a linguistic element and an accompanying pointing as referring to the same entity (Clark and Marchall, 1981; Liddell, 1995). This constraint outranks the other two constraints. Zwets (2014) shows that the weaker two constraints cannot be ranked with respect to each other, because typically when they are in conflict, the strongest constraint, C ONNECT, will resolve the conflict. The constraint R EF O BJECT captures the fact that people interpret pointing as referring to something rather than to somewhere (Clark, 2003, 2005). The third constraint relates to a basic function of language, that is reference to the immediate context, the here and now, or as Zwets calls it, the surrounding space (cf. (Hockett, 1960; Zwets 2014)). I will now show how Zwets accounts for the interpretations of pointing co-occurring with speech. The data below are all adopted from Zwets’ (2014) study. First, consider an example in which the participant in a gesture elicitation experiment is pointing
76
C HAPTER F OUR
at a drawing in her lap, as can be seen in Figure 4.3, while uttering the sentence in (4.4.3) (as before, the square brackets indicate the co-occurrence with the pointing gesture): Example 4.4.3. [Janke], die is ook getrouwd en die heeft ook twee kinderen “[Janke], she is also married and she also has two children”
Figure 4.3: Pointing at drawing while uttering sentence (4.4.3)
In Figure 4.3 the speaker of sentence (4.4.3) is pointing at the drawing of a family tree, while uttering the proper noun Janke. The pointing could in principle refer to a location on the drawing, an object such as the drawing itself, or to a person, Janke, who is not present in the surrounding space. Zwets (2014) claims that the intended referent of the pointing in Figure 4.3 is indeed Janke. The OT semantic Tableau 4.3 shows how this interpretation becomes optimal, given the three constraints listed in (1)-(3) and their ranking. Note that the dotted line between two of the constraints indicates that the ranking between these constraints is undetermined. The two candidate interpretations that satisfy the weakest constraint, S TAY L OCAL, are references to the location on the drawing and to the drawing itself. Both these candidates, however, violate the strongest constraint C ONNECT, because the speaker utters the name Janke while pointing, and this should be taken into account in finding the intended referent. Reference to a location on the map additionally violates R EF O BJECT, the constraint that prefers pointing to refer to an object rather than to a location. Therefore, the constraint C ONNECT ultimately determines the interpretation. The addressee understands that the
C ONSTRAINTS IN O PTIMALITY T HEORY
77
Tableau 4.3: Interpretive optimisation of pointing in sentence (4.4.3), Figure 4.3
Input form: “[Janke]’, she is also married and she also has two children” [...] = pointing to drawing in lap (Figure 4.3) location’ drawing’ Janke’
C ONNECT
R EF O BJECT
* *
*
S TAY L OCAL
*
speaker is referring to a person Janke, who is not present in the surrounding space, by pointing to the representation of this person on the drawing. The following example also illustrates referring to an absent individual, but in this case, there is no object in the surrounding space that represents this individual. By using a combination of speech and gesture the speaker is able to prevent you from referring to the addressee who is present in the local context. Instead the second person pronoun you refers to an imaginary addressee in the story the speaker is telling (Zwets 2014). Example 4.4.4. Ik zou dan niet heel gemakkelijk op die persoon afgaan en zeggen [jij steelt] “I would not easily approach that person and say [you’re stealing]” Note that while the second person pronoun you gets a deictic interpretation, it does not refer to the (local) addressee, but rather to the addressee of the embedded clause, the object of zeggen ‘say’ in sentence (4.4.4). De Schepper (2013) argues that the identities of the speaker and the addressee(s) are always fixed at the beginning of the sentence, while the identity of others is not. Because the identity of the speaker and the addressee is already fixed at the beginning of the sentence, the content of the sentence cannot influence or change who is the speaker or the addressee. I assume, following Wechsler (2010), that a second person pronoun is necessarily interpreted by the addressee via self-ascription. When the second person pronoun is embedded in direct speech or in the course of a story, it gets interpreted as an embedded addressee, which I will refer to as an ‘imaginary’ addressee. The process of self-ascription still takes place, but is overruled by the context in which you is used. This is what Wechsler (2010) calls ‘indirect’ pronoun interpretation
78
C HAPTER F OUR
Figure 4.4: Pointing to an imaginary addressee while uttering sentence (4.4.4) (Zwets 2014)
by a non-self-ascriber. I assume that indirect pronoun interpretation is triggered by a very general constraint that requires the interpretation of a lexical item to fit within the context, as argued for by Hogeweg (2009) and Zwarts (2004): Definition 4. F IT: interpretations should not conflict with the (linguistic) context. In order to arrive at the right optimal interpretation, the constraint F IT should be stronger than S TAY L OCAL but weaker than C ONNECT. This process of arriving at the optimal interpretation of you in context is illustrated in the Optimality Theoretic semantic Tableau 4.4. The constraint F IT is formulated very generally, and somehow we are expected to deduce from the input whether a certain reading is in conflict with the context. A detailed analysis as to which syntactic, semantic, and pragmatic factors exactly determine the reading of a certain pronoun in a certain context is however outside the intended scope of this paper. The third and final example of interpretive optimisation of pointing is given in sentence (4.4.5), Figure 4.5 (Zwets, 2014): Example 4.4.5. Je kan ’m ook [hierzo] doen “You can also put it [over here]” In this example the participants are discussing their family trees. The woman on the left is drawing a tree that does not fit on the paper, however, so the
C ONSTRAINTS IN O PTIMALITY T HEORY
79
Tableau 4.4: Interpretive optimisation of pointing in sentence (4.4.4), Figure 4.4
Input form: “I would not easily approach that person and say [you’re stealing]” [...] = pointing to an empty location location’ youlocal ’ youimaginary ’
C ONNECT
F IT
*
R EF O BJECT
S TAY L OCAL
* * *
Figure 4.5: Pointing at paper while uttering a sentence (4.4.5) (Zwets 2014)
other woman utters a sentence (4.4.5) while pointing at a location on the piece of paper. Because she utters hierzo ‘over here’ while pointing, the intended referent of the pointing is taken to be a location on the paper, in accordance with the constraint C ONNECT. Hence, in this instance S TAY L OCAL is satisfied, whereas R EF O BJECT is violated in order to satisfy the stronger constraint C ONNECT. This is illustrated in Tableau 4.5. Again, C ONNECT determines the eventual interpretation of the pointing. It leads the addressee to interpret the pointing as indicating a location, thus satisfying S TAY L OCAL but violating R EF O BJECT. Zwets (2014) distinguishes one other interpretation that she found for pointing signs but not for pointing gestures. This type of pointing was illustrated by the example in sentence (4.4.2), Figure 4.2 above. While the constraint C ONNECT is about connecting pointing to a linguistic element, a
80
C HAPTER F OUR
Tableau 4.5: Interpretive optimisation of pointing in sentence (4.4.5), Figure 4.5
Input form: “You can also do it [over here]” [...] = pointing to paper (Figure 5) location’ paper’ family tree’
C ONNECT
R EF O BJECT
S TAY L OCAL
* * *
pointing sign is a linguistic element itself, which can be interpreted either deictically or anaphorically (unlike pointing gestures). If the location that is pointed at is empty and has not been used for introducing another discourse referent yet, there is no potential antecedent present in the input, and the pointing is interpreted as referring to the object that is expressed by the adjacent noun phrase. Hence, we arrive at the OT Tableau 4.6. Tableau 4.6: Interpretive optimisation of pointing in sentence (4.4.2), Figure 4.2 (page 74)
Input form: “[POINT] COLLEAGUES INFLUENCE #NGT SIGN” [...] = pointing to empty location (Figure 2) location’ colleagues’
C ONNECT
R EF O BJECT
*
*
S TAY L OCAL
The reason that in Tableau 4.6 the constraint S TAY L OCAL is not violated, is because the pointing sign is actually used to indicate an arbitrary but important location in the surrounding space. In this location the discourse referent ‘colleagues’ is introduced, so that this location can be used in the subsequent discourse to refer anaphorically to the antecedent ‘colleagues’ again (Zwets, 2014). This concludes the discussion of interpretive optimisation of pointing as developed by Zwets (2014).
C ONSTRAINTS IN O PTIMALITY T HEORY
81
4.5 OT SYNTACTIC ANALYSIS OF PERSONAL PRONOUNS AND POINTING
De Schepper (2013) provides an Optimality Theoretic account of the use of personal pronouns versus pointing in spoken language and sign language. He introduces four independently motivated constraints to deal with this pattern. The four constraints are listed in (5)-(8), ranked from strongest to weakest (slightly adapted from de Schepper (2013, p. 34-35)): Definition 5. *A MBIGUITY: make clear which contextual entity is referred to. Definition 6. D OMINANCE: use the dominant medium when communicating. Definition 7. *I NNOVATIONS: do not invent new elements. Definition 8. E CONOMY: reduce the number of elements used. The constraint in (5) is the strongest one, and it is addressee-oriented: the addressee should be able to get the right interpretation for the given linguistic form. The constraint in (6) simply requires hearing people to use speech and deaf people to use signs, i.e., it requires everybody to use their dominant language. De Schepper (2013) uses the constraint in (7) to prefer pointing, which he assumes has “always been a part of modern man” (de Schepper, 2013, p. 35) over the use of personal pronouns, which he considers an innovation. Finally, the well-known principle Economy comes in many variations, but is defined in (8) as a matter of number of elements. When does the interaction of these constraints lead to the use of a personal pronoun or pointing or both? Consider the OT syntactic Tableau 4.7 of choosing between a personal pronoun and pointing for deictic reference to the speaker in case of a deaf speaker whose dominant language is Sign Language of the Netherlands (NGT) (de Schepper, 2013). When signers wish to refer to themselves, but do not use a linguistic element, neither a pointing or a personal pronoun, an addressee will not arrive at the intended interpretation. Therefore, in order to avoid ambiguity, signers have to use a linguistic element to refer to themselves. Only the zero element will therefore violate the constraint *A MBIGUITY. We assume that signers will not use spoken personal pronouns (although they usually can use speech and will of course do so in certain circumstances), therefore personal pronoun in the tableau above refers to the use of a separate linguistic
82
C HAPTER F OUR
Tableau 4.7: Expressive optimisation of reference to the speaker in sign language context
Input meaning: 1SG Context: NGT Speaker ∅ pointing personal pronoun pointing + personal pronoun
*A MBI -
D OMI -
*I NNO -
E CO -
GUITY
NANCE
VATIONS
NOMY
*
* *
*
**
*
sign – not a pointing – that could be interpreted as the pronoun I. Therefore, none of the four output candidates violates the constraint D OMINANCE, as in all these cases the signers are assumed to use their dominant language (a sign) to refer to themselves. The use of pointing is ‘natural’, however, whereas using a separate sign, an invented personal pronoun, violates the ban against using new elements in language. Finally, the number of linguistic elements determines the number of violations of E CONOMY. The first candidate violates *A MBIGUITY, whereas the third and the fourth candidate violate *I NNOVATIONS as well as E CONOMY. Hence, the second candidate, that only violates the weakest constraint E CONOMY, wins the competition. That is, signers are predicted to use pointing to refer to themselves. By contrast, speakers of Dutch will not end up using a pointing gesture to refer to themselves (unless under very specific circumstances, for instance when it is not allowed or considered impolite to speak). For speakers of Dutch the constraint that requires speakers to stick to the dominant modality – in this case speech – comes into play. Tableau 4.8 illustrates expressive optimisation of reference to the speaker in spoken Dutch (de Schepper, 2013, p. 36). Tableau 4.8 shows that the first candidate is again ruled out because of *A M BIGUITY , while the second candidate now loses the competition because of D OMINANCE. The third and fourth candidates both violate *I NNOVATIONS and E CONOMY, but the fourth one, that consists of two linguistic elements, violates the latter constraint twice. Therefore, the third candidate output comes out as the winner. Speakers of Dutch will use a personal pronoun (ik ‘I’ or mij ‘me’) to refer to themselves.
C ONSTRAINTS IN O PTIMALITY T HEORY
83
Tableau 4.8: Expressive optimisation of reference to the speaker in spoken language context
Input meaning: 1SG Context: Speaker of Dutch ∅ pointing personal pronoun pointing + personal pronoun
*A MBI -
D OMI -
*I NNO -
E CO -
GUITY
NANCE
VATIONS
NOMY
*
* *
*
**
* *
In the above two contexts it is fairly clear why signers use pointing and speakers use a personal pronoun to refer to themselves. But we have already come across sentence (4.4.1), an example where a personal pronoun and pointing co-occur. This optimal outcome in the case of multiple addressees as in (4.4.1) above (“I have seen what [you]1 did, that [you]2 were innocent”) is illustrated in Tableau 4.9. Tableau 4.9: Expressive optimisation of reference to multiple addressees in spoken language context
Input meaning: 2SG1 + 2SG2 Context: Speaker of Dutch ∅ pointing personal pronoun pointing + personal pronoun
*A MBI -
D OMI -
*I NNO -
E CO -
GUITY
NANCE
VATIONS
NOMY
*
* *
*
**
* * *
By now, readers can check for themselves that the fourth candidate, by which the speaker will use a personal pronoun you and an accompanying pointing gesture, wins the competition. Avoidance of ambiguity is clearly not the only reason why in Dutch and other languages gestures co-occur
84
C HAPTER F OUR
with speech; other factors may involve emphasis, contrast, discourse structure, and social relationships between the interlocutors (cf. (Enfield et al., 2007; Zwets, 2014).
4.6 C ONCLUSION The aim of this chapter was to introduce the framework of Optimality Theory and to show its workings in the domain of personal pronouns and pointings where constraints of various natures appear to interact and possibly conflict. I presented three case studies to illustrate the cross-modularity of Optimality Theory. First, in interpreting anaphoric personal pronouns incrementally a constraint involving world knowledge was shown to outrank lexical and syntactic constraints. Second, in interpreting pointing gestures a constraint bridging two modalities – speech and gesture – was shown to outrank semantic and pragmatic constraints. Third, in producing personal pronouns and/or pointing a sociolinguistic constraint on using one’s dominant language was shown to outrank syntactic constraints on inventing new linguistic elements and on economy. A major distinction was argued to exist between different types of personal pronouns: first and second person pronouns are typically deictic, whereas third person pronouns are typically anaphoric. Pointing gestures can refer to first, second, and third person, but they are like first and second person pronouns in the sense that they are typically deictic. Only in sign language can pointing also be interpreted anaphorically, which makes sense because sign language lacks a separate class of personal pronouns other than pointings (Zwets 2014, de Schepper 2013).
ACKNOWLEDGEMENTS The research presented here was financially supported by the Netherlands Organisation for Scientific Research (grant 360-70-313), which is gratefully acknowledged. I would like to thank Thijs Trompenaars for his editorial help and Vera van Mulken for checking the English.
C ONSTRAINTS IN O PTIMALITY T HEORY
85
B IBLIOGRAPHY Aalberse, S. (2009). Inflectional Economy and Politeness: Morphologyinternal and morphology-external factors in the loss of second person marking in Dutch. Ph.D. thesis, University of Amsterdam. Alibaši´c Ciciliani, T. and Wilbur, R. B. (2006). Pronominal system in Croatian Sign Language. Sign Language and Linguistics, 9, 95–132. Barberà, G. and Zwets, M. (2013). Pointing in sign language and spoken language: anchoring vs. identifying. Sign Language and Linguistics, 13, 491–515. Berenz, N. (2002). Insights into person deixis. Sign Language and Linguistics, 3, 137–142. Bresnan, J. (2001). The emergence of the unmarked pronoun. In G. Legendre, J. Grimshaw, and S. Vikner, editors, Optimality-Theoretic Syntax, pages 113–142. The MIT Press, Cambridge, MA. Clark, H. H. (2003). Pointing and placing. In S. Kita, editor, Pointing: where language, culture, and cognition meet, pages 243–268. Lawrence Erlbaum Associates, NJ. Clark, H. H. (2005). Coordinating with each other in a material world. Discourse studies, 7, 507–525. Clark, H. H. and Marchall, C. R. (1981). Definite reference and mutual knowledge. In A. K. Joshi, B. Webber, and I. Sag, editors, Elements of discourse understanding, pages 10–63. Cambridge University Press, Cambridge. Cozijn, R., Commandeur, E., Vonk, W., and Noordman, L. G. (2011). The time course of the use of implicit causality information in the processing of pronouns: A visual world paradigm study. Journal of Memory and Language, 64, 81–40. Crasborn, O., van der Kooij, E., Ros, J., and de Hoop, H. (2009). Topicagreement in NGT (Sign Language of the Netherlands). The Linguistic Review, 26, 355–370. de Hoop, H. (2004). On the interpretation of stressed pronouns. In R. Blutner and H. Zeevat, editors, Optimality Theory and Pragmatics, pages 25– 41. Palgrave / Macmillan, New York.
86
C HAPTER F OUR
de Hoop, H. (2013). Incremental optimization of pronoun interpretation. Theoretical Linguistics, 39, 87–93. de Hoop, H. and de Swart, P. (2004). Contrast in discourse. Journal of Semantics, 21, 87–93. de Hoop, H. and Lamers, M. (2006). Incremental distinguishability of subject and object. In P. d. S. L. Kulikov, A. Malchukov, editor, Case, Valency and Transitivity, pages 269–287. John Benjamins, Amsterdam / Philadelphia. de Schepper, K. (2013). You and me against the world? First, second and third person in the world’s languages. Ph.D. thesis, Radboud University Nijmegen. de Schepper, K. and de Hoop, H. (2012). Construction-dependent hierarchies. In W. Abraham and E. Leiss, editors, Modality and theory of mind elements across languages, pages 383–404. De Gruyter, Berlin. de Swart, P. (2005). Cross-modularity in active to passive alternations. In J. Doetjes and J. van de Weijer, editors, Linguistics in the Netherlands 2005, pages 191–202. John Benjamins, Amsterdam. Enfield, N. J., Kita, S., and de Ruiter, J. P. (2007). Primary and secondary pragmatic functions of pointing gestures. Journal of Pragmatics, 39, 1722–1741. Flannery, G. (2010). Open and closed systems of self-reference and addressee-reference in Indonesian and English: a broad typological distinction. In Y. Treis and R. D. Busser, editors, Selected Papers from the 2009 Conference of the Australian Linguistic Society. Foley, W. A. (1991). The Yimas Language of New Guinea. Stanford University Press, Stanford, CA. Garvey, C. and Caramazza, A. (1974). Implicit causality in verbs. Linguistic Inquiry, 5, 459–464. Greenberg, J. (1963). Some universals of grammar with particular reference to the order of meaningful elements. In J.Greenberg, editor, Universals of Language, pages 73–113. MIT Press, Cambridge, MA. Greenberg, J. (1986). Introduction: some reflections on pronominal systems. In U. Wiesemann, editor, Pronominal Systems, pages xvii–xxi. Gunther Narr Verlag, Tübingen.
C ONSTRAINTS IN O PTIMALITY T HEORY
87
Haiman, J. (1985). Natural Syntax: Iconicity and Erosion. Cambridge University Press, Cambridge. Haspelmath, M. (2014). On system pressure competing with economic motivation. In B. MacWhinney, A. Malchukov, and E. Moravcsik, editors, Competing motivations in grammar and usage. Oxford University Press, Oxford. Heine, B. and Song, K.-A. (2010). On the genesis of personal pronouns: Some conceptual sources. Language and Cognition, 2, 117–147. Hendriks, P. and de Hoop, H. (2001). Optimality theoretic semantics. Linguistics and Philosophy, 24, 1–32. Hendriks, P., de Hoop, H., Krämer, I., de Swart, H., and Zwarts, J. (2010). Conflicts in Interpretation. Equinox, London. Hockett, C. F. (1960). The origin of speech. Scientific American, 203, 88– 111. Hogeweg, L. (2009). Word in progress. On the interpretation, acquisition, and production of words. Ph.D. thesis, Radboud University Nijmegen. Horn, L. (1984). Toward a new taxonomy for pragmatic inference: Q-based and R-based implicature. In D. Schiffrin, editor, Meaning, Form, and Use in Context: Linguistic Applications, pages 11–42. Georgetown University Press, Washington, DC. Kehler, A. and Rohde, H. (2013). A probabilistic reconciliation of coherence-driven and centering-driven theories of pronoun interpretation. Theoretical Linguistics, 39, 1–37. Lamers, M. and de Hoop, H. (2005). Animacy information in human sentence processing: an incremental optimization of interpretation approach. In H. Christiansen, P. R. Skadhauge, and J. Villadsen, editors, Constraint Solving and Language Processing, Lecture Notes in Computer Science, volume 3438, pages 158–171. Springer Verlag, Berlin. Liddell, S. K. (1995). Real, surrogate, and token space: grammatical consequences in ASL. In K. Emmorey and J. Reilly, editors, Language, Gesture, and Space, pages 19–41. Lawrence, Erlbaum Associates, Hillsdale, NJ. Liddell, S. K. (2003). Grammar, gesture and meaning in American Sign Language. Cambridge University Press, Cambridge.
88
C HAPTER F OUR
Lukman (2009). The role of politeness in the competition of Jambi City Malay local pronouns. MA Thesis, Radboud University Nijmegen. Maier, E. and de Schepper, K. (2014). Fake indexicals: morphosyntax, or pragmasemantics? Manuscript University of Groningen & Radboud University Nijmegen (submitted). Maier, E., de Schepper, K., and Zwets, M. (2013). The pragmatics of person and imperatives in Sign Language of the Netherlands. Research in Language, 11(4), 359–376. Moravcsik, E. (2014). Introduction. In B. MacWhinney, A. Malchukov, and E. Moravcsik, editors, Competing motivations in grammar and usage. Oxford University Press, Oxford. Pfau, R. (2011). A point well taken: on the typology and diachrony of pointing. In D. Napoli and G. Mathur, editors, Deaf around the world. The impact of language, pages 144–163. Oxford University Press, Oxford. Prince, A. and Smolensky, P. (1997). Optimality: from neural networks to universal grammar. Science, 14, 1604–1610. Prince, A. and Smolensky, P. (2004). Optimality Theory: constraint interaction in generative grammar. Blackwell Publishing, Malden. Siewierska, A. (2004). Person. Cambridge University Press, Cambridge. Smolensky, P. and Legendre, G. (2006). The Harmonic Mind. From neural computation to Optimality-Theoretic Grammar. MIT Press, Cambridge, MA. Wechsler, S. (2010). What ‘you’ and ‘I’ mean to each other: Person indexicals, self-ascription, and theory of mind. Language, 86, 332–365. Zipf, G. K. (1949). Human behavior and the principle of least effort: An introduction to human ecology. New York. Zwarts, J. (2004). Competition between word meanings: The polysemy of (a)round. In C. Meier and M. Weisgerber, editors, Proceedings of SuB 8, pages 349–360. University of Konstanz, Konstanz. Zwets, M. (2009). Clusivity of Dutch ‘wij’: evidence from pointing. In B. Botma and J. van Kampen, editors, Linguistics in the Netherlands, volume 26, pages 139–148. John Benjamins, Amsterdam.
C ONSTRAINTS IN O PTIMALITY T HEORY
89
Zwets, M. (2014). Locating the difference. A comparison between Dutch pointing gestures and pointing signs in Sign Language of the Netherlands. Ph.D. thesis, Radboud University Nijmegen.
Part II Recent Advances in Constraints and Language Processing
C HAPTER F IVE –S YNTAX 1– C ONSTRAINT- DRIVEN G RAMMAR D ESCRIPTION B ENOÎT C RABBÉ , D ENYS D UCHIER , YANNICK PARMENTIER , S IMON P ETITJEAN
5.1 I NTRODUCTION For a large number of applications in Natural Language Processing (NLP) (e.g., Dialogue Systems, Automatic Summarisation, Machine-Translation, etc.), one needs a fine-grained description of language. By fine-grained, it is generally meant that this description should precisely define the relations between the constituents of the sentence (often referred to as deep syntactic description), and, also, when possible, contain information about the meaning of the sentence (semantic representation).1 For such a fine-grained description of language to be processed by a computer, it is usually encoded in a mathematical framework called Formal Grammar. Since Chomsky’s seminal work on formal syntactic descriptions (Chomsky, 1957), several grammar formalisms have been proposed to describe natural language’s syntax. These generally rely on a rewriting system, 1
In this work, we focus on the sentence level, thus ruling out prosody, pragmatics, discourse and other levels of language that may also be useful for NLP applications.
94
C HAPTER F IVE
e.g., Context-Free Grammar (CFG). In such a system, one describe natural language syntax as a rewriting of strings, using two sets of symbols: terminal symbols (syntactic categories of syntactic constituents, such as NP for Noun Phrase, V for Verb, etc.), and non-terminal symbols (lexical items).2 One of the non-terminal symbols is distinguished and called axiom of the grammar. It is the starting point of a derivation. A grammar rule is then defined as the rewriting of a non-terminal symbol into a list of (terminal or non-terminal) symbols. As an illustration, consider the toy example below (terminal symbols start with lowercase letters, the axiom is S, standing for Sentence). S NP N V
→ → → →
NP Det cat eats
VP VP N Det N
→ V NP → the → mouse
Such a CFG can be used to derive sentences such as "the cat eats the mouse". CFG has been shown to suffer from a lack of expressivity. Indeed, for various syntactic phenomena, one cannot define linguistically-motivated rewriting rules which yield the expected sentences. One can for instance cite the case of cross-serial dependencies in Dutch (Bresnan et al., 1982). Several grammar formalisms have thus been proposed to deal with this lack of expressivity.3 Here, we will consider three grammar formalisms whose expressivity lies beyond that of CFG making them suitable for the description of the syntax of several natural languages. These formalisms are Tree-Adjoining Grammar (TAG), Lexical Functional Grammar (LFG) and Property Grammar (PG). They have been used to describe several electronic grammar for e.g. English (XTAG Research Group, 2001), Chinese (Fang and King, 2007), or French (Prost, 2008). TAG will serve as a basis for illustrating one of the main issues raised when developing real-size grammatical resources, namely redundancy. Redundancy is the fact that grammar rules often share significant common substructures. This redundancy greatly impacts grammar development and maintenance (how to ensure coherence between grammar rules?). LFG and PG will serve as support formalisms in the context of crossframework grammar engineering. 2 3
Roughly speaking, lexical items correspond to words. To be more precise, we should also consider non-atomic lexical items (so-called Multi-Word Units). The question of what expressivity is required to describe the syntax of natural languages is an open question.
C ONSTRAINT- DRIVEN G RAMMAR D ESCRIPTION
95
Let us first introduce TAG.4 TAG is a tree rewriting system, where elementary trees can be combined via two rewriting operations, called substitution and adjunction. Substitution consists in replacing a leaf node labelled with ↓ with a tree whose root has the same syntactic category as this leaf node. Adjunction consists in replacing an internal node with a tree where both the root node and one of the leaf nodes (labelled with ) have the same syntactic category as this internal node. As an illustration, consider Figure 5.1 below. It shows (i) the substitution of the elementary tree associated with the noun John into the elementary tree associated with the verb sleeps, and (ii) the adjunction of the elementary tree associated with the adverb deeply into the tree associated with sleeps.5 S NP
S NP↓
VP
VP
NP
V
VP ADV
John
sleeps
deeply
→
John
VP VP
ADV
V
deeply
sleeps (derived tree)
Figure 5.1: Tree rewriting in TAG
Note that most TAG implementations are such that any tree of the grammar has at least one leaf node labelled with a lexical item (the head of the syntactic constituent described by the tree). Such a TAG is called Lexicalised TAG (LTAG).6 In Lexicalised TAG, the grammar can be seen as a lexicon mapping lexical items with their associated syntactic structures. Following the XTAG (XTAG Research Group, 2001) grammar for English, redundancy is partly captured by considering tree templates (or tree schemata) instead of plain TAG trees. Such a tree template is a factorisation between TAG trees differing only by their lexical items. As examples of tree templates, consider Figure 5.2.7 When parsing sentences using tree tem4 5
6 7
For a detailed introduction to TAG, see (Joshi and Schabès, 1997). For sake of precision, note that the nodes labelled with syntactic categories (i.e., non-lexical nodes) are also associated with two feature structures constraining via unification the substitutions and adjunctions taking place during rewriting (Joshi and Schabès, 1997). Throughout this chapter, we only consider Lexicalised TAG. The trees depicted in this chapter are motivated by the French grammar of Abeillé (2002)
96
C HAPTER F IVE
plates, the appropriate lexical word is inserted dynamically by the parser as a child of the anchor (marked ). S N↓
S N↓
V
Jean voit Marie John sees Mary Figure 5.2:
TAG
S N↓
N↓
N
Quelle fille Jean Which girl John
N∗
S
V
N↓ V
N↓
voit sees
(Jean) qui voit Marie (John) who sees Mary
tree templates
Basically, a real-size TAG is made of thousands of elementary tree templates (XTAG Research Group, 2001; Crabbé, 2005). Due to TAG’s extended domain of locality, many of these trees share common sub-trees, as for instance the relation between a canonical subject and its verb, as shown in Figure 5.3. N
S N↓
V
N↓
N
S C que
Jean mange une pomme John eats an apple
S N↓
V
La pomme que Jean mange The apple that John eats
Figure 5.3: Structural redundancy in TAG
To deal with this redundancy, two main approaches have been considered in the last decades: (i) defining a set of canonical trees and extend these by applying transformation rules (so-called lexical rules), and (ii) defining a description language (so-called metagrammar) which would allow the linguist to abstract over the grammar trees. These two approaches are introduced in Section 5.2. Then, in Section 5.3, we present a metagrammar formalism named eXtensible MetaGrammar (XMG), which is particularly interesting who provides linguistic justifications in particular for not using the VP category and for using the category N at multiple bar levels instead of introducing the category NP in French.
C ONSTRAINT- DRIVEN G RAMMAR D ESCRIPTION
97
as it offers a collection of well-formedness constraints to be applied to the tree structures being described. In Section 5.4, we show how to extend this metagrammar formalism so that it can be used to describe other syntactic structures than TAG trees, namely structures of a Lexical Functional Grammar and of a Property Grammar. Using a common framework to work with different target syntactic formalisms makes it easier to share information between grammars (either grammars based on distinct formalisms or grammars describing distinct languages). Finally, in Section 5.5, we conclude about the current status of constraint-based metagrammar, and give some perspectives for future work.
5.2 S EMI - AUTOMATIC PRODUCTION OF TREE - ADJOINING GRAMMARS
In this Section we introduce the main ideas regarding the representation of the lexicon for tree based grammars.
5.2.1 L EXICAL
RULES
Let us start with an historical overview. In PATRII, one of the first proposal for grammatical representation (Shieber, 1984), the lexicon is roughly a set of lexical entries equipped with a subcategorisation frame, such as: love : = v = np = np
This entry specifies that the verb love takes two arguments: a subject noun phrase and an object noun phrase. This lexical entry can be used with an adequate grammar to constrain the syntactic structure where the word love can appear (e.g., to express the fact that love cannot be used intransitively). PATRII comes with two devices to facilitate lexical description: templates and lexical rules. On the one hand, templates can be seen as macros, which permit us to easily share information between lexical entries. For instance, one can state that love and write are transitive verbs by writing: love : transitiveVerb write : transitiveVerb
transitiveVerb : = v = np = np
98
C HAPTER F IVE
where transitiveVerb is a macro called in the descriptions of both love and write. On the other hand, lexical rules can be seen as a transformation device, used to describe alternative forms of a lexical entry. For instance, to express that a transitive verb such as love, has an active and a passive variant, we can use the following lexical rule: passive = pp
Such a rule is used to define a new lexical entry out, to be derived from an initial entry in, using the following constraints: (i) the category of out is identical to the category of in, (ii) the category of the object in out is the category of the subject in in, and (iii) that the subject category in out is a prepositional phrase. Lexical rules are used to allow for a dynamic expansion of the lexicon by deriving lexical variants from core lexical entries. For instance, the application of the passive lexical rule to the base entry of the verb love generates a new passive lexical entry. This transformation system has been widely used for describing the lexicon in other syntactic frameworks, e.g. HPSG (Meurers and Minnen, 1995). It builds on two leading ideas: lexical description aims both at factoring out information (templates) and at expressing relationships between variants of a single lexical unit (lexical rules). Strikingly, when working with strongly lexicalised syntactic systems such as Lexicalised TAG, alternative solutions to that of Shieber have been searched for. The main reason is that the amount and the variety of lexical units is much greater, leading to a much larger number of templates and lexical rules. For instance, for the development of large TAG such as the English XTAG , the grammar writer had to design complicated ordering schemes to deal with the large number of lexical rules (Prolo, 2002).
5.2.2 D ESCRIPTION
LANGUAGES
To overcome the ordering issues raised by strong lexicalisation, we propose a solution built on an idea first introduced in Construction Grammar (Koenig and Jurafsky, 1995).8 The idea is to describe the lexicon using a 8
Besides strong lexicalisation, setting up a system representing a TAG lexicon raises another problem, that of the structures used. In Construction Grammar, Koenig and Jurafsky (1995) combine elementary fragments of information via feature structure unification. When work-
C ONSTRAINT- DRIVEN G RAMMAR D ESCRIPTION
99
dynamic process: first, a core lexicon is described manually, then an extended lexicon is automatically built by combining elementary fragments of information. In other words, the design of a TAG grammar consists in describing trees made of elementary pieces of information (hereafter fragments). For instance the tree on the left of Figure 5.2 page 96 is defined by combining a subtree representing a subject with another subtree representing an object, and with a subtree representing the spine of the verbal tree, as shown in Figure 5.4. CanonicalSubject S N↓ Jean John
ActiveVerb S
+
V ... ...
V voit sees
CanonicalObject S
+
V ... ...
N↓ Marie Mary
Figure 5.4: Combining elementary tree fragments
However the design of an actual grammar requires to define trees with variants. That is, rather than merely describing a tree with a subject in canonical position, one also wants to describe a tree with e.g. a wh or a relative subject. More generally, while designing a grammar, one wants to define sets of trees having common properties (e.g., valency). For instance, one wants to define a transitive verb as being made of a subject, an object and an active verbal spine: TransitiveVerb
→
Subject ∧ ActiveVerb ∧ Object
where Subject and Object are shortcuts for describing sets of variants: Subject
→
CanonicalSubject ∨ RelativeSubject
Object
→
CanonicalObject ∨ WhObject
and where CanonicalSubject, WhSubject. . . are defined as the core fragments of the grammar: S CanonicalSubject →
S
S
CanonicalObject → N↓
V
ActiveVerb → V
N↓
V
ing with TAG, however, one works with trees. Thus, as we shall see in Section 5.3, the elementary fragments of information for TAG consist in formulas of a Tree Description Logic.
100
C HAPTER F IVE
N S RelativeSubject → N∗
S
WhObject → N↓
N↓
S
V
Given the above definitions, a description such as TransitiveVerb is meant to describe the tree templates depicted in Figure 5.2 page 96.9 That is, each variant description of the subject embedded in the Subject clause is combined with (i) each variant description of the object embedded in the Object clause, and with (ii) the description of the active verbal spine embedded in the ActiveVerb clause. Let us note that the representation system we have just introduced is built on two components: (i) a language for describing tree fragments and (ii) a language for controlling how fragments are to be combined. These two components will be detailed in the next Section, which formally introduces our representation system (called eXtensible MetaGrammar).
5.3 E X TENSIBLE M ETAG RAMMAR : CONSTRAINT- BASED GRAMMAR DESCRIPTION
In this Section, we consider two questions: (1) how to conveniently describe tree fragments, (2) how to flexibly constrain how such tree fragments may be combined to form larger syntactic units.
5.3.1 A
LANGUAGE FOR DESCRIBING TREE FRAGMENTS
Let us first introduce a language of tree descriptions, and show how it can be generalised to a family of formal languages parameterised by a node labelling system that further limits how elements can be combined. The base language L. Let x, y, z . . . be node variables. We write for immediate dominance, ∗ for its reflexive transitive closure (dominance), ≺ for immediate precedence (or adjacency) and ≺+ for its transitive closure (strict precedence). ranges over a set of node annotations (usually feature structures), and = refers to node identification (which causes unification 9
The combination of relativised subject and a questioned object is rejected by the principle of extraction uniqueness (See Section 5.3).
C ONSTRAINT- DRIVEN G RAMMAR D ESCRIPTION
101
over node annotations). A tree description φ is defined as follows:
φ
:= x y | x ∗ y | x ≺ y | x ≺+ y | x : | x = y | φ ∧ φ
Descriptions built over L are interpreted over first-order structures of finite ordered trees, only considering minimal models with respect to the number of nodes. Throughout the chapter we use an intuitive graphical notation for representing tree descriptions. Though this notation is not sufficient to represent every L-expression, it nonetheless generally suffices for the trees we use to describe natural language syntax. Thus, in the figure below, the description D0 on the left is represented by the tree on the right: x ∗ w ∧ x y ∧ x z D0 = ∧y ≺+ z ∧ z ≺ w ∧x : X ∧ y : Y ∧ z : Z ∧ w : W
X (D0 ) Y
≺+
Z
W
where immediate dominance is represented by a solid line, dominance by a dashed line, precedence by the symbol ≺+ and adjacency is left unmarked. A parametric family of languages. When using the language L to combine tree fragments, one needs to explicitly identify node variables defined in distinct fragments, and which have to be interpreted as representing the same node in a model. In this paragraph, we introduce several node labelling systems which make it possible to control how tree fragments may be combined. Such a node labelling system is called principle, and comes with a combination scheme C which can be used as a parameter of the description language L introduced above. In the remainder of this Section we first report on the use of L without any parameter (L(0)). / We then introduce a first extension of L where node variables are labelled with global names (L(g names)). Secondly, we introduce an extension of L where node variables are labelled with local names (L(l names)). Finally, we introduce a third extension of L where node variables are labelled with colours L(colours), which facilitates grammar writing by automatically identifying node variables according to their colour. Note that L(0) / has been used by Xia (2001), and L(g names) by Candito (1999) to describe large TAG for English and French respectively. While introducing these languages, we show that none of them is appropriate for describing the lexicon of a large French TAG.
102
C HAPTER F IVE
Language L(0). / L(0) / does not use any labelling constraint. The combination schema C is thus empty. In other words, when combining elementary fragments to describe trees, there is no other constraint than unification between node annotations (i.e., feature structures). With such a language, one can independently describe fragments such as those depicted in Figure 5.5,10 where (D0 ) describes a relative NP and (D1 ) a transitive construction. Their combination generates the two models depicted in Figure 5.6 (labelled M0 and M1 respectively). S S NP↓
S NP↓
(D0 )
VP
(D1 )
NP V
NP
ε Figure 5.5: Fragments described using the language L(0) /
S S NP↓ NP↓
S
S (M0 ) NP↓
NP↓
VP V
ε
(M1 )
VP V
NP
NP
ε
Figure 5.6: Models for the combination of fragments of Figure 5.5
As it stands, this language faces an expressivity limit for it does not allow to precisely constrain the way fragments combine. For instance, in the French TAG, the fragment combination depicted in Figure 5.7 is badly handled since it yields, among others, the results depicted in Figure 5.8 where (D2 ) represents a cleft construction and (D3 ) a canonical object construction. In such a case, only (M2 ) is normally deemed linguistically valid. (M3 ) and (M4 ) represent cases where the cleft construction and the canonical object construction have been mixed. 10
These fragments and the related models are those used by Xia in the context of the XTAG English Grammar.
C ONSTRAINT- DRIVEN G RAMMAR D ESCRIPTION
103
S V
N↓
Cl↓ V
C est That is
Jean John
S
S C
V
qui qui who
mange eats
≺+
V ... ...
(D2 )
N↓ (D3 ) la pomme the apple
Figure 5.7: Fragments described using the language L(0) / (continued) S V
S
N↓
S
V
N↓
N↓
V
C
S
(M2 ) Cl↓
V
C
V
(M3 )
N↓
Cl↓
qui
V
qui S V
N↓
S (M4 )
Cl↓
V
C
V
qui Figure 5.8: Models for the combination of fragments of Figure 5.7
Language L(g names). Candito (1999) introduces an instance of L(C) that constrains combinations according to global names labelling node variables. Such a language makes it possible to avoid cases such as the one outlined above. The combination scheme C is defined as follows: (i) a finite set of names, such that each node of a tree description is associated with a unique name, and (ii) a interpretation scheme constraining two nodes sharing the same name to be interpreted as denoting the same node in a model. In other words, a model is valid iff (i) every node has exactly one name and (ii) there is at most one node with a given name.11 In this context, the only model resulting from combining (D4 ) with (D5 ) from Figure 5.9 (where node names are written next to categories) is (M2 ) from Figure 5.8 above. 11
To be complete, Candito uses additional operations to map multiple names onto a single node. However this does not affect the content of our discussion.
104
C HAPTER F IVE
Sextr Vvbar
N↓arg-subj
Sm
Sm (D4 )
Cl↓ceCl
VcleftV
Ccomp Vanchor
Vanchor ≺+ N↓arg-obj
(D5 )
quicomplex Figure 5.9: Fragments described using the language L(g names)
L(g names) thus corrects some of the shortcomings of L(0): / naming ensures here that the canonical argument (D5 ) cannot be realised within the cleft argument (D4 ). However, L(g names) is unsatisfactory for two main reasons: • first, the grammar writer has to manage naming by hand, and must handle the issues arising from name conflicts ; • second, the grammar writer cannot use the same tree fragment more than once in the same description. The second shortcoming makes this language inadequate to describe a French TAG . Indeed, in the case of a double PP complementation, as shown in Figure 5.10, one cannot use the fragment (D6 ) more than once to yield (M5 ), since identical names must denote identically the same nodes. S N
PPpp (D6 )
V
(M5 ) Pprep
PP P1
N↓pparg
PP N↓ P2
Jean parle de Marie à John tells Paul about Mary
N↓ Paul
Figure 5.10: Description of double PP complementation
Language L(l names). A first extension of L(g names) consists in restricting the scope of the names labelling node variables. We thus define a language where names are by default local to the fragment they appear in: L(l names). With such a language it becomes possible to handle cases such
C ONSTRAINT- DRIVEN G RAMMAR D ESCRIPTION
105
as the double PP complementation above. The idea is the following: one can instantiate a given tree fragment (e.g. (D6 ) from Figure 5.10) twice, and associate each of these two instances with a local name (e.g., D6 −1 and D6 −2 ). Then, one can refer to, say the root node of each of these two instances, by using a specific access operator (written ".") as follows:12 D6 −1 .PP and D6 −2 .PP. This makes it possible to specify how to combine these instances (e.g., D6 −1 .PP ≺ D6 −2 .PP). Handling local names improves L(g names) in so far as it permits us to (i) deal with issues arising from name conflicts, and (ii) instantiate a given fragment more than once. It thus offers an interesting level of expressivity. Nonetheless, it is not fully satisfactory, because one may need to manage a large number of local variables when describing linguistic tree fragments. Language L(colours). L(colours) was designed to overcome the shortcomings of languages L(0), / L(g names) and L(l names). It makes it possible to constrain the way fragments combine more precisely than language L(0), / and more concisely than language L(l names), while avoiding the naming conflicts of language L(g names). To do this, the combination scheme C used in L(colours) associates node variables with the following colour-based labels: black (•B ), white (◦W ), red (•R ) or failure (⊥). This scheme comes also with (i) combination rules to control the way colour-labelled node variables can be identified (see Figure 5.11), and (ii) an additional condition on model admissibility, which is that each node must be either red or black. •B •R ◦W ⊥
•B ⊥ ⊥ •B ⊥
•R ⊥ ⊥ ⊥ ⊥
◦W •B ⊥ ◦W ⊥
⊥ ⊥ ⊥ ⊥ ⊥
Figure 5.11: Combination scheme for L(colours)
As an illustration of the expressivity and control brought by L(colours), consider the colour-enriched descriptions in Figure 5.12. These yield only the desired model (M2 ) in Figure 5.8 page 103.13 12
Strictly speaking, the name PP in the definition of (D6 ) must have been exported, to allow us to access it, see Section 5.3.2. 13 Colours can also be used to express double PP complementation (Figure 5.10).
106
C HAPTER F IVE
S•R V•R
N↓•R
S• B
S◦ W (D7 )
Cl↓•R
V•R
C• R
V•B
V◦W ≺+ N↓•R
(D8 )
qui•R Figure 5.12: Fragments described using the language L(colours)
Colours can be compared with systems based on resources and requirements, such as Interaction Grammars (Perrier, 2003). A tree is well-formed iff it is saturated. Saturation is represented by red or black nodes, while non-saturation is represented by white ones. This language L(colours) (together with L(l names)) was implemented within the eXtensible MetaGrammar (XMG) framework14 and used in the development of a large French TAG built on the analysis of Abeillé (2002).15 Alternative instances of L(C) might be suitable for other syntactic systems. For example a combination schema based on polarities could serve as a foundation for Polarised Unification Grammars (Kahane, 2006).
5.3.2 A
LANGUAGE FOR COMBINING TREE FRAGMENTS
For a grammar description language to be expressive enough, it must allow for the description of variant structures (alternatives such as active / passive). Here we introduce the language LC whose role is to offer means to precisely control how fragments can be combined. Controlling fragment combinations. LC offers three mechanisms to handle fragments: abstraction via parameterised classes (association of a name and zero or more parameters with a content), conjunction (accumulation of contents), and disjunction (non-deterministic accumulation of contents). A content can either refer to a tree fragment (represented by a tree description logic formula φ ), to an abstraction (referred to by its name) or to a combi14 15
See http://sourcesup.renater.fr/xmg. The current XMG framework is not restricted to the specific case of TAG, it has been adapted to other cases of tree-based syntactic systems such as Interaction Grammars or Multi-Components TAG (Crabbé et al., 2013).
C ONSTRAINT- DRIVEN G RAMMAR D ESCRIPTION
107
nation of contents. A formula in LC is thus built as follows: Class
:=
Name[p1 , . . . , pn ] → Content
Content
:=
φ | Name[. . . ] | Content ∨ Content | Content ∧ Content
This language permits us for instance to describe the variants introduced in Section 5.2.2, these are copied below for convenience: TransitiveVerb
→
Subject ∧ ActiveVerb ∧ Object
Subject Object
→ →
CanonicalSubject ∨ RelativeSubject CanonicalObject ∨ WhObject
Note that this language LC can be understood as a Definite Clause Grammar (DCG) (Pereira and Warren, 1980), which characterises the different ways fragments can combine (the result of their actual combination depends on the parametric language L used to describe fragments). Managing namespaces. When combining tree fragments defined using the language L(l names), one may want to refer to names defined elsewhere (i.e., in other fragments). Since the names are by default local to a fragment, that is local to a class, for a name to be accessible from outside its fragment, this name must be exported. The definition of a class in LC is thus refined as follows: Class
:=
Name[p1 , . . . , pn ] → Content
exports n1 , . . . , nk
where n1 , . . . , nk are names defined in Content. Once a local name n (labelling a node variable) is exported, it becomes accessible from outside its fragment. To access it, one needs to (i) assign a local name c to this fragment when it is instantiated, and then (ii) use a specific access operator (written ".").16 In this new fragment, n is then accessible as c.n, the definition of Content in LC is thus extended as follows: Content := φ | c = Name[. . . ] | Content ∨ Content | Content ∧ Content
5.3.3 T OWARDS
A LIBRARY OF LINGUISTIC PRINCIPLES
We would like to generalise the approach from Section 5.3.1, by defining additional node labellings, which could be combined with node colouring to 16
Cf. the definition of L(l names) in Section 5.3.1.
108
C HAPTER F IVE
provide the linguist with a more expressive description language. The idea is to define a library of linguistic principles, all based on the description language L introduced above. These principles allows to further restrict the way nodes can be identified when combining tree fragments. Clitic ordering. Let us first introduce the clitic ordering principle. In French clitics are non tonic particles with two specific properties already identified by Perlmutter (1970): first they appear in front of the verb in a fixed order according to their rank (1a-1b), which is a property of type integer, and second two different clitics in front of the verb cannot have the same rank (1c). In the examples below, the clitic il has rank 1 and le, la rank 3. (1) a. Il1 la3 mange He eats it b. *La3 il1 mange *It he eats c. *Il1 le3 la3 mange *He eats it it Let us extend the description for transitive verbs given in Section 5.2.2 as follows: TransitiveVerb Subject
→ →
Subject ∧ ActiveVerb ∧ Object CanonicalSubject ∨ RelativeSubject ∨ CliticSubject
Object
→
CanonicalObject ∨ WhObject ∨ CliticObject
Here, the Subject and Object clauses allow (among others) for a clitic subject and a clitic object whose definitions are as follows: CliticSubject →
CliticObject → V
Cl↓[case=nom,rank=1]
≺+
V V
Cl↓[case=acc,rank=3]
≺+
V
When realised together, none of these clitic descriptions defines how these clitics are ordered relatively to each other; therefore a combination of these two descriptions yields the following two models:
C ONSTRAINT- DRIVEN G RAMMAR D ESCRIPTION
109
V (M6 )
Cl↓[cse=nom,rank=1]
Cl↓[cse=acc,rank=3]
V
V (M7 ) Cl↓[cse=acc,rank=3]
Cl↓[cse=nom,rank=1]
V
where (M7 ) is an undesirable solution in French. Here we describe grammatical structures by combining elementary fragments (i.e., local information), which interact when realised together. Such an issue is dealt with by using a tree well-formedness principle stating that sibling nodes of category Clitic have to be ordered according to their ranking property (such a rank property actually labels node variables, we are thus considering L(rank)). In our example, this principle ensures that only (M6 ) is a valid model. Clitic uniqueness. To ensure clitic uniqueness, L(rank) is extended with a well-formedness principle stating that valid models cannot have two nodes labelled with the same value for the rank property. Extraction uniqueness. We assume that, in French, only one argument of a given predicate may be extracted.17 Following this, the extraction principle is responsible for ruling out tree models where more than one node would be associated with the property extraction. Like the node-colouring principle, the principles introduced above depend on a specific natural language (here French), or on a specific target formalism (here TAG).18 Principles are additional parametric constraints that can be used (or not) for constraining the described models. One could define a library of such principles that would be activated according to the target language or formalism.
5.4 C ROSS FRAMEWORK GRAMMAR DESIGN USING METAGRAMMARS
The description languages introduced above capture linguistic generalisations and make it possible to reason about language at an abstract level. 17
While seldom cases of double extraction have been reported in French, these are so unnatural that they are generally ruled out of grammar implementations. 18 See also (Le Roux et al., 2006).
110
C HAPTER F IVE
Describing language at an abstract level is not only interesting for structure sharing within a given framework, but also for information sharing between frameworks and / or languages. This observation was already made by Clément and Kinyon (2003a); Clément and Kinyon (2003b). In their papers, the authors showed how to extend an existing metagrammar for TAG so that both a TAG and a LexicalFunctional Grammar (LFG) could be generated from it. They annotated TAG metagrammatical elementary units (so-called classes) with extra pieces of information, namely (i) LFG’s functional descriptions and (ii) filtering information to distinguish common classes from classes specific to TAG or LFG. The metagrammar compilation then generated an extended TAG, from which LFG rules were extracted. To maximise the structure sharing between their TAG and LFG metagrammars, the authors defined classes containing tree fragments of depth one. These fragments were either combined to produce TAG trees or associated with functional descriptions to produce LFG rules. This cross-framework experiment was applied to the design of a French / English parallel metagrammar, producing both a TAG and a LFG. This work was still preliminary. Indeed (i) it concerned a limited metagrammar (the target TAG was composed of 550 trees, and the associated LFG of 140 rules) (ii) more importantly, there is no clear evidence whether a generalisation to other frameworks and / or languages could be possible (metagrammar implementation choices, such as tree fragment depth, were not independent from the target frameworks). Here, we introduce a more generalised approach by using a metagrammatical language (namely XMG), and show how it can handle an arbitrary number of distinct target frameworks. The linguist can thus use the same formalism to describe different frameworks and grammars.19 This Section is organised as follows. In Section 5.4.1, we briefly introduce LFG and present an extension of XMG to describe LFG grammars. In Section 5.4.2, we introduce Property Grammar (PG), and present a second extension of XMG to generate PG grammars. Finally, in Section 5.4.3, we generalise over these two extensions, and define a layout for cross-framework grammar engineering.20 19
Nonetheless, if one wants to experiment with multi-formalism, e.g., by designing a parallel TAG / LFG grammar, nothing prevents her / him from defining “universal” classes, which contain metagrammatical descriptions built on a common sublanguage. 20 Note that, in this Section, we consider formalisms, namely Lexical-Functional Grammar and Property Grammar, whose expressivity goes beyond that of TAG. In other terms, they are not mildly-context sensitive.
C ONSTRAINT- DRIVEN G RAMMAR D ESCRIPTION
5.4.1 P RODUCING A LEXICAL - FUNCTIONAL
111
GRAMMAR USING A
METAGRAMMAR
Lexical-Functional Grammar. sists of three main components:
A Lexical-Functional Grammar (LFG) con-
1. context-free rules annotated with functional descriptions, 2. well-formedness principles, and 3. a lexicon. From these components, two main interconnected structures can be built:21 a c(onstituent)-structure, and a f(unctional)-structure. The c-structure represents a syntactic tree, and the f-structure grammatical functions in the form of recursive attribute-value matrices. As an example of LFG, consider the Figure 5.13 below. It contains a toy grammar and the c- and f-structures for the sentence “John loves Mary”. In this example, one can see functional descriptions labelling context-free rules (see (1) and (2)). These descriptions are made of equations. For instance, in rule (1), the equation (↑ SUBJ) =↓ constrains the SUBJ feature of the functional description associated with the left-hand side of the context-free rule to unify with the functional description associated with the first element of the right-hand side of the rule. In other words, these equations are unification constraints between attribute-value matrices. Nonetheless, these constraints may not provide enough control on the f-structures licensed by the grammar, LFG hence comes with three additional well-formedness principles (completeness, coherence and uniqueness) (Bresnan, 1982). Extending XMG for LFG. In the previous Section, we defined the XMG language, and applied it to the description of TAG. Let us recall that one of the motivations of metagrammars in general (and of XMG in particular) is the redundancy which affects grammar extension and maintenance. In TAG, the redundancy is higher than in LFG. Still, as mentioned by Clément and Kinyon (2003a), in LFG there are redundancies at different levels, namely within the rewriting rules, the functional equations and the lexicon. Thus, the metagrammar approach can prove helpful in this context. Let us now see what type of language could be used to describe LFG.22 21 22
This connection is often referred to as functional projection or functional mapping. A specification language for LFG has been proposed by Blackburn and Gardent (1995), but it corresponds more to a model-theoretic description of LFG than to a metagrammar.
112
C HAPTER F IVE
Toy grammar: (1)
S ↑=↓
→
NP (↑ SUBJ) =↓
VP ↑=↓
(2)
VP ↑=↓
→
V ↑=↓
NP (↑ OBJ) =↓
(3) John
NP, (↑ PRED) = JOHN , (↑ NUM) = SG, (↑ PRES) = 3
(4)Mary
NP, (↑ PRED) = MARY , (↑ NUM) = SG, (↑ PRES) = 3
(5)loves
V, (↑ PRED) = LOV E(↑ SUBJ) (↑ OBJ) , (↑ T ENSE) = PRESENT
c-structure:
f-structure: S
↑=↓
⎡
NP
VP
(↑ SUBJ) =↓
↑=↓
John
Figure 5.13:
V
NP
↑=↓
(↑ OBJ) =↓
loves
Mary
PRED ⎢ ⎢ ⎢SUBJ ⎢ ⎢ ⎢ f1 : ⎢ ⎢ ⎢ ⎢OBJ ⎢ ⎣
⎤ ’LOVE (↑ SUBJ) (↑ OBJ) ’ ⎡ ⎤ ⎥ PRED ’JOHN’ ⎥ ⎥ ⎦ f2 : ⎣NUM SG ⎥ ⎥ PERS 3 ⎥ ⎥ ⎤ ⎡ ⎥ PRED ’MARY’ ⎥ ⎥ ⎦ f3 : ⎣NUM SG ⎥ ⎦ PERS 3
TENSE PRESENT
LFG grammar and c-and f-structures for the sentence “John loves Mary”
To describe LFG at an abstract level, one needs to describe its elementary units, which are context-free rules annotated with functional descriptions (e.g., equations) and lexical entries using attribute-value matrices. Contextfree rules can be seen as trees of depth one. Describing such structures can be done in XMG using a description language similar to the one for TAG, i.e., using the (dominance) and ≺ (precedence) relations. One can for instance define different context-free backbones according to the number of elements in the right-hand sides of the LFG rules. These backbones are encapsulated in parameterised XMG classes, where the parameters are used to assign a syntactic category to a given element of the context-free rule, such as in the class BinaryRule below. BinaryRule[A, B,C] → (x[cat : A] y[cat : B]) ∧ (x z[cat : C]) ∧ (y ≺+ z) exports x, y, z
C ONSTRAINT- DRIVEN G RAMMAR D ESCRIPTION
113
We also need to annotate the node variables x, y, z with functional descriptions. Let us see how these functional descriptions FDesc are built:23 Fdesc := ∃(g FEAT ) |← (g FEAT ) | (g ∗ FEAT ) | (g FEAT ) CONST VAL | Fdesc ∨ Fdesc | (Fdesc ) | Fdesc ∧ Fdesc where g refers to an attribute-value matrix, FEAT to a feature, VAL to a (possibly complex) value, CONST to a constraint operator (= for unification, =c for constraining unification, ∈ for set membership, = for difference), (FDesc ) to optionality, and ∗ to LFG’s functional uncertainty. Note that g can be complex, that is, it can correspond to a (relative – using ↑ and ↓ – or absolute) path pointing to a sub-attribute-value matrix. To specify such functional descriptions, we can extend XMG in a straightforward manner, with a dedicated dimension and a dedicated description language LLFG defined as follows: DescLFG := x y | x ≺ y | x ≺+ y | x = y | x : | xFd | Fd
DescLFG ∧ DescLFG := g | ∃g. f | g. f = v | g. f =c v | g. f ∈ v | ← Fd | Fd ∨ Fd |
(Fd ) | Fd ∧ Fd g, h := ↑ | ↓ | h. f | f ∗ i where g, h are variables denoting attribute-value matrices, f , i (atomic) feature names, v (possibly complex) values, and . . . corresponds to LFG’s functional mapping introduced above. With such a language, it now becomes possible to define an XMG metagrammar for our toy LFG as follows.24 Srule → br = BinaryRule[S, NP, VP] ∧ br.x↑=↓ ∧ br.y(↑ .SUBJ) =↓ ∧ br.z↑=↓ V Prule → br = BinaryRule[VP, V, NP] ∧ br.x↑=↓ ∧ br.y↑=↓ ∧ br.z(↑ .OBJ) =↓ In this toy example, the structure sharing is minimal. To illustrate what can be done, let us have a look at a slightly more complex example taken from 23
We do not consider here additional LFG operators, which have been introduced in specific LFG environments, such as shuffle, insert or ignore, etc. 24 Here, we do not describe the lexical entries, these can be defined using the same language as the LFG context-free rules, omitting the right-and-side.
114
C HAPTER F IVE
(Clément and Kinyon, 2003a): VP → V (NP) PP (NP) ↑=↓ (↑ OBJ) =↓ (↑ SecondOBJ) =↓ (↑ OBJ) =↓ Here, we have two possible positions for the NP node, either before or after the PP node. Such a situation can be described in XMG as follows: V Prule2 → br = BinaryRule[VP, V, PP] ∧ u[cat : NP] ∧ br.y ≺+ u ∧ br.y↑=↓ ∧ br.z(↑ .SecondOBJ) =↓ ∧ u(↑ .OBJ) =↓ Here, we do not specify the precedence between the NP and PP nodes. We simply specify that the NP node is preceded by the V node (denoted by y). When compiling this description with a solver such as the one for TAG, two solutions (LFG rules) will be computed. In other terms, the optionality can be expressed directly at the metagrammatical level, and the metagrammar compiler can directly apply LFG’s uniqueness principle. It is worth stressing the fact that the metagrammar here not only allows for structure sharing via the (conjunctive or disjunctive) combination of parameterised classes, but it also allows to apply well-formedness principles to the described structures. In the example above with the two NP nodes, this well-formedness principle is checked on the constituent structure and indirectly impacts the functional structure (which is the structure concerned with these principles). If we see the functional structures as graphs and equations as constraints on these, one could imagine to develop a specific constraint solver. This would allow to turn the metagrammar compiler into an LFG parser, which would, while solving tree descriptions for the constituent structure, solve graph-labelling constraints for the functional structure.
5.4.2 P RODUCING
A PROPERTY GRAMMAR USING A METAGRAMMAR
Property Grammar (PG). This formalism has been introduced by Blache (2000). It differs from TAG or LFG in so far as it does not rely on a rewriting system. In PG, one defines the relations between syntactic constituents not in terms of rewriting rules, but in terms of local constraints (the so-called properties).25 The properties licensed by the framework rely on linguistic ob25
An interesting characteristic of these constraints is that they can be independently violated, and thus provide a way to characterise ungrammatical sentences.
C ONSTRAINT- DRIVEN G RAMMAR D ESCRIPTION
115
servations, such as linear precedence between constituents, co-occurrency, mutual exclusion, etc. Here, we will consider the following 6 properties, that constrain the relations between a constituent (i.e., the node of a syntactic tree), with category A and its sub-constituents (i.e., the daughter-nodes of A):26 Obligation Uniqueness Linearity Requirement Exclusion Constituency
A: B A : B! A:B≺C A:B⇒C A : B ⇔ C A:S
at least one B child at most one B child B child precedes C child if a B child, then also a C child B and C children are mutually exclusive children must have categories in S
In a real-size PG, such as the French PG of Guénot (2006), these properties are encapsulated (together with some syntactic features) within linguistic constructions, and the latter arranged in an inheritance hierarchy27 . An extract of the hierarchy of Guénot (2006) is presented in Figure 5.14 (fragment corresponding to basic verbal constructions). V (Verb)
INTR ID|NATURE SCAT 1 .SCAT CAT V const. : V : 1 SCAT ← (aux-etre ∨ aux-avoir)
V-n (Verb with negation) inherits V RECT 1 INTR SYN NEGA DEP Adv-n Adv-ng uniqueness : Adv-np ! requirement : 1 ⇒Adv-n linearity : Adv-ng≺ 1 : Adv-ng≺Adv-np : Adv-np≺ 1 .[MODE in f ] : 1 .[MODE ← in f ] ≺Adv-np
V-m (Verb with modality) inherits V ; V-n RECT 1 INTR SYN INTRO DEP Prep uniqueness : Prep! requirement : 1 ⇒Prep linearity : 1 ≺Prep
Figure 5.14: Fragment of a PG for French (basic verbal constructions)
Let us for instance have a closer look at the properties of the V-n construction of Figure 5.14. It says that in French, for verbs with a negation, 26 27
Here, we omit lexical properties, such as cat(apple) = N. Note that this hierarchy is a disjunctive inheritance hierarchy, i.e., when there is multiple inheritance, the subclass inherits one of its super-classes.
116
C HAPTER F IVE
this negation is made of an adverb ne (labelled with the category Adv-ng) and / or an adverb pas (or a related adverb such as guère, labelled with the category Adv-np). These adverbs, if they exist, are unique (uniqueness property), and linearly ordered (linearity property). When the verb is an infinitive, it comes after these adverbs (e.g., ne pas donner (not to give) versus je ne donne pas (I do not give)). Extending XMG for PG. In order to describe PG, we need to extend the XMG formalism with linguistic constructions. These will be encapsulated within XMG’s classes. Following what has been done for LFG, we extend XMG with a dedicated dimension and a dedicated description language LPG . Formulas in LPG are built as follows: DescPG := x | x = y | x = y | [ f :E] | {P} | DescPG ∧ DescPG P := A : B | A : B! | A : B ≺ C | A : B ⇒ C | A : B ⇔ C | A : B where x, y correspond to unification variables, = to unification, = to unification failure, E to some (possibly complex) expression to be associated with the feature f , and {P} to a set of properties. Note that E and P may share unification variables. With this language, it is now possible to define the above V, V-n and V-m constructions as follows: V class → [INTR : [ID | NATURE : [CAT : X.SCAT]]] ∧ (V : X) ∧ (X = [CAT : V, SCAT : Y ]) ∧ (Y = aux−etre) ∧ (Y = aux−avoir) V −n → V class ∧ [INTR:[SYN:[NEGA:[RECT:X, DEP:Adv−n]]]] ∧ (V : Adv−ng!) ∧ (V : Adv−np!) ∧ (V : X ⇒ Adv−n) ∧ (V : Adv−ng ≺ X) ∧ (V : Adv−ng ≺ Adv−np) ∧ (V : Adv−ng ≺ Y ) ∧ (V : Z ≺ Adv−np) ∧ (Y = inf) ∧ (Y = X.mode)∧ ← (Z = inf) ∧ (Z = X.mode) V −m → (V class ∨V −n) ∧ [INTR:[SYN:[INTRO:[RECT:X, DEP:Prep]]]] ∧ (V : Prep!) ∧ (V : X ⇒ Prep) ∧ (V : X ≺ Prep)
Note that the disjunction operator from XMG’s control language LC allows us to represent Guénot (2006)’s disjunctive inheritance. Also, compared with TAG and LFG, there is relatively few redundancy in PG, for redundancy is already dealt with directly at the grammar level, by organising the constructions within an inheritance hierarchy based on linguistic motivations.
C ONSTRAINT- DRIVEN G RAMMAR D ESCRIPTION
117
In the same way as for LFG, the metagrammar could be extended so that it solves the properties contained in the final classes, according to a sentence to parse. This could be done by adding a specific constraint solver such as that of Duchier et al. (2014) as a post-processor of the metagrammar compilation. In other words, the linguist would define grammar properties, which describe the relations between constituents. These properties would be defined hierarchically to maximise information sharing, using a metagrammar. The metagrammar compiler would then (i) produce the target property grammar (automatic generation of the redundancy underlying the input metagrammar) and then (ii) compute syntactic tree models whose leaf nodes would be the words of the input sentence. Note that these models would not need all properties to be satisfied, hence accounting for ungrammatical sentences (Duchier et al., 2009).
5.4.3 T OWARDS
EXTENSIBLE METAGRAMMARS
We have seen two extensions of the XMG formalism to describe not only TAG grammars, but also LFG and PG ones, these rely on the following concepts: • The metagrammar describes a grammar by means of conjunctive and / or disjunctive combinations of elementary units (using a combination language LC ). • The elementary units of the (meta)grammar depend on the target framework, and are expressed using dedicated description languages (L, LLFG , LPG ). When compiling a metagrammar, the compiler executes the logic program underlying LC (i.e., unfolds the combination rules) while storing the elementary units of LD|LFG|PG in dedicated accumulators. The resulting accumulated descriptions may need some additional post-processing (e.g., tree description solving for TAG). Thus, to extend XMG into a cross-framework grammar engineering environment, one needs (i) to design dedicated description languages, and (ii) to develop the corresponding pre / post-processing modules (e.g., metagrammar parsing / description solving). A first version of XMG (XMG 1) was developed in Oz-Mozart.28 It implements the language described in Section 5.3, and supports tree-based formalisms, namely TAG and Interaction Grammar (Perrier, 2000). It has 28
See http://sourcesup.cru.fr/xmg and http://www.mozart-oz.org.
118
C HAPTER F IVE
been used to design various large tree grammars for French, English and German.29 The implementation of a new version of XMG (XMG 2) has started in 2010, in Prolog (with bindings to the Gecode Constraint Programming C++ library)30 , with the goal of supporting a truly modular grammar description, which will in turn facilitate cross-framework grammar engineering as presented here.
5.5 C ONCLUSION This chapter introduces a core abstract framework for representing grammatical information of tree based syntactic systems. Grammatical representation is organised around two central ideas: (1) the lexicon is described by means of elementary tree fragments that can be combined, (2) fragment combinations are handled by a control language, which turns out to be an instance of a DCG. The framework described here, generalises the TAG specific approaches of Xia (2001); Candito (1999), by providing a parametric family of languages for tree composition as well as constraints on tree well-formedness. This collection of description languages makes it possible to describe different types of linguistic structures (TAG’s syntactic trees, LFG’s functional descriptions, PG’s linguistic constructions), these structures being combined either conjunctively or disjunctively via a common control language. The formalism also applies specific constraints on some of these structures to ensure their well-formedness (e.g., rank principle for TAG). Using a formalism that can describe several types of grammar frameworks offers new insights in grammar comparison and sharing. This sharing appears naturally when designing parallel grammars, but appears also when designing distinct grammars (e.g., reuse of the combinations of elementary units). The implementation of the formalism introduced here is ongoing work. The goal is to provide the linguist with an extensible formalism, offering a rich collection of predefined description languages; each one with a library of principles, and constraint solvers to effect specific assembly, filtering, and verifications on the grammatical structures described by the metagrammar. 29
These are available on line, see http://sourcesup.cru.fr/projects/xmg and http://www.sfs.uni-tuebingen.de/emmy/res-en.html. 30 See https://launchpad.net/xmg and http://www.gecode.org.
C ONSTRAINT- DRIVEN G RAMMAR D ESCRIPTION
119
B IBLIOGRAPHY Abeillé, A. (2002). Une grammaire d’arbres adjoints pour le français. Editions du CNRS, Paris. Blache, P. (2000). Constraints, Linguistic Theories and Natural Language Processing. In D. Christodoulakis, editor, Natural Language Processing, Lecture Notes in Artificial Intelligence Vol. 1835. Springer. Blackburn, P. and Gardent, C. (1995). A Specification Language for Lexical Functional Grammars. In Proceedings of the Seventh Conference of the European Chapter of the Association for Computational Linguistics (EACL’95), pages 39–45, Dublin, Ireland. Bresnan, J. (1982). The passive in lexical theory. In The Mental Representation of Grammatical Relations. The MIT Press, Cambridge, MA. Bresnan, J., Kaplan, R. M., Peters, S., and Zaenen, A. (1982). Cross-serial dependencies in Dutch. Linguistic Inquiry, 13(4), 613–635. Candito, M.-H. (1999). Organisation Modulaire et Paramétrable de Grammaires Electroniques Lexicalisées. Ph.D. thesis, Université de Paris 7. Chomsky, N. (1957). Syntactic Structures. Mouton, The Hague. Clément, L. and Kinyon, A. (2003a). Generating parallel multilingual lfgtag grammars from a metagrammar. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pages 184– 191, Sapporo, Japan. Association for Computational Linguistics. Clément, L. and Kinyon, A. (2003b). Generating LFGs with a MetaGrammar. In M. Butt and T. Holloway King, editors, Proceedings of LFG-03 Conference, pages 106–125, Saratoga Springs, United States of America. CSLI Publications. Crabbé, B. (2005). Représentation informatique de grammaires fortement lexicalisées : Application à la grammaire d’arbres adjoints. Ph.D. thesis, Université Nancy 2. Crabbé, B., Duchier, D., Gardent, C., Le Roux, J., and Parmentier, Y. (2013). XMG : eXtensible MetaGrammar. Computational Linguistics, 39(3), 591–629.
120
C HAPTER F IVE
Duchier, D., Prost, J.-P., and Dao, T.-B.-H. (2009). A model-theoretic framework for grammaticality judgements. In Proceedings of the 14th International Conference on Formal Grammar (FG2009), pages 17–30, Bordeaux, France. Lecture Notes in Computer Science, Volume 5591, Springer. Duchier, D., Dao, T.-B.-H., and Parmentier, Y. (2014). Model-Theory and Implementation of Property Grammars with Features. Journal of Logic and Computation, 24(2), 491–509. Fang, J. and King, T. H. (2007). An LFG Chinese Grammar for Machine Use. In Proceedings of the Grammar Engineering Across Frameworks (GEAF07) Workshop, pages 144–160, Stanford. CSLI Publications. Guénot, M.-L. (2006). Éléments de grammaire du français pour une théorie descriptive et formelle de la langue. Ph.D. thesis, Université de Provence. Joshi, A. K. and Schabès, Y. (1997). Tree adjoining grammars. In G. Rozenberg and A. Salomaa, editors, Handbook of Formal Languages, volume 3, pages 69–123. Springer Verlag, Berlin. Kahane, S. (2006). Polarized unification grammars. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 137–144, Sydney, Australia. Association for Computational Linguistics. Koenig, J.-P. and Jurafsky, D. (1995). Type underspecification and on-line type construction in the lexicon. In Proceedings of the West Coast Conference on Formal Linguistics (WCCFL 94), UC San Diego. Le Roux, J., Crabbé, B., and Parmentier, Y. (2006). A constraint-driven metagrammar. In Proceedings of the Eighth International Workshop on Tree Adjoining Grammar and Related Formalisms (TAG+8), pages 9–16, Sydney, Australia. Meurers, W. D. and Minnen, G. (1995). A computational treatment of HPSG lexical rules as covariation in lexical entries. In Proceedings of the Fifth International Workshop on Natural Language Understanding and Logic Programming, Lisbon, Portugal. Pereira, F. and Warren, D. (1980). Definite clause grammars for language analysis —a survey of the formalism and a comparison to augmented transition networks. Artificial Intelligence, 13, 231–278.
C ONSTRAINT- DRIVEN G RAMMAR D ESCRIPTION
121
Perlmutter, D. (1970). Surface structure constraints in syntax. Linguistic Inquiry, 1, 187–255. Perrier, G. (2000). Interaction Grammars. In Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000), pages 600–606, Saarbrücken, Germany. Perrier, G. (2003). Les grammaires d’interaction. Habilitation à diriger les recherches en informatique. Université Nancy 2. Prolo, C. (2002). Systematic grammar development in the XTAG project. In Proceedings of the 19th International Conference on Computational Linguistics (COLING’02), Taipei, Taiwan. Prost, J.-P. (2008). Modelling Syntactic Gradience with Loose Constraintbased Parsing. Cotutelle Ph.D. Thesis, Macquarie University, Sydney, Australia, and Université de Provence, Aix-en-Provence, France. Shieber, S. M. (1984). The design of a computer language for linguistic information. In Proceedings of the 10th International Conference on Computational Linguistics and 22nd Annual Meeting of the Association for Computational Linguistics, pages 362–366, Stanford, California, USA. Association for Computational Linguistics. Xia, F. (2001). Automatic Grammar Generation from two Different Perspectives. Ph.D. thesis, University of Pennsylvania. XTAG Research Group (2001). A Lexicalized Tree Adjoining Grammar for English. Technical Report IRCS-01-03, IRCS, University of Pennsylvania.
C HAPTER S IX –S YNTAX 2– E XTENDING THE C ONSTRAINT S ATISFACTION FOR BETTER L ANGUAGE P ROCESSING K ILIAN A. F OTH , PATRICK M C C RAE , W OLFGANG M ENZEL
6.1 I NTRODUCTION The term “constraint” is often used in the description of systems for automatic language processing, but with various subtle and less subtle differences in meaning. This is not surprising, since its dictionary definition “any very general restriction on a sentence formation rule” is far too broad to be of any use in classification. Presumably, any solution to the NLP problem will allow some proposed solution structures and disallow others, or else it would be very uninstructive indeed; most if not all practitioners will have a much more specific working definition of the term. However, these understandings do not always exactly coincide. Therefore, before explaining a new refinement of our own constraintbased formalism, we want to take the time to review in what sense various other approaches that fall under the same umbrella term are based on constraints and what small and large differences are useful to keep in mind when comparing them. We hope that such a comparison may be of value
124
C HAPTER S IX
to practitioners in our modestly sized but relatively disparate community. We are particularly concerned with the more subtle differences in usage, because we feel that they cause more difficulties to communication between different sub-fields. A large and decisive homonymy, whose ambiguity is easily resolved by its pragmatic context, usually causes few problems of understanding. The potential for misunderstanding is much greater with nonobvious ambiguities, when both sides assume a specific meaning and neither is aware that an ambiguity even exists. Therefore we list some important dichotomies for distinguishing variants of constraints that a grammar might use, and which can influence the character of a system strongly even if their definitions appear to be very similar. When dealing with an ambiguous term, it is often helpful to ask with what other term it occurs in complementary distribution. In general use, a “constraint” is a compulsion or force that inhibits an otherwise natural or expected tendency: time constraints prevent an inspection tour from visiting locations that would clearly benefit from it, or social constraints delay the expected happy ending until book length has been achieved. On the face of it, such distinctions appear less appropriate for computer programs, which are entirely artificial and man-made: whatever a program accepts or rejects is supposed to depend entirely on the predilections of its author, and not on reconciling the myriad different conflicting pressures of the real world. Nevertheless, the dictionary does name two very important concepts: it speaks of rules and restrictions on them, and the idea of explicitly restricting possibilities that would otherwise be allowed does come up in computer programming. Operating systems provide a rich orthogonal set of operations, some of which are explicitly forbidden to particular users by means of access control lists or file permissions. Similarly, the TCP connections that underlie the Internet were cleverly programmed to allow any data whatsoever to be disseminated indifferently, but most endpoints now routinely restrict them again, in order to make sure that certain unwanted transmissions (e.g., spam e-mails or denial-of-service attacks) are not delivered after all. In the smaller scope of program development, the popular method of generate and test comes close to this arrangement. In its simplest form, one component of a system will systematically create solution candidates, which are then accepted or rejected by a second, independently programmed module. The tester may verify the plausibility of a structure, compute its cost or perform other checks to decide whether a given candidate is acceptable or not, or whether it is less preferred than another candidate. Often this is motivated by the general goal of modularity and reusability: for instance, it
E XTENDING THE CS FOR BETTER LP
125
may be much easier to check the well-formedness conditions on the finished result of the generator than to program well-formedness into the generating behaviour such that dispreferred structures are not created in the first place. With this division of labour, it is very easy to identify in a program the two concepts mentioned in the initial definition: the generator proceeds according to rules, while its output is subjected to restrictions by the testing component. A constraint is then simply one of the criteria by which testing occurs. However, rules and restrictions are often not as clearly separated from each other, and their respective import can be rather unequally distributed. At one extreme, a classical context-free grammar consists solely of a generating component with no additional conditions imposed: all strings that its rules generate are equally valid members of the described language. It was recognised early on that this model is far too simple to model human languages. Not every noun phrase combines equally well with every verb phrase to form a sentence, whether for reasons of concord or of semantic aptness or mere predominantly frequency-driven pragmatics: natural language is not context-free. But the insight that human languages behave similarly to the output of certain formal generating systems was too valuable to be abandoned altogether, so the basic model was developed further in various directions. Notably, most refinements were restrictions on the initial model, because it seemed to allow too many utterances that hearers find unacceptable. Although such additional restrictions play the role of filters in the generate-and-test model, they were not commonly called “constraints” in the early literature, and often they were not programmed (or posited) as a separate component. For instance, there is an obvious regularity in that a singular NP must associate with a singular VP and a plural NP with a plural VP. This could in theory be implemented as a filter that rejects entire sentences which violate this condition, but this might allow a huge proliferation of ultimately untenable solutions that a cleverer strategy could have “nipped in the bud”. The more appropriate solution was to enforce the regularity by extending the set of non-terminals from a generic NP token to at least a “singular NP” and a “plural NP” token, which would then combine respectively with a “singular VP” and “plural VP”. But this approach required writing multiple rules that differ only in subtle features (not just concord of number but various other properties), a redundancy abhorred by linguistic theory. At the same time, many of them seemed structurally similar because they all posited the same abstract condition of feature equality at various points in a rule. The solution was to
126
C HAPTER S IX
extend the non-terminals to generalised typed feature structures, in which the features are not just additional “decorators” of a normal noun or verb phrase, but constitute the major part of its identity (in fact, the non-terminal type of a phrase is commonly represented as just another feature). A production rule is then allowed to operate only if all of its constraints on feature attributes are satisfied. Although this approach still interleaves the generating and testing operations, it is a step towards the explicit testing of constraints that exist independently of the rules they restrict. Often the weight given to finer constraints rises at the cost of the generating component, which becomes much simpler. In the extreme case, as with Property Grammar (Blache and Prost, 2004), the generator is reduced to a combinatorial number of possible candidates, and all restrictions are imposed by constraints, which implements the concept of “generate and test” to the fullest. Note that linguistics practitioners often describe their grammars as “generative” whether the generating component is predominant or simplified, as long it is there. In contrast, computational linguists use terms such as “constraint-based” more freely if their systems occupy that side of the spectrum. However, even in this case it would be useful to distinguish formalisms that use constraints as one mechanism among several from constraint-based formalisms in the strictest sense, i.e., those whose grammatical knowledge is essentially expressed in constraints, while the generator itself is linguistically trivial. In the first case, constraints supplement the grammar; they may rank, guide or restrict the output of a component that could, in theory, operate all on its own, albeit at the price of greater output ambiguity. In the latter case, the constraints constitute the grammar. Whatever is used as input for testing will be so uniformly distributed that on its own it could not possibly qualify as a reasonable model of language.
6.2 T HE C ONSTRAINT S ATISFACTION P ROBLEM The most generally recognised use of the term “constraint” in computing is in connection with the Constraint Satisfaction Problem (Tsang, 1993). Formally, a CSP consists of a set of variables or domains D, a set of possible values V for each variable and a set of constraints C that imposes restrictions on the joint assignment of values to variables. The go-to example for explaining the concept of “constraints” to the general reader used to be the crossword puzzle: every box can in principle hold any letter of the alphabet, but the combined assignments must form intelligible words. Nowadays the most popular example is the ubiquitous Sudoku puzzle: each of the rows,
E XTENDING THE CS FOR BETTER LP
127
columns and 3x3 subsquares must hold each of the digits 1–9 precisely once. The term “constraint” has a clearly defined formal meaning in this context: it is simply a subset of the combinatorially defined set of joint assignments to one or more domains. For instance, the top row in a Sudoku puzzle could, in theory, be filled in any of 387,420,489 (99 ) ways, but only 362,880 (9!) of them actually satisfy the uniqueness constraint on rows. Note that this constraint is a much smaller subset of the total set of possibilities. The number of domains to which the constraint jointly applies (9) is called the arity of the constraint; the arity of a CSP is the maximal arity of any of its constraints. If the arity of a CSP exceeds 1, i.e., as soon as not all combined assignments are allowed, it is N P-complete. This means that many other combinatorial problems can be transformed equivalently into instances of the CSP, for instance Travelling salesman (find the shortest round trip connecting all nodes in a large graph) or satisfiability (can the Boolean variables in an arbitrary formula be chosen so that it evaluates to true?). In our context, the importance of the CSP is that it is an almost perfect fit for the generate-andtest paradigm: the uniform expression of structure as variable assignment allows a generator to be defined trivially, while the test consists in verifying that the joint assignment of values to any set of domains is a member of the subset defined by the constraint on those domains. This means that it is trivially easy to implement a correct solution algorithm. Unfortunately, N P-complete problems probably cannot be solved efficiently: so far, no solution method is known that is substantially faster than checking every combination individually. However, in practice it is often possible to exploit features of a concrete problem to reduce the effort. For instance, whenever a digit in a Sudoku puzzle is given in advance, none of the other cells in the same row, column or sub-square can be assigned that digit. This means that it can be removed from the 20 other domains altogether before we even start to generate joint assignments. Effectively, this imposes a new unary constraint (subset) on the value sets of these variables, since only a subset of the originally possible values remains. This method of combining the constraints given in the problem description into new, stronger constraints (smaller subsets) is called consistency propagation. A skilled Sudoku solver will usually aim to solve the puzzle exclusively through consistency propagation (the better ones deliberately use ink rather than pencils to demonstrate that they do not intend to use backtracking at all). More generally, the goal of propagation is to convert weak constraints with large arities into stronger constraints with smaller arities. A completely solved Sudoku puzzle has only one possible value left in each of the 81 domains, i.e., the original con-
128
C HAPTER S IX
straints have been reduced to 81 unary, trivial constraints: the solution can be read off immediately without searching at all. In practice, consistency propagation is often used in combination with heuristic search, which tries to search the more promising parts of the search space first. For instance, it can be useful to rearrange the domains by the size of their value sets, or such that variables restricted by many constraints are assigned first. A peculiar property of a CSP is that it is often not obvious from the problem statement whether a solution exists or not. A problem may be underconstrained (have several solutions) or over-constrained (have no solution). From the perspective of natural language processing, this would correspond to ambiguous or unacceptable utterances, respectively. The declarative nature of constraint programming can also be an advantage, in that the description of properties of a valid linguistic structure via constraints does not require any particular assumption about the mechanism that humans or machines use to operate on them. For instance, the CSP is susceptible to parallel execution on modern vector architectures in a way that many other algorithms aren’t.
6.3 NLP FORMALISMS AND THE CSP 6.3.1 C ONSTRAINTS
AS VALUE SUBSETS
To begin with, constraints in a formal model of language can be either explicit or implicit. For instance, a classical context-free grammar rule such as “S → NP VP” expresses constraints on both type and ordering within the same rule that licenses a production. As discussed above, this corresponds to a tight integration of rules and restrictions. The constraints “a surface subject must be of nominal type” and “surface subjects immediately precede their verb phrase” are not first-class elements of the formalism. In a more declarative formalism such as Property Grammar, both of these would be individual, explicit constraints. One advantage of this is that refining the set of categories or even restructuring it completely (perhaps in order to port a grammar to a different, related human language) is possible without disturbing the other organising principles of what is likely to be a complex system. We feel that a major distinction should be made between constraintbased formalisms whose constraints are eliminative in nature, i.e., they simply prevent some partial solutions that would otherwise be well-defined, and
E XTENDING THE CS FOR BETTER LP
129
those that are constructive, i.e., that are part of the mechanism that builds up the solutions from simpler components in the first place. In the former case the constraints defined by a grammar author often essentially constitute the entire grammar, while in the latter they only contribute to the definition of grammaticality. A precondition on the application of generative rules such as the one mentioned early in this section 6.3.1 is a good example of a constraint that plays a contributory role: it expresses the knowledge that the NP and the VP must either be both plural or both singular, so it can be viewed as a subset of two of the four theoretically possible derivations. But it applies only to a specific production rule that exists independently of the constraint; other productions may or may not operate under similar constraints. Also, the constraint alone could not construct a parse structure without the rule: since the number of elements in the structure that a derivational grammar assigns to an input string can vary depending on how many internal nodes are postulated (i.e., the size of the set of domains D is not a constant), constraints cannot filter the set of possible parses (combined assignments) until at least their size has been decided. In contrast, in a strict dependency grammar allowing only direct wordto-word subordinations, the number and the size of all domains in a given word problem is known in advance, so that only a trivial combinatorial generator is needed. In this case, constraints can constitute the entire grammatical knowledge expressed in a formalism, and it can be modelled completely as an instance of the general Constraint Satisfaction Problem. The parsing problem could then be solved with heuristics and solution methods developed by the greater constraint programming community. The CSP can be solved by a complete search of all possible assignments, but not all formalisms whose solutions are computed via complete search are instances of the CSP. The decisive distinction is between those that assign labels, structures etc. to a predetermined set of variables and those that construct a complex solution whose size and structure is not known from the outset. For instance, part-of-speech taggers assign a category to each of the words in their input, and the possible categories are typically well-known small sets. The rules on which tags can constitute a solution are often simple lexicon lookup and combinations of adjacent tags, i.e., constraints of arity 1, 2 or 3. The POS tagging problem is thus an almost perfect instance of a basic CSP, and was solved via lexicon look-up and independent lowarity tests very early, e.g. by Klein and Simmons (1963). In contrast, an approach such as Sign-Based Construction Grammar (Sag et al., 2012) does not model the parsing problem as the assignment of values to separate predefined variables. Instead, a single complex solution is constructed in which
130
C HAPTER S IX
typed feature structures are combined via unification. Exactly how the solution is structured depends on the possible combination rules and the available lexical items. While dynamic CSPs exist (see (Tsang, 1993)) in which the domain and value sets change over time (e.g., an ongoing scheduling problem might model new jobs arriving after the preliminary assignment has been made, or new processing stations come online), this approach would correspond to a CSP with only one domain, but an open-ended set of values that depend on previous derivation steps. This is different enough from the prototypical definition such that most constraint programming methods are not applicable. The type and compatibility constraints used here therefore correspond to “constraints” only in the more general sense. Although it is justified to call the approach “constraint-based”, it isn’t useful to model them as elements of the set C in an actual CSP. Other formalisms correspond more closely to the value-assignment model, but not always to its most basic form. The definition given above had a rather clear-cut definition of a constraint: it defines a particular subset of the entire set of assignment possibilities for one or more domains of the problem. This does not always coincide with the term “constraint” in general usage. Also, even in the context of CSPs the original definition has been extended in various ways that require qualifications to the term.
6.3.2 H ARD
AND SOFT
C ONSTRAINTS
In common parlance, “constraint” is sometimes used as a weaker term than “rule” or “law”: conflicting with someone’s time constraint does not sound as unacceptable as breaking a rule. But the CSP as originally defined makes no such distinction: if a value assignment is not covered by a constraint, then it is simply impossible, just as if this value did not exist. Often it is useful to consider joint assignments even if they do not satisfy all the constraints. This turns the classical CSP into a Partial CSP or PCSP (Freuder and Wallace, 1992), which aims to satisfy the constraints as well as possible. Although the PCSP does not occupy a higher class of formal complexity, solving a PCSP poses additional difficulties when compared to solving the corresponding CSP: for instance, consistency propagation cannot simply remove a value from a domain even if it contradicts some constraint, because it might still be part of the best solution. Also, even when a solution is found, it is not immediately clear whether processing can terminate, because further search might find an even better one. What constitutes the preferred solution in a PCSP can be defined in different ways. It could be the solution that violates the fewest constraints
E XTENDING THE CS FOR BETTER LP
131
(Minton et al., 1992), the least important constraints as in Optimality Theory (Prince and Smolensky, 1993), or the constraints with the lowest combined importance according to some additional metric (Blache et al., 2008). All such variants involve extending the definition of C: a constraint is no longer merely a subset of the Cartesian product of value sets, but also has a valuation attached which defines how it behaves in this preference metric. If the goal is to violate the fewest constraints, then all constraints can be considered to bear the same score, and the combined metric is simply an addition. If some constraints are more important than others, they can receive different ranks, and solutions are judged by a minimum function. In the most general case, each constraint bears its own score, and solutions are ranked by a metric such as addition or multiplication of the scores of violated constraints. Note that even in a PCSP there are often constraints that absolutely must hold, which are called hard constraints. In contrast, soft constraints can be violated if necessary. They correspond to the looser general usage of “constraint” as opposed to an “impossibility”. It is possible to solve the PCSP correctly using variants of complete search algorithms that keep track of the quality of the partial solution. Depending on whether a “perfect” solution (with all constraints satisfied) is expected to exist or not, it can be useful to start out with a normal CSP and only retract certain constraints in case no solution is found, like Maruyama (1990). Alternatively, constraints can be considered soft from the beginning.
6.3.3 U NIFORM
AND FREE - FORM
C ONSTRAINTS
Although a constraint in a CSP is formally a set of possible variable assignments, this is not necessarily the best way of expressing them. Clearly, it is more useful to formulate the row constraint in the Sudoku puzzle as a generalised rule such as “No two boxes in a row may contain the same number” than to list the 362,880 elements of the set. Similarly, computer programs can work with a direct representation of possible combinations if the sets are relatively small, but for problems with indefinite sizes it is more useful to write formulas to evaluate or program verifying routines to be run on candidate assignments. Even though these define subsets just as much as an explicit list does, determining membership in that subset may require more effort. After all, every empty box in a Sudoku could be said to allow “only those values that appear in the complete solution”, but this condition would be as hard to resolve as solving the entire puzzle. Therefore, it is important in what way a formalism allows constraints to be expressed. The usual trade-off here is that more complex formulations
132
C HAPTER S IX
allow for some generalisations to be expressed more easily and more elegantly, but on the downside may require more processing effort. As a counterbalance, theories sometimes stress the importance of expressing the entire knowledge codified in a grammar as one and only one type of construct, e.g., typed feature structures serving as lexical entries, derivation rules and as parsing output (Copestake, 2002).
6.3.4 A XIOMATIC
AND EMPIRICAL GRAMMARS
An advantage of constraints that take a simple consistent form is that they lend themselves well to automatic grammar acquisition: a meta-generator component can systematically enumerate all possible constraints, and those that restrict solutions in an adequate way are retained. Alternatively, all constraints of a particular form can be assumed and their different weights estimated by observing how often they are satisfied on a training corpus of solved problems. Note that this procedure is not restricted to systems on the “testing” end of the generate-and-test spectrum: it can be used just as well for introducing defeasible rules into a purely generative grammar, e.g., when a CFG is turned into a probabilistic CFG by weighting all of its rules. Axiomatic (hand-written) grammars are produced by an expert postulating generally valid rules of a language while relying on their own language competence, and only indirectly on empirical input. The classical example would be Noam Chomsky demonstrating that some aspects of English syntax could be captured by context-free production rules, by writing down these rules explicitly. In contrast, empirical (automatically computed) grammars are extracted automatically from a large amount of strings or syntax structures in that language, as when a probabilistic syntax analyser is trained on a tree bank to create a new PCFG. Although this is not a strict theoretical necessity, axiomatic grammars tend to allow for more complex and varied rules or constraints than empirical ones. Linguists often feel that their expertise cannot be notated accurately with just uniform atomic pieces of knowledge, while machine learning algorithms need rigidly defined search spaces in order to terminate. For efficient machine learning, we often want a space of possibilities that is not just finite, but also small. As a compromise, both approaches can be combined into a hybrid system by manually refining or pre-specifying an automatically computed language model. For instance, a PCFG could deduce both the probability of the production rules and the rules themselves from its input, or it could take the production rules as an input and merely estimate each probability.
E XTENDING THE CS FOR BETTER LP
133
6.4 D EPENDENCY G RAMMAR M ODELLING WITH L OCALLY-S COPED C ONSTRAINTS Weighted constraint dependency grammar (WCDG) models the parsing problem as a weighted CSP, with constraints expressed as logical formulas on dependency structures. WCDG resides near the “testing” end of the generate-and-test spectrum: it defines a completely uniform solution space where in principle, any word might modify any other with any label. Both fundamental properties, such as projectivity (and indeed, non-circularity), and more linguistically informed rules are notated alike as explicit constraints on the combinatorial possibilities. The soft constraints rank the solution candidates and define the preferred solution. Given sufficient time, a complete search will find the global optimum as defined by the given set of constraints. Since the complexity of constraint evaluation grows exponentially with the highest constraint arity in the grammar, real-world applications are often restricted in their arity by performance considerations. WCDG, for instance, is limited to the evaluation of unary and binary constraints. A challenge also arises from the very nature of the formalism: while for unification-based parsing approaches such as HPSG (Pollard and Sag, 1994) or LFG (Bresnan, 2001), propagating feature information through the solution structure constitutes a fundamental principle, but a constraint-based dependency system with a primarily localised view, WCDG cannot “transport” a feature from one place in a solution to another; it can only predefine multiple versions of a lexical item or a dependency label. Constraints must then be used to ensure that such variants co-occur consistently. In the following we refer to any property which cannot be tested for by a single localised constraint as supra-local. Properties requiring knowledge of the entire solution structure we refer to as global. Both the original Constraint Dependency Grammar, the similar eXtended Dependency Grammar (XDG) (Duchier, 1999) supports supra-local constraints by running on top of a general constraint programming system (Oz). We now outline how the two predicates in the WCDG formalism is and has open up supra-local modelling options for constraint-based systems that—in combination with certain solution procedures—permit access to complex feature information as if it were actually propagated along dependency edges similar to feature propagation in formalisms like LFG. The expressivity extension thus achieved provides solid grounds for challenging the traditional view of constraint-based formalisms as adopting a strictly lo-
134
C HAPTER S IX
calised view. Motivated by a range of German language modelling challenges, we present four applications of these predicates and illustrate how they can be employed to re-formulate constraints whose complexity otherwise would exceed that of a classical weighted constraint dependency formalism based on unary and binary constraints. Table 6.1: Operators in WCDG
X↑from X↓word X↑case X.label & | ->
˜
=
!=
>
Y↓from < X↑from;
This constraint can be read as follows: When a relative clause (Y.label = REL) modifies a word (Y↑from = X↓from) that is a subject (X.label = SUBJ), and the relative clause is right-modifying (Y\) while the subject is left-modifying (X/), then the relative clause itself (Y↓from) must occur to the left ( has(X↓id, SUBJ)
(Finite verbs need subjects.) X↓cat = VVFIN & X↓transitive = yes -> has(X↓id, OBJA)
(Transitive verbs need objects.) Although the obvious application of this operator is to enforce verb valency conditions, it lends itself to many related uses. For instance, German infinitive constructions can occur in the form of an infinitive with a particular marker word (of the category PTKZU), or as a special verb form that incorporates this marker (category VVIZU). Assuming that an external marker always bears the special-purpose label ZU, the has operator allows this condition to be expressed concisely: X↓cat = VVIZU | (X↓cat = VVINF & has(X↓id, ZU))
(X↓ is a valid infinitival construction.) The WCDG engine deduces the supra-local nature of a formula automatically by scanning its body and applies such constraints only if a complete solution candidate is available. Thus, conditions are still expressed over single edges or pairs of edges at a time, but during evaluation they can also examine additional neighbouring edges as required. While the has operator expresses conditions on the dependants of a word, the similar is operator tests the label of the dependency edge above a given word. For instance, German main clauses generally place exactly one constituent in front of the final verb. This can be expressed by a constraint that forbids two dependencies modifying the same verb from the left. However, this condition only holds in main clauses (in which the verb itself is labelled as S), but not for subclauses or relative clauses (labelled e.g. NEB or REL). Therefore, three edges would have to be tested to detect such an illegal configuration: the two right-modifying dependencies under the verb and the edge directly above. This would require a ternary constraint, which WCDG does not support. With the supra-local is operator, however, a single binary constraint suffices: {X/SYN/\Y/SYN} : Vorfeld : 0.1 : X↑cat = VVFIN -> ~is(X↑id, S);
138
C HAPTER S IX
In fact, this constraint is considerably faster to evaluate than an allquantified ternary constraint would be, because WCDG effectively has to check one additional edge (the one above the finite verb), and that only when the premise of the constraint actually holds. Thus, it allows for easier grammar development and more efficient evaluation than a grammar limited to strictly local constraints.
6.6.2 R ECURSIVE T REE T RAVERSAL One limitation of the supra-local operators is and has described so far is that they operate only on direct neighbours of the dependency edges to which they are applied. This is often sufficient, but there are phenomena which require the presence of structurally more distant features. For instance, a subclause should be analysed as a relative clause (REL instead of NEB) exactly if it is marked by the presence of a relative pronoun, but this pronoun does not always modify the finite verb directly: ”Es soll eine Art Frühwarnsystem eingerichtet werden, in dessen Zentrum der IWF steht.” (An early-warning system is planned whose center will be constituted by the IWF.) Similar cases of remote markers abound in German: the conjunction sondern is only used for phrases containing a negation somewhere, a genitive modifier must contain at least one overt genitive form, etc. To check such conditions, it is necessary to extend the semantics of the supra-local operators so that optionally they can also find indirect dependants or regents. In such cases it is useful to restrict the extended search in some way, both for operational and for linguistic reasons. For instance, when a subclause is modified by a nested relative clause, the subclause itself should not be labelled REL, even though the corresponding dependency subtree contains a relative pronoun further down. Similarly, in coordinated sentences the finite verb is labelled as KON (in asyndetic coordination) or CJ (in normal coordination) rather than S or NEB, so that even a lookup via is cannot determine whether main-clause or subclause ordering should be enforced; what counts is the label of the topmost finite verb in a coordination, which can be several edges apart. Therefore, the notion of ‘scope’ has been implemented for the extended versions of the non-local operators: when used with four arguments, the
E XTENDING THE CS FOR BETTER LP
139
search is extended across a specific set of labels, i.e. those which are subsumed by a particular pseudo-label in a special-purpose hierarchy. For instance, the actual test for sentence type in the Vorfeld constraint is closer to the following version: is(X↑id, S, Label, Konjunkt)
(X↑ is eventually labelled S) where Konjunkt subsumes both KON and CJ in the hierarchy ‘Label’. This construct effectively ascends the tree from a finite verb until a label other than KON or CJ is found, and compares this label to the main-clause marker S. The has operator has been extended in the corresponding way; for instance, it can be programmed to descend into a sentence labelled REL to detect a relative pronoun, but only until another subclause indicator such as REL, NEB or S intervenes. This use of a label-delimited semi-global search resembles the notion of barriers in Government and Binding theory (Chomsky, 1986), but it does not claim to be a fundamental principle. Indeed, by varying the set of labels to traverse, it can be restricted more or less; for instance, it can operate only upon an NP, or upon the entire tree structure.
6.6.3 L OCALISED A NCILLARY C ONSTRAINTS The syntax for the is and has predicates introduced so far permits to test for static attributes of the edges above or below the dependency under consideration. A useful extension to the concepts of is and has therefore is to include a check for the most general edge property expressible: the satisfaction of an arbitrary constraint. Since is and has are evaluated in the context of a normal constraint, we refer to their argument constraint as ancillary constraint. To motivate this extension linguistically, consider thematic role assignment in German non-modal perfect tense active sentences: Der Mann [ AGENT ] hat die Frau mit dem Fernrohr gesehen. (The man [ AGENT ] has seen the woman with the telescope.) In all of the following examples we assume the full verb to be agentive. SUBJ and AGENT dependencies then originate from the same node in the constraint net. Non-modal perfect tense active in German is a composite tense formed by a finite auxiliary in combination with a full verb’s past participle. In constraint terms, this tense can be characterised by a dependency with the following properties: An AUX edge (X.label=AUX) links a finite auxiliary verb form (X↑cat=VAFIN) of haben or sein (X↑base=haben
140
C HAPTER S IX
| X↑base=sein) as regent with a full verb’s past participle (X↓cat= VVPP) as dependent. Figure 6.1 illustrates that constraining the origin of the AGENT dependency to the origin of the SUBJ dependency (dotted) in a non-modal perfect passive sentence requires a ternary constraint involving the SUBJ, AGENT and AUX (dashed) edges. Moreover, formulating this constraint requires to impose restrictions on the AUX edge as well as on the nodes linked by it. In requiring satisfaction of an ancillary constraint via is or has, the origin of the AGENT dependency in German perfect tense active sentences can elegantly be formulated as follows: A SUBJ edge (X.label=SUBJ) meeting with an AUX edge that marks a perfect active sentence (has(X↑id, ’Detect perfect tense active’)) must have an edge originating from its bottom node (X↓id = Y@id) which bears the label AGENT (Y.label = AGENT). It is this use of has in combination with the ancillary constraint that allows us to express a genuinely ternary supra-local relation as a WCDG-licensed binary constraint. The ancillary constraint to be satisfied by the edge meeting the SUBJ dependency enforces exactly the set of properties previously identified for the detection of a perfect tense active sentence: {X:SYN} : X.label = & X↑cat = & (X↑base & X↓cat =
’Detect perfect tense active’ : ancillary : 1 : AUX VAFIN = haben | X↑base = sein) VVPP;
By employing the ancillary constraint as argument to has, we have effectively extended the scope of properties accessible on a neighbouring dependency—from access to a single static edge property to the full range of edge and node properties available. Constraint expressivity is enhanced because we can now create general custom predicates that neighbouring edges need to fulfil. Clearly, the conjunction of features X↑cat=VAFIN & (X↑base=haben | X↑base=sein) & X↓cat=VVPP was intractable with the static arguments to is or has presented in the previous sections. The elegance of this approach lies in the fact that ancillary constraints of arbitrary complexity can now be employed as re-usable functional blocks to perform checks for linguistically intuitive, yet formally complex properties over and over again. Notable from a performance point of view is, that the WCDG implementation is such that, once an ancillary constraint has been evaluated for a given edge, its result will be cached and afterwards is available for repeated use at no extra cost computationally.
E XTENDING THE CS FOR BETTER LP
141
Figure 6.1: AGENT assignment in German perfect tense active sentences
6.6.4 C ASCADING
AND
R ECURSIVE A NCILLARY C ONSTRAINTS
The ancillary constraints presented so far are localised unary constraints— and as such provide full access to the properties of the next-neighbour edges above and below a given dependency in the syntax tree. As we will now illustrate, the syntax-semantic interface exhibits phenomena the modelling of which requires even higher expressivity than is provided by the extended localised unary ancillary constraints. We proceed to describe an additional expressivity enhancement that utilises cascading and recursive calls to ancillary constraints. This enables us to model properties spanning across arbitrarily large sections of the dependency tree, e.g. global properties, with just binary constraints. As an example consider AGENT thematic role assignment in German passive sentences. The AGENT in a German passive sentence typically is embedded as the PP filler noun (X.label=PN) in a von-PP (X.label=PP & X↓word=von) which modifies the past participle of a full verb (X↑cat = VVPP) (see Figure 6.2).
142
C HAPTER S IX
Der Mann wird von der Frau [ AGENT ] mit dem Fernrohr gesehen. (The man is being seen by the woman [ AGENT ] with the telescope.) The full verb past participle, in turn, must be correctly embedded in the lowest-lying AUX dependency in order for the sentence to be in passive voice. We can therefore formulate the constraint on the origin of the AGENT dependency in German passive sentences with the following cascade of ancillary constraints (edges refer to Figure 6.2). Binary invocation constraint. For the pair of edges X (SEM) and Y (dotted) which share the same origin node (X↓id = Y↓id) we demand: If Y is a PN edge and the edge above it (dashed) satisfies the ancillary constraint ’Detect full-verb modifying von-PP in passive’, then X must be an AGENT dependency. {X:SEM,Y:SYN} : X↓from = Y↓from & Y.label = PN & is(Y↑id, ’Detect full-verb modifying von-PP in passive’) -> X.label = AGENT;
Ancillary constraint #1: ’Detect full-verb modifying von-PP in passive’.
The edge above PN must be a full-verb modifying von-PP. This is tested for by ancillary constraint #3. The edge above the PP edge must be the lowestlying AUX edge in a passive construction, which is tested for by ancillary constraint #2. is(X↓id, ’Detect full-verb modifying von-PP’) & is(X↑id, ’Detect passive bottom-up’);
Ancillary constraint #2: ’Detect passive bottom-up’.
The edge above the full-verb modifying PP must be a passive-marking AUX edge. Passive sentences are identified based on their lowest-lying AUX edge which connects a past participle dependent with its auxiliary regent of base form werden. The regent’s category depends on tense. X.label = AUX & X↓cat = VVPP & ~has(X↓id, AUX) & (X↑cat = VAFIN // Present Tense, Simple Past Tense | (X↑cat = VAPP & is(X↑id, AUX) ) // Perfect Tenses, Fut II, Subjun II
E XTENDING THE CS FOR BETTER LP
143
| (X↑cat = VAINF & is(X↑id, AUX) ) ) // Fut I, Subjun I & X↑base = werden; Ancillary constraint #3: ’Detect full-verb modifying von-PP’.
A PP is of relevance to AGENT-assignment in a passive constructions if it contains the preposition von and attaches to the full verb’s past participle. X.label= PP & X↑cat = VVPP & X↓word = von;
Figure 6.2: AGENT assignment in German passives
Again, use of an ancillary constraint permits us to express in a binary constraint a condition which otherwise would have required a quarternary constraint construction relating the PN, PP, AUX, and AGENT dependencies. A related, though again slightly more complex modelling task is to constrain the origin of the AGENT dependency in German active sentences. Due to the large number of structurally diverse active constructions in German it can be more convenient to model an active voice sentence as a sentence which is not in passive voice.1 As mentioned above, German passives can be identified based on their lowest-lying AUX edge. Since the actual location of this 1
This modelling decision may require justification beyond the scope of this chapter. Suffice
144
C HAPTER S IX
edge depends on tense and mode, the constraint for its detection needs to be flexible. We employ a constraint which moves down the dependency tree by recursively invoking itself until it either finds an AUX edge satisfying the bottom-up criteria for passive detection or until it cannot descend further and fails altogether. The AGENT dependency in an active voice sentence may originate from the origin of the SUBJ dependency, while in a passive voice sentence it originates from the origin of the PN dependency contained in a full-verb modifying PP. Note that these conditions include the global properties active and passive voice. We can now conveniently formulate this complex requirement in the following recursive ancillary constraint invocation: Given an AGENT dependency, it originates either from a SUBJ edge in an active sentence or from a PN edge in a full-verb modifying vonPP in a passive construction. X.label = AGENT -> (Y.label = SUBJ & ~has( Y↑id, ’Detect passive sentence top-down’)) | (Y.label = PN & is(Y↑id, ’Detect full-verb modifying von-PP in passive’));
While the detection procedure for the full-verb modifying von-PP in a passive construction has been outlined above, the detection of the active construction merits further explanation. Starting from the SUBJ edge in Figure 6.3, the ancillary constraint first checks the OBJA edge for satisfaction of the ancillary constraint ’Detect passive bottom-up’. Since this is unsuccessful, it then continues to descend down the right hand-side of the dependency tree, progressing edge by edge, recursively re-invoking itself until it either finds an AUX edge satisfying the ancillary constraint and terminates, or until no further alternatives are available and it fails. This formulation is an expressive extension to the recursive tree traversal introduced in Section 6.2. {X:SYN} : ’Detect passive top-down’ : ancillary : 1 : X.label = AUX & (is( X↓id, ’Detect passive bottom-up’) | has( X↓id, ’Detect passive top-down’) );
The increase in constraint expressivity required in this modelling scenario arises from the fact that the dependencies constrained are farther apart in the dependency tree and thus are not contiguous anymore. So, although it here to say that replacing the indirect detection of active voice by a direct detection has no impact on our line of argument. The ancillary constraints merely need to be re-formulated to detect the structural features of an active voice sentence.
E XTENDING THE CS FOR BETTER LP
145
Figure 6.3: AGENT assignment based on German active/passive detection
our approach for extending access to edge and node properties of neighbouring dependency edges is based on is and has, it is by no means limited in applicability to neighbouring edges. Before, supra-local properties would have needed to be tested for by a higher arity constraint, which was unavailable in WCDG’s formalism. Now, such a higher arity constraint can be re-formulated as a suitably expressive binary constraint operating on a contiguous dependency structure that contains all edges we wish to predicate. Since an ancillary constraint can only extend access to one neighbouring dependency above or below, there is a linear relationship between the number of invocations to an ancillary constraint and the distance between the considered edges in the dependency tree.
6.7 C ONCLUSIONS In this chapter we have described a systematic extension to the expressive power of the constraint-based grammar WCDG. To better convey the extent and impact of a seemingly minor change to an already obscure formalism, we have attempted to give an overview over the landscape of con-
146
C HAPTER S IX
straint-based formalisms in natural language analysis and the most important differences between the meanings of the term “constraint” as used by various practitioners. An important distinction is between variants of the CSP, where constraints are essentially subsets of the set of joint assignments of values to a predefined set of domains, and approaches with constraints in the looser sense, which can operate on a more general level of representation. WCDG falls into the former class and employs hand-written, declarative, defeasible weighted constraints. While all constraint satisfaction problems share the same underlying fundamental structure, the way in which the constraints that form a grammar are notated and evaluated can nevertheless have decisive effects on the expressive power of a formalism. We have outlined the gain in constraint expressivity achieved with the introduction of two additional predicates is and has into a notation based on predicate logic; although syntactically conservative, these predicates extend the pattern of constraint access in the dependency tree and thus open up paths to the effective and efficient handling of supra-local constraints within the formal and operational limitations of the WCDG formalism. Notably, these supra-local constraints could only have been expressed as ternary and higher arity constraints in the absence of the is and has predicates, thus exceeding the limits of the WCDG implementation. Motivated by examples from the syntax-semantics interface, we illustrated that the consecutive extension of the predicate syntax for is and has in combination with cascading and recursive invocations to ancillary constraints produces a significant increase in constraint expressivity. Most notably, we have demonstrated how global syntactic properties such as active or passive voice in German can be made accessible to evaluation in WCDG within the limits of binary constraint dependency formulation.
6.8 F UTURE WORK Our work so far has focused on implementations involving unary ancillary constraints. With few changes the WCDG formalism can be extended to support the evaluation of binary ancillary constraints as well. A systematic investigation into the effects of this is pending. From a theoretical point of view, a formal analysis of the expressivity enhancements achieved with is and has appears challenging and rewarding. While we have focused on the use of is and has to solve specific modelling tasks, we conjecture that the full expressive potential resulting from the use of these predicates in combination with ancillary constraints has not yet been exhausted.
E XTENDING THE CS FOR BETTER LP
147
B IBLIOGRAPHY Blache, Philippe, Prost, and Jean-Philippe (2008). A quantification model of grammaticality. In Proceedings of the 5th International Workshop on Constraints and Language Processing (CSLP’08), pages 5–19, Hamburg, Germany. Blache, P. and Prost, J.-P. (2004). Gradience, constructions and constraint systems. In H. Christiansen, P. R. Skadhauge, and J. Villadsen, editors, CSLP, volume 3438 of Lecture Notes in Computer Science, pages 74–89. Springer. Bresnan, J. (2001). Lexical-Functional Syntax. Blackwell Publishers, Oxford. Charniak, E. and Johnson, M. (2005). Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking. In Proc. ACL 2005, pages 173–180, Ann Arbor, Michigan. Assoc. for Computational Linguistics. Chomsky, N. (1986). Barriers. MIT Press. Copestake, A. (2002). Implementing typed feature structure grammars. CSLI lecture notes. CSLI Publications, Stanford, CA. Duchier, D. (1999). Axiomatizing dependency parsing using set constraints. In Proceedings of The Sixth Meeting on Mathematics of Language, pages 115–126, Orlando/USA. Foth, K. A. (2007). Hybrid Methods of Natural Language Analysis. Ph.D. thesis, Universität Hamburg. Foth, K. A., Menzel, W., and Schröder, I. (2000). A Transformation-based Parsing Technique with Anytime Properties. In 4th Int. Workshop on Parsing Technologies, IWPT-2000, pages 89–100. Freuder, E. C. and Wallace, R. J. (1992). Partial constraint satisfaction. Artificial Intelligence, 58(1–3), 21–70. Klein, S. and Simmons, R. F. (1963). A Computational Approach to Grammatical Coding of English Words. Journal of the Association for Computing Machinery, 10, 334–347. Maruyama, H. (1990). Structural disambiguation with constraint propagation. In Proc. 28th Annual Meeting of the ACL (ACL-90), pages 31–38, Pittsburgh, PA.
148
C HAPTER S IX
Menzel, W. and Schröder, I. (1998). Decision procedures for dependency parsing using graded constraints. In S. Kahane and A. Polguère, editors, Proc. Coling-ACL Workshop on Processing of Dependency-based Grammars, pages 78–87, Montreal, Canada. Minton, S., Johnston, M. D., Philips, A. B., and Laird, P. (1992). Minimizing conflicts: a heuristic repair method for constraint satisfaction and scheduling problems. Artif. Intell., 58(1-3), 161–205. Pollard, C. and Sag, I. A. (1994). Head-Driven Phrase Structure Grammar. The University of Chicago Press, Chicago. Prince, A. and Smolensky, P. (1993). Optimality theory: Constraint interaction in generative grammar. Technical Report 2, Center for Cognitive Science, Rutgers University. Sag, I. A., Boas, H. C., and Kay, P. (2012). Introducing sign-based construction grammar. In Sign-Based Construction Grammar, pages 1–30, Chicago/USA. University of Chicago Press. Schulz, M., Hamerich, S., Schröder, I., Foth, K., and By, T. (2005). [X]CDG User Guide. Natural Language Systems Group, Hamburg University, Hamburg, Germany. Tsang, E. (1993). Foundations of Constraint Satisfaction. Academic Press, London and San Diego.
C HAPTER S EVEN –S EMANTICS – O N S EMANTIC P ROPERTIES IN C ONSTRAINT-BASED G RAMMARS V ERÓNICA DAHL , BAOHUA G U , J. E MILIO M IRALLES
7.1 I NTRODUCTION New approaches to parsing include substantial efforts to obtain useful results even for incomplete or ungrammatical input. More flexible models than the traditional, hierarchical parsing models have emerged around such efforts, largely motivated by new developments in related areas: speech recognition systems have matured enough to be helpfully embedded in spoken command systems, e.g. for people with disabilities; the unprecedented explosion of text available online since the Internet’s advent calls for ultraflexible text processing schemes; geographically distributed enterprises are evolving controlled languages as a kind of interlingua in which their employees from different countries can communicate despite possible errors and imperfections. Even outside such specialised needs, a parser ideally should model human cognitive abilities for parsing by being able to extract meaning from text produced in real life conversation, which typically is incomplete, often not perfectly grammatical, and sometimes erroneous. Imperfections can result from normal human error in actual speech, or be
150
C HAPTER S EVEN
introduced by machines, as in the case of text produced from speech recognition systems, which, while evolved enough to be usable, are notoriously error-prone. Many of the new approaches to flexible parsing sacrifice completeness: so-called shallow parsing (Abney, 1991), for instance, identifies syntactic phrases or chunks (e.g. noun phrases) derived by flattening down a sentence’s parse tree, but typically loses much of the connection among chunks which a parse tree would exhibit. For more than a decade, partly in order to provide robust while flexible parsing, views of grammar have been shifting from the traditional Chomskyan focus on hierarchical analysis into a view of grammar as a set of conditions simultaneously constraining and thus defining the set of possible utterances. Several linguistic theories make an intensive use of this notion, in particular HPSG (see (Pollard and Sag, 1994), (Sag and Wasow, 1999)), minimalism (Chomsky, 1995) or the Optimality Theory (see (Prince and Smolensky, 1993)). As in the last two, we aim at very few general operations: we work with just one rule at the heart of parsing, and unlike minimalism, need not filter candidate structures. Our work, although useful for other parsing tasks as well, focuses around the Property Grammar, or PG, approach. The idea of representing a language’s grammar solely through properties between constituents was first proposed as a theoretical formalism by Bès and Hagège (2001), and as a practical and extended formalism by Blache (2005). Computationally, it relates to Gazdar and Pullum’s dissociation of phrase structure rules into the two properties of Immediate Dominance (called constituency in the PG literature) and Linear Precedence (called either precedence or linearity in PG) (Gazdar, 1983). In Property Grammars, syntactic structure is expressed by means of relations between categories rather than in terms of hierarchy. For instance, a Property Grammar parse of the noun phrase “every blue moon” results in a set of satisfied properties (e.g. linear precedence holds between the determiner and the noun, between the adjective and the noun, and between the determiner and the adjective; the noun’s requirement for a determiner is satisfied, etc.) and a set of unsatisfied properties, which is empty for this example. In contrast, “Every moon blue” would yield a violation of linear precedence between the adjective and the noun, indicated by placing this relationship in the set of unsatisfied properties. In its original formulation, Property Grammars already provided full rather than shallow parsing, but produced no parse tree - just a list of satisfied and unsatisfied properties which together, characterised the sentence
O N S EMANTIC P ROPERTIES IN C ONSTRAINT-BASED G RAMMARS
151
fully. In one of the author’s own first computational rendition of Property Grammars (Dahl and Blache, 2004), we provided parse trees as well, not only because linguists are used to thinking in terms of parse trees, but also because having them available allows us to check linguistic constraints that refer to trees. In this paper we make two further contributions to flexible parsing with Property Grammar formalism: a) we extend property-based parsing to include semantic information, so that selected phrases can be automatically extracted, which incorporate syntax and semantics as a side effect of parsing, and b) we provide a high level implementation of the new model in terms of HyProlog. We use the application of concept and relation extraction from biomedical texts to exemplify throughout. Section 7.2 presents background information on Property Grammar. Section 7.3 extends the Property Grammar model to include semantics in view of information extraction. Section 7.4 presents our parsing methodology, after an intuitive presentation of the programming tool used, Hyprolog. Section 7.5 discusses related work and extensions, and 7.6 presents our concluding remarks.
7.2 BACKGROUND ON P ROPERTY G RAMMARS Property-based Grammars (Blache, 2005) define any natural language in terms of a small number of properties: linear precedence (e.g. within a verb phrase, a transitive verb must precede the direct object); dependency (e.g., a determiner and a noun inside a noun phrase must agree in number), constituency (e.g. a verb phrase can contain a verb, a direct object,...), requirement (e.g. a singular noun in a noun phrase requires a determiner), exclusion (e.g., a superlative and an adjectival phrase cannot coexist in a noun phrase), obligation (e.g. a verb phrase must contain a verb), and uniqueness (e.g. a prepositional phrase contains only one preposition). The user defines a grammar through these properties instead of defining hierarchical rewrite rules as in Chomskyan based models. In addition, properties can be relaxed by the user in a simple modular way. For instance, we could declare “precedence” as relaxable, with the effect of allowing ill-formed sentences where precedence is not respected, while pointing out that they are ill-formed (this feature is useful for instance in language tutoring systems). The result of a parse is, then, not a parse tree per se (although we do provide one, just for convenience, even in the case of ill-formed input), but a list
152
C HAPTER S EVEN
of satisfied and a list of unsatisfied properties. The table below shows a toy example, for an anomalous noun phrase with more than one noun. The first argument in the ”cat" symbol output represents the category recognised, in this case a noun phrase (np). Its features are singular and masculine (second argument). The third argument shows the parse tree, which is constructed despite the anomaly, and includes the three nouns (”cellules", ”immunotoxines" and ”peptides"). The fourth argument shows the list of properties that are satisfied within the noun phrase, and the fifth and last argument shows the only unsatisfied property: uniqueness of head noun. Input Output
les cellules endothéliales immunotoxines peptides proapoptotiques (the endothelial ... cells) cat(np, [sing, masc], sn(det(les), n(cellules), ap(adj(endothéliales)), n(immunotoxines), n(peptides), ap(adj(proapoptotiques))), [prec(det,n), dep(det,n),requires(n, det), exclude(name, det), excludes(name, n), dep(sa, n), excludes(name, sa), excludes(sa, sup)], [uniqueness(n)])
Property Grammars allow us to extract syntactic constructs from text, as first proposed by Dahl and Blache (2005). This functionality is useful to deal with parts of the input for which no information can be built. It is for example the case with possibly long lists of juxtaposed NP, frequent in spoken languages but for which no specific syntactic relation can be given. But it is also interesting for some applications in which the entire syntactic structure or, in other words, the description of all possible syntactic categories is necessary. This is the case for question-answering, information extraction, or terminological applications based on NP recognition.
7.3 S EMANTIC P ROPERTY G RAMMARS The original Property Grammar formalism focuses on syntactic information. Only one of its properties — dependency — is meant to include semantics, but there is no clear specification of how, or of how the semantics included could serve to construct appropriate meaning representations for sentences being parsed. In the present article we propose a novel while simple, natural and efficient way of incorporating semantic information which can be used for concept and relation extraction: we construct, in addition to the lists of satisfied and unsatisfied syntactic properties corresponding to a category being analysed, a list of Semantic Properties associated with the category.
O N S EMANTIC P ROPERTIES IN C ONSTRAINT-BASED G RAMMARS
153
We next introduce this extension by means of examples taken from biomedical text — one of the applications we have worked on. Uses of the Semantic Properties list for further semantic extensions than the ones proposed here are of course possible, according to the semantic domain addressed and the needs of different specific applications. The extensions described below are sufficient for our purposes of information extraction from biomedical text, and serve in particular to choose appropriately between ambiguous readings in automatic fashion. They rely on a type hierarchy of concepts, or ontology, of the domain (in our case, biomedical) having been made available for the parser to consult. Extracting concepts and relations within noun phrases. A noun is interpreted by our parser as the relational symbol of a semantic relationship to be extracted, and the arguments of this relation are constructed from the noun’s various complements, appropriately typed after consultation of the domain-dependent (in this case, biomedical) ontology. For instance, the noun phrase: The activation of NF-kappa-B via CD-28
parses semantically into the list of properties: [protein(‘NF-kappa-B’), gene(‘CD-28’), activation(‘NF-kappa-B’, ‘CD-28’)]
which shows the relationship obtained, i.e., activation(‘NF-kappa-B’, ‘CD-28’), together with the types that our concept hierarchy associates with each of the arguments of the relationship (i.e, ‘NF-kappa-B’ is of type protein, whereas ‘CD-28’ is of type gene). Extracting concepts and relations within verb phrases. Just as nouns induce relationships, verbs also induce relationships whose arguments are the semantic representations of the verb’s syntactic arguments. For instance, in the sentence: retinoblastoma proteins regulate transcriptional activation.
The verb regulate marks a relation between two concepts — retinoblastoma proteins and transcriptional activation. Our parser constructs a list of properties which identifies the semantic types of the relationship’s arguments as well as the relationship itself, just as was done for noun phrases. [protein(‘retinoblastoma proteins’), process(‘transcriptional activation’), regulate(‘retinoblastoma proteins’, ‘transcriptional activation’)]
154
C HAPTER S EVEN
Disambiguation on the fly. Our parser’s consultation of a biomedical ontology is useful not only to gather type information about the entities involved in the relationships being constructed, but also to disambiguate in function of context. For instance, usually binding site refers to a DNA domain or region, while sometimes it refers to a protein domain or region. Catching the latter meaning is not trivial since both c-Myc and G28-5 are protein molecules. However, our parser looks for semantic clues from surrounding words in order to disambiguate: in sentence (1) below, promoters points to the DNA region binding site, whereas in sentence (2), ligands points to the protein meaning of binding site. Our parser can calculate an entity’s appropriate type by consulting domain-specific clues. More details regarding disambiguation in biological texts can be found in (Dahl and Gu, 2007). (1)
(2)
Transcription factors USF1 and USF2 up-regulate gene expression via interaction with an E box on their target promoters, which is also a binding site for c-Myc. The functional activity of ligands built from the binding site of G28-5 is dependent on the size and physical properties of the molecule both in solution and on the cell surfaces.
7.4 O UR PARSING M ETHODOLOGY 7.4.1 BACKGROUND : H Y P ROLOG Our parser’s programming tool, Hyprolog, is an extension of Prolog with assumptions and abduction, useful for hypothetical reasoning and other applications, running on top of Sicstus Prolog, from which it can use all features and libraries, including Constraint Handling Rules or CHR (Frühwirth, 1998). Here we describe Hyprolog in its three main components (assumptions, CHR and abduction), in as intuitive a fashion as possible. Assumptions. Assumptions were developed by Dahl et al. (1997) and further adapted by Christiansen and Dahl (2004) as a logic programming incarnation of linear and intuitionistic implications (Girard, 1987). They have made their way to contemporary logic programming based systems, such as Bin Prolog (Tarau, 1997), Hyprolog (Christiansen and Dahl, 2005) and CHRGs (Christiansen, 2005). They include as well a new type of implication called timeless assumptions, which have been found particularly useful, among other things, for several difficult parsing problems such as anaphora resolution. Here we will only be concerned with linear and intuitionistic assumptions, which we present next, in an intuitive manner.
O N S EMANTIC P ROPERTIES IN C ONSTRAINT-BASED G RAMMARS
155
Assumptions are facts1 that are added to a Prolog program dynamically, making them available from the point in which they are called, and during the entire continuation. Linear assumptions can be used at most once2 , whereas intuitionistic assumptions can be used as many times as needed. To add, or call, an assumption, all we need to do is to precede them with the “+” sign if linear, or the “∗” sign if intuitionistic, at the point where we are adding them or calling them. To use, or consume, either type of assumption, we just precede it by the sign “−”. For instance, a list of English words with indication of word boundaries can be given as: sentence :- +eng(the,0), +eng(blue,1), +eng(moon,2).
(A sentence is the English word “the” starting at point 0, followed by the English word “blue” starting at point 1, and the English word “moon” starting at point 2). If we wanted to (naively) translate this list of words into French, for instance, we could now define: dictionary(the,la). dictionary(blue,bleue). dictionary(moon,lune).
and recurse through the main translating action, which removes (consumes) a word in English, and replaces it with (assumes) its French counterpart: translate_word :- -english(W,Point), dictionary(W,Translation), +french(Translation,Point).
Calling translate_word three times leaves us with just three assumptions (our original ones having been consumed once and for all, being linear): +french(la,0), +french(bleue,1), and +french(lune,2)
which can now be consumed in turn by some pretty printing procedure. Notice that explicitly stating word boundaries allows us to express the same information regardless of the order of the assumption, e.g., sentence :1 2
+eng(blue,1), +eng(the,0),
+eng(moon,2).
They can also be full clauses, but the subset containing only facts is quite enough for our purposes in this paper The type of linear assumption we use is more rigorously called linear affine implication in the literature, and it differs from linear implication proper in that it can either be consumed once or not at all, whereas linear implication proper must be consumed exactly once
156
C HAPTER S EVEN
also characterises the string “the blue moon”. The difference between linear and intuitionistic assumptions can be exemplified by the following calls: the first one succeeds, binding X to “the”; the second one fails, since linear assumptions once consumed are no longer there to be consumed again, and the third one succeeds, binding both X and Y to “the”, since the fact that there is a word “the” at point 0 has been assumed intuitionistically and can therefore be reused as many times as needed: example :- +word(the,0), -word(X,0). example :- +word(the,0), -word(X,0), -word(Y,0). example :- *word(the,0), -word(X,0), -word(Y,0).
Assumptions and consumptions are similar to the Prolog primitives “assert” and “retract”, except that they are available during the entire continuation of the computation, and that they are backtracked upon. Constraint Handling Rules, or CHR. A CHR program is a finite set of rules of the form {Head ==> Guard | body} where Head and Body are conjunctions of atoms and Guard is a test constructed from built-in predicates; the variables in Guard and Body occur also in Head; in case the Guard is the local constant “true”, it is omitted together with the vertical bar. Its logical meaning is the formula ∀(Guard → (Head → Body)) and the meaning of a program is given by conjunction. A derivation starting from an initial state called a query of ground constraints is defined by applying rules as long as it adds new constraints to the store. A rule as above applies if it has an instance (H==>G|B) with G satisfied and H in current store, and it does so by adding B to the store. It is to be noted that if the application of a rule adds a constraint c to the store which already is there, no additional rules are triggered, e.g., p==>p does not loop as it is not applied in a state including p. There are three types of CHR rules: • Propagation rules which add new constraints (body) to the constraint set while maintaining the constraints inside the constraint store for the reason of further simplification. • Simplification rules which also add as new constraints those in the body, but remove as well the ones in the head of the rule. • Simpagation rules which combine propagation and simplification traits, and allow us to select which of the constraints mentioned in the head of the rule should remain and which should be removed from the constraint set.
O N S EMANTIC P ROPERTIES IN C ONSTRAINT-BASED G RAMMARS
157
Abduction. Abduction is the unsound but useful rule of inference which concludes (or abduces) a from the knowledge of b and the rule that a implies b. They can be simply incorporated through CHR by declaring as abducible certain predicates, which when generated and not resolvable, will simply remain in the constraint store. E.g. if every time it rains I go to the cinema, and going to the cinema has been declared as abducible, when querying: cinema, there being no definitions for it, it will remain in the constraint store, marked as abduced thanks to the declaration which states it is an abducible predicate. More details can be found in (Christiansen and Dahl, 2009). An example of using abducibles within our parser is shown in the next section.
7.4.2 A H YPROLOG PARSER
FOR
P ROPERTY G RAMMARS
A Hyprolog program is written as a Prolog program with additional declarations of assumptive and abductive predicates. In this paper we neglect to write all declarations, since it is apparent from our discussion which predicates are assumptions and which are abducibles. Our parsing method is applicable to other grammatical formalisms than just Property Grammars. The following simple example illustrates its workings for recognition of sentences in the scrambled an bn cn language, where the input “words” are entered as linear assumptions, the topmost call is to the predicate recognise, and all_consumed is a system predicate that checks that no assumptions remain unconsumed: input :- +a, +b, +b, +c, +c, +a. recognise :- input, apply_rules, all_consumed. apply_rules :- apply_rule, !, apply_rules. apply_rules. apply_rule :- -a, -b,-c.
The problem-dependent definition of apply_rule in this case simply consumes one a, one b, and one c. After the second iteration no more assumptions remain, so the second rule of apply_rules triggers, and given that all assumptions have been consumed, the program stops with success. Applying this parser to property grammars involves changing the definitions of input and of apply_rule. We record all categories (including lexical categories) as assumptions, and rule application in this case combines two categories (one of which is
158
C HAPTER S EVEN
a phrase or a phrase head) after checking the properties between them, and constructs a new category from both of these, also recorded as an assumption, until no more categories can be inferred. Syntactic categories are described in the notation: +cat(Cat,Features,Graph,Sat,Unsat,Semantics,Start,End)
where Cat names a syntactic category stretching between the start and end points Start and End, respectively; Features contains the list of its syntactic features, such as gender and number, Graph represents the portion of parse tree corresponding to this category and is automatically constructed by the parser; Sat and Unsat are respectively the list of syntactic properties respectively satisfied and unsatisfied by this category; and Semantics holds the Semantic Properties list. The sentence’s words and related information, thus expressed as assumptions, can be viewed as resources to be consumed in the process of identifying higher level constituents. Lists of satisfied and unsatisfied properties are created by our single rule, so that incorrect or incomplete input is admitted but the anomalies are pointed out from the list of unsatisfied properties. Our single rule’s form is described in the Appendix. As an example of using abducibles to express user-defined constraints, let us consider an alternative way of checking that all words in the input sentence be used by the end of the parse (earlier we achieved the same effect through the primitive all_consumed): after calling the parser, we leave trace that the parse is ”finished" with an abducible done: g- input, parse, done.
and we add the constraint: +cat(C,_,_,_,_,_,_,_), done => word(C) | fail.
The guard in this constraint checks that the category C is of type word, and if so, its coexistence with the abducible done, which indicates the parsing is finished, makes the parse fail. Termination. User defined constraints (expressed as CHR constraints) can also serve to retrieve results of interest, express linguistic constraints, etc. Note that the parser’s single rule consumes two resources before creating a new one, so each rule application decreases the (initially finite) number of resources available by one. The process stops when no more rule applications are possible, leaving if successful a category “sentence” stretching between the start and end points of the input sentence, and containing its full characterisation (satisfied and unsatisfied properties, semantics, etc.).
O N S EMANTIC P ROPERTIES IN C ONSTRAINT-BASED G RAMMARS
159
7.5 R ELATED W ORK One other formalism that shares the aims and some of the features of Property Grammars are Dependency Grammars (cf. Tesnière (1959) and on this point Mel’ˇcuk (1988)), a purely equational system in which the notion of generation, or derivation between an abstract structure and a given string, is also absent. However, whereas in Dependency Grammars, as their name indicates, the property of dependence plays a fundamental role, in the framework we are considering it is but one of the many properties contributing to a category’s characterisation. Morawietz (2000) implements deductive parsing (Shieber et al., 1995) in CHR, and proposes different types of parsing strategies (including one for Property Grammars) as specialisations of a general bottom-up parser. Efficiency however is not addressed beyond a general discussion of possible improvements, so while theoretically interesting, this methodology is in practice unusable due to combinatorial explosion. Moreover, it produces all properties that apply for each pair of categories without keeping track of how these categories are formed in terms of their subcategories, so there is no easy way to make sense of the output in terms of a complete analysis of a given input string. Our parser, as we saw, keeps track of the output in the lists of properties (syntactic and semantic) that it constructs, and it reduces the problem of combinatorial explosion since every rule application condenses two assumptions into one. It is interesting to compare methodologies for Property Grammars from the point of view of direct implementation. Because of their sole reliance of constraints, Property Grammars are a main candidate for direct implementation in terms of constraint solving. However the only works which incorporate direct implementation to some extent are those of Blache and Morawietz (2000); Dahl and Blache (2004); Duchier et al. (2014) and Womb Parsing (Dahl and Miralles, 2012). Of these, the only one that does not need to calculate all constraints between every pair of constituents is the one of Dahl and Miralles (2012). Instead, it checks constraints only for failure. This works well in the context of grammar induction, resulting in particular in great search-space reductions and hence more efficiency. As well, it has recently (Dahl et al., 2013) been shown to work for parsing sentences rather than inducing grammars, while retaining both the search space reduction obtained by the failure-driven focus and the direct implementation character (in the constraint-solving sense) of Dahl and Miralles (2012). We are presently extending this approach with semantics, under the hypothesis that semantic contributions can be combined by a single semantic rule, in
160
C HAPTER S EVEN
a similar way as we have a single (modulo symmetry) syntactic rule in the present work. Among other recent approaches to parsing through constraints, McCrae et al. (2014) also aims at declarativeness, modularity, and robustness in the face of errors. Derivations are also replaced by well-formedness conditions expressed as constraints, and instead of a constituent based parse tree, the output is a labelled dependency tree. While we do not need to construct a parse tree, since also we decouple into modular properties the information that would appear lumped together in a phrase structure rule (e.g. dominance and precedence) we can produce a parse tree as well as the list of satisfied and unsatisfied properties, because we do not completely give up the notion of constituent. Linguists are used to looking at parse trees, and it does not cost much to produce one as a side effect of parsing. Our parse trees can be constructed even for erroneous input, thus providing visual clues in addition to explicitly signalling mistakes within the list of unsatisfied properties; for instance if a determiner is typed twice by mistake, the parse tree will show two determiners, even though these can not be derived from any implicit or explicit rule. In a sense, this amounts to having some context free backbone, but instead of it driving the analysis, it is inferential in the sense that it will accommodate to the input received. In this manner, our constraints regarding constituency, far from blocking the parse, can serve to better pinpoint errors. Flexibility regarding which constraints are used is obtained by McCrae et al. (2014) through defeasible constraints; in our case the user can declare in which cases some constraints can be relaxed.
7.6 C ONCLUSION We have argued, both from a cognitive sciences and from a practical point of view, the usefulness of PG for parsing natural language, and extended it to incorporate semantic information useful for instance, as shown, to extract concepts and relationships from biomedical texts. This extends them in effect with further human thought processing plausibility than was already present in their ability to process incomplete or incorrect input. We have provided a novel and minimalistic parsing scheme for our extension of PGs, which can be used as well for parsing other grammatical frameworks where the focus is on flexibility and cognitive skills: in some cases, a sentence’s characterisation only contains satisfied constraints, but it can also be the case that some constraints can be violated, especially when parsing real life corpora. In most cases, such violations do not have consequences on the
O N S EMANTIC P ROPERTIES IN C ONSTRAINT-BASED G RAMMARS
161
acceptability of the input. With this work we hope to stimulate further research into the many ramifications of the proposed formalism and parsing methodology.
B IBLIOGRAPHY Abney, S. (1991). Parsing by chunks. In Principle-Based Parsing. Kluwer Academic Publishers. Bès, G. and Hagège, C. (2001). GRIL/LPL.
Properties in 5P.
Technical report,
Blache, P. (2005). Property Grammars: A Fully Constraint-Based Theory. In H. Christiansen, P. R. Skadhauge, and J. Villadsen, editors, Constraint Solving and Language Processing, volume 3438 of Lecture Notes in Computer Science, pages 1–16. Springer. Blache, P. and Morawietz, F. (2000). Some aspects of natural language processing and constraint programming. Technical report, Universitat Stuttgart, Universität Tübingen, IBM Deutschland. Chomsky, N. (1995). The Minimalist Program. MIT Press. Christiansen, H. (2005). CHR grammars. International Journal on Theory and Practice of Logic Programming, special issue on Constraint Handling Rules, 4-5, 467–501. Christiansen, H. and Dahl, V. (2004). Assumptions and abduction in prolog. In MultiCPL’04 - Third International Workshop on Multiparadigm Constraint Programming Language, pages 87–102, Saint Malo, France. Christiansen, H. and Dahl, V. (2005). Hyprolog: A new logic programming language with assumptions and abduction. In M. Gabbrielli and G. Gupta, editors, ICLP, volume 3668 of Lecture Notes in Computer Science, pages 159–173. Springer. Christiansen, H. and Dahl, V. (2009). Abductive logic grammars. In H. Ono, M. Kanazawa, and R. J. G. B. de Queiroz, editors, 16th International Workshop on Logic, Language, Information and Computation (WoLLIC), volume 5514 of Lecture Notes in Computer Science, pages 170–181. Springer.
162
C HAPTER S EVEN
Dahl, V. and Blache, P. (2004). Directly Executable Constraint Based Grammars. In F. Mesnard, editor, Programmation en logique avec contraintes, JFPLC 2004, 21, 22 et 23 Juin 2004, Angers, France. Hermes. Dahl, V. and Blache, P. (2005). Extracting Selected Phrases through Constraint Satisfaction. In H. Christiansen and J. Villadsen, editors, Constraint Solving and Language Processing 2nd International Workshop, CSLP 2005, pages 3–17, Sitges, Spain. Dahl, V. and Gu, B. (2007). A CHRG analysis of ambiguity in biological texts. In H. Christiansen and J. Villadsen, editors, Proceedings of Fourth International Workshop on Constraints and Language Processing (CSLP), pages 53–64, Roskilde, Denmark. Dahl, V. and Miralles, J. (2012). Womb grammars: Constraint solving for grammar induction. In J. Sneyers and T. Frühwirth, editors, Proceedings of the 9th Workshop on Constraint Handling Rules, Technical Report CW 624, pages 32–40, Budapest, Hungary. Dahl, V., Tarau, P., and Li, R. (1997). Assumption grammars for natural language processing. In Proceeding of Fourteenth International Conference on Logic Programming, pages 256–270. MIT Press. Dahl, V., E˘gilmez, S., Martins, J., and Miralles, J. E. (2013). On failuredriven constraint-based parsing through CHRG. In Tenth International Workshop on Constraint Handling Rules, pages 13–24, Berlin, Germany. Duchier, D., Dao, T.-B.-H., and Parmentier, Y. (2014). Model-Theory and Implementation of Property Grammars with Features. Journal of Logic and Computation, 24(2), 491–509. Frühwirth, T. (1998). Theory and practice of constraint handling rules. The Journal of Logic Programming, 37(1–3), 95 – 138. Gazdar, G. (1983). Phrase structure grammars and natural languages. In A. Bundy, editor, IJCAI, pages 556–565, Karlsruhe, West Germany. Girard, J.-Y. (1987). Linear logic. Theoretical Computer Science, 50, 1– 102. McCrae, P., Foth, K., and Menzel, W. (2014). Extending the Constraint Satisfaction for better Language Processing. In P. Blache, H. Christiansen, V. Dahl, D. Duchier, and J. Villadsen, editors, Constraints and Language Processing. Cambridge Scholar Publishing.
O N S EMANTIC P ROPERTIES IN C ONSTRAINT-BASED G RAMMARS
163
Mel’ˇcuk, I. (1988). Dependency Syntax. SUNY Press. Morawietz, F. (2000). Chart Parsing as Contraint Propagation. In Proceedings of The 18th International Conference on Computational Linguistics (COLING-2000), pages 551–557, Saarbrücken, Germany. Pollard, C. and Sag, I. (1994). Head-driven phrase structure grammars. CSLI. Chicago University Press. Prince, A. and Smolensky, P. (1993). Optimality theory: Constraint interaction in generative grammars. Technical report. Sag, I. and Wasow, T. (1999). Syntactic Theory: A Formal Introduction. CSLI. Shieber, S., Schabes, Y., and Pereira, F. (1995). Principles and implementation of deductive parsing. 24, 3–36. Tarau, P. (1997). Binprolog 5.75 user guide. Technical report, Universit e de Moncton. Tesnière, L. (1959). Eléments de Syntaxe Structurale. Klincksieck.
164
C HAPTER S EVEN
A PPENDIX The single rule described in Section 7.4.2 is given in Figure 7.1. combine :-cat(Cat,Features1,Graph1,Sat1,Unsat1,Sem1,Start1,End1), -cat(Cat2,Features2,Graph2,Sat2,Unsat2,Sem2,End1,End2), xp_or_obli(Cat2,XP), ok_in(XP,Cat), precedence(XP,Start1,End1,End2,Cat,Cat2,Sat1,Unsat1,SP,UP), dependency(XP,Start1,End1,End2,Cat,Features1,Cat2,Features2,SP,UP,SD,UD), build_tree(XP,Graph1,Graph2,Graph,ImmDaughters), uniqueness(Start,End2,Cat,XP,ImmDaughters,SD,UD,SU,UU), requirement(Start,End2,Cat,XP,ImmDaughters,SU,UU,SR,UR), exclusion(Start,End2,Cat,XP,ImmDaughters,SR,UR,Sat,Unsat), semantics(Sem1,Sem2,Sem), +cat(XP,Features2,Graph,Sat,Unsat,Sem,Start1,End2).
Figure 7.1: New Category Inference
This rule combines two consecutive categories into a third. The first call after consumption of the two categories to be combined tests that one of the two categories is of type XP (a phrase category) or obligatory (i.e., the head of an XP), and that the other category is an allowable constituent for that XP. Then the guard successively tests each of the PG properties among those categories (constituency, precedence, dependency, uniqueness, requirement and exclusion), building the parse tree before testing uniqueness, and incrementally building as it goes along the lists of satisfied and unsatisfied properties. Finally, it infers a new category of type XP spanning both these categories, with the finally obtained Sat and Unsat lists as its characterisation, and semantics Sem built from the semantics of the two categories being combined. In practice, we use another rule symmetric to this one, in which the XP category appears before the category Cat which is to be incorporated into it.
E XAMPLES L EXICON RULES % name(T-Entity) --> [Entity], {ner(Entity, T)}. name(protein-IL2) name(protein-NFkappaB) name(dna-bcl2) name(dna-promoter) name(rna-mRNA)
-->[’IL2’], -->[’NFkappaB’], -->[’bcl2’], -->[’promoter’], -->[’mRNA’],
{ner(’IL2’),protein}. {ner(’NFkappaB’),protein}. {ner(’bcl2’),dna}. {ner(’promoter’),dna}. {ner(’mRNA’),rna}.
O N S EMANTIC P ROPERTIES IN C ONSTRAINT-BASED G RAMMARS
165
name(celltype-monocytes) -->[’monocytes’], {ner(’monocytes’),celltype}. name(celltype-leukocytes)-->[’leukocytes’],{ner(’leukocytes’),celltype}. name(cellline-HL60) -->[’HL-60’], {ner(’HL60’),cellline}. % another format %name(protein,’IL-2’) :-[’IL-2’], {ner(’IL-2’),protein}. %name(protein,’NF-kappaB’) :-[’NF-kappaB’], {ner(’NF-kappaB’),protein}. %name(’DNA’,’bcl-2’) :-[’bcl-2’], {ner(’bcl-2’),’DNA’}. %name(’DNA’,’promoter’) :-[’promoter’], {ner(’promoter’),’DNA’}. %name(’RNA’,’mRNA’) :-[’mRNA’], {ner(’mRNA’),’RNA’}. %name(cell_type,’monocytes’) :-[’monocytes’], {ner(’monocytes’),cell_type}. %name(cell_type,’leukocytes’):-[’leukocytes’],{ner(’leukocytes’),cell_type}. %name(cell_line,’HL-60’) :-[’HL-60’], {ner(’HL-60’),cell_line}.
+cat(name, [singular], name(N), [], [], T-N, Start, End)
S ENTENCE RULES IS-A O NTOLOGY G RAPH Gene Ontology terms can be linked by five types of relationships: is_a, part_of, regulates, positively_regulates and negatively_regulates3 . In immunology, activation is the transition of leucocytes and other cell types involved in the immune system4 . The Gene Ontology can be browsed at Ontology Lookup Service (OLS), an open source project that provides a centralised query interface for ontology and controlled vocabulary lookup. C ONSTRAINT L ISTS regulation(Regulator, regulation(Regulator, regulation(Regulator, regulation(Regulator, regulation(Regulator, regulation(Regulator,
3 4
Regulatee) Regulatee) Regulatee) Regulatee) Regulatee) Regulatee)
==> ==> ==> ==> ==> ==>
protein(Regulatee) | fail. gene(Regulatee) | fail. bio-source(Regulatee) | fail. bio-substance(Regulatee) | fail. bio-process(Regulatee) | true. bio-function(Regulatee) | true.
An Introduction to the Gene Ontology, http://www.geneontology.org/ GO.doc.shtml#biological_process http://en.wikipedia.org/wiki/Activation#Immunology
166
C HAPTER S EVEN
Figure 7.2: A sample ontology of biological concepts that have IS-A relation.
C HAPTER E IGHT –S YNTAX / S EMANTICS I NTERFACE – M ULTI - DIMENSIONAL T YPE T HEORY: RULES , C ATEGORIES AND C OMBINATORS FOR S YNTAX AND S EMANTICS J ØRGEN V ILLADSEN
8.1 I NTRODUCTION We model the syntax and semantics of natural language by constraints, or rules, imposed by the multi-dimensional type theory Nabla (Villadsen, 2010). The only multiplicity we explicitly consider here is two, namely one dimension for the syntax and one dimension for the semantics, but we find the general perspective to be important. For example, issues of pragmatics could be handled as additional dimensions by taking into account direct references to language users and, possibly, other elements of the situation in which expressions are used. We note that it is possible to combine many dimensions into a single dimension using Cartesian products. Hence there is no theoretical difference between a one-dimensional type theory and a multi-dimensional type theory. However, we think that in practice the gain can be substantial. Nabla is a linguistic system based on categorial grammars (Buszkowski et al., 1988) and with so-called lexical and logical combinators (Villadsen,
168
C HAPTER E IGHT
1997) inspired by work in natural logic (Sánchez, 1991). The original goal was to provide a framework in which to do reasoning involving propositional attitudes like knowledge and beliefs (Villadsen, 2001, 2004a). One of the main problems addressed is the rather complicated repertoire of operations that exists besides the notion of categories in traditional Montague grammar. For the syntax we use a categorial grammar along the lines of Lambek. For the semantics we use so-called lexical and logical combinators inspired by work in natural logic. Nabla provides a concise interpretation and a sequent calculus as the basis for implementations.
8.1.1 BACKGROUND In computational linguistics work on Nabla (Villadsen, 2001, 2010) has previously focused on the logical semantics of propositional attitudes replacing the classical higher order (intensional) logic (Montague, 1973) with an inconsistency-tolerant, or paraconsistent, (extensional) logic (Villadsen, 2004b) by introducing some kind of partiality, or indeterminacy (Muskens, 1995). These ideas have also found applications outside natural language semantics as such, in particular in advanced databases and multi-agent systems (Villadsen, 2002, 2004a).
8.1.2 A RGUMENTS In Nabla we can specify a grammar, that is, a definition of a set of wellformed expressions. Given the grammar, we can also specify a logic in Nabla. The logic defines the notion of a correct argument. We use the mark √ for correct arguments and the mark ÷ for incorrect arguments: John is a man. Victoria is a woman. John loves Victoria. John loves a woman. A man loves Victoria.
John loves Victoria. John loves a woman. A man loves Victoria.
√
÷
The sentences above the line are the premises of the argument; the sentences below the line are the conclusions of the argument. By allowing multiple conclusions we obtain a nice symmetry around the line.
M ULTI - DIMENSIONAL T YPE T HEORY
169
Note that the mark indicates what the logic says about the correctness — it does not say what is possible or not possible to infer in any particular situation. For instance, if one do not understand English or just do not understand the single word ‘is’ (one might take it to be synonymous to ‘hates’) then it might be more appropriate to take the first argument to be incorrect; or if one presuppose knowledge about the conventions for male and female names then it might be more appropriate to take the second argument to be correct. To sum up: in Nabla we specify both a grammar and a logic; these defines the set of all arguments (and hence the set of all sentences and other subexpressions) and the set of correct arguments (and hence the set of incorrect arguments is the remaining arguments).
8.1.3 F ORMULAS The logic for the formulas is here first order logic, also known as predicate logic (van Benthem, 1991). The meaning of the first argument is the following formula: MJ ∧WV ∧ LJV ⇒ ∃y(Wy ∧ LJy) ∧ ∃x(Mx ∧ LxV ) We use a rather compact notation. We use lowercase letters for variables and uppercase letters for constants (both for ordinary constants like J and V for ‘John’ and ‘Victoria’ and for predicate constants like M for ‘man’, W for ‘woman’ and L for ‘love’). Note that conjunction ∧ has higher priority than implication ⇒ and that the quantifier ∃ has even higher priority (hence we need the parentheses to get the larger scope).
8.1.4 S TRINGS Traditionally, the map from arguments to formulas would consist of a map from the individual sentences of the argument to formulas and a procedure describing the assembling of the final formula from the separate formulas. The map from the sentences would have to deal with various inflections and possibly minor items like punctuation and rules of capitalisation. Consider again the argument: John is a man. Victoria is a woman. John loves Victoria. John loves a woman. A man loves Victoria.
√
170
C HAPTER E IGHT
In Nabla we make a single pass over the argument to obtain a string, which is a sequence of tokens (a token is to be thought of as a unit representing a word or a phrase): John be a man also Victoria be a woman also John love Victoria so John love a woman also a man love Victoria
Note the tokens so and also as well as the changes to the verbs (the person and tense information is discarded since we only consider present tense, third person). We emphasise that the map from arguments to string is a simple bijection. Only quite trivial manipulations are allowed and the overall word-order must be unchanged.
8.1.5 C OMBINATORS We provide a brief introduction to combinators and the λ -calculus (Schönfinkel, 1967; Stenlund, 1972; Hindley and Seldin, 1986). The combinators and the λ -calculus can be either typed or untyped (Barendregt, 1984); we only consider the typed variant here as it is used to extend classical first order logic to higher order logic (van Benthem, 1991). By f a we mean the application of a function f to an argument a. It is possible to consider multiple arguments, but we prefer to regard f ab as ( f a)b and so on (also known as currying, named after Curry though it was invented by Schönfinkel). A combinator, say x or y, can manipulate the arguments: xfg ; gf
yabc ; cbb
The manipulations are swap ( f and g), deletion (a), duplication (b), and permutation (c). We can define the combinators using the so-called λ abstraction: x ≡ λ ab(ba) y ≡ λ abc(cbb) Hence for example (the numbers are treated as constants): x1(x2y(x34)5) ; x2y(x34)51 ; y2(x34)51 ; 5(x34)(x34)1 ; 5(43)(43)1 The λ -abstraction binds the variables (they were free before). We call a combinator pure if it is defined without constants. We always use uppercase letters for constants and lowercase letters for variables. The combinators x and y are pure. The following combinator send is not pure (R is a constant): send ≡ λ abc(Rcba)
M ULTI - DIMENSIONAL T YPE T HEORY
171
The combinator send means ‘sends . . . to’ and R means ‘receives . . . from’ (also possible to use the combinator receive and the constant S). For example as in ‘Alice sends the box to Charlie’ or ‘Charlie receives the box from Alice’. We use the λ -calculus with the following rules (observe that we write f a rather than f (a) for the application of a function f to an argument a): • α -conversion (y not free in α and y free for x in α ):
λ xα ; λ yα [x := y] • β -reduction (β must be free for x in α ): (λ xα )β ; α [x := β ] • η -reduction (x not free in α ):
λ x(α x) ; α We use ;λ for evaluation using these three rules (λ -conversion). We use the (typed) λ -calculus (van Benthem, 1991) in formulas (and combinator definitions). The higher order logic in Montague grammar is also based on the λ -calculus, but the usual rules of λ -conversion do not hold unrestricted, due to the intensionality present (Muskens, 1995). Our use of combinators is inspired by work in natural logic (Purdy, 1991, 1992; Sánchez, 1991; Villadsen, 1997) and differs from previous uses in computer science, mathematical logic and natural language semantics (Curien, 1986; Curry et al., 1958, 1972; Simons, 1989; Steedman, 1988).
8.1.6 T YPE L ANGUAGE
AND
T YPE I NTERPRETATION
The basic ideas is closely related to the type theory by Morrill (1994). We take a type theory to consist of a type language and a type interpretation. A type language T is given by a set of basic types T0 ⊆ T and a set of rules of type construction. There is a rule of type construction for each type constructor. Each type constructor makes a type out of subtypes. A type interpretation consists of an interpretation function [[·]] with respect to a universe. A universe is a set of objects. A subset of the universe is called a category (for example the empty category and the universal category). The interpretation function maps types to categories. We may call a
172
C HAPTER E IGHT
type a category name (or even just a category, and the interpretation of the type for the category content). A universe together with a type interpretation for basic types [[A]] (A ∈ T0 ) is a model. The type interpretation for arbitrary types is defined compositionally — that is, the type interpretation is a composition of the subtypes interpretations (we have to stay within the universe, of course). Hence we extend a basic type interpretation [[A]] (A ∈ T0 ) to a type interpretation [[A]] (A ∈ T ). It is essential that we do not think of objects as atomic. They can have components; hence we get a multi-dimensional type theory. Let n be the number of dimensions. Each type is interpreted as a category — the members hereof are called inhabitants. An inhabitation is a category for each type. An inhabitation extends another inhabitation if and only if (iff) for each type, the category of the former includes the category of the latter. With respect to the type interpretation an initial inhabitation determines a final inhabitation as its minimal extension satisfying the interpretation of types (we assume that such a minimal extension exists). Note that the interpretation of types is a precise definition of the inhabitants of a type based on the inhabitant of its subtypes.
8.1.7 T HEORY
OF I NHABITATION AND
T HEORY
OF
F ORMATION
We emphasise that an inhabitation is not (just) a basic type interpretation (this holds for initial inhabitations too). An arrow is a component-wise operation on objects labelled by types. An inhabitation satisfies an arrow iff it is closed under the arrow. A theory of inhabitation is a set of arrows. An inhabitation satisfies a theory of inhabitation iff it satisfies every arrow in the theory of inhabitations. An initial inhabitation together with a theory of inhabitation determine a final inhabitation which is the minimal extension of the initial inhabitation satisfying the theory of inhabitation. In order to represent objects and arrow (and inhabitations and theories of inhabitations) we introduce representation languages (let ai range over terms of the representation language for dimension i). An entry is a sequence of terms and a type, written as a1 −. . .−an : A (where n is the number of dimensions). A formation is a set of entries. A sequent or a statement of formation is a configuration and an entry, written as Δ " a1 − . . . − an : A (where the left side contains the antecedents and the right side contains the succedents). A configuration is a finite set of sequences of variable declarations 1 − . . . − xn : A }. A statement of formation gives {x11 − . . . − x1n : A1 , . . . , xm m m
M ULTI - DIMENSIONAL T YPE T HEORY
173
a formation as all instantiations of variables. A theory of formation is a set of statements of formation. An initial formation plus a theory of formation give a final formation in the same way as an initial inhabitation plus a theory of inhabitation give a final inhabitation. We provide a theory of formation by a set of rules of formation, which defines the theory of formation inductively.
8.1.8 NABLA The main task of Nabla is then to define a total set of strings and for each string a set of formulas. If the set of formulas for a string is empty it indicates that the string does not map to an argument. If the set of formulas for a string has more than one member then it shows that the string maps to an ambiguous argument (one or more of the sentences are ambiguous). In Nabla the grammar is completely given by a lexicon (there are no rules specific for the particular fragment of natural language). The lexicon has a set of entries for each token; the set of tokens is called the vocabulary. The grammar determines the string / formula association and the logic determines the validity of the formula. Besides the grammar and the logic we also need a tokeniser which is a quite simple device that turns arguments into strings.
8.2 T HE RULES We define a multi-dimensional type theory with the two dimensions: syntax and semantics. We use a kind of the so-called Lambek calculus with the two type constructors / and \, which are right- and left-looking functors (Lambek, 1958; Moortgat, 1988; Morrill, 1994). We assume a set of basic types T0 , where • ∈ T0 is interpreted as truth values. The set of types T is the smallest set of expressions containing T0 such that if A, B ∈ T then A/B, B\A ∈ T . A structure consists of a vocabulary and a set of bases S ≡ V , B, where V is finite and B(A) = 0/ for all A ∈ T0 . We define three auxiliary functions on types (the first for the syntactic . dimension and the second for the semantic dimension; symbol = is used for
174
C HAPTER E IGHT
such “mathematical” definitions, in contrast with ≡ for literal definitions): . #A$ = V + , A ∈ T . %A& = B(A), A ∈ T0 . . %A/B& = %B\A& = %B& → %A& . |A| = #A$ × %A& By V + we mean the set of (non-empty) sequences of elements from V (such sequences correspond to strings and for the sake of simplicity we call them strings). The universe is A∈T |A| (which depends only on the structure S ). With respect to S we extend a basic type interpretation [[A]] ⊆ |A| (A ∈ T0 ) to a type interpretation [[A]] ⊆ |A| (A ∈ T ) as follows (the concatenation of the strings x and x is written xˆx ): . [[A/B]] = { x, y | for all x , y , if x , y ∈ [[B]] then xˆx , yy ∈ [[A]] } . [[B\A]] = { x, y | for all x , y , if x , y ∈ [[B]] then x ˆx, yy ∈ [[A]] } We use a so-called sequent calculus (Prawitz, 1965) with an explicit semantic dimension and an implicit syntactic dimension. The implicit syntactic dimension means that the antecedents form a sequence rather than a set and that the syntactic component for the succedent is the concatenation of the strings for the antecedents. It should be observed that all rules work unrestricted on the semantic component from the premises to the conclusion. We refer to the resulting sequent calculus as the Nabla calculus. We use Γ (and Δ) for sequences of categories A1 . . . An (n > 0). The rules have sequents of the form Γ " A. The sequent means that if a1 , . . . , an are strings of categories A1 , . . . , An , respectively, then the string that consists of the concatenation of the strings a1 , . . . , an is a string of category A. Hence the sequent A " A is valid for any category A. Rules are displayed starting with the conclusion and the premises indented below. There are two rules for / (a left and a right rule) and two rules for \ too. The left rules specify how to introduce a / or a \ at the left side of the sequent symbol ", and vice versa for the right rules (observe that the introduction is in the conclusion and not in the premises). The reason why we display the rules in this way is that sequents tend to get very long, often as long as a whole line, and hence the more usual tree format would be problematic. Also the conclusion is usually longer than each of the premises.
M ULTI - DIMENSIONAL T YPE T HEORY
175
We note that only the right rule of λ (where α ;λ α is λ -conversion) is possible, since only variables are allowed on the left side of the sequent symbol. x:A " x:A
=
Δ " α : A
α ;λ α
λ
Δ " α :A Δ[Γ] " β [x '→ α ] : B
Cut
Γ " α :A Δ[x : A] " β : B Δ[Γ z : B\A] " γ [x '→ (z β )] : C
\L
Γ " β :B Δ[x : A] " γ : C Γ " λ yα : B\A
\R
y:B Γ " α :A Δ[z : A/B Γ] " γ [x '→ (z β )] : C
/L
Γ " β :B Δ[x : A] " γ : C Γ " λ yα : A/B
/R
Γ y:B " α :A
8.2.1 C OMMENTS The order of the premises does not matter, but we adopt the convention that the minor premises (the premises that “trigger” the introduction of / or \) come first and the major premises (the premises that “circumscribes” the introduction of / or \) come second. The rule /R is to be understood as follows: if we prove that (the syntactic components for the types in) Γ with (the syntactic component for the type) B to the right yield (the syntactic component for the type) A, then we conclude that (. . . ) Γ (alone) yields (. . . ) A/B; furthermore if the variable y represents (the semantic component for the type) B and the term α represents (the
176
C HAPTER E IGHT
semantic component for the type) A, then the λ -abstraction λ yα represents (. . . ) A/B (we do not care about the semantic components for the types in Γ since these are being taken care of in α ). In the same manner the rule /L is to be understood as follows: if we prove that Γ yields B and also prove that Δ with A inserted yields C, then we conclude that Δ with A/B and Γ (in that order) inserted (at the same spot as in the premise) yields C; furthermore if the term β represents B and the term γ represents C (under the assumption that the variable x represents A), then γ with the application (z β ) substituted for all free occurrences of the variable x represents C (under the assumption that the variable z represents A/B).
8.3 T HE C ATEGORIES As basic categories for the lexicon we have N, G, S and the top category • corresponding to the whole argument (do not confuse the basic category N with the constant N for ‘Nick’ and so on). Roughly we have that N corresponds to “names” (proper nouns), G corresponds to “groups” (common nouns) and S to “sentences” (discourses). Consider the following lexical category assignments: John Nick Gloria Victoria : N run dance smile : N \S find love : (N \S)/N man woman thief unicorn : G popular quick : G/G be : (N \S)/N be : (N \S)/(G/G) a every : (S/(N \S))/G ((S/N)\S)/G not : (N \S)/(N \S) nix : S/S and or : S\(S/S) and or : (N \S)\((N \S)/(N \S)) (G/G)\((G/G)/(G/G)) ok : S also : S\(S/S) so : S\(•/S)
M ULTI - DIMENSIONAL T YPE T HEORY
177
8.3.1 C OMMENTS The order of the tokens is the same as for the lexical combinators to come. Together the lexical category assignments and the lexical combinator definitions constitute a set of lexical entries. A lexicon consists of a skeleton and set of lexical entries. For the lexicon the skeleton is just the three basic categories N, G and S (we omit the top category, which is always •). Note that it is not a mistake that there is no for, say, the token be (compared with the combinators be and be ). There simply is just one token (with two meanings).
8.4 T HE C OMBINATORS We introduce the following so-called logical combinators (Stenlund, 1972): ˙ Q ˙ N ˙ C ˙ D ˙ O I˙ T˙ P˙
≡ ≡ ≡ ≡ ≡ ≡ ≡ ≡
λ xy(x = y) λ a(¬a) λ ab(a ∧ b) λ ab(a ∨ b) λ tu∃x(tx ∧ ux) λ tu∀x(tx ⇒ ux) ( λ ab(a ⇒ b)
Equality Negation Conjunction Disjunction Overlap Inclusion Triviality Preservation
After having introduced the logical combinators we introduce the so-called lexical combinators. There is one or more combinator for each token in the vocabulary, for example the combinator John for the token John, be and be for be and so on (tokens and combinators are always spelled exactly the same way except for the (possibly repeated) at the end). In order to display the lexicon more compactly we introduce two placeholders (or “holes”) for combinators and constants, respectively. ) is placeholder for logical combinators (if any) and ◦ is place-holder for (ordinary and predicate) constant (if any); the combinators and constants to be inserted are shown after the | as in the following lexicon: John Nick Gloria Victoria ≡ ◦ | J N G V run dance smile ≡ λ x(◦x) | R D S
178
C HAPTER E IGHT
find love ≡ λ yx(◦xy) | F L man woman thief unicorn ≡ λ x(◦x) | M W T U ˙ | P Q popular quick ≡ λ tx()(◦x)(tx)) | C ˙ be ≡ λ yx()xy) | Q ˙ be ≡ λ f x( f λ y()xy)x) | Q ˙ ˙ a every ≡ λ tu()tu) | O I not ≡ λ tx()(tx)) | N nix ≡ λ a()a) | N and or ≡ λ ab()ab) | C D and or ≡ λ tux()(tx)(ux)) | C D ok ≡ ) | T˙ ˙ also ≡ λ ab()ab) | C ˙ so ≡ λ ab()ab) | P
8.4.1 C OMMENTS It might be possible to display the lexicon in an even more compact way by avoiding the remaining repetitions (for the token be), but we have not found it worthwhile. We have put dots above the logical combinators in order to distinguish them from the more advanced logical combinators previously used (Villadsen, 2001) (the first four are the same; we will not go into details about the remaining *-marked combinator which also uses the special predicates E for existence and I for integrity in a paraconsistent logic): Q N C D E U O I T F P
≡ ≡ ≡ ≡ ≡ ≡ ≡ ≡ ≡ ≡ ≡
λ xy(x = y) λ a(¬a) λ ab(a ∧ b) λ ab(a ∨ b) λ it∃x(Eix ∧ tx) λ it∀x(Eix ⇒ tx) λ itu∃x(Eix ∧ (tx ∧ ux)) λ itu∀x(Eix ⇒ (tx ⇒ ux)) λ i(Ii) λ ip(Ii ∧ pi) λ pq∀i(pi ⇒ qi)
Equality Negation Conjunction Disjunction Existentiality∗ Universality∗ Overlap∗ Inclusion∗ Triviality∗ Filtration∗ Preservation∗
M ULTI - DIMENSIONAL T YPE T HEORY
179
Note that even though we use the λ -calculus of higher order logic we have not used any higher-order quantifications. The overlap combinator takes two sets and test for overlap (analogously for the inclusion combinator). The triviality combinator is used in case of no premises or no conclusions in an argument. The preservation combinator is used between the premises and the conclusions in an argument. Some of the combinators are discussed elsewhere, cf. Hindley and Seldin (1986), in particular the ‘restricted generality’ combinator Ξ corresponding to our logical combinator I˙ (see also Curry et al. (1958)), but usually using the untyped λ -calculus with a definition of so-called canonical terms in order to avoid a paradox discovered by Curry. Let us return to the formula we considered in the introduction: MJ ∧WV ∧ LJV ⇒ ∃y(Wy ∧ LJy) ∧ ∃x(Mx ∧ LxV ) Using the logical combinators we obtain the formula: P˙ ˙ (MJ) (C ˙ (WV ) (LJV ))) (C ˙ λ x(Mx) λ x(LxV ))) ˙ (O ˙ λ y(Wy) λ y(LJy)) (O (C
Due to the η -rule in the λ -calculus there is no difference between W and λ y(Wy), between M and λ x(Mx), or between LJ and λ y(LJy), but there is no immediate alternative for λ x(LxV ). We do not have to list the types of constants since either the type of a constant is ε or the type can be determined by the types of its arguments. At a first glance it may appear as though the use of combinators just makes the formula look more complicated, but we have really added much more structure to the formula. Also, we are so used to the usual formulas of predicate logic that any change is problematic. As soon as we leave the lexicon and turn to string / formula associations, the use of logical combinators is easier to accept. Let us add even more structure to the formula by using the equality combinator (it is triggered by the word ‘is’ in the two first sentences): P˙ ˙ ˙ (O ˙ λ x(W x) λ x(QxV ˙ )) (LJV ))) ˙ (O ˙ λ x(Mx) λ x(QxJ)) (C (C ˙ λ x(Mx) λ x(LxV ))) ˙ (O ˙ λ y(Wy) λ y(LJy)) (O (C
Finally we would like to emphasise that it is not in any way a goal to get rid of all variables although this is surely possible by introducing a series of pure combinators, since the pure combinators in general do not add any useful structure to the formula. We think that the challenge is to find the best balance between the use of combinators and the use of λ -abstractions. Let us return to the previous formula with the logical combinators:
180
C HAPTER E IGHT
P˙ ˙ ˙ (O ˙ λ x(W x) λ x(QxV ˙ )) (LJV ))) ˙ (O ˙ λ x(Mx) λ x(QxJ)) (C (C ˙ λ x(Mx) λ x(LxV ))) ˙ (O ˙ λ y(Wy) λ y(LJy)) (O (C
Using the lexical combinators we obtain the formula: so (also (a man λ x(be x John)) (also (a woman λ x(be x Victoria)) (love Victoria John))) (also (a woman λ x(love x John)) (a man (love Victoria)))
We find this formula remarkably elegant. What remains is the association with the original string: John be a man also Victoria be a woman also John love Victoria so John love a woman also a man love Victoria
This is taken care of by the Nabla calculus. We now turn to some examples.
8.5 E XAMPLES : S YNTAX AND S EMANTICS Consider the tiny argument (where rect):
√
indicates that the argument is cor-
John is a popular man. John is popular.
√
The lexical category assignments to tokens give us the following string / formula association using the sequent calculus: John be a popular man so John be popular ; so (a (popular man) λ x(be x John)) (be popular John) ˙ (Px) (Mx)) λ x(QJx)) ˙ ˙ (PJ) (QJJ)) ˙ ˙ λ x(C (C ; P˙ (O ; PJ ∧ MJ ⇒ PJ
It is really an impressive undertaking, since not only does the order of the combinators not match the order of the tokens, but there is also no immediate clue in the string on how to get the structure of the formula right (“the parentheses”). As expected the resulting formula is valid.
M ULTI - DIMENSIONAL T YPE T HEORY
181
8.5.1 S TEP - BY-S TEP F ORMULA E XTRACTION We consider the following tiny argument with one premise and no conclusion (rather special, but good enough as an example): John smiles.
√
We show that the derivations for this argument yield a formula reducible to ( (and hence that the argument is a correct argument as every argument with no conclusions is). The argument corresponds to the following string: John smile so ok
The token so corresponds to the line in the argument and the token ok corresponds to the omitted conclusions. The string corresponds to the following sequent: N N \S S\(•/S) S " •
Note that • is the top category (arguments). The other categories are given by the lexical category assignments. By using the rules of the Nabla calculus we obtain the following derivation: N N \S S\(•/S) S " •
\L
N N \S " S
\L
N " N
=
S " S
=
•/S S " •
/L
S " S
=
• " •
=
S TEP 1 For simplicity we use numbers 1, 2, 3, . . . as variables. We start from the last line in the derivation, introduce the variables 1 and 2, and use the rule /L to get the term 3 2 (the variable 3 is a fresh variable at the position where the / is introduced):
182
C HAPTER E IGHT /L
3 2 " 32 2 " 2
=
1 " 1
=
S TEP 2 We reuse the variable 1 and introduce the variable 4, and use the rule \L to get the term 4 5 (the variable 5 is a fresh variable at the position where the \ is introduced). At last we use the rule \L to get the term 1 (5 4) 2 (the variable 2 can be reused): \L
4 5 1 2 " 1 (5 4) 2
\L
4 5 " 54 4 " 4
=
1 " 1
= /L
3 2 " 32 2 " 2
=
1 " 1
=
S TEP 3 The tokens of the string correspond to the variables 4, 5, 1 and 2, respectively, and the lexical combinators are inserted yielding the extracted formula. Using the logical combinators the formula is then finally reduced to ( as promised: John smile ok so ; so (smile John) ok ; λ ab(P˙ a b) (λ x(Sx) J) T˙ ˙ ; P˙ (SJ) T ; λ ab(a ⇒ b) (SJ) (
M ULTI - DIMENSIONAL T YPE T HEORY
183
; SJ ⇒ ( ; (
This completes the step-by-step example.
8.5.2 F URTHER E XAMPLES We first consider the argument using the string from the tokeniser: John runs. John is Nick. Nick runs.
√
John run also John be Nick so Nick run
Here John has category N, run has category N \S, also has category S\(S/S) and so on. The string has the top category •, since it is an argument. By using the rules of the Nabla calculus we obtain the following derivation: N N \S S\(S/S) N (N \S)/N N S\(•/S) N N \S " • N N \S S\(S/S) N (N \S)/N N " S N N \S " S
\L \L \L
N " N
=
S " S
=
S/S N (N \S)/N N " S
/L
N (N \S)/N N " S
/L
N " N
=
N N \S " S
S " S •/S N N \S " • N N \S " S
\L
N " N
=
S " S
= = /L \L
184
C HAPTER E IGHT
N " N
=
S " S
=
• " •
=
We extract the following formula for the derivation. John run also John be Nick so Nick run ; so (also (run John) (be Nick John)) (run Nick) ˙ (RJ) (QJN)) ˙ ; P˙ (C (RN) ; RJ ∧ J = N ⇒ RN
Observe the reverse order of John and Nick in the formula with the lexical combinators. All transitive verbs and the copula (token be) have the object before the subject in formulas with lexical combinators. In the final formula the order is not reversed. Only left rules were used in the derivation above. The following argument requires a right rule due to the existential quantifier (token a) in the object position of the copula (token be with combinators be and be ): John is a popular man. John is popular.
√
John be a popular man so John be popular
N (N \S)/N ((S/N)\S)/G G/G G S\(•/S) N (N \S)/(G/G) G/G " • N (N \S)/N ((S/N)\S)/G G/G G " S G " G
\L /L
=
N (N \S)/N ((S/N)\S)/G G " S
/L
N (N \S)/N (S/N)\S " S
\L
N (N \S)/N " S/N
/R
N (N \S)/N N " S
/L
N " N
=
N N \S " S
\L
M ULTI - DIMENSIONAL T YPE T HEORY
185
N " N
=
S " S
=
S " S G " G •/S N (N \S)/(G/G) G/G " •
= = /L
N (N \S)/(G/G) G/G " S
/L
G/G " G/G
=
N N \S " S
\L
N " N
=
S " S
=
• " •
=
John be a popular man so John be popular ; so (a (popular man) λ x(be x John)) (be popular John) ˙ λ x(C ˙ (Px) (Mx)) λ x(QJx)) ˙ ˙ (PJ) (QJJ)) ˙ ; P˙ (O (C ; PJ ∧ MJ ⇒ PJ
Here the use of lexical and logical combinators is more substantial.
8.6 C ONCLUSION The multi-dimensional type theory Nabla provides a concise interpretation and a sequent calculus as the basis for implementations. Of course other calculi are possible for the same interpretation: • Further type constructions for a larger natural language coverage, cf. the treatment of propositional attitudes by Villadsen (2001, 2004a) who also replaces the classical logic with a paraconsistent logic. • Other constraint solving technologies, cf. work on glue semantics (Dalrymple, 1999), XDG (Extensible Dependency Grammar) (Debusmann et al., 2004), CHRG (Constraint Handling Rules Grammar) (Christiansen, 2002) as well as categorial grammars (Moot, 1999; de Groote, 2001; Kuhlmann, 2002).
186
C HAPTER E IGHT
We also consider integrations of work concerning the ontology underlying natural language, to be specified in the lexicon, cf. as a starting point (Dölling, 1995). We are interested in a description of both syntax, semantics, and pragmatics of natural language. As a brief illustration of the kind of semantic / pragmatic problems we have in mind we quote the famous “fallacy of accent” story: Even the literal truth can be made use of, through manipulation of its placement, to deceive with accent. Disgusted with his first mate who was repeatedly inebriated on duty, the captain of a ship noted in the ship’s logbook, almost every day, “The mate was drunk today.” The angry mate took his revenge. Keeping the log himself on a day when the captain was ill, the mate recorded, “The captain was sober today.”
I. M. Copi & C. Cohen (2002) Introduction to Logic (11th ed.) Prentice Hall, p. 167.
B IBLIOGRAPHY Barendregt, H. P. (1984). The Lambda Calculus, Its Syntax and Semantics. North-Holland, revised edition. Buszkowski, W., Marciszewski, W., and van Benthem, J., editors (1988). Categorial Grammar. John Benjamins Publishing Company. Christiansen, H. (2002). Logical grammars based on constraint handling rules. In P. J. Stuckey, editor, 18th International Conference on Logic Programming, page 481. Springer-Verlag. LNCS 2401. Curien, P.-L. (1986). Categorical Combinators, Sequential Algorithms and Functional Programming. Pitman. Curry, H. B., Feys, R., and Craig, W. (1958). Combinatory Logic — Volume I. North-Holland. Curry, H. B., Hindley, J. R., and Seldin, J. P. (1972). Combinatory Logic — Volume II. North-Holland. Dalrymple, M., editor (1999). Semantics and Syntax in Lexical Functional Grammar: The Resource Logic Approach. MIT Press.
M ULTI - DIMENSIONAL T YPE T HEORY
187
de Groote, P. (2001). Towards abstract categorial grammars. In 39th Annual Meeting of the Association for Computational Linguistics, pages 148– 155, Toulouse, France. Debusmann, R., Duchier, D., Koller, A., Kuhlmann, M., Smolka, G., and Thater, S. (2004). A relational syntax-semantics interface based on dependency grammar. In Proceedings of the 20th International Conference on Computational Linguistics (Coling 2004), pages 176–182, Geneva, Switzerland. COLING. Dölling, J. (1995). Ontological domains, semantic sorts and systematic ambiguity. International Journal of Human-Computer Studies, 43, 785–807. Hindley, J. R. and Seldin, J. P. (1986). Introduction to Combinators and λ Calculus, volume 1 of London Mathematical Society Student Texts. Cambridge University Press. Kuhlmann, M. (2002). Towards a Constraint Parser for Categorial Type Logics. Master’s thesis, Division of Informatics, University of Edinburgh. Lambek, J. (1958). The mathematics of sentence structure. American Mathematical Monthly, 65, 154–170. Montague, R. (1973). The proper treatment of quantification in ordinary English. In J. Hintikka et al., editors, Approaches to Natural Language, pages 221–242. D. Reidel. Moortgat, M. (1988). Categorial Investigations — Logical and Linguistic Aspects of the Lambek Calculus. Foris Publications. Moot, R. (1999). Grail: An interactive parser for categorial grammars. In R. Delmonte, editor, VEXTAL, pages 255–261. Venice International University. Morrill, G. (1994). Type Logical Grammar. Kluwer Academic Publishers. Muskens, R. (1995). Meaning and Partiality. CSLI Publications, Stanford, California. Prawitz, D. (1965). Natural Deduction, volume 3 of Stockholm Studies in Philosophy. Almqvist & Wiksell. Purdy, W. C. (1991). A logic for natural language. Notre Dame Journal of Formal Logic, 32, 409–425.
188
C HAPTER E IGHT
Purdy, W. C. (1992). Surface reasoning. Notre Dame Journal of Formal Logic, 33, 13–36. Sánchez, V. (1991). Studies on Natural Logic and Categorial Grammar. Ph.D. thesis, University of Amsterdam. Schönfinkel, M. (1967). On the building blocks of mathematical logic. In J. van Heijenoort, editor, From Frege to Gödel — A Source Book in Mathematical Logic (1879–1931). Harvard University Press. Original 1924. Simons, P. (1989). Combinators and categorial grammar. Notre Dame Journal of Formal Logic, 30, 242–261. Steedman, M. (1988). Combinators and grammars. In R. T. Oehrle, E. Bach, and D. Wheeler, editors, Categorial Grammars and Natural Language Structures, pages 417–442. D. Reidel. Stenlund, S. (1972). Combinators, λ -Terms and Proof Theory. D. Reidel. van Benthem, J. (1991). Language in Action: Categories, Lambdas and Dynamic Logic. North-Holland. Villadsen, J. (1997). Using lexical and logical combinators in natural language semantics. Consciousness Research Abstracts, 1, 51–52. Villadsen, J. (2001). Combinators for paraconsistent attitudes. In P. de Groote, G. Morrill, and C. Retoré, editors, Logical Aspects of Computational Linguistics, pages 261–278. Lecture Notes in Computer Science 2099, Springer-Verlag. Villadsen, J. (2002). Paraconsistent query answering systems. In T. Andreasen, A. Motro, H. Christiansen, and H. L. Larsen, editors, Flexible Query Answering Systems, pages 370–384. Lecture Notes in Computer Science 2522, Springer-Verlag. Villadsen, J. (2004a). Paraconsistent assertions. In G. Lindemann, J. Denzinger, I. J. Timm, and R. Unland, editors, Multi-Agent System Technologies, pages 99–113. Lecture Notes in Computer Science 3187, SpringerVerlag. Villadsen, J. (2004b). A paraconsistent higher order logic. In B. Buchberger and J. A. Campbell, editors, Artificial Intelligence and Symbolic Computation, pages 38–51. Lecture Notes in Computer Science 3249, Springer-Verlag.
M ULTI - DIMENSIONAL T YPE T HEORY
189
Villadsen, J. (2010). Nabla: A Linguistic System Based on Type Theory. LIT Verlag. Foundations of Communication and Cognition (New Series).
C HAPTER N INE –S IGN L ANGUAGE – C ONSTRAINT- BASED S IGN L ANGUAGE P ROCESSING A NNELIES B RAFFORT, M ICHAEL F ILHOL
9.1 I NTRODUCTION Sign Languages (SLs) are languages used to communicate with and among the deaf communities. They are natural languages whose origins and evolution cannot be traced any more easily than for spoken languages (SpLs). SL research is still in its infancy. All the more indeed, in the computer science field of natural language processing (NLP), SL has only appeared very recently. Three major issues are researched: SL recognition, generation, and translation. Recognition, the first to appear some 20 years ago, involves capturing devices such as camera or motion capture systems, and signal processing to extract the relevant data. SL generation, alternatively called SL synthesis, began a few years later, following the increasing capabilities of computers. It involves a virtual character model to be animated, named virtual signer or signing avatar (see Figure 9.9 page 213), and computer graphics programming techniques to render a video output. The most recent research topic is translation, for which results of both former fields can be the input and/or output. All these studies need to integrate linguistic knowledge to a certain
192
C HAPTER N INE
degree, together with other models to support image processing and animation tasks. SLs are less-resourced languages. In other words, there are very few reference books describing them (grammar rules, etc.), a limited number of dictionaries, and mainly small-sized corpora. Because of the lack of corpora in the earlier ages of SL linguistic studies, linguistic descriptions were often built from intuitive knowledge of researchers completed by native signer interviews. With the emergence of SL corpora, the methodologies have evolved towards corpus-based studies allowing researchers to create statistically-informed models. This chapter gives an overview of the main trends and on our current investigations. The next section 9.2 sketches the main linguistic properties of SL. Section 9.3 presents the main approaches to SL linguistics and the major trends in SL modelling, and section 9.4 the alternative approach that we are exploring, implemented in a web-based application described in section 9.5. The last section 9.6 concludes on the current limitations and gives a few considerations for the future.
9.2 L INGUISTIC DESCRIPTION OF S IGN L ANGUAGES Spoken and gestural languages use different channels: SpLs are audiophonatory, whereas SLs are visuo-gestural. As a consequence, one may question if SLs operate the same way SpLs do, or if they follow specific rules induced by use of corporal articulators and visual perception. This latter modality seems to promote simultaneous use of a number of articulators, the linguistic use of the space in front of the signer so-called signing space, and the omnipresence of iconicity at all levels of the language. The main point is the role of the modality in the structuring of language and in the organisation of its functions. International research has tended to adopt one of two diametrically opposed positions with respect to iconicity. Some researchers have set aside the iconic dimension, in order to focus on the dimensions traditionally explored in structuralist or generative linguistics (particularly phonology and syntax) respectively from a structural or a formal point of view and with minimal consideration for the implications of iconicity. Conversely, other authors have constructed specific theories for sign languages producing very different models. Finally, a third category, considered as intermediate, proposes parametric models taking into account iconic factors. In this section, we report on some consequences of these epistemolog-
C ONSTRAINT- BASED S IGN L ANGUAGE P ROCESSING
193
ical positions on SL linguistics, for the phonology, the phonetics, and the lexicon levels, and conclude on the problems raised by SL description with the classical linguistic levels.
9.2.1 P HONOLOGY The first linguistic studies on SLs, initiated by Stokoe, a linguist studying American Sign Language, focused on the description of the lexicon at the phonological level (Stokoe, 1960; Klima and Bellugi, 1979). The sign, most often considered as the structural equivalent of word, is divided into four manuals parameters: hand shape, location, movement, and orientation of palm. These parameters have been initially considered as gestural equivalents of phonemes. Then, the hand shapes have be described in terms of features. Most of the phonological models (Boyes-Braem, 1981; Liddell and Johnson, 1989; Brentari, 1998) refer, most of the time implicitly, to the gestural phoneme as the basic unit. These authors have adopted an approach with the aim to identify in SLs what was already known and described in SpLs, leaving aside the difference in their modality. However, conventional phonological models do not take into account the iconic dimension observable in the sign form. They consider that these elements are involved at a different level than the phonological one and are therefore not likely to constrain the sign form. Some authors attempt to reconcile double articulation and iconicity. Thus, for Stokoe himself (Stokoe, 1991), there is "no compelling reason why meaning and form cannot have that meaningful relationship in phonology". He has even proposes a semantic phonology, which "ties the last step to the first, making a seamless system of this pitty-pat progression". His proposal is close in some respects to that of Cuxac (Cuxac, 2004) in France, who proposes to reverse the double articulation for SLs. In Cuxac’s model (Cuxac, 2000, 2004), the parametrics units are not phonological units, and therefore there is no duality of patterning or an inversed duality of patterning, with a morphological level which constraints the phonological level.
9.2.2 P HONETICS Unlike SpLs, for which the first phonological descriptions were fueled by a phonetic description (e.g. International Phonetic Alphabet), the search for abstract categories in SLs did arrive in a first time (Prillwitz et al., 1989;
194
C HAPTER N INE
Johnson and Liddell, 2010). It is the need to note accurate descriptive elements that led to develop phonetic approaches that describes the realisation of gestural elements. These notation systems have always been based on Stokoe’s parametric model and represent a perceptual standpoint. Articulatory descriptions are quite nonexistent for SLs. Phonetic and articulatory approaches are therefore generally lacking in SL linguistics. The phonetic level is mainly studied in the context of SL processing, for recognition or generation (Braffort, 1996; Elliott et al., 2004; Filhol, 2008).
9.2.3 L EXICON The question raised by many authors is to define the extent of SL lexicon, which categories should be defined, according to what criteria, without enforcing any equivalence between a sign and a word. In other words, one needs to define what should be found in a SL dictionary that define concepts rather than translate signs in words or reciprocally such as in bilingual dictionaries (Cuxac, 2004; Boutora, 2008). When building dictionaries, authors generally consider what is called the citation form of a sign, referring to the form of a sign used in isolation, as opposed to continuous signing. Indeed, in continuous signing, there can be modifications of one or several of the constituting parameters with an infinite number of possibilities. Figure 9.1 shows an example of such a phenomenon. The sign meaning BALL is shown in its citation form as given in a dictionary (a), and annotated with the elements and measures whose values can vary depending on the context (b). For example, to express a big ball, the radius (Rad in the figure) of the circle drawn by the hands’ paths may change in size. To express the spatial relationship between a ball and a previously signed entity such as a table, the location of the virtual circle (Loc in the figure) may take a value geometrically consistent with that of the table. Also, not all signs encountered in SLs own a citation form. Signs vary in degrees of conventional specification and stabilisation. Some authors distinguished three degrees of conventionalisation (Johnston, 2010): Fully-lexical signs, highly conventionalised for which both form and meaning are (relatively) stable or consistent across contexts (they can easily be listed in a dictionary); partly-lexical signs, combinations of conventional and contextual elements, described in the SL linguistics literature as depicting (also known as classifier or polymorphemic) signs and indexing (or pointing) signs. They can be an infinite number and cannot be enumerated and listed in a dictio-
C ONSTRAINT- BASED S IGN L ANGUAGE P ROCESSING
(a)
195
(b)
Figure 9.1: The sign BALL in its citation form (a) (source: Dictionnaire bilingue LSF, Editions IVT), and the parameters that can vary depending on the context (b).
nary in any straightforward way; non-lexical signs, which are gestures that can be culturally shared or idiosyncratic and occur commonly in signed discourse just as they do in spoken discourse, non-manual gestures that occur when the signer is actually performing some action of a character in a role, or fingerspelling that is the representation of the letters of a writing system using a manual code. For other authors, the lexical signs are not the core of SLs, rather some highly iconic structures (Cuxac, 2000, 2004). These structures are referred to using various terminologies, depending on the authors. They include the depicting signs mentioned before but also other structures called role shifts, constructed actions, transfers, etc. They can be used to express utterances without using any dictionary-listed sign. They are also highly combinable, with other iconic structures, lexical signs or pointings. These constructions are used, for example, for lexical creation, and when annotating SL corpora, it is sometimes not easy to decide on the degree of lexicality of such units. Moreover, this lexicalisation can be reversed when needed, to fit to a different context. In this case, it is difficult, and often impossible, to decide and freeze the lexical status of such units. For example, Figure 9.2 shows the sign meaning CAR PARK in its citation form. Its manual structure is identical to the depicting sign expressing “a countable set of vehicles in a row”. The signer’s weak hand establishes a locative reference in space, while the strong hand performs the linear and regular arrangement starting near this reference.
196
C HAPTER N INE
Figure 9.2: The sign CAR PARK (source: Dictionnaire bilingue LSF, Editions IVT)
9.2.4 L EXICON ,
SYNTAX ... AND LINGUISTIC LEVELS
Describing SL syntax should include three aspects: multilinearity of the articulators, use of the signing space, and iconicity. But here also, the opposed positions on the role of iconicity, non-manual articulators and spatial constraints produce models that are more or less close to those usually encountered for Indo-european vocal languages (VLs), being spoken or written. Some studies are mainly centred on “word order” that do not account the other kinds of constraints (Neidle et al., 2000), and conversely, others do not consider analysis at the syntactical level (Cuxac, 2000). Nevertheless, there is an organisation at the discourse level, that is not only temporal, but more spatio-temporal, and integrates other kinds of constraints, such as the spatial organisation of the discourse using the signing space. Then, it is arguably difficult to consider a syntactic level in the classical sense. Also, it may be difficult to distinguish a lexical sign from a syntactic construction in SL, for example productive signs can be equivalent to complex sentences in SpL, but form a single sign unit in terms of parametric description: a handshape, start and end locations, a movement that reflects the action. Moreover, the fact that several articulators can be active in a simultaneous way allows the signer to express different content with various linguistic role using non-manual and manual elements at the same time. In summary, the separation into distinct analysis levels raises a number of problems. Several researchers have criticised the transfer of VL models to the description of SL or consider that it is necessary to carefully manipulate the concepts inherited from the description of spoken/written languages (Miller, 2000). This transfer phenomenon is not specific to SL. The inadequacy of the models developed for the Indo-european languages also exists
C ONSTRAINT- BASED S IGN L ANGUAGE P ROCESSING
197
with VLs belonging to different families of languages and based on different categories (Slobin, 2013). The sign-word equivalence in particular is an unstable foundation on which seem to be based many assumptions on the other levels of analysis. This assumption of structural equivalence is maintained by the practices of SL corpora annotation at all levels of analysis, practices that fail to account for the form-meaning relationship in these languages. We explain in the next section the impact of such situation on the nature of the computational models that are proposed in SL processing.
9.3 L ANGUAGE MODELS This section presents the various approaches that are currently explored for linguistic modelling of SL. The first two parts present two perspectives from the world of VL, that are often opposed: rule-based grammars and machine learning approaches. A third part describes SL-specific approaches, i.e. new proposals not based on transfers from existing models designed for written or spoken languages. In the last part, we provide an inventory of representations, whether existing, still requiring significant research or even not really tried out at all.
9.3.1 G ENERATIVE / CATEGORICAL
GRAMMARS
This approach relies on the definition of a lexical level composed of the terminal nodes of a grammar, and the grammar itself which defines the syntax. A lot of the syntactic structure of written languages can be captured by generative grammar, and when the first SL synthesis projects started, it was still the most advanced and wide-spread formal theory for language at the phrase level. Incidentally, the generative field was very much tied to the universalist school of thought, according to which most of the human faculty for language is innate and only a few language-specific parameters need be learnt by a child to acquire his first language. In other words, generative grammars should be able to cover all languages, whether well-known or unexplored. Though it is not clear whether SL enthusiasts really agreed with this hypothesis, there were reasons for which SL processing began using generative grammar. As we have seen, structure grammars rely on a lexicon (and usually some compositional structure for its units) and a set of syntactic rules (together with unification constraints). Therefore, a prerequisite for using generative
198
C HAPTER N INE
grammar with SL is to define equivalents in the chosen SL. Invariably, the generative approaches to SL used a lexicon of citation forms, and Stokoestyle manual parameters for the lower levels. To some extent, citation forms can be made flexible if added, say, a sub-lexical feature “location” to be unified with one of a verb argument. Unification constraints on the “location” parameter can account for “spatial agreement”, e.g. when a verb is directed to a symbolic point in space. This approach was first demonstrated by the the ViSiCAST and eSign projects and their generative HPSG framework (Marshall and Safar, 2004), in which a full text-to-SL pipeline was designed. Other projects have followed taking similar approaches, some dedicated to SL generation or textto-sign translations (San-Segundo et al., 2012), sometimes combined with data-driven approaches (Lombardo et al., 2010); others to SL recognition or translation (Wu et al., 2007). We call these models “linear”, because they consider SL utterances as sequences of gestural units, systematically identified as lexical and comparable to the string of words that compose a sentence. All signed components or added features then align with those lexical boundaries, whether manual or non-manual, whether composing a sign like phonemes or morphemes or of syntactic origin. However, as explained in section 9.2, many specificities in SL involve a lot more than sequences of signs in their citation form and possible parametric agreement.
9.3.2 M ACHINE
LEARNING APPROACHES
Once able to integrate statistical approaches, speech recognition systems have improved significantly. It was made possible by the collection of very large corpora. Due to the lack of comparable-sized data, it is not presently possible to follow the same path with SLs (Cooper and Bowden, 2009) and expect the same results. Despite this problem, some studies focus on SL recognition, using Hidden Markov Models, and variants attempting to adjust the algorithms and consider simultaneity (Vogler and Metaxas, 2001). Given a well-designed system built on a representative corpus, accounting for as much variability as possible, these approaches face strong limitations due to the linguistic specificities of SL: 1. signs must be broken down into smaller units of phonetic nature, whose relevance, definition and detection are still problematic (Theodorakis et al., 2010);
C ONSTRAINT- BASED S IGN L ANGUAGE P ROCESSING
199
2. lexical signs can bear strong modification of their constituents depending on the context, and modelling all possible variations can require too many different training examples to keep the categories consistent; 3. the depicting signs are built on the fly and are not listable in a dictionary, which makes them extremely difficult to be modelled with a classical machine example learning technique. Other phenomena, specific to SL, must be considered. As explained before, an SL production involves manual and non-manual articulators, more or less synchronised on very different spatial and temporal scales. Moreover, the signer uses the signing space to support and structure his discourse. Signs, whether lexical or productive are (topologically) arranged in this space, and many pointing and referencing operations and constraints are observed. This spatial and multi-component property makes the speech recognition tools developed for linear SpL inadequate for SL (Dalle, 2006). By design, machine-learning approaches are still linear models, where a SL utterance is considered describable in sequences of units. Consequently, they suffer from the same limitations as raised with the previous approach. Other models, built specifically for SL, were developed to overcome some of these limitations. We discuss these in the next section.
9.3.3 SL- SPECIFIC
APPROACHES
In parallel with these classical approaches, researchers advocate the need to develop new approaches that are specific to SLs. The main criticisms are that linear models, whether grammar- or learning-based, do not allow to represent the multilinearity and the complex synchronisation patterns involving all (manual and non-manual) articulators of SL productions, and that the machine learning approach is based on huge amounts of data, not available in SL. The main challenges in SL modelling and processing are: the relevant use of space in SL discourse, the iconicity that is present at various levels (iconic morphemes, spatial agreement...), and the multilinearity of the language.
200
C HAPTER N INE
M ULTILINEARITY Multilinearity enables parallel gestural events, carried by different articulators over different time spans. A first account of multi-track description was given by the P/C formalism (Huenerfauth, 2006), also used by LópezColino in his thesis work (López-Colino and Colás, 2012). It defines two ways of conjoining parts of a signing activity: constituting and partitioning, respectively represented by C- and P-nodes of a recursive structure terminated with leaf nodes (often lexical) or “null” nodes (padding out the tracks where necessary): • C-nodes build sequences of two or more children nodes. For instance, if all that is needed is a sequence of lexical signs, they are constituted in order under a parent C-node, which is represented by a single timeline on which the constituted signs are aligned in sequence; • P-nodes are used to create parallel tracks sharing the same time span (i.e. interval boundaries) for the their respective contents; • “null” nodes (Ø) allows to give precedence or different durations to partitioned time tracks, Such recursive structure accounts for simultaneity of parts of signing, without considering the non-manual gestures as sub-lexical features of an otherwise manual lexical sequence. S1 : “the child approaches the car” For example, sentence S1 , translated in French Sign language (LSF), is composed of: • a sequence of the four manual units shown in Figure 9.3: 1. the two-handed sign CAR; 2. a weak-hand posture with a specific handshape for vehicles, establishing its location and orientation in the signing space—we call this handshape a classifier or proform; 3. the one-handed sign CHILD, during which the weak hand classifier is sometimes held in place; 4. a strong-hand movement of a person classifier depicting the displacement of the child towards the car’s location anchored in space by the weak hand;
C ONSTRAINT- BASED S IGN L ANGUAGE P ROCESSING
201
• an eye gaze directed to the weak hand starting before the second unit in our example, and another one to the strong hand on the last one.
Figure 9.3: The four manual units composing S1 translated in LSF
Figure 9.4: Signed example sentence S1 modelled with P/C and null nodes
Figure 9.4 illustrates how S1 is represented using the P/C model. While P/C does overcome limitations of the linear models, it does not however generalise well to a higher grammatical level. Looking at SL corpora, we see for example that negating head shakes generally run after the accompanying sign sequence has ended, and their duration is pretty constant regardless of the sequence duration. We also note that in the case of sequences longer than a couple of signs, they hardly ever start at the same time but rather towards the last sign, and in the case of single negated signs, they are even likely to start before the sign. Depending on the negated sign sequence and the consequent overlap or inclusion of the tracks, the nesting scheme of the C- and Ø nodes in the parent P-node may change dramatically, which makes generalisation of a production rule for negating head shake impossible. For others, to enable linguistic generalisation, every relevant interval of time (illustrated with elongated boxes in the example diagrams above) in which a part of the signing activity takes place must be made able freely to float on its track, with no restriction to do so over the parallel intervals’ boundaries (Filhol, 2012). The next section describes such an approach.
202
C HAPTER N INE
D E - LEVELLING Most of the approaches above imply or assume the possible identification of separate lexical and syntactic levels, each having its own modelling system, and as explained in section 9.2, this is problematic. For us, the lack of established theory and formal knowledge on lessstudied languages is a chance for SL processing not to miss out on SL specificities, and formalise SL without the bias of other language structure. To do so, a possibility is to fall back on weaker hypotheses and a methodology whose starting point is the search for links between signed forms and linguistic functions, i.e. between the visible features, i.e. the states and movements of the language’s articulators, e.g. “eyelids closed”, and the interpreted purpose of the production (meaning), whether rhetoric, semantic, lexical or unidentified, e.g. “topic change” or “add pejorative judgement on person/object”. Starting with either a form or a function, a refining search process is repeated back and forth until either: • invariants in form can be found and formalised for many occurrences of an identified function—this raises a production rule that can be animated by SL synthesis software; • a definite function can be positively interpreted for every occurrence of a certain form criterion—this makes an interpretation rule, to be triggered in SL recognition tasks. For example in LSF, it was observed that placements of named objects in the signing space (such as the first part the the sentence S1 ) all had the same pattern: the object is signed first, then a classifier (arm, hand or finger) is established at the chosen location with a little downward movement, immediately preceded by an eye gaze directed at the same location (Braffort and Dalle, 2008). We make three working hypotheses towards a formal grammatical system applicable to SLs for automatic synthesis. • The first one is that rules established in such manner can equally pertain to: – the lexical level, if it specifies a stabilised correspondence between an articulated form and a concept such as would be found in a dictionary; – syntax, when ordering units as it is sometimes necessary;
C ONSTRAINT- BASED S IGN L ANGUAGE P ROCESSING
203
– a grammatical structure in a wider sense, when sequence alone is not sufficient and finer synchronisation rules of overlapping gestural parts are needed; – discourse structure, if it describes how to outline a talk, emphasise titles, add the right pauses between the sections... The example rule above (placement in the signing space) would probably be situated somewhere between syntax and discourse, without perfectly fitting either. • The second hypothesis is that rules are parameterisable, in the sense that the form features described for a given function can: – wrap around a generic placeholder for which a yet unknown signing specification should be given, e.g. the classifier in our example rule; – depend on a context-dependent value, e.g. the target point for the placement rule example, on which both the eye gaze and hand location depend. • The third hypothesis is that a sufficient set of nestable production rules can be found to constitute a fully compositional SL production grammar, able to derive any utterance. This is based on the observation that many rules can be composed, one using another as a parameter argument. For example, given our rule above, illustrated in 9.5a, and one whose function is to make signed objects smaller, illustrated in 9.5b, the object placeholder of the former can be filled with a call to the latter to represent (and produce) the placement of a smaller object. The object to be used in the call to 9.5b will make use of another rule—say, one for a lexical item—which further demonstrates the underlying recursion of the proposed model. Note In the Figures 9.5a and 9.5b: • the boxes bound the time intervals during which the contained form description occurs; • the arrangement on the vertical axis is arbitrary—the diagrams are not to be read as annotation tiers where one would find one articulator per tier; • italics in the boxes identify the rule arguments (i.e., boxes to be filled).
204
C HAPTER N INE
(a)
(b)
Figure 9.5: Rule time line illustrations (parameter arguments are in italics)
Such de-levelled approach to SL descriptions allows but does not require identification and systematic use of, for instance, a syntactic or a lexical layer, rather all phonetic productions or parameterisable generalisations are described in the same way. With this view, SL productions are recursively generated from higher-level entries, through the finer and finer descriptions contained and all the way down to the articulator movements (i.e. the phonetics). Verifying this hypothesis would validate quite a novel type of formal grammar for SLs, and can shed a new light on Natural Language Processing altogether. To describe a form, people at the LIMSI laboratory develop a language called AZee (Filhol, 2012), which is defined in the next section.
9.4 AZ EE AZee is a model to describe and synchronise articulatory forms, built with the aim to synthesise signed productions with a virtual signer. Its philosophy is to describe the set of necessary and sufficient constraints of any kinds, being articulatory, contextual or temporal. AZee allows: • to make exclusive use of necessary and sufficient articulatory constraints, i.e. no Stokoe-like parametric value is mandatory, only the required articulations are to be specified—for example, the thumb position and the angle between the flat fingers and the palm are not constrained in HELLO, THANK YOU (Figure 9.6); • to enable generic functional rules (whether or not lexical) and their associated forms, including invariant and context-dependent specification (in the BALL example, the symmetry in the manual path is invariant and the location is context-dependent); • to specify any articulation (whether or not manual) at any time relative to another, for general specification on the time axis—cf. our
C ONSTRAINT- BASED S IGN L ANGUAGE P ROCESSING
205
classifier placement example where eye gaze systematically precedes the hand’s movement.
Figure 9.6: The sign HELLO, THANK YOU
The basic instruments of the model are a set of native types and a set of typed operators and constants to build expressions normally resulting in XML specifications of animations to synthesise with a software avatar engine. The full set of types is : NUM, BOOL, LIST, VECT, POINT, SIDE, BONE, CSTR, SCORE, AZOP. The first three are all-purpose value types for constraint expressions. VECT and POINT allow for the geometric specification of signing space objects like locations and orientations. SIDE and BONE refer to the signer’s skeleton and are used in the articulatory constraints. The main three are detailed below: CSTR Constraints that may apply at a point in time, of three main types: bone orientation and placement (forward/inverse kinematics), morphs (for non-skeleton articulators like facial muscles), and eyegaze direction. SCORE Animation specifications, normally the result of an expression to be used as synthesis input. The only type to cover time, CSTR being articulatory but instantaneous. An XML description excerpt is given in Figure 9.7. It basically specifies a list of time-stamped keyframes in a first section, and a list of articulations and morph values to be reached at given keyframes, or held between given keyframes. The basic idea is that any articulator not given a morph value or a joint rotation may be interpolated to reach its next state, or simply take a rest or comfort-selected position. AZOP Equivalent to functions in functional programming languages. They are to be applied to named argument expressions and result in new
206
C HAPTER N INE
expressions. They are most useful to write production rules with non-frozen signed output. For instance, while a shoulder shrug gesture or some non-modifiable sign may be frozen thus described as a SCORE directly, most grammatical rules will be AZOPs with named arguments—such as duration —and a SCORE output, whose expression depends on the arguments.
Figure 9.7: An AZee output of type SCORE
Here is a selection of AZee operators of various argument and result types, which should give an idea of a few things possible with AZee. orient: orientation constraint Type: str, BONE, str, VECT → CSTR Articulatory constraint to orient skeleton bones in the signing space. The first argument is either ‘DIR’ or ‘NRM’ depending on whether the bone axis to be oriented is the direction bone (to make it point in
C ONSTRAINT- BASED S IGN L ANGUAGE P ROCESSING
207
a direction) or the normal bone (to lie it in a plane). The second is usually ‘along’ to align the vector in the given vector direction, but ‘//’ is possible to allow opposite direction. place: placement constraint Type: site, POINT → CSTR Articulatory constraint placing a body site at a point in space. The first parameter is a POINT expression, but is not evaluated to 3d coordinates of the point. It must be a body site expression, i.e. one referring to a point on the skin, to be placed at the point given by the second parameter. morph: morph constraint Type: str, NUM → CSTR Articulatory constraint to control non-skeletal articulators such as facial muscles. Morphs have ID names, and can be combined with weights. The first argument is the morph ID to be used; the second is its [0, 1] weight. key: hold constraints Type: NUM, CSTR → SCORE This operation creates the most basic score. A “key(D, C)” expression returns a score of duration D, made of two animation keyframes between which the enclosed constraint specs C will be held. D can be zero, and C can hold any set of constraints: morphs, orientation constraints, placement constraints... sync: synchronise scores Type: name, SCORE, list of (name, SCORE, synctype) → SCORE This operator is used to synchronise a list of scores. Each score has a name, referred to by the other scores to specify the way they synchronise with the others. A name can be any identifier string; a synctype is a string from the list below, followed by the appropriate boundaries or durations: • ‘start-at’, ‘end-at’: score is untouched and merged starting or ending at a given time position; • ‘start/end’, ‘start/duration’: added score is stretched or compressed to fit the specification; • ‘start/kfalign’: score geometry is abandoned and keyframes are aligned with those of the current score...
208
C HAPTER N INE
azop: create an AZee operator Type: list of (str, AZexpr), AZexpr → AZOP The result is an azop that can be applied to a context of named argument expressions, which will produce a result typed according to the last AZexpr given. This last expression generally contains references to the argument names, as would any parameterised function in a programming language. Alternatively, the ‘nodefault’ string can be given if no default expression makes sense; the argument then becomes mandatory when applying the azop. apply: apply an AZOP to a context Type: AZOP, list of (str, AZexpr) → returned by azop The first argument is the azop to be applied. An azop comes with a list of optional or mandatory named arguments, which together form a context for the azop. The return value and type are given by the azop specification. If the azop is a production rule, it will result in a SCORE. For example, the expression below describes the azop that models the rule “Place classifier in space”, which appeared in the LSF version of sentence S1 (see second picture Figure 9.3). Indentation denotes a parameter under its operator. 1. azop 2. ’classifier’ # type: AZOP (h:SIDE -> CSTR) 3. ’nodefault’ 4. ’loc’ # type: POINT 5. translate 6. site 7. ’ABST’ 8. w 9. scalevect 10. fwd 11. medium 12. sync 13. ’classmvt’ # BOX 1 14. sequence 15. key 16. 0 17. place 18. translate 19. site 20. ’PA’ 21. w 22. @loc 23. scalevect
C ONSTRAINT- BASED S IGN L ANGUAGE P ROCESSING
24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57.
209
up small transition 1 path ’straight/accel’ site ’PA’ w key 1 place site ’PA’ w @loc ’classcfg’ # BOX 2 key 1 apply @classifier ’h’ w ’start/end’ ’classmvt:0:0’ ’classmvt:-1:0’ ’eyegaze’ # BOX 3 key 1 lookat @loc ’start/end’ ’classmvt:0:-.5’ ’classmvt:-1:-1’
Lines 2 to 11 contain the two argument declarations of the azop, including their names (classifier and loc) and their default values if absent on azop application (or nodefault, e.g. line 3). Argument classifier is expected to be an azop itself, to be applied to a side (left, right, weak, strong) and returning a set of articulatory constraints (type CSTR) shaping and orienting the specified hand. Each of lines 13, 40 and 50 names a part of the full signing activity, all to be synchronised by the sync operation. The word “box” here is a reference to those in Figure 9.5a. Lines 47 and 55 are sync types, i.e. specify the way in which the containing box is to be synchronised with the previous ones. “start/end” means that the two following arguments specify the start and end time position of the score. All four box:kf:off formatted strings are relative time specifications,
210
C HAPTER N INE
creating a new keyframe for insertion if none is present at the specified time stamp. In such string, kf is the keyframe number of the identified box, from which to offset the time stamp. The same way values are indexed in Python lists, keyframe numbers are numbered 0 and up from the first to the last, and -1 and down from the last to the first. Line 49 refers to the final keyframe of the score contained in box classmvt; line 56 specifies a negative time offset of .5 from the beginning of the same box. This azop can be saved under the reference "Place classifier in space" and stored as a production rule capable of turning any (C, P) pair of type (AZOP, POINT) into a resulting score, combining all boxed features to fully animate placement of a classifier hand in space at the given target point, provided ‘C’ is an azop shaping a hand ‘h’ into the right classifier form. The expression for it is a simple application of the azop with both of its arguments set: apply @"Place classifier in space" ’classifier’ [expression for C] ’loc’ [expression for P]
The interesting and new thing about this model is that the sync operation works with any set of scores and any contained articulation specification, except for anatomically impossible constraints. Nothing has enforced us to animate the hands, and no lexical base stream was needed for description. Evaluating this expression produces an XML specification of joint and morph articulations, as presented in Figure 9.7, to be animated directly. Overall, this means we produce animations directly from semantically relevant rule entries and their contextual arguments. AZee is currently implemented in KAZOO, a web application for SL generation using a virtual signer, described in the next section.
9.5 KAZOO KAZOO is a web-based platform for French Sign Language (LSF) content synthesis and animation using a virtual signer. It integrates a web server that allows the user to input his request and display the animation, a database that stores AZee descriptions in a Python format, SL Gene, a module that computes body postures, VS Anim, a module that computes the complete animation data.
C ONSTRAINT- BASED S IGN L ANGUAGE P ROCESSING
211
The overall architecture is illustrated in Figure 9.8. We describe hereafter the features of the SL Gene and VS Anim (more details can be founded in (Braffort et al., 2013)).
Figure 9.8: Kazoo: module organisation.
9.5.1 SL G ENERATION M ODULE (SL G ENE ) This module is in charge of the automatic computation of the animation description from the linguistic specification. It is the heart of the framework, and proceeds in three steps: 1. from the input, it builds a complete and consistent AZee description combining all the linguistic constraints on all the articulators. This module interacts with the database a number of times in order to retrieve the needed descriptions, contained in the input specification; 2. it applies constraints on the skeleton for each key-posture, which together define the problems to be solved. 3. it outputs a low-level XML description of the animation (e.g. Figure 9.7). A significant module of the solving process is the Inverse Kinematics (IK) module, useful every time a body point must be placed at a target location. An animated character is modelled with a skeleton of rigid segments connected with joints, called a kinematic chain. IK formulas allow calculation
212
C HAPTER N INE
of the joint parameters that position the character’s end effectors at a given position. In our module, this process works in three steps: • first the system builds a high number of random solutions within the limits of the skeleton. The N best solutions are kept for the next step. They are selected regarding their overall score (how close they are from the posture we want them to achieve), and their distance to the others solutions (the further, the better). • the second step performs a gradient descent on each of the N solutions, normally converging to the nearest optimum. • finally, each of these solutions is given a comfort score defining which posture is the most comfortable for the skeleton. The solution with the best comfort score is considered to be the most realistic one. The comfort score is computed using a statistical comfort model which has been built thanks to a prior study based on motion capture corpora (Delorme, 2011).
9.5.2 V IRTUAL S IGNER A NIMATION M ODULE (VS A NIM ) This module is in charge of producing the graphical virtual signer animation output. This module: • take the output of the SL Gene module (the low-level XML description) as input ; • compute a complete animation, including: – the facial morphs (which are ready-made at this moment), and which can be weighted and combined (e.g. one for the eyebrows, one for the cheeks, etc.); – the coarticulation in case of a sequence of animations. • output the specification of the animation to be displayed on the web page (animation matrix). Figure 9.9 shows the current version of the KAZOO web page. It is an ongoing project, but the current version already offers automatic animation of a virtual signer.
C ONSTRAINT- BASED S IGN L ANGUAGE P ROCESSING
Figure 9.9: Kazoo demo page, version 1.0.
213
214
C HAPTER N INE
9.6 C ONCLUSION This chapter has given an overview of the main trends in linguistic and computational modelling and on our current investigations. We have explained why the use of methods designed for Indo-European languages are not satisfactory for modelling all the phenomena encountered in SL productions, especially multilinear use of manual and non-manual articulators, signing space, and iconicity at all levels of the language. Moreover, we have seen that the division into analysis levels poses a number of problems. Our proposition, based on SL corpus analysis, is applied regardless of the level, from the sub-lexical level to the discourse structure. The AZee model allows formalisation of the production rules through the expression of articulator synchronisation with time constraints, and body postures and movements with geometric constraints. A part of this model is already well assessed (2,000+ LSF dictionary signs described), and we have explored several discourse-level structures. A dozen function-to-form production rules concerning lists and time have been published (Filhol et al., 2013). Number of issues remain to be considered, among which use of the signing space, and incorporation of the fully iconic and productive units.
B IBLIOGRAPHY Boutora, L. (2008). Fondements historiques et implications théoriques d’une phonologie des langues des signes. Etude de la perception catégorielle des configurations manuelles en LSF et réflexion sur la transcription des langues des signes. Ph.D. thesis, Université Paris 8. Boyes-Braem, P. K. (1981). Features of the Handshape in American Sign Language. Department of Psychology, University of California. Braffort, A. (1996). Reconnaissance et compréhension de gestes, application à la langue des signes. Ph.D. thesis, Université Paris-Sud. Braffort, A. and Dalle, P. (2008). Sign language applications: preliminary modeling. Universal Access in the Information Society, Emerging Technologies for Deaf Accessibility in the Information Society, 6(4), 393–404. Braffort, A., Filhol, M., Delorme, M., Bolot, L., Choisier, A., and Verrecchia, C. (2013). Kazoo: A sign language generation platform based on production rules. In Third International Symposium on Sign Language Translation and Avatar Technology, Chicago, USA.
C ONSTRAINT- BASED S IGN L ANGUAGE P ROCESSING
215
Brentari, D. (1998). A prosodic model of sign language phonology. Cambridge, MA: MIT Press. Cooper, H. and Bowden, R. (2009). Sign language recognition: Working with limited corpora. In 5th International Conference on Universal Access in Human-Computer Interaction. Part III: Applications and Services (UAHCI ’09), pages 472–481, Berlin, Heidelberg. Springer-Verlag. Cuxac, C. (2000). La Langue des Signes Française (LSF) : les voies de l’iconicité, volume 15–16 of Faits de Langues. Paris-Gap, Ophrys. Cuxac, C. (2004). Phonétique de la lsf : une formalisation problématique. Silexicales, 4, 93–113. Dalle, P. (2006). High level models for sign language analysis by a vision system. In Workshop on the Representation and Processing of Sign Language:Lexicographic Matters and Didactic Scenarios (LREC), pages 17–20, Genes, Italie. Evaluations and Language resources Distribution Agency (ELDA). Delorme, M. (2011). Modélisation du squelette pour la génération réaliste de postures de la langue des signes française. Ph.D. thesis, Paris-Sud University. Elliott, R., Glauert, J. R. W., Jennings, V., and Kennaway, J. R. (2004). An overview of the sigml notation and sigmlsigning software system. In Sign Language Processing Satellite Workshop of the Fourth International Conference on Language Resources and Evaluation, LREC 2004, pages 98–104. Filhol, M. (2008). Modèle descriptif des signes pour un traitement automatique des langues des signes. Ph.D. thesis, Université Paris-Sud. Filhol, M. (2012). Combining two synchronisation methods in a linguistic model to describe sign language. In E. Efthimiou, G. Kouroupetroglou, and S.-E. Fotinea, editors, Gesture and Sign Language in HumanComputer Interaction and Embodied Communication, volume 7206 of LNCS/LNAI. Springer Berlin Heidelberg. Filhol, M., Hadjadj, M., and Testu, B. (2013). A rule triggering system for automatic text-to-sign translation. In Third International Symposium on Sign Language Translation and Avatar Technology, Chicago, USA.
216
C HAPTER N INE
Huenerfauth, M. (2006). Generating American Sign Language classifier predicates for English-to-ASL machine translation. Ph.D. thesis, University of Pennsylvania. Johnson, R. E. and Liddell, S. K. (2010). Toward a phonetic representation of signs: Sequentiality and contrast. Sign Language Studies, 11(2), 241– 274. Johnston, T. (2010). From archive to corpus: Transcription and annotation in the creation of signed language corpora. International Journal of Corpus Linguistics, 15(1), 106–131. Klima, E. S. and Bellugi, U. (1979). The Signs of Language. Cambridge, MA: Harvard University Press. Liddell, S. K. and Johnson, R. E. (1989). American sign language: The phonological base. Sign Language Studies, 64, 195–278. Lombardo, V., Nunnari, F., and Damiano, R. (2010). A virtual interpreter for the italian sign language. In 10th International Conference on Intelligent Virtual Agents (IVA’10), pages 201–207, Berlin, Heidelberg. SpringerVerlag. López-Colino, F. and Colás, J. (2012). Hybrid paradigm for spanish sign language synthesis. Universal Access in the Information Society, 11(2), 151–168. Marshall, I. and Safar, E. (2004). Sign language generation in an ALE HPSG. In 11th International Conference on Head-Driven Phrase Structure Grammar (HPSG-2004), pages 189–201, Leuven, Belgium. Miller, C. R. (2000). La phonologie dynamique du mouvement en langue des signes québécoise. Fides, Champs linguistiques. Neidle, C., Kegl, J., MacLaughlin, D., Bahan, B., and R.G., L. (2000). The Syntax of American Sign Language: Functional Categories and Hierarchical Structure. Cambridge, MA: The MIT Press. Prillwitz, S., Leven, R., Zienert, H., Hanke, T., and Henning, J. (1989). HamNoSys. Version 2.0. Hamburg Notation System for Sign Languages. An introductory guide, volume 5 of International studies on sign language and the communication of the deaf . Signum Press.
C ONSTRAINT- BASED S IGN L ANGUAGE P ROCESSING
217
San-Segundo, R., Montero, J. M., Córdoba, R., Sama, V., Fández, F., D’Haro, L. F., López-Ludeña, V., Sánchez, D., and García, A. (2012). Design, development and field evaluation of a spanish into sign language translation system. Pattern Analysis and Applications, 15(2), 203–224. Slobin, D. I. (2013). Typology and channel of communication. where do signed languages fit in. In Typology and channel of communication: In honor of Johanna Nichols, pages 47–68. John Benjamins Publishing Company. Stokoe, W. (1991). Semantic phonology. Sign Language Studies, 71, 107– –114. Stokoe, W. C. (1960). Sign language structure: An outline of the visual communication systems of the american deaf. Journal of Deaf Studies and Deaf Education, 10(1), 3–37. Theodorakis, S., Pitsikalis, V., and Maragos, P. (2010). Model-level datadriven sub-units for signs in videos of continuous sign language. In IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pages 2262–2265, Dallas, USA. Vogler, C. and Metaxas, D. (2001). A framework for recognizing the simultaneous aspects of american sign language. Computer Vision and Image Understanding, 81, 358–384. Wu, C.-H., Su, H.-Y., Chiu, Y.-H., and Lin, C.-H. (2007). Transfer-based statistical translation of taiwanese sign language using PCFG. 6(1), 1–18.
C HAPTER T EN –R EPRESENTING C ONTEXT– G EOMETRIC L OGICS
H EDDA R. S CHMIDTKE
Classical logical theories enabled to represent context-dependent information, such as the meaning of indexicals, build context-dependency upon an a-contextual logical language. Indexicals, such as I, you, here, now, are interpreted by attaching coordinates and reference frames indicating speaker, hearer, location, and time of an utterance to the classically interpreted natural language or logical expression. From a cognitive science point of view, context-dependent thinking seems to be both computationally simpler and evolutionarily earlier, and one may ask whether it could be the other way around: de-contextualised reasoning being built upon reasoning in context. This article explores this idea. Following the principle that indexicals should be interpreted similar to coordinates and reference frames, the idea is taken one step further: can we also interpret logical or natural language expressions as coordinates? Following this idea we can see the outlines of a semantics framework that is geometric by nature rather than set-theoretic and of a reasoning architecture composed of several layers of complexity, with de-contextualisation or re-contextualisation as separate steps of processing on the highest cognitive layer.
220
C HAPTER T EN
10.1 I NTRODUCTION Context is key to understanding cognition: in cognitive psychology experiments, context influence is known to crucially affect subjects’ memory (Kokinov et al., 2007); in natural languages, context is known to influence the meaning of any utterance, both as co-text, the text surrounding a certain utterance, and in the form of indexicals (I, we, here, now, yesterday) and demonstratives (Forbes, 1989). Where, when, by whom, and to whom an utterance is spoken are crucial factors for understanding the semantics and the pragmatics of a sentence. Moreover, context influences the meaning of adjectives (Gärdenfors, 2000), such as red – compare the hue of red wine with that of red hair – or tall – compare the size of a tall elephant with that of a tall human being. Novel context-awareness technologies, e.g. on mobile phones, promise a new way to process natural language in context. Location, time, and speaker, for instance, can be provided by most mobile phones. In order to process language in context, semantic language processing can thus be grounded in a sensing system. In order to make information from such a system available for logic reasoning, a contextual reasoning mechanism can be used. The context as detected from sensors can thus be used to identify the meaning of indexicals or the right way to utter a request in a polite manner, if we understand the basic structures of terms representing context information as they logically relate to each other. Following Benerecetti et al. (2000), contextual reasoning can regard three aspects: • a context can be part of a reference context: here can mean the country I am in or the top of my palm. • a context can be in a certain direction of a reference context with respect to some reference frame: tomorrow, today will be yesterday • a context can be of higher or lower levels of granularity or approximation with respect to another: I am writing on my thesis will be true for the speaker over a temporally extended interval consisting of many small temporal intervals in which the speaker is not writing on their thesis. Benerecetti et al. (2000) distinguish between three parts of a contextualised expression: the context parameters, e.g. space, time, and speaker, the values they take, e.g. “March 3rd, 2014 4pm,” and the expression to be contextu-
G EOMETRIC L OGICS
221
alised, e.g.“it is cloudy today” or “Cloudy(today).” The result is a contextualised expression, such as: “time = March 3rd, 2014 4pm: Cloudy(today).” Context in this framework, as in others (Forbes, 1989), has two distinct characteristics: first, it consists of a parameter – or as we will say in the following a dimension – and a value – a position in that dimension; and, second, it is distinct from the logical part, the actual expression. This article follows the model in the first point, but diverges from it in the second point, seeing the logic expression, including individuals and predication, as contextual entities. An analysis of animal cognition (Gärdenfors, 2005) suggests that contextdependent knowledge is evolutionarily, and from a position of cognitive complexity, an intermediate step between purely reactive systems and cognitive systems able to plan, anticipate, explain, and imagine. This article explores the idea that logical reasoning is based on context-dependent reasoning, i.e., that abstract logical reasoning extends context-dependent reasoning. The resulting framework is close to labelled deductive system Gabbay (1996). A key similarity is the notion of a context or state that is modified through operations that make up the meaning of arriving information. In particular, we study how the continuous influx of perceptual information to the reasoning system can influence the reasoning process. Moreover, the notion of particular operations – in this approach: geometric transformations – that change the perspective of a reasoning agent are important concepts similar to dynamic language processing frameworks. The chapter explores the key idea of geometric reasoning in two steps. We first explore how a fragment of classical first order logic can be interpreted in terms of dimensions and positions along dimensions. The resulting framework is for brevity called geometric semantics and contrasted to set theoretic semantics. The expressiveness of this framework is explored with respect to a simple logical language that allows only one type of entities, called contexts. De-contextualisation and change of perspective are then explored as operations outside the core framework, either invoked by perception or by a mental operation.
10.2 G EOMETRIC S EMANTICS The core idea of a context-based semantics is the idea of a multi-dimensional universe. Set theoretic semantics map predicates to subsets of the universe of discourse. Logics able to represent indexicals interpret sen-
222
C HAPTER T EN
tences, predicates and/or entities with respect to a tuple consisting, e.g., of a speaker, a time, and a space. The proposed geometric semantics carry this idea one step further and map every entity, including predicates and individuals to tuples in a universe of discourse that is a high-dimensional coordinate space. The main difference is in the handling of predication. Geometric semantics treat individuals and predicates in the same manner, whereas classical set theoretic semantics interpret variables and constants as elements and predicates as subsets of the universe of discourse. For any finite universe of discourse U, we can generate a corresponding geometric semantics model in an intuitive manner. Assuming a universe of discourse U, where |U| = n, we assume one dimension for each element in U. The geometric universe G = {0, 1}n is thus an n-dimensional coordinate space corresponding to U. The interpretation mapping I in set theoretic first order logic semantics can then be replaced with a mapping J: J : Pred ∪Vars → G
(10.1)
P(x) is true iff J(x) ≤ J(P)
(10.2)
and the ordering ≤ is an ordering of n-dimensional tuples that can be derived from one-dimensional orderings ≤i for each dimension: a ≤ b iff ai ≤i bi for all 0 ≤ i < n. Example: assume a structure U = {2, 3, 4} and I(Even) = {2, 4}, I(a) = 2, and I(b) = 3, then a corresponding geometric structure would be G = {0, 1}3 and J(Even) = (1, 0, 1), J(a) = (1, 0, 0), and J(b) = (0, 1, 0). The structure (U, I) is a classical model of Even(a) as I(a) ∈ I(Even); and (G, J) is a geometric model of Even(a) since J(a) = (1, 0, 0) ≤ J(Even) = (1, 0, 1). In the following, we call this simple mapping of sets to spaces {0, 1}n the bit-set mapping, alluding to the well-known bit-set data structure for representing sets. It clearly does not have representational advantages over the set-theoretic version. More interesting structures result when we consider spaces Rn , which can represent dimensions such as colours, or space and time, that is, conceptual spaces (Gärdenfors, 2000). Moreover, using Cn , we can represent a two-dimensional space, as we will do below, or a linear dimensions together with extension, as in the diagrammatic reasoning method of Kulpa (1997).1 1
Representing a linear dimension, like time, with extensions, we can perform reasoning in an interval algebra instead of a point algebra. A point c ∈ C can then be understood as an interval starting at position real(c) with a diameter of im(c), that is extending to real(c) + im(c), where real(a) is the real part of a and im(a) is the imaginary part of a (Kulpa, 1997, proposes a very similar technique for diagrammatic reasoning). Alternatively, we can represent center points in one dimension and size in another (cf. the granularity dimensions outlined
G EOMETRIC L OGICS
223
In general, this allows us analogous semantics for physical dimensions, such as space, time, or colour hue, which can be used, e.g., to mirror conceptual spaces. Mathematically, such dimensions can be described as fields: R with the conventional addition and multiplication operations is a field, so is C, the complex plane with the usual complex number multiplication and addition operations. The multiplication and addition operations of a field have the properties associativity, commutativity, and distributivity of multiplication over addition. Both operations have an identity element and the existence of additive and multiplicative elements is ensured: (g1 · g2 ) · g3 = g1 · (g2 · g3 )
(10.3)
g1 · g2 = g2 * g1 g·1 = g
(10.4) (10.5)
g · g−1 = 1 (g1 + g2 ) · g3 = g1 + (g2 + g3 )
(10.6) (10.7)
g1 + g2 = g2 + g1 g+0 = g g + −g = 0
(10.8) (10.9) (10.10)
g1 · (g2 + g3 ) = g1 · g2 + g1 · g3
(10.11)
An ordered field, such as R is endowed with an ordering relation. However, not every field needs to be ordered. C, for instance, is not ordered. In order for ≤ to be meaningful, we need to define a suitable ordering relation.2 Since predicates and individuals are not distinguished semantically in the framework, a simpler logical language can be chosen. We call the points in G contexts and define a fragment of a context logical language built upon context terms (Schmidtke, 2012) to be interpreted by contexts. Starting from a set of atomic context terms CV , we define the language CT of context terms. 1. All atomic context terms and the special symbols ( (called: the maximal context) and ⊥ (the impossible context) are context terms. 2. If c and d are context terms then (c + d) (summation), and (c , d) (intersection) are also context terms.
2
in Schmidtke, 2005), or assume a separate dimension for concentric circles of different sizes (Schmidtke and Beigl, 2011, 2010). It may even be useful to employ different types of ordering, however, we do not explore this variant here further, in order to keep the description of the approach as simple as possible.
224
C HAPTER T EN
The only difference to Schmidtke (2012) lies in the omission of a complement. An atomic context formula is formed from two context terms with the sub-context relation and – similar to hybrid logics – a context term that modifies the formula: 1. If c and d are context terms, then [c - d] is an atomic formula. 2. If φ is a formula and c is a context term, then c : φ also is a formula. 3. If φ is a formula, then ¬φ also is a formula. 4. If φ and ψ are formulae, then (φ ∨ ψ ) and (φ ∧ ψ ) are formulae. We interpret this language with a structure (G, J, ≤). The function J : CV → G maps atomic context terms to contexts. The relation ≤ is a partial order relation that establishes the fundamentals of logical implication by extending the fields Fi = (Gi , +, ·) that form the dimensions of G into lattice structures Li = (Gi , ≤i ), upon which the semantics of context logic can be built. With ≤i defined, operators min and max denoting the greatest lower bound and lowest upper bound with respect to ≤ can then be derived. The key idea of geometric semantics is to define: J(c , d) = min(J(c), J(d)) J(c + d) = max(J(c), J(d))
(10.12) (10.13)
We obtain min and max from the partial ordering ≤: x = min(a, b) iff x ≤ a and x ≤ b and there is no x , so that x ≤ x and x ≤ a and x ≤ b
(10.14)
x = max(a, b) iff a ≤ x and b ≤ x and there is no x , so that x ≤ x and a ≤ x and b ≤ x
(10.15)
For a concrete model G, we derive min and max from operations mini , maxi making dimension Gi into a lattice. In general, a lattice supports the closure property, associativity, commutativity, and the absorption laws. min(min(g1 , g2 ), g3 ) = min(g1 , min(g2 , g3 )
(10.16)
min(g1 , g2 ) = min(g2 , g1 ) max(g1 , min(g1 , g2 )) = g1
(10.17) (10.18)
max(max(g1 , g2 ), g3 ) = max(g1 , max(g2 , g3 ) max(g1 , g2 ) = max(g2 , g1 ) min(g1 , max(g1 , g2 )) = g1
(10.19) (10.20) (10.21)
G EOMETRIC L OGICS
225
Supporting the intuition of minimum and maximum operations, we can demand that (Li , mini , maxi ) be a bounded and distributive lattice, that is, the existence of identity elements, and distributivity. max(g, 0≤ ) = g
(10.22)
min(g, 1≤ ) = g min(g1 , max(g2 , g3 )) = max(min(g1 , g2 ), min(g1 , g3 ))
(10.23) (10.24)
max(g1 , min(g2 , g3 )) = min(max(g1 , g2 ), max(g1 , g3 ))
(10.25)
where 0≤ and 1≤ are special elements for G as a lattice. Notice that these may or may not be different from 1 and 0, the identity elements of G as a field. R as a field, for instance, has identity elements 1 and 0, while 0≤ as the minimum element of R, with the usual interpretation of min as the minimum, would be the minimal element of R. Since R does not have a minimal element, we would need to extend it to include, e.g., negative infinity. Positive infinity could then be added to also have a largest element. With a bounded lattice, we can define the semantics of ( and ⊥: J(() = 1≤ J(⊥) = 0≤
(10.26) (10.27)
Since each of the dimensions of G is a field, we can easily generate min, max by applying for each dimension the respective mini , maxi operations, with bounds 1≤ and 0≤ as the tuples derived from the respective dimension bounds 1≤i and 0≤i . We limit the discussion here and in the following to minimum and maximum operations. Notice, however, that we would not need to treat every dimension of G in the same manner: mini could for instance be taking the intersection in one dimension, the greatest common divisor, in another, and the minimum in a third dimension, and min would still be a greatest lower bound making G into a lattice. However, different types of greatest lower bounds have different properties. For the purposes of this article, we remain with the intuition of a minimum operation, featuring the above additional properties of boundedness and distributivity. This will give us the opportunity to easily define betweenness. However, minimum and maximum, in contrast to other possible lattice operators, do not support definition of a complement. In order for (G, J, ≤) to be meaningful for the semantics of formulae, we need meaningful partial orderings ≤i for each dimension i, which may
226
C HAPTER T EN
diverge from the standard orderings of the field3 and we need interpretations for the context terms ( and ⊥. With the above characterisation, we receive the following formulae as tautologies: [⊥ - (] [c , ( = c]
(10.28) (10.29)
[c + ⊥ = c] [c , c = c] [c + c = c]
(10.30) (10.31) (10.32)
[c , (c + d) = c] [c + (c , d) = c]
(10.33) (10.34)
In summary, we obtain the full semantics for context terms: J(c , d) = min(J(c), J(d))
(10.35)
J(c + d) = max(J(c), J(d)) J(() = 1≤
(10.36) (10.37)
J(⊥) = 0≤
(10.38)
We interpret context formulae with respect to (wrt) a current context m ∈ G. If no restriction is given we assume m = 1≤ : [c - d] is true wrt m iff min(J(c), m) ≤ J(d) c : φ is true wrt m iff φ is true wrt min(m, J(c))
(10.39) (10.40)
¬φ is true wrt m iff φ is not true wrt m φ ∧ ψ is true wrt m iff φ and ψ are true wrt m φ ∨ ψ is true wrt m iff φ or ψ are true wrt m
(10.41) (10.42) (10.43)
With these semantics, the following equivalences hold in the framework: x : [c - d] ≡ [x , c - d]
(10.44)
( : [c - d] ≡ [c - d]
(10.45)
10.3 E XPRESSIVENESS OF C ONTEXT L OGIC In order to show how this framework relates to other proposals, we explore its expressiveness with several examples in this section. The first step 3
Also, the complex numbers, for instance, are not a linearly ordered field, that is, we have to define a partial ordering ≤i in any case if the field is C.
G EOMETRIC L OGICS
227
is to show that the approach is capable to express binary predicates, not only monadic predicates. We then explore the boundaries of the approach discussing how change of perspective can be represented as an operation external to the logical language.
10.3.1 B INARY R ELATIONS
IN
C ONTEXT L OGIC
We saw in the above example that the approach is able to represent monadic predicate logic. However, the logic would not go beyond the expressiveness of propositional logic if it were not able to represent binary predicates. Context logic, however, can be seen as founded upon a single partial ordering relation -, interpreted by ≤. This mechanism has been shown to have surprising expressiveness (Schmidtke, 2012; Schmidtke and Beigl, 2011), being capable of modelling sets of pre-order relations, including the reflexive hulls of the directional before (B) and the topological in or part of (P), which are key to the theory of conceptual spaces. This holds if a set-theoretic semantics is assumed, but it may be unclear how the geometric semantics can also mirror such ordering relations. We therefore explore these examples in more depth. Forming the reflexive hull BR and PR , both relations can be extended into pre-orders. A pre-order is transitive and reflexive. Transitivity is key to inference with these relations: if we know BR (x, y) (x is before or at the same time as y) and BR (y, z) (y is before or at the same time as z), then we can infer BR (x, z) (x is before or at the same time as z). Likewise, if we know PR (x, y) (x is in or at the same location as y) and PR (y, z) (y is in or at the same location as z), then we can infer PR (x, z) (x is in or at the same location as z). Can this inference be reflected with the above semantics? In order to validate this claim, we need to show that context logic can express the above propositions, that any given geometric model supports the inference, and that there are non-trivial geometric semantics models for these relations. The last point will be a key point for validating that geometric semantics represent an interesting and non-trivial alternative to other semantics frameworks. Context logic can represent BR (x, y) as [x , BR - y]. This formula would be interpreted as min(J(x), J(BR )) ≤ J(y). The above transitivity inference then runs as follows: [x , BR - y] [y , BR - z] [x , BR - z]
min(J(x), J(BR )) ≤ J(y) min(J(y), J(BR )) ≤ J(z) min(J(x), J(BR )) ≤ J(z)
(10.46) (10.47) (10.48)
228
C HAPTER T EN
We can see that this is true in general, since the minimum of two entities is smaller or equal to each of them and ≤ is transitive. The question now is: under which conditions can a context term BR give rise to a meaningful relation? The above transitivity inference is trivially true in dimension i if Ji (BR ) = 0≤i . We can use this and single out a certain dimension t, interpreting Jt (BR ) = 1≤t , with Ji (BR ) = 0≤i for all other dimensions i = t. With this interpretation, before can be interpreted by the linear ordering on the temporal dimension. The relation part of PR over a raster space of n pixels could be interpreted by representing the space with n dimensions as a bit-set space. PR singles out exactly the n dimensions in question, so that Ji (PR ) = 0≤i for any dimension irrelevant for PR and J j (PR ) = 1≤ j for any relevant dimension. With this interpretation, we can locate an event e between a time t1 and a time t2 and in place p with the statement: [e , PR - p] ∧ [t1 , BR - e] ∧ [e , BR - t2 ]
10.3.2 P ERCEPTION
AND
(10.49)
R EASONING
We can model stages of perception in this framework, with layers of rules of the form: [x , R1 - y] ∧ [y , R1 - z] ∧ . . . → [x , R2 - y ]
(10.50)
where relation R1 corresponds to parts of the geometric model that have been perceived or inferred, while R2 consists of parts that can be inferred next. Intuitively, this idea can be expanded to describe the operation of a neural network logically. The neural network itself can be seen as a geometric structure, with relations R1 , R2 , etc. as activating and deactivating areas in the course of processing information. Moreover, a neural network can be seen as a reasoning mechanism, where rule application requires some sort of constraint satisfaction mechanism to trigger the spreading activation caused by a stimulus. To give a concrete example , a simple colour predicate can be understood as a convex region in colour space with the rule: [c1 , Hue - p] ∧ [p , Hue - c2 ] → [Red ,Colour - p]
(10.51)
As facts from perception arrive, further parts of the geometric neural reasoner are activated. Detecting hue between values c1 and c2 , the rule would activate a classification “red” as a consequence. As part of this constraint
G EOMETRIC L OGICS
229
satisfaction process, the reasoner could also trigger actions as mechanisms to fulfil a certain constraint. Activation/deactivation of certain parts can be represented in the context logical language by setting the context. Assuming that Hue and Colour, belonging to different layers, do not activate the same dimensions, we could express a similar statement as above with: Hue +Colour : [c1 - p] ∧ [p - c2 ] → [Red - p]
(10.52)
More precisely, example (10.51) would be identical to: (Hue : [c1 - p] ∧ [p - c2 ]) → Colour : [Red - p]
(10.53)
highlighting the context switching between layers. This mechanism can also be used to reflect different notions of colours in different contexts. Representing hue in a bit-vector manner with a number of dimensions, each representing different perceivable colours, the context Wine would suppress certain types of colours, which are impossible for wine, while the context Hair suppresses other types of colours, which are impossible for hair.
10.3.3 C HANGING P ERSPECTIVES The interpretation of contexts as points in a potentially high-dimensional space G allows us to represent changes in the world as perceived through the senses of an agent as coordinate transforms over G (or parts of G). Likewise, we can represent learning as transforms over G. Moreover, we need to assume that these operations can be invoked by the cognitive system as mental actions part of thinking, leading to the simulation capability that evolutionarily allowed the detachment of thought (Gärdenfors, 2005). It should be emphasised that these operations are not part of the logical language, but might be understood as controlled by the reasoner aiming to fulfil constraints. Taking our original starting point to allow tuples of G as the only entities represented by the system, we can ask the question which type of coordinate transforms the system can initiate. Mathematically, a coordinate transform can be represented by a matrix that converts coordinates of the source coordinate system into coordinates of the goal coordinate system. A rotation by an angle φ around the z-axis in 3D space, for instance, is performed by multiplying the rotation matrix with
230
the origin coordinates: ⎡ cos φ − sin φ ⎣ sin φ cos φ 0 0
C HAPTER T EN
⎤ ⎤⎡ ⎤ ⎡ x cos φ − y sin φ 0 x 0⎦ ⎣y⎦ = ⎣x sin φ + y cos φ ⎦ z 1 z
(10.54)
If we assume that the cognitive system employs the same structure, that is tuples of G to internally represent coordinate transforms, it is clear that one context alone – as a tuple of G – cannot represent an arbitrary matrix. Under this assumption, the geometric framework cannot handle combining different dimensions of G, which we would need for setting x cos φ − y sin φ as an x-coordinate for the resulting context. Only those transformations can be handled separately for each dimension that can be expressed with a diagonal matrix or that correspond to vector addition. We can, however, represent rotation around the z-axis if x and y coordinates are represented not as separate dimensions but as one dimension in G. This can be achieved by representing both together as one complex valued position c ∈ C in the complex plane. Rotation in the complex plane corresponds simply to multiplication. In order to represent three dimensions, however, we would need one more dimension, around which we then cannot rotate. This creates an asymmetry between the dimensions, but in fact, human spatial cognition and perception are well-known to be 2.5-dimensional rather than 3-dimensional. Rotation around the z-axis is represented by the following matrix, where the first coordinate – here in polar coordinates (A, α ) – is the complex coordinate representing x and y coordinates together, and z is a real valued coordinate: (A, α + φ ) (1, φ ) 0 (A, α ) = (10.55) 0 1 z z We can scale a context c by a factor s also simply by multiplying c with a context. In contrast to rotation, now φ = 0 and the real valued s is the scaling factor. (sA, α ) (s, 0) 0 (A, α ) = (10.56) z sz 0 s Translation can be represented easily as addition of two vectors. Addition in the complex plane corresponds to separate addition of real and imaginary components. Representing complex values now in terms of their real and imaginary parts as x + iy we can represent a translation by some
G EOMETRIC L OGICS
231
value tx along the x-axis (represented by the real part), ty along the y-axis (represented by the imaginary part), and tz along the z-axis as: tx + x + i(ty + y) x + iy tx + ity (10.57) + = tz tz + z z Looking at the set of transformations we can perform under these conditions, we obtain three transformations crucial to mental image manipulation (Kosslyn, 1980): translation (panning), scaling (zooming), and rotation. Performing such transformations allows a reasoner to change perspectives, and ultimately to imagine changes in the world. This ability together with the ability to infer truth of goal propositions in such imagined worlds considerably increases the cognitive ability of the perceiving and reasoning agent. The goal proposition considered as a constraint may be true under certain transformations, finding these is the key step to finding the actions that help achieve the goal.
10.4 C ONCLUSIONS This article presented a context logical language together with a geometric semantics suitably powerful to represent simple natural language expressions in a context. In contrast to the conventional approach of enriching a logic reasoning system with context, the idea for the logical framework was to move from a representation of indexicals and perceptual input towards a full contextual logic reasoning system. The resulting approach has a crucial property that makes it attractive for further studies: reasoning within context is represented easily while de-contextualising or imagining situations and worlds distinct from the current situation requires transformations of the model, and thus considerably higher effort. This is in line with a point of view assumed in cognitive science that context-dependent cognition seems to be more fundamental: the ability to imagine or anticipate different contexts has been argued to be both more complex and evolutionarily later than cognition in context. The proposed approach sheds light on the interface between analog and logical reasoning from a new angle, opening a range of new questions. Further work is needed, for instance, to understand whether operations for decontextualisation and change of perspective should interact with the logic or be kept apart, as this article did. Do we need to increase expressiveness so as to allow an agent to reflect about and control its imagination process or can the logic handle this? How could a logic reasoning system and proof theory
232
C HAPTER T EN
for an integrated full geometric logic be defined? The proposed approach is expected to reflect more properly the ease with which human beings reason within and about context, thus aiming to contribute towards a better understanding of the role of context in human cognition.
B IBLIOGRAPHY Benerecetti, M., Bouquet, P., and Ghidini, C. (2000). Contextual reasoning distilled. Journal of Experimental and Theoretical Artificial Intelligence, 12(3), 279–305. Forbes, G. (1989). Indexicals. In D. Gabbay and F. Guenther, editors, Handbook of Philosophical Logic, volume IV, pages 463–490. D. Reidel. Gabbay, D. (1996). Labelled deductive systems, volume 1. Clarendon Press Oxford. Gärdenfors, P. (2000). Conceptual Spaces: The Geometry of Thought. MIT Press, Cambridge, MA. Gärdenfors, P. (2005). The detachment of thought. In C. Erneling and D. Johnson, editors, The mind as a scientific subject: between brain and culture, pages 323–341. Oxford University Press. Kokinov, B., Petkov, G., and Petrova, N. (2007). Context-sensitivity of human memory: Episode connectivity and its influence on memory reconstruction. In B. N. Kokinov, D. C. Richardson, T. Roth-Berghofer, and L. Vieu, editors, International Conference on Modeling and Using Context, pages 317–329. Kosslyn, S. (1980). Image and Mind. The MIT Press, Cambridge, MA. Kulpa, Z. (1997). Diagrammatic representation of interval space in proving theorems about interval relations. Reliable Computing, 3(3), 209–217. Schmidtke, H. R. (2005). Granularity as a parameter of context. In A. K. Dey, B. N. Kokinov, D. B. Leake, and R. M. Turner, editors, International Conference on Modeling and Using Context, volume 3554 of LNCS, pages 450–463. Springer. Schmidtke, H. R. (2012). Contextual reasoning in context-aware systems. In Workshop Proceedings of the 8th International Conference on Intelligent Environments, page 82. IOS Press.
G EOMETRIC L OGICS
233
Schmidtke, H. R. and Beigl, M. (2010). Positions, regions, and clusters: Strata of granularity in location modelling. In R. Dillmann, J. Beyerer, U. D. Hanebeck, and T. Schultz, editors, KI 2010, volume 6359 of LNAI, pages 272–279. Springer. Schmidtke, H. R. and Beigl, M. (2011). Distributed spatial reasoning for wireless sensor networks. In Modeling and Using Context, pages 264– 277. Springer.
Part III Applications
C HAPTER E LEVEN C ONSTRAINT- BASED W ORD S EGMENTATION FOR C HINESE H ENNING C HRISTIANSEN , B O L I
11.1 I NTRODUCTION Written Chinese text has no separators between words in the same way as European languages use space characters, and this creates the Chinese Word Segmentation Problem, CWSP: given a text in Chinese, divide it in a correct way into segments corresponding to words. Good solutions are in demand for virtually any nontrivial computational processing of Chinese text, ranging from spellchecking over internet search to deep analysis. Isolating the single words is usually the first phase in the analysis of a text, but as for many other language analysis tasks, to do that perfectly, an insight in the syntactic and pragmatic content of the text is essentially required. While this parallelism is easy for competent human language user, computer-based methods tend to be separated into phases with little or no interaction. Accepting this as a fact, means that CWSP introduces a playground for a plethora of different ad-hoc and statistically based methods. In this paper, we show experiments of implementing different approaches to CWSP in the framework of CHR Grammars (Christiansen, 2005), that provide a constraint solving approach to language analysis. CHR Grammars are based upon Constraint Handling Rules, CHR (Frühwirth, 1998, 2009), which is a declarative, high-level programming language for specification and implementation of constraint solvers. These grammars feature highly
238
C HAPTER E LEVEN
flexible sorts of context-dependent rules that may integrate with semantic and pragmatic analyses. The associated parsing method works bottom-up and is robust of errors and incomplete grammar specifications, as it delivers the chunks and (sub-) phrases that have been recognised also when the entire string could not be processed. The main contribution of this paper is to demonstrate how different approaches to CWSP can be expressed in CHR Grammars in a highly concise way, and how different principles can complement each other in this paradigm. CHR Grammars may not be an ideal platform for high throughput systems, but can serve as a powerful and flexible system for experimental prototyping of solutions to CWSP. Section 11.2 gives a brief introduction to the intricacies of CWSP and to CHR Grammars including a background of related work. Next, we begin the applications of CHR Grammars showing the representation of a lexicon in section 11.3, and section 11.4 demonstrates a rudimentary, lexicon-based CWSP method based on a maximum match principle. A splitting of a character sequence into into smaller portions, called maximum ambiguous segments, to be analysed separately in shown in section 11.5. In section 11.6, we discuss further ideas for approaching CWSP that seem to fit into CHR Grammars, and section 11.7 gives a short summary and a conclusion. This paper is a revised version of (Christiansen and Li, 2011).
11.2 BACKGROUND AND R ELATED W ORK 11.2.1 T HE C HINESE W ORD S EGMENTATION P ROBLEM Chinese text is written without explicit separation between the different words, although periods are unambiguously delineated using the special character “◦” which serves no other purpose. The Chinese Word Segmentation Problem, CWSP, is the problem of finding a correct or at least a good splitting of the text into units that, in some reasonable way, can be interpreted as words. However, the Chinese language does not possess the same clear distinction between syntax and morphology as European languages normally are assumed to have, and what is considered a semantic unit, a word or a standard phrase is not always obvious.1 A notion of “natural chunk” has been suggested by Huang et al. (2013) as a replacement 1
It may be claimed that the main reason why CWSP is an apparent problem for natural language processing software may be that the current foundations for such software reflect traditional views of European languages, rooted in studies of Latin.
C ONSTRAINT- BASED W ORD S EGMENTATION FOR C HINESE
239
for “word” together with machine learning techniques for identifying such chunks. As for other languages, analysis is also made difficult by the fact that certain parts may be left out of a sentence; as opposed to most European languages, even the verb may be left out when obvious from the context, so for example, verbs corresponding to “have” or “be” are seldom needed in a Chinese text. Also, Chinese has almost no inflectional markers. These facts make it even more difficult to use syntactic constraints or cues to guide or validate a given segmentation. The recent textbook by Wong et al. (2010) contains a good introduction to these difficulties also for non-Chinese speakers; see also the analysis given by Li (2011). Fully satisfactory solutions to CWSP have not been seen yet. The state of the art among “fairly good” systems use lexicon-based methods, complemented by different heuristics and statistically based methods as well as specialised tools such as named entity recognisers; see, e.g., (Wong et al., 2010) and (Li, 2011) for a more detailed overview. A good source of primary literature is the web repository containing all proceedings from the CIPS-SIGHAN Joint Conferences on Chinese Language Processing and previous workshops (CIPS-SIGHAN repository, 2000–). Controlled competitions between different Chinese word segmentation systems have been arranged together with the CIPS-SIGHAN conferences. Reports from the 2010 and 2012 competitions (Zhao and Liu, 2010; Duan et al., 2012) indicate precision and recall figures up to around 0.95 for tests on selected corpora, but it is unlikely that these results will hold on arbitrary unseen (types of) text. Some general systems for Internet search such as Google2 and Baidu3 use their own word segmentation algorithms which are not publicly available; Li (2011) provides some tests and discussion of these approaches. A few more details of related work are discussed in section 11.6, below.
11.2.2 CHR G RAMMARS CHR Grammars (Christiansen, 2005) add a grammar notation layer on top of Constraint Handling Rules (Frühwirth, 1998, 2009), CHR, analogous to the way Definite Clause Grammars (Pereira and Warren, 1980) are added on top of Prolog. CHR itself was introduced in the early 1990es as a rulebased, logical programming language for writing constraint solvers for tra2 3
http://www.google.com http://www.baidu.com; the biggest Chinese web search engine.
240
C HAPTER E LEVEN
ditional constraint domains such as integer or real numbers in a declarative way, but has turned out to be a quite general and versatile forward-chaining reasoner suited for a variety of applications; see (Frühwirth, 1998; Christiansen, 2009, 2014b). The CHR Grammar system and a comprehensive Users’ Guide are available on the internet (Christiansen, 2002). We assume the terminology and basic concepts of CHR and Prolog to be known, but the following introduction to CHR Grammars may also provide sufficient insight to readers without this detailed background. Grammar symbols (terminals and non-terminals) are represented as constraints, decorated with integer numbers that refer to positions in the text, although these are normally kept invisible for the grammar writer. Rules work bottom up: when certain patterns of grammar symbols are observed in the store, a given rule may apply and add new grammar symbols in the same way as a constraint solver may combine and simplify a group of constraints into other constraints. Consider the following example of a grammar given as its full source text. !
Notice that the information on the left and right-hand sides of the rules are opposite to the usual standard for grammar rules. This is chosen to indicate the bottom-up nature of CHR Grammars and to resemble the syntax of CHR. Symbols in square brackets are terminal symbols, and non-terminals are declared as shown in the first source line above and can be used in the grammar rules as shown. Given the query
constraints corresponding to the three terminal symbols are created and entered into the constraint store; the three “lexical” rules will apply and add new grammar symbols representing the recognition of two nouns and a verb in a suitable order, such that the last rule can apply and report the recognition of a sentence. The answer is given as the final constraint store which includes the following constraint; notice that the positions (or boundaries) in the string are shown here.
The rules shown above are propagation rules, that work by adding new instances of grammar symbols to those already existing in the constraint store.
C ONSTRAINT- BASED W ORD S EGMENTATION FOR C HINESE
241
When the arrow in a rule is replaced by , the rule becomes a simplication rule which will remove the symbols matched on the left-hand side; there is also a form of rules, called simpagations, that allow to remove only some of the matched symbols. When simplication and simpagation rules are used, the actual result of a parsing process may depend on the procedural semantics of the underlying CHR system (i.e., its principles for which rules are applied when), and a knowledge about this is recommended for the grammar writer who wants to exploit the full power of CHR Grammars. A strict use of propagation rules implies a natural handling of ambiguity as all possible analyses are generated in the same constraint store, while simplication rules may be applied for pruning or sorting out among different (partial) solutions. In some cases, it may be relevant to order the application of rules into phases such that rstly all rules of one kind apply as much as possible, and then a next sort of rules is allowed to apply. This can be done by embedding non-grammatical constraints in the left-hand side of a grammar rule, declared as ordinary CHR constraints. We can illustrate this principle by a modication of the sample grammar above. ... :- chr_constraint phase2/0. {phase2}, noun, verb, noun ::> sentence.
Notice the special syntax with curly brackets, which is inspired by Denite Clause Grammars (Pereira and Warren, 1980). This means that, in this rule, the constraint phase2 does not depend on positions in the text, but must be present for the rule to apply. The query for analysis should then be changed as follows. ?- parse([dogs,hate,cats]), phase2.
This means that rst, the lexical rules will apply as long as possible (as they are not conditioned by the constraint phase2), and when they are nished, the sentence rule is allow to be tried. In this particular example, this technique does not change the result, but we give examples below where it is essential. CHR Grammar rules allow an extensive collection of patterns on the lefthand side for how grammar symbols can be matched in the store: contextsensitive matching, parallel matching, gaps, etc.; these facilities will be explained below when they are used in our examples. As in a Denite Clause Grammar Pereira and Warren (1980), grammar symbols may be extended
242
C HAPTER E LEVEN
with additional arguments that may store arbitrary information of syntactic and semantic kinds. CHR, often in the shape of CHR Grammars, have been used for a variety of language processing tasks until now, but to our knowledge, not to Chinese until the work reported here. Hecksher et al. (2002) used a predecessor of CHR Grammars for analysing hieroglyph inscriptions, Christiansen et al. (2007a,b) used it for interpreting use case text and converting it into UML diagrams; van de Camp and Christiansen (2012) and Christiansen (2014a) have used CHR for resolving relative and other time expressions in text into absolute calendric references; Christiansen and Dahl (2003) have made grammatical error detection with error correction; Bavarian and Dahl (2006) have analysed biological sequence data.
11.3 A L EXICON IN A CHR G RAMMAR We begin the applications of CHR Grammars introducing a lexicon. As in most other grammar formalisms, a lexicon for testing can be represented by a collection of small rules, one for each lexeme. The following sample lexicon are used in the examples to follow. 中3,.!中 "+0."*&!!)" 中3,.!中 %&+"/"!'" 0&2"%&+ 人3,.!人 %&+"/"-",-)" 人3,.!人 -",-)"%1*+ 人民3,.!人民 -",-)" 国3,.!国 ,1+0.5 国中3,.!国中 %&$% / %,,) 共和3,.!共和 ."-1)& 共和国3,.!共和国 ."-1)& ,1+0.5 中人民共和国 3,.!中人民共和国 ",-)"/ ."-1)& ,# %&+ 中央3,.!中央 "+0.) 政府3,.!政府 $,2".+*"+0 民政3,.!民政 &2&) !*&+&/0.0&,+ 中央人民政府 3,.!中央人民政府 ",-)"/ "+0.)$,2".+*"+0
Notice that the grammar contains two rather large words that look like compounds, but which will be included in any dictionary as words as they are known and fixed terms with fixed meanings. The word grammar symbol may be extended with syntactic tags, but for now we will do with the simplest form as shown.
C ONSTRAINT- BASED W ORD S EGMENTATION FOR C HINESE
243
11.4 M AXIMUM M ATCHING A first naive idea for approaching CWSP may be to generate all possible words from the input, followed by an assembly of all possible segmentations that happen to include the entire input, and then a final phase selecting a best segmentation according to some criteria. Obviously, this is of exponential or worse computational complexity, so more efficient heuristics have been developed. One such heuristics is the maximum matching method, which has been used in both forward and backward versions; here we show the forward method; see Wong et al. (2010) for background and references. The sentence is scanned from left to right, always picking the longest possible word; then the process continues this way until the entire string has been processed. Three CHR Grammar rules are sufficient to implement this principle. The first one, which needs some explanation, will remove occurrences of words that are proper prefixes of other word occurrences.
The “$$” operator is CHR Grammar’s notation for parallel match: the rule applies whenever both of the indicated patterns match grammatical constraints in the store for the same range of positions (i.e., substring). The symbol “...” refers to a gap that may match any number of positions in the string, from zero and upwards, independently of whatever grammar symbols might be associated with those positions.4 In other words, the pattern “word(_), ...” matches any substring that starts with a word. So when this is matched in parallel with a single word, it applies in exactly those cases where two words occur, one being a (not necessarily proper) prefix of the other. The exclamation mark in front of the first word indicates that the grammar matched by this one is not removed from the store as is the standard for simplification rules. This is an example of a so-called simpagation rule having the arrow (which otherwise signifies simplification), in which all grammar symbols and constraints appearing on the left-hand side marked with “!” are kept in the store and all others removed. The true on the right-hand side stands for nothing, meaning that no new constraints or grammar symbols are added. Thus, when a string is entered, 4
Gaps are not implemented by matching, but affect how its neighbouring grammar symbols are matched, putting restrictions on their word boundaries. In the example shown, it must hold that r1 ≥ r2 for the rule to apply where r1 and r2 designate the right boundary of the first, resp., the second word in the rule head.
244
C HAPTER E LEVEN
this rule will apply as many times as possible, each time a lexicon rule adds a new word, and thus keeping only longest words. In a second phase, we compose a segmentation from left to right, starting from the word starting after position 0. The first rule applies an optional notation in “:(0,_)”, which makes the word boundaries explicit, here used to indicate that this rule only applies for a leftmost word. The compose constraint is used as described above to control that these rules cannot be applied before all short words have been removed by the rule above.
Assuming the lexicon given above, we can query this program as follows, shown also with the answer found (with constraints removed that are not important for our discussion). -./"中人民共和国中央人民政府 ,*-,/"
/"$*"+00&,+ 中人民共和国中央人民政府
Here the method actually produces the right segmentation, meaning “The Central People’s Government of the People’s Republic of China”; the “of” being implicit in the Chinese text. Notice that there is actually a word spanning over the split, namely the word for high-school that happens to be in the lexicon, cf. section 11.3. This example showed also the advantage of combining the maximum match principle with having common terms or idioms represented as entries in the lexicon. We can show another example that demonstrates how maximum matching can go wrong. We extend the lexicon with the following rules. 明䉯3,.!明䉯 䉯3,.!䉯 在3,.!在 考3,.!考 将来3,.!将来 将来的3,.!将来的 李3,.!李 李子3,.!李子 明3,.!明 将3,.!将 在3,.!在 来3,.!来 事3,.!事
!"#&+&01!" )".)5 ."))5 01))5 %,+"/0 01))5 ,+/&!". #101." #101." .")0"! &#*&)5+*" -)1* .&$%0&+$$&2"++*" 3&)) 0 ,*" 0%&+$
The sample sentence we want to check is “ 李明䉯在考将来的事 ”, which can be translated into English as “Li Ming is really considering the future things” corresponding to the correct segmentation as follows.
C ONSTRAINT- BASED W ORD S EGMENTATION FOR C HINESE
245
李明䉯在考将来的事
Querying the maximum matching program as shown above for this sentence gives the segmentation 李明䉯在考将来的事
that does not give sense to a Chinese reader. The problem is that the first two characters, which together represent a person’s name that is not included in the lexicon. Thus, the first character is taken as a word and thus the second and third second character are taken as the next word, and so on. In the middle of the sentence, the program accidentally gets on the right track again and gets the remaining words right. Due to the high frequency of two-character words in Chinese, it is easy to produce quite long sentences where one wrong step in the beginning makes everything go wrong for the maximum matching method. If instead, in the example above, the two characters for the personal name Li Ming are treated as one unit, everything would go right. This could suggest that a specialised algorithm for identifying personal names will be useful as an auxiliary for CWSP, as it has been suggested among others by Chen et al. (2010). We can simulate such a facility by adding a rule for this specific name as follows. 李明3,.!李明 &&+$-"./,++*"
Finally, we mention that combinations of forward and backward maximum segmentation have been used, and in those regions where the two disagree, more advanced methods are applied; see, e.g., Zhai et al. (2009).
11.5 M AXIMUM A MBIGUOUS S EGMENTS Another principle that may be used in algorithms for CWSP is to run a first phase, identifying the maximum ambiguous segments of a text. We have distilled the principle from a variety of methods that apply similar principles; we have not been able to trace it back to a single source, but (Wong et al., 2010) may be consulted for a detailed review. An ambiguous segment is defined as a contiguous segment s in which • any two contiguous characters are part of a word, • there are at least two words that overlap, and • the first and last character are each part of a word entirely within s.
.
246
C HAPTER E LEVEN
For example, if abcd and def are words, then the substring abcdef will form an ambiguous segments, but not necessarily cdef or abcdefg. An ambiguous segment is maximal, a MAS, whenever it cannot be expanded in any direction to form a larger ambiguous segment. For example, if abc, cde, def, defg are words, then the substring abcdefg may form a maximal ambiguous segment. Thus, if no unknown words occur in a text, the splits between the MASs will be definitive. Except in construed cases, the length of the MASs are reasonable, which means that we can apply more expensive methods subsequently within each MAS, perhaps even with exponential methods that enumerate and evaluate all possible segmentations. Identifying these MASs can be done by very few CHR rules. For simplicity, we introduce a grammar symbol maxap which covers MASs as well as single words that can only be recognised as such. The following two CHR Grammar rules and an additional Prolog predicate are sufficient to identify the maxaps.
The second rule uses the auxiliary Prolog predicate overlap as a guard. The presence of a guard, between the arrow and the vertical bar, means that the rule can only apply in those cases where the guard is true. When this rule applies, the variables R1 and R2 will be instantiated to pairs of indices indicating beginning and end of the two given maxaps. The overlap predicate tests, as its name indicates, whether the two segments in the string occupied by the two input maxaps do overlap. This grammar rule will gradually put together ambiguous segments and, via repeated applications, merge together so only maximum ones remain. We can test this program for the previous example, “Li Ming is really ...” as follows. -./"李明䉯在考将来的事
*4- *4- *4- *4-
This corresponds to splitting the sequence into the substrings 李明䉯在考将来的事 ,
which then can be analysed separately.
C ONSTRAINT- BASED W ORD S EGMENTATION FOR C HINESE
247
11.6 D ISCUSSION Our main sources on CWSP research (CIPS-SIGHAN repository, 2000-) report also statistically based methods of different sorts, possibly combining with part-of-speech tagging. Part-of-speech tagging can be implemented in a CHR Grammar, but for realistic applications with a huge lexicon, the right solution may be to preprocess the text by a part-of-speech tagger, and then let a CHR Grammar use the tags. Named entity recognisers can be integrated in a similar way. CHR Grammars do not themselves support machine learning, but it is straightforward to integrate probabilities or other weighting schemes (found by other means) into a CHR Grammar: each constituent has an associated weight, and when a rule applies, it calculates a new weight for the compound. Additional rules can be added that prune partial segmentations of low weight. A recent approach to CWSP (Zhang et al., 2013) maps first a text into a binary three that represents alternative segmentations based on a lexicon, and then this tree is pruned based on statistically learned weights. Comprehensive statistics concerning ambiguity phenomena in Chinese text is reported by Qiao et al. (2008), which appears to be very useful for further research into CWSP. More refined analyses involving particular knowledge about the Chinese language may be incorporated in a CHR Grammar approach to CSWP. For example, the sign “ 的 ” (pronounced “de”) normally serves as a marker that converts a preceding noun into an adjective; in fact, most adjectives are constructed in this way from nouns which often have no direct equivalent in European languages, e.g., adjective “red” is constructed from a noun for “red things”. Thus, what comes before “ 的 ” should preferably be a noun.5 There are of course lots of such small pieces of knowledge that can be employed and should be employed, and we may hope that the modular rule-based nature of CHR can make it possible to add such principles in an incremental way, one by one. The out-of-vocabulary (OOV) problem, that we did not approach here, may be the obstacle that makes any method, that works well for an isolated and controlled corpus, useless in practice. OOV words are often proper names and it is obvious that a module for recognising proper names should be included. We have already referred to Chen et al. (2010) who suggest an approach to recognise person names, and Wong et al. (2010) list several 5
There are few additional usages of “ 的 ” (where it is pronounced “di”), but these are in special words that are expected to be included in any Chinese dictionary.
248
C HAPTER E LEVEN
characteristics that may be applied in identifying also place names, transcription of foreign names, etc. We may also refer to an interesting approach to OOV in CWSP that incorporate web searches (Qiao and Sun, 2009). Li (2011) suggests a method that involves web searches to evaluate alternative suggestions for segmentations which also may improve performance in case of OOV. A recent proposal by Tian et al. (2013) applies machine learning techniques to produce a sort of abstract grammar for Chinese words, which thus also handle OOVs. To reduce the complexity induced by the large character sets, characters are mapped into classes based on semantic features, and then the “word grammar” is expressed in terms of identifiers for those classes.
11.7 C ONCLUSION It has been demonstrated how different approaches to the Chinese Word Segmentation Problem can be realised in a concise way in the framework of CHR Grammars, that may serve as a flexible platform for experimenting with and testing new approaches to the problem. There is a high demand for efficient and precise solutions due to the vast presence of the Chinese language on the Internet, as well as for Chinese language processing in general. It is also an interesting test case for the versatility of CHR Grammars. The straightforward lexicon-as-grammar-rules approach that we have applied here, which is perfect for small prototypes, does not scale well to full dictionaries. However, it is easy to get around this problem using an external dictionary and other resources such as a named entity recogniser as preprocessors. So in addition to entering the texts as a character sequence as shown in our examples, it may be accompanied with constraints that represent all possible word occurrences in the text. With this extension in mind, CHR Grammar based approaches to CWSP may scale reasonably to larger texts due to the unambiguous indication of periods which can be analysed one by one. CHR Grammars’ flexibility may be utilised to incorporate handling of lots of special cases based on a linguistic insight. An important next step is to incorporate methods for handling OOV words.
C ONSTRAINT- BASED W ORD S EGMENTATION FOR C HINESE
249
B IBLIOGRAPHY ACL Anthology: A Digital Archive of Research Papers in Computational Linguistics (2000–2013). Special Interest Group on Chinese Language Processing (SIGHAN). Webarchive with all articles of CIPSSIGHAN Joint Conference on Chinese Language Processing 2010, Proceedings of the nth SIGHAN Workshop on Chinese Language Processing, n = 1, . . . , 7, 2002–2013, Second Chinese Language Processing Workshop, 2000. http://www.aclweb.org/anthology/sighan.html. Link checked March 2013. Bavarian, M. and Dahl, V. (2006). Constraint based methods for biological sequence analysis. Journal of Universal Computing Science, 12(11), 1500–1520. Chen, Y., Jin, P., Li, W., and Huang, C.-R. (2010). The Chinese persons name disambiguation evaluation: Exploration of personal name disambiguation in Chinese news. In CIPS-SIGHAN Joint Conference on Chinese Language Processing 2010. Online proceedings, http://aclweb.org/anthology/W/W10/W10-4152.pdf. Christiansen, H. (2002). CHR Grammar web site; released 2002. http://www.ruc.dk/~henning/chrg. Christiansen, H. (2005). CHR Grammars. Int’l Journal on Theory and Practice of Logic Programming, 5(4-5), 467–501. Christiansen, H. (2009). Executable specifications for hypothesis-based reasoning with Prolog and Constraint Handling Rules. J. Applied Logic, 7(3), 341–362. Christiansen, H. (2014a). Constraint logic programming for resolution of relative time expressions. In A. Beckmann, E. Csuhaj-Varjú, and K. Meer, editors, Computability in Europe 2014, Lecture Notes in Computer Science. Springer. To appear. Christiansen, H. (2014b). Constraint programming for context comprehension. In P. Brézillon and A. Gonzalez, editors, Context in Computing. To appear. Christiansen, H. and Dahl, V. (2003). Logic grammars for diagnosis and repair. International Journal on Artificial Intelligence Tools, 12(3), 227– 248.
250
C HAPTER E LEVEN
Christiansen, H. and Li, B. (2011). Approaching the Chinese word segmentation problem with CHR grammars. In CSLP 2011: Proc. 4th Intl. Workshop on Constraints and Language Processing, volume 134 of Roskilde University Computer Science Research Report, pages 21–31. Christiansen, H., Have, C. T., and Tveitane, K. (2007a). From use cases to UML class diagrams using logic grammars and constraints. In RANLP ’07: Proc. Intl. Conf. Recent Adv. Nat. Lang. Processing, pages 128–132. Christiansen, H., Have, C. T., and Tveitane, K. (2007b). Reasoning about use cases using logic grammars and constraints. In CSLP ’07: Proc. 4th Intl. Workshop on Constraints and Language Processing, volume 113 of Roskilde University Computer Science Research Report, pages 40–52. Duan, H., Sui, Z., Tian, Y., and Li, W. (2012). The CIPS-SIGHAN CLP 2012 Chinese word segmentation onMicroBlog corpora bakeoff. In Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing, pages 35–40, Tianjin, China. Association for Computational Linguistics. Frühwirth, T. (2009). Constraint Handling Rules. Cambridge University Press. Frühwirth, T. W. (1998). Theory and practice of Constraint Handling Rules. Journal of Logic Programming, 37(1-3), 95–138. Hecksher, T., Nielsen, S. T., and Pigeon, A. (2002). A CHRG model of the ancient Egyptian grammar. Unpublished student project report, Roskilde University, Denmark. Huang, Z., Xun, E., Rao, G., and Yu, D. (2013). Chinese natural chunk research based on natural annotations in massive scale corpora - exploring work on natural chunk recognition using explicit boundary indicators. In Sun et al. (2013), pages 13–24. Li, B. (2011). Research on Chinese Word Segmentation and proposals for improvement. Master’s thesis, Roskilde University, Computer Science Studies, Roskilde, Denmark. Available at http://rudar.ruc.dk/handle/1800/6726. Pereira, F. C. N. and Warren, D. H. D. (1980). Definite clause grammars for language analysis - a survey of the formalism and a comparison with augmented transition networks. Artificial Intelligence, 13(3), 231–278.
C ONSTRAINT- BASED W ORD S EGMENTATION FOR C HINESE
251
Qiao, W. and Sun, M. (2009). Incorporate web search technology to solve out-of-vocabulary words in Chinese word segmentation. In Proceedings of 11th Pacific Asia Conference on Language, Information and Computation (PACLIC’2009), pages 454–463. Qiao, W., Sun, M., and Menzel, W. (2008). Statistical properties of overlapping ambiguities in Chinese word segmentation and a strategy for their disambiguation. In P. Sojka, A. Horák, I. Kopecek, and K. Pala, editors, TSD, volume 5246 of Lecture Notes in Computer Science, pages 177– 186. Springer. Sun, M., Zhang, M., Lin, D., and Wang, H., editors (2013). Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data - 12th China National Conference, CCL 2013 and First International Symposium, NLP-NABD 2013, Suzhou, China, October 10-12, 2013. Proceedings, volume 8202 of Lecture Notes in Computer Science. Springer. Tian, L., Qiu, X., and Huang, X. (2013). Chinese word segmentation with character abstraction. In Sun et al. (2013), pages 36–43. van de Camp, M. and Christiansen, H. (2012). Resolving relative time expressions in Dutch text with Constraint Handling Rules. In D. Duchier and Y. Parmentier, editors, CSLP, volume 8114 of Lecture Notes in Computer Science, pages 166–177. Springer. Wong, K.-F., Li, W., Xu, R., and Zhang, Z.-S. (2010). Introduction to Chinese Natural Language Processing. Morgan and Claypool publishers. Zhai, F.-W., He, F.-W., and Zuo, W.-L. (2009). Chinese word segmentation based on dictionary and statistics. Journal of Chinese Computer Systems, 9(1), (No page numbers given). Zhang, K., Wang, C., and Sun, M. (2013). Binary tree based Chinese word segmentation. CoRR, abs/1305.3981. Zhao, H. and Liu, Q. (2010). The CIPS-SIGHAN CLP 2010 Chinese Word Segmentation Bakeoff. In Proceedings of the Joint Conference on Chinese Language Processing, pages 199–209. Association for Computational Linguistics.
C HAPTER T WELVE S UPERTAGGING WITH C ONSTRAINTS G UILLAUME B ONFANTE , B RUNO G UILLAUME , M ATHIEU M OREY, G UY P ERRIER
12.1 I NTRODUCTION In the traditional workflow of grammatical analysis, one finds first a decomposition of the text into lexemes, followed by the grammatical analysis itself. In Natural Language Processing, between the lexer and the parser, one inserts a step called part-of-speech tagging (POS) whose aim is to simplify the grammatical analysis task. POS tagging maps words of a natural language sentence to some tags, themselves interpreted by the analyser in the subsequent step. POS tagging follows from the fact that, contrarily to computer languages, a word in a natural language may serve in very different contexts ("I saw it" vs "Take a saw"). Due to the induced ambiguity, the search space of POS tagging for a sentence is then a priori exponential in the number of words. To fix the ideas, let us suppose that each word in the dictionary can be mapped to 4 tags1 , then, a short sentence of length 15 can be tagged in 415 / 109 different ways. Considered that the parsing step works on each tagging individually, this amount is prohibitive. The role of POS tagging is to either filter out "wrong" taggings, or to pick "good" candidates. POS tagging has been thoroughly considered in the past. Generally speaking, methods fall in two categories, the rule-based ones and the stochas1
A quite conservative hypothesis.
254
C HAPTER T WELVE
tic ones. The basic idea behind rule-based taggers is to apply (hand written) patterns locally on tag sequences to rule out incorrect parts-of-speech. This line of research, pioneered by Harris (1962) among others, is still alive as illustrated by the Constraint Grammars of Karlsson et al. (1995) or the chapter on Handcrafted rules by van Halteren (1999). Among stochastic methods, a successful approach is based on Hidden Markov Models, see for instance (Kupiec, 1992; Merialdo, 1994). One selects sequences which optimise locally the most common n-grams. The key feature of the method is to learn likely tag sequences from a (possibly large) dataset. An intermediate view on the problem, in between rule-based and stochastic methods, was provided by Brill. His tagger "normalises" tag sequences according to a (partially learned) transformation process (Brill, 1995). The reader will find a more exhaustive overview of POS methods in (Jurafsky and Martin, 2009). In most NLP systems, POS tagging and parsing are two consecutive but distinct processes. This separation harms accuracy, which has prompted several works on joint models for POS tagging and parsing (Rush et al., 2010; Li et al., 2011; Bohnet and Nivre, 2012). An alternative is to complement or substitute to POS tagging a step of tagging with a richer tagset whose information is directly linked to the one used for the syntactic analysis. The elements of such a tagset are called supertags and among other things, they usually contain subcategorisation information. For instance, supertags for verbs can contain information about their valency. This idea, called supertagging, was proposed by Bangalore and Joshi (1999) to speed up the parsing process of LTAG. It has then been successfully applied to CCG (Clark and Curran, 2004) and HPSG (Ninomiya et al., 2006). Most of the work on supertagging relies on adaptations of the statistical methods used for POS tagging. A drawback of these methods is that they do not guarantee a perfect recall: even methods that output n-best lists of solutions can prune the correct supertagging, which then prevents the syntactic analyser from finding the correct parse. Boullier (2003) experimented an exact and non statistical approach. By exact, we mean that it does not discard supertags corresponding to a successful parse. The approach is non statistical in the sense that instead of using training annotated corpora, it relies on properties of the grammar. Supertags result from an abstraction of the elementary structures of the grammar and supertagging reduces to parsing with the abstract grammar. Boullier (2003) successfully experimented this approach on LTAG with two steps of abstraction, from LTAG to CFG and from CFG to Regular Grammar. We propose a supertagging method based on the companionship prin-
S UPERTAGGING WITH C ONSTRAINTS
255
ciple, a form of constraints on sentences. In this framework, supertagging amounts to a constraint propagation technique. The companionship principle states that some tags ask for some companions. A determiner needs a noun, a transitive verb an object, an adverb a verb. That is, in any (parsable) sentence, any determiner will correspond to some noun, any transitive verb to some object, any adverb to some verb. Relationships may be multiple, for instance, a relative pronoun is both linked to a noun phrase and to a verb. Companions may be disjunctive: a pronoun may be subject of either a transitive verb or an intransitive one. No symmetry is required: a determiner asks for a noun, but a noun is not necessarily linked to a determiner. Second part of the principle, one discriminates companions on the left from those on the right, taking thus into account the linear order of sentences. E.g. the determiner comes before the noun. Sometimes, the relation may be bilateral, for instance a complementiser with respect to the principle verb. Since companions are a necessary condition on sentences, to perform supertagging, one may simply remove all those taggings for which the principle is not respected. For instance, there is no reasons to keep a lexical tagging involving an orphan determiner, that is a determiner without nouns on its right. Like the work done by Boullier (2003), the companionship principle is not based on statistics, nor on heuristics, but on necessary conditions of parsing. Consequently, we accept to have more than one lexical tagging for a sentence, as long as we can ensure to have the good ones (when they exist!). This property is particularly useful to ensure that the deep parsing will not fail because of an error at the disambiguation step. On the other hand, for the parser, the set of candidates should be as small as possible. To evaluate this feature, we consider mean ambiguity per word. Second criterion of success, the procedure must be efficient compared to the analysis step. These are the rules of the game we are playing with. Our thesis is the following. First, companions may be computed directly on the grammar, not on sentences. Thus, the cost of the method can be partially pushed at some pre-compilation phase. The minimal part is done at parsing time. Second, supertagging can be performed using an automaton technique working on the entire set of potential supertaggings. This ensures partially its efficiency. Companionship constraints appear more or less transparently in grammatical frameworks. Consider the case of Dependency Grammars. They explicitly link words by pairs, these are good companion candidates. De-
256
C HAPTER T WELVE
pendency Grammars are due to Lucien Tesnière. Tesnière (1959) developed a formal and sophisticated theory with dependencies. In computational linguistics, there are a few explicit dependency grammars (Debusmann et al., 2004). Most grammars are implicit grammars learnt for training corpora and used for statistical parsing. Therefore, this framework has a limited interest for our purpose. Nevertheless, many current grammatical formalisms give rise to explicit grammars while relying more or less explicitly on the notion of dependencies between words. It is true of the phrase structure based formalisms which consider that words introduce incomplete syntactic structures which must be completed by other words. This idea is at the core of Categorial Grammars (CG) (Lambek, 1958) and all its trends such as Abstract Categorial Grammars (ACG) (de Groote, 2001) or Combinatory Categorial Grammars (CCG) (Steedman, 2000), being mostly encoded in their type system. Dependencies in CG were studied by Moortgat and Morrill (1991) and for CCG by Clark et al. (2002); Koller and Kuhlmann (2009). Other formalisms can be viewed as modelling and using dependencies, such as Tree Adjoining Grammars (TAG) (Joshi, 1987) with their substitution and adjunction operations. Dependencies for TAG were studied by Joshi and Rambow (2003). Another more recent concept of polarity can be used in grammatical formalisms to express that words introduce incomplete syntactic structures. Interaction Grammars (IG) (Guillaume and Perrier, 2009) directly use polarities to describe these structures but it is also possible to use polarities in other formalisms in order to make explicit the more or less implicit notion of incomplete structures: for instance, in CG (Lamarche, 2008) or in TAG (Kahane, 2006; Bonfante et al., 2004; Gardent and Kow, 2005). On this regard, Marchand et al. (2009) showed that it is also possible to extract a dependency structure from a syntactic analysis in IG. This encourages us to say that in many respects dependencies and polarities are two sides of the same coin. The aim of this paper is to show that dependencies can be used to express companionship constraints on the taggings of a sentence and hence these constraints can be used to partially disambiguate the words of a sentence. There is a final point we want to discuss about, the question of efficiency. As mentioned earlier, POS must be done in time negligible compared to parsing. Thus, we have to face the exponential number of lexical tagging of a sentence. It is not reasonable to treat them individually. To avoid this, it is convenient to use an automaton to represent the set of all lexical taggings. This automaton has linear size with regard to the length of the sentence. The idea of using automata is not new. In particular, methods based on
S UPERTAGGING WITH C ONSTRAINTS
257
Hidden Markov Models (HMM) use such a technique for part-of-speech tagging (Kupiec, 1992; Merialdo, 1994). Using automata, we benefit from dynamic programming procedures, and consequently from an exponential temporal and space speed up.
12.2 T HE C OMPANIONSHIP P RINCIPLE IN B RIEF In this section, we present informally the principle on a toy AB-grammar. We will see the companionship principle in action and we will show that it can be computed on the grammar.
12.2.1 PARSING
WITH AN
AB- GRAMMAR
Let us consider a simple grammatical formalism: AB-Grammar (BarHillel, 1953), which is a restriction of Categorial Grammar. An AB-grammar relies on a set of syntactic types recursively built from atomic types with two operators: left division (\) and right division (/). A constituent of type A\B (resp. B/A) expects a constituent of type A immediately on its left (right) to build a constituent of type B. A toy grammar may use the following types based on the atomic types N, NP and S. name of type Det LAdj RAdj CN Clit TrV IntrV
value NP/N N/N N\N N (NP\S)/((NP\S)/NP) (NP\S)/NP NP\S
description determiner left adjective right adjective common noun object clitic pronoun transitive verb intransitive verb
An AB-grammar maps any word of a vocabulary to a (finite) set of types. We call the mapping a lexicon. Let us consider a very small vocabulary V = { “la”, “belle”, “ferme”, “porte” } and let us consider a toy grammar on V . The lexicon G is given by Table 12.1 below. Each × corresponds to an element in G . For instance, the French word “porte” can be a common noun (“door”), a transitive verb (“hangs”) or an intransitive verb (“to have a far reach”).
258
C HAPTER T WELVE
Table 12.1: Toy lexicon of an AB-grammar
Det LAdj RAdj CN Clit TrV IntrV
la ×
belle
ferme
porte
× × ×
× × ×
×
× ×
× ×
× ×
Example 1. “La belle ferme la porte” The parsing of the Sentence (1) with the grammar G is done in two steps. In the first step, we select an entry in the lexicon for each word of the sentence to build a lexical tagging of the sentence. For instance, let us consider the lexical tagging: [Det, LAdj, CN, Clit, TrV] In the second step, we reduce the lexical tagging according to the two reduction rules: A, A\B −→ B (left division)
B/A, A −→ B (right division)
If the computation ends with the unique tag S, then parsing succeeds. With the tagging example above, the following successful reduction folds: NP/N, N/N, N, (NP\S)/((NP\S)/NP), (NP\S)/NP NP/N, N/N, N, NP\S NP/N, N, NP\S NP, NP\S
12.2.2 F ILTERING
−→ −→ −→ −→
S
LEXICAL TAGGINGS WITH THE
C OMPANIONSHIP P RINCIPLE A naive way to perform parsing is to do it on all potential lexical taggings of the sentence one after the other. It is not efficient due to the exponential number of taggings as shown in introduction. For Sentence (1),
S UPERTAGGING WITH C ONSTRAINTS
259
among several hundreds of possible taggings, 4 correspond to a solution and the remaining ones are not productive. A first solution is: [Detla , LAdjbelle , LAdjferme , CNla , IntrVporte ]. It must be discarded because it does not respect the agreement in gender between the feminine adjective belle and the masculine noun la. The AB-grammar ignores agreement features. We give the English translation of the sentences corresponding to the 3 other successful lexical taggings: • [Detla , CNbelle , TrVferme , Detla , CNporte ]: “The nice girl closes the door” • [Detla , LAdjbelle , CNferme , Clitla , TrVporte ]: “The nice farm hangs it” • [Detla , CNbelle , RAdjferme , Clitla , TrVporte ]: “The firm nice girl hangs it” This example shows that a filtering step is crucial to keep a number of lexical taggings small enough to be tractable. We propose filtering based on the Companionship Principle. In our case, this principle amounts to say that in a parsable lexical tagging, every type must find a companion, that is another type which will combine with it in the parsing process. If a lexical tagging does not verify this principle, it may be discarded because it is not parsable. What is interesting is that it is possible to pre-compute all companions of any type from the lexicon G and even to foresee if they are left or right companions, according to the sentence order. Indeed, according to the syntactic composition rules of our grammatical framework, we are sure that an initial type A/B will combine either on its right with another initial type having B as its head or as an argument of a type X/(A/B) or (A/B)\X. For instance, a transitive verb has the type (NP\S)/NP . This type can combine on its right with the type NP of a noun phrase that is the direct object of the verb but it can also combine on its left with the type (NP\S)/((NP\S)/NP) of an object clitic pronoun. In a symmetrical way, an initial type B\A will combine either on its left with another initial type having B as its head or as an argument or a type X/(A\B) or (A\B)\X. To avoid the usage of a type A/B or A\B as an argument of a more complex type, we consider an equivalent grammar without nested types. In our grammar, only one type has such a complex argument, the type Clit, which has the form (NP\S)/((NP\S)/NP). A new atomic type T is introduced to replace the type (NP\S)/NP) of transitive verbs, when they are used as arguments of clitics. Two types are now associated to transitive verbs, depending on the fact that they are used with a canonical object (TrV_1) or
260
C HAPTER T WELVE
Table 12.2: Types of the grammar redefined with a flat structure
name of type Det LAdj RAdj CN Clit TrV_1 TrV_2 IntrV S
value NP/N N/N N\N N (NP\S)/T (NP\S)/NP T NP\S S
description determiner left adjective right adjective common noun object clitic pronoun transitive verb transitive verb intransitive verb sentence
with a clitic object (TrV_2). Table 12.2 gives the list of the syntactic types redefined in this way. Therefore, without nested type, it is possible to associate any type headed by B with a set of left companions gathering all types in the from A/B and with a set of right companions in the form B\A. There is an exception if B = S, because S does not need to combine with another type. Table 12.3 gives the set of companionship constraints resulting of this property. Table 12.3: Head companions
name of type Det LAdj RAdj CN TrV_2
value NP/N N/N N\N N T
left companions TrV_1 Det, LAdj Det, LAdj Det, LAdj Clit
right companions Clit, TrV_1, IntrV RAdj RAdj RAdj 0/
Note that it is also possible to consider dual constraints: for instance the fact that an intransitive verb (with type NP\S) is waiting to be saturated by a type headed by NP on its left. It produces another set of constraints described in Table 12.4; they that are handled the same way as the previous ones. Now, let us apply the Companionship Principle to all possible lexical taggings of Sentence (1) with the companionship constraints. With the 5 constraints of Table 12.3, among the 648 taggings, only 148 verify the Companionship Principle and the other ones are discarded. With the 13 con-
S UPERTAGGING WITH C ONSTRAINTS
261
Table 12.4: Argument companions
name of type Det LAdj RAdj Clit Clit TrV_1 TrV_1 IntrV
value NP/N N/N N\N (NP\S)/T (NP\S)/T (NP\S)/NP (NP\S)/NP NP\S
left companions 0/ 0/ LAdj, RAdj, CN Det 0/ Det 0/ Det
right companions LAdj, RAdj, CN LAdj, RAdj, CN 0/ 0/ TrV_2 0/ Det 0/
straints of the Tables 12.3 and 12.4, only 18 taggings verify the principle.
12.2.3 I MPLEMENTATION
WITH
AUTOMATA
Even after filtering with the Companionship Principle, the number of remaining lexical taggings is high and this number grows exponentially with the length of the sentence. A way of solving this problem is to represent the set of lexical taggings for a sentence in the compact form of an acyclic automaton. For instance, the set of the 648 initial lexical taggings for Sentence (1) will be represented with what we call a lexical tagging automaton (LTA) with 6 states and 19 transitions. Then, it is more efficient to apply the Companionship Principle directly on the LTA and to build a new LTA representing the 18 taggings that verify this principle. These automata are shown on Figure 12.1.
12.3 L EXICALISED G RAMMARS In this section, we present formally the notion of lexicalised grammar on which our filtering method is applicable. The definition is really general. It endows almost any notion of lexicalised grammar that is currently considered in literature. Definition 9 (Lexicalized Grammar). A Lexicalised Grammar is a 5-tuple (V , S , G , F , p) where: • V is a vocabulary: a finite set of words, • S is a set of syntactic structures,
262
C HAPTER T WELVE
Figure 12.1: The LTA of sentence (1) before and after filtering with the Companionship Principle
S UPERTAGGING WITH C ONSTRAINTS
263
• G ⊂ S × V is a finite set called the lexicon, • p is a partial function p : S ∗ → P(F ), where F denotes a set2 of parsing solutions. The process of parsing is achieved as follows. Given an input sentence w = w1 , . . . , wn ∈ V ∗ , choose a finite list L = [(S1 , w1 ), . . . , (Sn , wn )] of elements of G next called lexical taggings. We will say that L is a lexical tagging of the sentence w. Then, feed S = S1 , . . . , Sn to the parser. If p(S) is non empty, the sentence is considered to be parsed, and elements in p(S) are witnesses of its grammatical structure. As stated, our notion fits with so-called Lexicalised grammars. Let (w) = {S ∈ S | (S, w) ∈ G } be the function mapping words to the set of their corresponding syntactic structures. The string language generated by the grammar is the set of sentences having a successful tagging for the parsing function p. The grammar of the introductory example in the previous section (Section 12.2.1) can be formalised in the light of the formal definition above as C1 = {V , S1 , G1 , F1 , p1 } • The vocabulary V is the set { “la”, “belle”, “ferme”, “porte” }. • The set S1 of syntactic structures is the set of types {Det, LAdj, RAdj, CN, Clit, NP, TrV_1, TrV_2, IntrV, S}. • The lexicon G1 is given by the table of the previous section. • F1 = {S}. • The function p1 returns {S} if the parsing of the lexical tagging succeeds; otherwise, it returns the empty set. A naive way of performing lexical disambiguation is to run the parser on each lexical taggings, filtering only positive ones. Naturally, as stated, such a process is not efficient, but it can be largely improved if one works on a simplified version of a grammar. Actually, this idea is central to our thesis: part-of-speech tagging can be seen as parsing, but on a "simplified" grammar. Thus, the notion of grammar abstraction. Definition 10 (Grammar abstraction). Given C = (V , SC , GC , FC , pC ), the concrete grammar, and A = (V , SA , GA , FA , pA ), the abstract one, suppose that both share the same vocabulary, an abstraction from C to A is a function f from SC to SA such that 2
Which can be completely arbitrary in the present context.
264
C HAPTER T WELVE
1. ∀(S, w) ∈ GC , ( f (S), w) ∈ GA ; 2. for any sequence S1 , . . . , Sn ∈ SC , if pA ( f (S1 ), . . . , f (Sn )) is empty, then pC (S1 , . . . , Sn ) is empty. The conjunction of the two items guarantees that the string language generated by the grammar C is a subset of the language generated by the grammar A . Suppose that [(R1 , w1 ) . . . , (Rn , wn )] is a lexical tagging of a sentence w in A , if it cannot be parsed, any choice S1 , . . . , Sn such that f (Si ) = Ri will fail. Thus, to remove lexical taggings at the concrete level, it is actually sufficient to remove their corresponding ones at the abstract level. The idea to use of abstraction for lexical disambiguation can already be found in (Boullier, 2003)3 . Example of simple counting abstraction To illustrate grammar abstraction, let us come back to the example above. Let us consider the grammar C1 as being the concrete grammar. The abstraction that we define here, consists in forgetting the tree structure of a complex type and considering it as a tuple counting atomic types labelled positively or negatively, depending on whether they are available or expected resources. A = {V , SA , GA , FA , pA } with: • Let SA = ♥4 , that is 4-tuples of integers (n, m, p, q) representing a positive or negative counting of the respective types NP, N, S and T . • The lexicon GA is defined by the table below. Det LAdj, RAdj CN Clit TrV_1 TrV_2 IntrV
(1, −1, 0, 0) (0, 0, 0, 0) (0, 1, 0, 0) (−1, 0, 1, −1) (−2, 0, 1, 0) (0, 0, 0, 1) (−1, 0, 1, 0)
la × × ×
belle
ferme
porte
× ×
× ×
×
× × ×
× × ×
We remark that the abstract grammar does not differentiate left adjectives from right adjectives. Both are totally neutral in parsing. • The set FA is {(0, 0, +1, 0)}. 3
Our definition of morphism must be slightly extended for embedding the proposal of Boullier (2003).
S UPERTAGGING WITH C ONSTRAINTS
265
• The parsing function pA sums the 4-tuples of a lexical tagging. If the result is equal to (0, 0, +1, 0), pA returns {(0, 0, +1, 0)}. Otherwise, it returns the empty set. The abstraction count from C1 to A maps every type to a 4-tuple of integers according to the following algorithm (where subtraction of 4-tuples is performed pointwise): function count(t) if t == NP then return (1, 0, 0, 0) else if t == N then return (0, 1, 0, 0) else if t == S then return (0, 0, 1, 0) else if t == T then return (0, 0, 0, 1) else if t == t1 /t2 or t == t2 \t1 then return count(t1 )− count(t2 ) For instance, count((NP\S)/NP) = (−2, 0, 1, 0). It is easy to verify that count is an abstraction from C1 to A . With grammar A , for Sentence (1), there are 3 × 2 × 5 × 3 × 4 = 360 lexical taggings4 . Among them, 7 are successful taggings. One of the advantages of the abstraction is to reduce the number of lexical taggings one work with: the 648 concrete lexical taggings are mapped to 360 abstract taggings. This fact improves the efficiency of filtering procedures. If we go back to the concrete level, the 7 successful abstract taggings correspond to 22 (out of 648) concrete taggings. A direct consequence of the fact that count is an abstraction is that all other lexical taggings may be discarded. Naturally, among the 22 taggings, there are the 4 successful tagging for the sentence “la belle ferme la porte”. We remark that the filtering with this abstraction is a little less efficient that the filtering of Section 12.2 using the companionship principle. Instead of keeping 18 taggings, we keep 22 taggings. The reader may observe that compared to the companionship principle stated for the concrete grammar, we miss the word order constraints. However, at the same time, at the abstract level, observe that a companion may be used only once. This is the core idea of what we call the affine companionship principle, described in Section 12.4.5. For instance, the tagging [Det, CN, CN, Clit, TrV_2] is kept by the application of the companionship constraints of Section 12.2. We remark that both tags CN share the same left companion Det, and the counting performed 4
To be compared with 3 × 3 × 6 × 3 × 4 = 648 taggings for the concrete grammar.
266
C HAPTER T WELVE
by the abstraction forbids this sharing : two available Ns cannot fill the same expected N. Therefore, the tagging is discarded by the abstraction filtering. At the opposite, consider the tagging [Clit, LAdj, CN, Det, TrV_2]. It is kept by the abstraction because the balance between available and expected atomic types is perfectly respected. On the other hand, it is discarded by application of the companionship constraints of Section 12.2 because the tag Det does not find one of its right companions LAdj, RAdj or CN on its right. Example of left-right counting abstraction on a particular type Consider again the Lexicalised grammar C1 as our concrete grammar. The abstraction that we define here, consists in considering only the atomic components NP of types that are active in syntactic composition according to the AB-grammar rules. They can be filed in three classes: they may be available resources as heads of types, as in NP/N, they may be expected resources on the left of the head, as in NP\S, or they may be expected resources on the right of the head, as in S/NP. We assign a triple (h, l, r) to any type with the following meaning: h is the number of head NP (it can take the value 1 or 0), l is the number of left NP and r is the number of right NP. For instance, type (NP\S)/NP is abstracted into the triple (0, 1, 1). To define the abstraction formally, we consider an abstract Lexicalised grammar ANP with same vocabulary as that one of C . • The set SANP of its syntactic structures is constituted of triples (h, l, r) representing a counting of respective head, left and right NP. • The lexicon GANP is defined by the table below. name Det LAdj,RAdj,CN,TrV_2 TrV_1 Clit, IntrV
value (1, 0, 0) (0, 0, 0) (0, 1, 1) (0, 1, 0)
la × × ×
belle
ferme
porte
×
× × ×
× × ×
We remark that the abstract grammar does not differentiate adjectives from common nouns or from the usage of transitive verb with a clitic. All are totally neutral in parsing. Moreover, it identifies intransitive verbs and clitics. • The set FANP of final syntactic structures reduces to {(0, 0, 0)}.
S UPERTAGGING WITH C ONSTRAINTS
267
• The parsing function pANP results from the application of the following rewriting rules to a lexical tagging: (0, 0, 0) −→ ε if l2 ≥ 1 (1, 0, 0), (h2 , l2 , r2 ) −→ (h2 , l2 − 1, r2 ) if r1 ≥ 1 (h1 , l1 , r1 ), (1, 0, 0) −→ (h1 , l1 , r1 − 1) The first rule means that all words labelled with (0, 0, 0) do not count and their triple is removed. The second rule represents the combination of constituents with left arguments and the last rule do the same for right arguments. If there exists a computation starting from a lexical tagging, using the rewriting rules above and ending in (0, 0, 0), pANP returns {(0, 0, 0)}; otherwise, it returns the empty set. The abstraction countNP from C to ANP maps every type to a triple of natural numbers according to the following algorithm: function countNP (t) if t == NP then return (1, 0, 0) else if t ∈ {N, S, T } then return (0, 0, 0) else if t == t1 /t2 and countNP (t1 ) == (h1 , l1 , r1 ) and countNP (t2 ) == (h2 , 0, 0) then return (h1 , l1 , r1 − h2 ) else if t == t2 \t1 and countNP (t1 ) == (h1 , l1 , r1 ) and countNP (t2 ) == (h2 , 0, 0) then return (h1 , l1 − h2 , r1 )
It is easy to verify that countNP is an abstraction because it has the two features characterising an abstraction. With lexicon GANP , for sentence (1), there are 3 × 1 × 3 × 3 × 3 = 81 lexical taggings in ANP . Among them, there are 7 successful taggings. On the concrete level, this corresponds to 87 lexical taggings kept out of 648. Similar abstraction can be done with basic types N and T instead of NP (results are given in the table below). Then, with an intersection of list of paths obtained which each of the three filters, we found that only 4 lexical taggings are kept (last line of the table).
268
C HAPTER T WELVE
Abstraction LT (abstract level) GANP 7/81 10/216 GAN 5/16 GAT Intersection of the 3 abstractions above
LT (concrete level) 87/648 36/648 261/648 4/648
This filtering is more efficient than the filtering using the simple abstraction count because it distinguishes between left and right composition. It is also more efficient than the application of the companionship principle done in Section 12.2 because it takes into account the fact that the same companion cannot be shared by several tags.
12.4 T HE C OMPANIONSHIP P RINCIPLE We have stated in the previous section the framework and the definitions required to describe our principle. In a first step, we give a formal definition of the Companionship Principle. In a second step, we show its use for lexical disambiguation. In a third step, we show how, from a practical point of view, the Companionship Principle can be approximated by means of grammar abstractions. In a last step, we discuss some variations of the definition of the Principle.
12.4.1 T HE
STATEMENT OF THE
C OMPANIONSHIP P RINCIPLE
The intuition guiding the Companionship Principle is that it is possible from a grammar to foresee the syntactic structures that are required to interact with a given structure in the process of syntactic composition. Such structures are called its companions. Given a grammar, we say that a pair (L , R) of sets of syntactic structures is a companionship constraint for an syntactic structure S if for each lexical tagging L = [(S1 , w1 ), . . . , (Sn , wn )] such that p(S1 , . . . , Sn ) = 0/ and S = Si for some i then: • either there is some j < i such that S j ∈ L , • or there is some j > i such that S j ∈ R. In other words, (L , R) lists the potential companions of S, respectively on the left and on the right.
S UPERTAGGING WITH C ONSTRAINTS
269
A system of companionship constraints for a grammar G is a function constr that associates a finite set of companionship constraints to each syntactic structure of G . The Companionship Principle is an immediate consequence of the definition of companionship constraints. It can be stated as the following necessary condition: The Companionship Principle A sequence S1 , . . . , Sn can be parsed only if for all i and for all companionship constraints (L , R) ∈ constr(Si ) • either {S1 , . . . , Si−1 } ∩ L = 0/ / • or {Si+1 , . . . , Sn } ∩ R = 0. By extension, we say that a lexical tagging [(S1 , w1 ), . . . , (Sn , wn )] respects the Companionship Principle if and only if S1 , . . . , Sn respects it.
12.4.2 T HE “C OMPANIONSHIP P RINCIPLE ”
LANGUAGE
In this section, we show that the set of sequences verifying the companionship principle is a regular language. Let us consider the set SG of the syntactic structures occurring in a grammar G . On SG seen as an alphabet, we can consider the following three languages: 1. The language SG∗ of all possible sequences of syntactic structures. 2. The second language is the set C of sequences S1 . . . Sn such that / that is succeeding syntactic sequences. p(S1 . . . Sn ) = 0, 3. Between the two previous languages, there is the set P of strings S1 . . . Sn such that the Companionship Principle applies. The fact that C ⊆ P ⊆ SG∗ follows from the Companionship Principle: all lexical taggings that are parsed successfully verify the Companionship Principle. Remarkably, the language P can be described as a regular language. Since C is presumably not a regular language (at least for natural languages!), P is a better regular approximation than the trivial SG∗ .
270
C HAPTER T WELVE
Let us consider one syntactic structure S and a companionship constraint (L , R) ∈ constr(S). Then, the set of strings of syntactic structures verifying this constraint can be described as LS:(L ,R) = ((L )∗ S(R)∗ ) where denotes the complement of a set. Then, P is a regular language defined by: P=
LS:(L ,R)
S∈SG (L ,R)∈constr(S)
From the Companionship Principle, we derive a lexical disambiguation Principle which simply tests tagging candidates with P. Notice that P can be statically computed (at least, in theory) from the grammar itself. A rough approximation of the size of the automaton corresponding to P can be easily computed. Since each automaton LS:(L ,R) has 4 states, P has at most 4m states where m is the number of atomic constraints. For instance, the grammar used in the experiments contains more than one companionship constraint for each lexical entry, and m > |SG | > 106 . Computing P by brute-force is then intractable. In the coming sections, the principle is refined to cope with that issue.
12.4.3 G ENERALISATION
OF THE
C OMPANIONSHIP P RINCIPLE
TO ABSTRACTION
As we could see just above, the application of the Companionship Principle for filtering lexical taggings may be very costly in space. A solution is to apply it not directly to the concerned grammar but to an abstraction of this grammar. Since the abstraction is simpler, the application of the Companionship Principle is expected to be less costly. Then, by crossing the application of this principle to different abstractions, one may hope to get an efficient filtering without too much cost. The soundness of the transformation comes from the property that any lexical tagging that can be parsed is transformed by the abstraction into a lexical tagging that is parsed. It can be formalised in the Generalised Companionship Principle, which is stated as follows:
S UPERTAGGING WITH C ONSTRAINTS
271
The Generalised Companionship Principle Let f be an abstraction from a concrete grammar C to an abstract grammar A . A sequence S1 . . . Sn of structures of C has a solution only if for all i and for all companionship constraints (L , R) ∈ constr( f (Si )) • either { f (S1 ), . . . , f (Si−1 )} ∩ L = 0/ / • or { f (Si+1 ), . . . , f (Sn )} ∩ R = 0.
Generalised Companionship Principle on the basic type abstraction example We consider the abstraction countNP from C to ANP , which maps every syntactic type to a triple of natural numbers. The set of companionship constraints that we propose follows from the semantics of triples: (head NPs, left NPs, right NPs). We are aware that in the process of parsing, every head NP must encounter a left or right NP. Therefore, any triple (1, l, r), can be associated with a set of left companions, which are triples with a non null third component, and with a set of right companions, which are triples with a non null second component. With lexicon GANP , this property entails one companionship constraint for Det-Clit, which has value (1, 0, 0): it has one left companion, TrV_1 with value (0, 1, 1) and two right companions, TrV_1 and IntrV with value (0, 1, 0). The application of the Generalised Companionship Principle with this constraint to the sentence “la belle ferme la porte” tagged with lexicon GC entails the selection of 420 taggings among the 648 possible taggings on the concrete level; at the abstract level, 31 out of 36 lexical taggings are kept. Again, a similar filtering can be done, replacing the atomic type NP by N and T . For N, we consider the abstract Lexicalised grammar AN with a new lexicon GAN defined by the table below. name Det LAdj RAdj CN Clit, TrV_1, TrV_2, IntrV
value (0, 0, 1) (1, 0, 1) (1, 1, 0) (1, 0, 0) (0, 0, 0)
la × × ×
belle
ferme
porte
× × ×
× × × ×
× ×
272
C HAPTER T WELVE
From this grammar, we define a set of companionship constraints in the same way as for NP. The three triples LAdj, RAdj and CN share the same companionship constraint with LAdj and Det as left companions and RAdj as right companion. The table below gives the numbers of lexical taggings filtered at the abstract and concrete level when generalised companionship principle is applied with each of the atomic type NP, N and T ; it gives also the filtering obtained by combination of the three filters. Abstraction LT (abstract level) GANP 74/81 84/216 GAN 11/16 GAT Intersection of the 3 abstractions above
LT (concrete level) 534/648 240/648 516/648 148/648
We obtain the same result as in the application of the direct companionship principle performed in Section 12.2 with the constraints of Table 12.3. In Section 12.2, we exactly apply the three filters above but we do it simultaneously.
12.4.4 T HE U NDIRECTED C OMPANIONSHIP P RINCIPLE The Companionship Principle includes a constraint on the order of the companions with respect to the concerned syntactic structure. Sometimes, it is interesting to relax this constraint to simplify filtering. Sometimes, this order is not relevant with respect to the grammar and the parsing function. In both cases, a way of formalising it is to consider companionship constraints (L , R) such that L = R. We call them undirected companionship constraints and they are defined by a unique set of companions. The Undirected Companionship Principle If a sequence S1 . . . Sn can be parsed then for all i and for all undirected companionship constraints C ∈ constr(Si ): {S1 , . . . , Si−1 , Si+1 , . . . , Sn } ∩ C = 0/
Undirected companion example Let us take again the first example of section 12.2. We replace the five directed companionship constraints with five undirected constraints: for each directed constraint, we make the union of its left and right companions to build the set of companions defining
S UPERTAGGING WITH C ONSTRAINTS
273
the undirected constraint. The resulting set of constraints is given by table below. name of type Det LAdj RAdj CN TrV_2
value NP/N N/N N\N N T
companions Clit, TrV_1, IntrV Det, LAdj, RAdj Det, LAdj, RAdj Det, LAdj, RAdj Clit
With this set of constraints, if we apply the Undirected Companionship Principle to the lexical taggings of sentence “la belle ferme la porte”, we obtain 312 remaining taggings with respect to the 648 initial taggings. If we compare this result with the 148 taggings obtained by the application of the directed Companionship Principle in the example of section 12.2, we remark that it is worse, which is normal because the constraints have been relaxed.
12.4.5 T HE A FFINE
AND
L INEAR C OMPANIONSHIP P RINCIPLES
Previously, we relaxed the companionship constraints and now, we strengthen them to take the following property into account: the same companion cannot be shared by two different tags. Consider the companionship constraints of Table 12.3 and the tagging [Det, CN, CN, Clit, TrV_2] which would correspond to the sentence "La fille fille la porte". This tagging respects the companionship principle in the sense that every tag finds its companion but the two tags CN share the same left companion Det. From a linguistic point of view, it means that two common nouns share the same determiner, which is not allowed. We model this constraint with the Affine Companionship Principle. The principle is expressed in an abstraction grammar which we call Affine Companionship Grammar, next denoted AAFF . The companionship principle on the affine companionship grammar induces an affine companionship principle on any grammar which can be abstracted to AAFF . In other words, to apply the affine companionship principle on G , it is sufficient to find an (possibly many) abstraction to AAFF . One may refine the affine companionship principle to take the following property into account: all potential companions must be used in the pairing with tags. Let us come back to the companionship constraints of Table 12.3 and to the tagging [Det, CN, IntrV, Clit, IntrV]. It respects the affine companionship principle because CN is paired with Det and Det can
274
C HAPTER T WELVE
be paired with Clit or one of the two tags IntrV. There is no constraint on IntrV and Clit. However, in all cases, two potential companions are not paired. From a linguistic point of view, it means that an intransitive verb or an object clitic are used without subject, which is not allowed. The refinement of the principle is describe by a restriction of affine companionship grammars, that is linear companionship grammars. The Affine Companionship Principle. First, we define the Affine Companionship Grammar, denoted AAFF , as follows. • The syntactic structures are strings on the alphabet {X, L, R,U}. Intuitively, L, R and U respectively represent left, right and undirected companions of X. • The parsing function p returns all final strings without X resulting from the application of the following (non-confluent) system of derivation rules: w1 Lw2 Xw3 w1Uw2 Xw3
−→ −→
w1 w2 w3 w1 w2 w3
w1 Xw2 Rw3 w1 Xw2Uw3
−→ −→
w1 w2 w3 w1 w2 w3
The meaning of a successful parsing is that it is possible to associate every occurrence of X to one occurrence of L on its left, of R on its right or of U whatever is the position, with an injective function: a companion cannot be used twice, hence the name affine. For instance, the string XLXR is in the language defined by AAFF (indeed XLXR → XR → ε ). The string XLL is not in AAFF (none of the four rules apply); the string LXX is not in AAFF (LXX → X and no more rule can apply). The Affine Companionship Principle follows immediately from the definition above. The Affine Companionship Principle Let fAFF be an abstraction from a grammar C to AAFF . If a sequence S1 . . . Sn of C has a solution then its abstraction f (S1 ) . . . f (Sn ) in AAFF has a solution. Let us take again the concrete grammar C1 (Section 12.3). There are many possible abstraction. To begin with, let us focus on NP’s constraints. We define an abstraction, which is similar to abstraction countNP but with AAFF as abstract grammar. Head NP are mapped to X, left NP to R and right NP to L, hence the abstract grammar:
S UPERTAGGING WITH C ONSTRAINTS
name Det LAdj, RAdj, CN, Clit, TrV_2 TrV_1 IntrV
value X ε RL R
la × ×
275
belle
ferme
porte
×
× × ×
× × ×
Then, if we apply the Affine Companionship Principle with this abstraction to sentence “la belle ferme la porte”, with respect to the 648 initial lexical taggings, 510 are selected. Now, we do the same for N instead of NP. The principle of the abstraction is the same and we obtain the following abstract grammar. name Det LAdj RAdj CN Clit, TrV_1, TrV_2, IntrV
value L XL RX X ε
la × × ×
belle
ferme
porte
× × ×
× × × ×
× ×
The Affine Companionship Principle with this abstraction to sentence la belle ferme la porte, select 125 of the 648 lexical taggings. The table below sums up the result for the Affine Companionship Principle: Abstraction LT (abstract level) NP 72/81 N 33/216 T 10/16 Intersection of the 3 abstractions above
LT (concrete level) 510/648 125/648 510/648 65/648
The Linear Companionship Principle. The Linear Companionship Grammar, denoted ALIN is the Affine Companionship Grammar AAFF , with the additional constraints that the final succeeding structure is the empty string. Intuitively, it means that it is possible to associate every occurrence of X to an occurrence of L, R or U with an injective function, and this function is bijective: no occurrence of L, R and U is left aside. For instance, the string LXLXR is not in the language defined by ALIN (although it is in the language defined by AAFF ), whereas LXLXRX is in ALIN . From this grammar, follows the Linear Companionship Principle, which is stated in a similar way as the Affine Companionship Principle.
276
C HAPTER T WELVE
Abstraction LT (abstract level) NP 7/81 N 14/216 T 5/16 Intersection of the 3 abstractions above
LT (concrete level) 87/648 44/648 261/648 5/648
Apart the 4 successful lexical taggings, one wrong tagging is kept: [Det, LAdj, RAdj, CN, IntrV]. The reason is that the linear companionship principle does not take into account the distance between a tag an its companion : LAdj is the companion of the tag CN and there is a tag RAdj between them, which is not possible from a linguistic point of view. The second abstraction of section 12.3 has not this default and keeps exactly the successful taggings.
12.5 I MPLEMENTATION OF THE C OMPANIONSHIP P RINCIPLE WITH AUTOMATA In this section we show how to use the Companionship Principle for disambiguation. Actually, we propose two implementations based on the principle, an exact one and an approximate one. The latter is really fast and can be used as a first step before applying the first one.
12.5.1 AUTOMATON
TO REPRESENT SETS OF LEXICAL TAGGINGS
The number of lexical taggings grows exponentially with the length of sentences. To avoid that, we represent sets of lexical taggings as the sets of paths of some acyclic automata where transitions are labelled by elements of G . We call such an automaton a lexical taggings automaton (LTA). Generally speaking, such automata save a lot of space. For instance, given a sentence [w1 , . . . , wn ] the number of lexical taggings to consider at the beginning of the parsing process is Π1≤i≤n |(wi )|. This set of taggings can be efficiently represented as the set of paths of the automaton with n + 1 states s0 , . . . , sn and with a transition from si−1 to si with the label t for each t ∈ (wi ). This automaton has ∑1≤i≤n |(wi )| transitions. With the data of Section 12.2, for the sentence “la belle ferme la porte”,
S UPERTAGGING WITH C ONSTRAINTS
277
we have the following initial automaton5 :
12.5.2 I MPLEMENTATION
OF THE
C OMPANIONSHIP P RINCIPLE
In this section, we describe an implementation of the (basic) Companionship Principle, thus denoted BC. Suppose we have an LTA A for a sentence [w1 , . . . , wn ]. For each transition t and for each atomic constraint in (L , R) ∈ C (t), we construct an automatonAt,L ,R in the following way. Each state s of At,L ,R is labelled with a triple composed of a state of the automaton A and two booleans. The intended meaning of the first boolean is to say that each path reaching this state does not fulfil the constraint. In other words, the path passes through some transition t, which does not follow a transition in L and which is not followed by a transition in R. The second boolean means that each path reaching this state contains a transition in L . The initial state is labelled (s0 , F, F) where s0 is the initial state of A u and other states are labelled as follows: if s −→ s in A then, in At,L ,R , we have: u
1. (s, b, F) −→ (s , T, F) if u = t u
2. (s, b, T) −→ (s , F, T) if u = t u
3. (s, b, b ) −→ (s , F, b ) if u ∈ R u
4. (s, b, b ) −→ (s , b, T) if u ∈ L u
5. (s, b, b ) −→ (s , b, b ) in other cases. 5
To improve readability, only the categories are given on the edges, while the French words can be inferred from the position in the automaton.
278
C HAPTER T WELVE
where b ∈ {T, F}. It is then routine to show that, for each state labelled (s, b1 , b2 ): • b1 is F iff for all paths p reaching this state: – either there is no transition t in the path – or there is some u ∈ L before the first transition t of the path – or there is some u ∈ R after the last transition t of the path • b2 is T iff all paths from the initial state to s contain a transition u ∈ L ; In conclusion, if s f a final state of A , a path ending with (s f , T, b) contains t but no transition able to fulfil the constraint; it can be safely removed. On the opposite, paths ending with (s f , F, b) are either paths without t transition or with a t transition which can find a companion on the same path; both must remain in the parsing process. Using two booleans, one may observe that the size of these automata is bounded by 4n where n is the size of A , that is the size of the companionship automata is linear w.r.t. the initial automata. In Figure 12.2, we give the automaton A for the last constraint of Table 12.4 (page 261) which states that an intransitive verb is waiting for a determiner on its left. The dotted part of the graph in Figure 12.2 corresponds to the part of the automaton that can be safely removed. After minimisation, we finally obtain the following automaton:
Figure 12.2: Automaton A for the last constraint of Table 12.4 (page 261)
S UPERTAGGING WITH C ONSTRAINTS 279
280
C HAPTER T WELVE
This automaton contains 516 paths (132 lexical taggings are removed by this constraint). For each transition t of the lexical taggings automaton and for each constraint (L , R) ∈ C (t), we construct the atomic constraint automaton At,L ,R . The intersection of these automata represents all the possible lexical taggings of the sentence which respect the Companionship Principle. That is, we output:
ACP =
At,L ,R
1≤i≤n, t∈A ;(L ,R)∈C (t)
It can be shown that the automaton is the same as the one obtained by intersection with the regular language defined in 12.4.2: ACP = A ∩ P In our example, the intersection of the 14 automata built for the atomic constraints given in Tables 12.3 and 12.4 gives the 18 path automaton of Figure 12.1.
12.5.3 A PPROXIMATION :
THE
ROUGH C OMPANIONSHIP
P RINCIPLE (RCP) The issue with the previous algorithm is that it involves a large number of automata (actually O(n)) where n is the size of the input sentence. Each of these automata has size O(n). The theoretical complexity of the intersection is then O(nn ). Sometimes, we face the exponential. So, let us provide an algorithm which approximates the Principle without augmenting the size of automata. The idea is to consider at the same time all the paths that contain some transition. We consider a LTA A . We write ≺A the precedence relation on transitions in an automaton A . We define lA (t) = {u ∈ G , u ≺A t} and rA (t) = {u ∈ G ,t ≺A u}. t For each transition s −→ s and each constraint (L , R) ∈ C (t), if lA (t) ∩ L = 0/ and rA (t) ∩ R = 0, / then none of the lexical taggings which use the transition t has a solution and the transition t itself can be safely removed from the automaton. This can be computed by a double-for loop: for each atomic constraint of each transition, verify that either the left context or the right context of the
S UPERTAGGING WITH C ONSTRAINTS
281
transition contains some structure to solve the constraint. Observe that the cost of this algorithm is O(n2 ), where n is the size of the input automaton. Note that one must iterate this algorithm until a fixpoint is reached. Indeed, removing a transition which serves as a potential companion breaks the verification. Nevertheless, since for each step before the fixpoint is reached, we remove at least one transition, we iterate the double-for at most O(n) times. The complexity of the whole algorithm is then O(n3 ). In practice, we have observed that the complexity is closer to O(n2 ): only 2 or 3 loops are enough to reach the fixpoint. Our small example is too trivial and the RCP does not filter any path on it. Anyway, on more realistic examples, it is useful to speed up the filtering process in IG parsing.
12.5.4 A FFINE
AND
L INEAR C OMPANIONSHIP P RINCIPLE (ACP)
We propose here streaming algorithms, i.e. a linear time, logarithmic space, one pass algorithms, to check whether a supertagging verifies the Affine Companionship Principle or the Linear Companionship Principle. To each word in {X, L, R,U}∗ , we associate a vector in N4 that can be computed incrementally, with constant cost at each step, using the function f defined below. The four elements of each vector correspond to: • the number of pending X (as the word is read from left to right, the only way to saturate such an X is to find an R on its right); • the number of pairings LX assembled up to here; • the number of L still available; • the number of U from the beginning. We define below 4 functions fL , fU , fX and fR from N4 to N4 which describe how the vector is computed (function f ) step by step from left to
282
C HAPTER T WELVE
right. fL ( j, k, l, m) = ( j, k, l + 1, m)
(12.1)
fU ( j, k, l, m) = ( j, k, l, m + 1) fX ( j, k, l + 1, m) = ( j, k + 1, l, m)
(12.2) (12.3)
fX ( j, k, 0, m) = ( j + 1, k, 0, m) fR ( j + 1, k, l, m) = ( j, k, l, m)
(12.4) (12.5)
fR (0, k + 1, l, m) = (0, k, l + 1, m) fR (0, 0, l, m) = (0, 0, l, m)
(12.6) (12.7)
f (w) = f (w1 . . . wk ) = fwk ( fwk−1 (. . . ( fw1 (0, 0, 0, 0)) . . .)) At the end, we only have to check that there are enough occurrences of the U letter (the last component) to saturate the pending X (the first component). Formally, let w ∈ {X, L, R,U}∗ , then w ∈ AAFF iff f (w) = (i, j, l, m) with m ≥ i.6 For instance, with w = LXRX ∈ AAFF , the algorithm produce: f (L) = (0, 0, 1, 0) using Eq. 12.1 f (LX) f (LXR)
= =
(0, 1, 0, 0) using Eq. 12.3 (0, 0, 1, 0) using Eq. 12.6
f (LXRX)
=
(0, 1, 0, 0) using Eq. 12.3
The key point of the algorithm is illustrated in the example above: when an X appears in the sequence it can be linked to an L if there is one available (second step, Eq 12.3) but this linking can be modified latter on: when (i) an R is crossed, (ii) there is no pending X and (iii) there is a linked pair (L,X) in the left context; the R and the X are linked, the L is then available again (third step, Eq 12.6). For the linear case, the algorithm is the same but we have to check that there are no pending (or unused) companion L, R or U. A pending L corresponds to the fact that the third component is not 0 at the end; a pending U corresponds to the fact that the last component is strictly greater than the first one; and, a pending R corresponds to the usage of the last equation (Eq. 12.7) in the computation of the vector. Hence, we define f as the partial function from {X, L, R,U}∗ to N4 define with the equations above except the last one. Then let w ∈ {X, L, R,U}∗ , then w ∈ ALIN iff f (w) = (i, j, 0, i). In the affine companionship principle implementation, starting for an automaton A, we consider a new automaton AAFF : 6
A complete proof the this fact can be found in (Morey, 2011).
S UPERTAGGING WITH C ONSTRAINTS
283
• states are couple of a state of A and a vector of N4 ; • initial state is s0 , (0, 0, 0, 0) where s0 is the initial state of A; w
• for each transition s − → s in A with w ∈ {X, L, R,U}, a transition w (s, v) − → (s , v ) is added in AAFF with v = fw (v); • state s f , (i, j, l, m) are final iff s f is final in A and m ≥ i. The linear companionship principle is implemented in the same way. But, as the function f is partial, some states of the new automaton can be removed even if there are not final.
12.5.5 I MPLEMENTATION
OF
L EXICALISED
GRAMMARS
In implementations of large coverage linguistic resources, it is very common to have, first, the description of the set of “generic” structures needed to describe the mode led natural language and an anchoring mechanism which compiles the vocabulary to a lexicon. We call unanchored grammar the set U of generic structures (not yet related to words) that are needed to describe the grammar. In this context, the lexicon splits in two parts: • a selection function from V to subsets of U , • an anchoring function α : V × U → S which builds a concrete syntactic structure from a word w ∈ V and an element in U . In the following, we suppose that U , and α are given. In this context, we define the grammar as the codomain of the anchoring function: G=
α (w, u)
w∈V ,u∈(w)
To give intuition, for TAGs, the generic structures are trees without anchor (instead, the symbol is often used to mark the place where the anchor should be). The function α replace the by the actual word. In some way, the generic structures can be seen as an abstraction of the concrete ones. Thus the idea to apply the Companionship Principle at the unanchored level; doing this, the Principle applies independently from the lexicon.
284
C HAPTER T WELVE
12.6 A PPLICATION TO I NTERACTION G RAMMARS In this section, we apply the Companionship Principle to the Interaction Grammars formalism. We first give a short and simplified description of IG and an example to illustrate them at work; we refer the reader to (Guillaume and Perrier, 2009) for a complete and detailed presentation.
12.6.1 I NTERACTION G RAMMARS We illustrate key features of Interaction Grammars on the example sentence below (2). In this sentence, “la” is an object clitic pronoun which is placed before the verb whereas the canonical place for the (non-clitic) object is on the right of the verb. Example 2. “Jean la demande.” [John asks for it] In IGs, the set F of final structures, used as outputs of the parsing process, contains ordered trees called parse trees (PT). An example of a PT for the sentence (2) is given in Figure 12.3. Each node of a PT describes a constituent; morpho-syntactic properties of the constituent are described in a feature structure (in the figure, feature structures are not shown, only the category is given). Leaves of a PT are either word of the language or the empty word (ε ). The left-right order of the tree non-empty leaves follows the left-right order of words in the input sentence. Empty word are used to mark traces in case of extraction or to replace elided constituent. As IGs follow the Model-Theoretic Syntax (MTS) framework, a PT is defined as the model of a set of constraints. Constraints are defined at the word level: syntactic structures are polarised tree description (PTD). A PTD is a set of nodes provided with relations between these nodes. Figure 12.4 shows three syntactic structure, the ones used to produce the PT of Figure 12.3 for Sentence (2). The relations used in the PTDs are: immediate dominance (lines) and immediate sisterhood (arrows). Nodes represent syntactic constituents and relations express structural dependencies between these constituents. Moreover, nodes carry a polarity: the set of polarities is {+, −, =, ∼}. A + (resp.−) polarity represents an available (resp. needed) resource, a ∼ polarity describes a node which is unsaturated. Each + must be associated to exactly one − (and vice versa) and each ∼ must be associated to at least another polarity different from ∼.
S UPERTAGGING WITH C ONSTRAINTS
285
Figure 12.3: The PT of Sentence (2)
Now, we define a PT to be a model of a set of PTDs if there is a surjective function I from nodes of the PTDs to nodes of the PT such that: • relations in the PTDs are realised in the PT: if M is a daughter (resp. immediate sister) of N in some PTD then I (M) is a daughter (resp. immediate sister) of I (N); • the feature structure of a node N in the PT is the unification of the feature structures of the nodes in I −1 (N). • each node N in the PT is saturated: with the associative and commutative rule given in Figure 12.5, the composition of the polarities of the set of nodes I −1 (N) is the = polarity; One of the strong points of IG is the flexibility given by the MTS approach: PTDs can be partially superposed to produce the final tree (whereas superposition is limited in standard CG or in TAG for instance). In our example, the four grey nodes in the PTD which contains “la” are superposed to the four grey nodes in the PTD which contains “demande” to produce the four grey nodes in the model. In order to give an idea of the full IG system, we briefly give here the main differences between our presentation and the full system.
286
C HAPTER T WELVE
Figure 12.4: PTDs for Sentence (2)
∼ − + =
∼ ∼ − + =
− − =
+ + =
= =
Figure 12.5: Polarity composition
• Dominance relations can be underspecified: for instance a PTD can impose a node to be an ancestor of another one without constraining the length of the path in the model. This is mainly used to model unbounded extraction. • Sisterhood relations can also be underspecified: when the order on subconstituents is not total, it can be mode led without using several PTDs. • Polarities are attached to features rather than nodes: it sometimes gives more freedom to the grammar writer when the same constituent plays different roles in some linguistic construction. • Feature values can be shared between several nodes: once again, this is a way to factorise the unanchored grammar.
S UPERTAGGING WITH C ONSTRAINTS
287
The application of the Companionship Principle is described on the reduced IG but it can be straightforwardly extended to full IG with unessential technical details. Following the notation given in 12.5.5, an IG is made of: • A finite set V of words; • A finite set U of unanchored PTDs (without any word attached to them); • A lexicon function from V to subsets of U . The anchored PTD α (w, u) for a word w and an unanchored PTD u is u where one node called the anchor is labelled by w. In practice, the anchoring process helps to refine some features. For instance, in the French IG grammar, the same unanchored PTD can be used for masculine or feminine common nouns and the gender is specified during the anchoring to produce distinct anchored PTDs. We recall that: G=
α (w, u)
w∈V ,u∈(w)
The parsing solutions of a lexical tagging [(S1 , w1 ), . . . , (Sn , wn )] where Si ∈ α (wi , ui ) for some ui ∈ (wi ) is the set of PTs that are models of the sequence of PTDs described by the lexical tagging: p(S1 , . . . , Sn ) = {m ∈ F | m is a model of S1 , . . . , Sn } With the definitions of this section, an IG is a special case of Lexicalised grammar as defined in section 12.3.
12.6.2 C OMPANIONSHIP P RINCIPLE
FOR
IG
In order to apply the Companionship Principle, we have to explain how the generalised atomic constraints are built for a given grammar. One way is to look at dependency structures but in IG polarities are built in, dependency information are read directly on polarities. A requirement to build a model is the saturation of all the polarities. Each time a PTD contains an unsaturated polarity +, − or ∼, we have to find some other compatible dual polarity somewhere else in the grammar to saturate it. From the general MTS definition of IG above, we can define a step by step process to build models of a lexical tagging. The idea is to build incrementally the interpretation function I with an atomic operation of node
288
C HAPTER T WELVE
merging. In this atomic operation, we choose two nodes with compatible features and we make the hypothesis that they have the same image through I and hence that they can be identified. The atomic operation of node merging can be used on the structures of the unanchored grammar to build constraints. Suppose that the unanchored PTD u contains some unsaturated polarity p in a node N. We build the set L p of left companions of the polarity p, i.e. PTDs that can be used on the left of u to saturate p. u ∈ L p iff there is some node N in u such that: • the two nodes N and N can be merged; • N contains a polarity p which saturates p; • the anchor of u is on the left of the anchor of u in the tree description obtained by node merging. For the first point, we have to check that the two nodes have compatible features but we can go further. Any model which merges N and N must also merges the mother node of N and the mother node of N (if any); the same remark holds also for immediate sister of the two nodes. If some of these new merging fails, one can conclude that u is not a companion of u. Of course, we can defined in a similar way the set R p of right companion of a polarity. With the definition above, for each unsaturated polarity p in a PTD u, the pair (L p , R p ) is a generalised atomic constraint in C (u). From Figure 12.5, we can observe that, when building a model, several polarities ∼ can be saturated by the same polarity. This is the case for instance of adjectives (with a node ∼ N) that are waiting for a noun (with a node = N), but the same noun can be used for several adjectives. In this case, the polarity p does not consume its companion and the Boolean Companionship Principle can be used with the constraint (L p , R p ). For the case where p is one of the two unsaturated polarities + and −, the situation is different: the saturation of the polarity p does consume its companion; so the Affine Companionship Principle can be applied. Note that it can not be excluded that some polarity p which is able to saturate p is not used (it will be saturated with some other polarity p ) and so the Linear Companionship Principle does not apply. We report here the experiment that were conducted on the French grammar built for IG (named Frigram). The experiments are performed on sentences taken from the newspaper Le Monde. Figure 12.6 give, for a given sentence length, the number of lexical taggings left after the application of several combination of filters. As said earlier, the RCP filter is efficient, it is then used systematically before another filter is used. The y-values are are
S UPERTAGGING WITH C ONSTRAINTS
289
!" #
Figure 12.6: Filtering in Interaction Grammars
the number of lexical taggings left after the filtering. The first curve (with the higher values) corresponds to the input of the parsing process, i.e. the output of the lexicon and the grammar. The 5 other curves (from top to bottom) represent the successive output of the filtering processes (each filter is applied on the result of the previous one); the filters we consider are: • the RCP filter; • the deterministic polarity filter described in (Bonfante et al., 2004); • again the RCP filter and the Boolean Companionship Principle applied only on the ∼ polarities of the grammar; • again the RCP filter and the complete polarity filter (including nondeterministic filters) described in (Bonfante et al., 2004); • again the RCP filter and the affine companionship principle. All the curves have a linear shape. Observe that the slope of each line corresponds actually to the mean ambiguity, i.e. the mean number of syntactic structures that are associated with a word. In the experiment above, this ambiguity is 7 before the filtering and, after the complete set of filters, the remaining ambiguity amounts to 1.6.
290
C HAPTER T WELVE
12.7 A PPLICATION TO L EXICALISED T REE A DJOINING G RAMMARS (LTAG) Tree Adjoining Grammar (Joshi, 1987) is a grammatical formalism in which syntactic structures are ordered trees built on a finite set of grammatical categories.7 Syntactic structures may contain special kind of leaf, called substitution node and written A↓ , which correspond to an unsaturated part of the tree where something is missing. Among the structures, some trees (called auxiliary trees) have a special leaf (called foot node, written A∗ ) which shares that same category A than the root. Trees of the grammar which are not auxiliary are called initial trees. As for the previous cases of AB-grammar and of IG, we suppose that our grammars are Lexicalised; this means that each elementary tree has one leaf (called the anchor) that contains a word w of the language. We use the notation anc(t) to refers to the word w which is the anchor of a tree t. Starting from a sequence of syntactic structures, parsing is done with two operations that are applied recursively to the syntactic structures and to the structures built during previous steps (called derived trees). Substitution consists in grafting two trees, one with a root A onto a substitution node A↓ of some other tree. Adjunction consists in inserting a auxiliary tree inside another tree; more precisely an auxiliary tree (with root A and foot A∗ ) replaces an internal node A of a some other (either elementary or derived). Following Definition 9, we can consider a TAG grammar as a Lexicalised grammar G = (V , S , G , F , p) where: • S is the set of initial and auxiliary trees, • V is the finite set of words {anc(t) | t ∈ S }, • the lexicon G is the set {(t, w) ∈ S × V | w = anc(t)}, • F is the set of derived trees with root S and without substitution nodes, 7
In real TAG grammars, features structures are added to nodes but here, we will consider a simplified version where nodes contain only a syntactic category.
S UPERTAGGING WITH C ONSTRAINTS
291
• the function p is a partial function define as follows: let f ∈ F and S1 , . . . , Sn ∈ S ∗ ; f ∈ p(S1 , . . . , Sn ) iff f can be obtained from the sequence S1 , . . . , Sn with the two operations of substitution and adjunction. The substitution operation is directly controlled by the substitution nodes: each substitution node is used exactly once in a substitution operation. At the same time, each root node of an initial tree (except the main tree rooted at S for the whole sentence) must also be used exactly once in a substitution operation. In our setting, the remark above means that it is possible to build some abstractions from a TAG grammar to the Linear Companionship Grammar. For each syntactic category A, consider the abstraction fA such that fA (t) = Ri X j Lk where j is 1 if the root of t is A and 0 else; i (resp. k) is the number of substitution node A↓ that are on the left (resp. on the right) of the anchor of the tree t. It follows from the definitions above that the function fA is an abstraction and so, that we can apply the linear companionship principle to TAG grammars. Of course, like in our introductory example, the abstraction can be done for each category which is implied in the substitution process; and the list of LTA obtained can be intersected to filter the set of lexical taggings. In large TAG grammars, there are another kind of node in elementary structures that are called co-anchor; with co-anchors the same elementary structures can introduce more than one lexical unit at the same time. It is used for example for the fixed preposition linked to a particle verb (“off” in “take off”,“up” in “take up”) or for other lexical items that appear in frozen expressions (“drink” in “take a drink”, “rest” in “take a rest” . . . ). Fortunately, it is easy to encode a TAG grammar with co-anchors into an equivalent TAG grammar without co-anchors using the substitution mechanism: for instance, in a tree t with a co-anchor “up”, the co-anchor node is replaced by a substitution node UP↓ and a new initial tree with two nodes: one root with category UP and one leaf with “up” which is now an anchor and not a co-anchor. Using the transformation above, co-anchors are completely handled by the affine companionship principle and only lexical taggings that are well balanced in terms of co-anchors are conserved by the filtering process. At a first sight, one may imagine that the adjunction could also be used in some abstraction to filter out taggings, following the idea that an auxiliary tree with foot A∗ and root A cannot be used if the context does not contains an A node where the adjunction can be performed. But, unfortunately, this idea filters out a very few number of lexical taggings and so we don’t use
292
C HAPTER T WELVE
them in the experiments. The experimental results given below are computed on sentences of the PennTreeBank corpus. The TAG grammar we used (Alahverdzhieva, 2008; Narayan and Gardent, 2012) contains a set of unanchored trees organised into families and a lexicon mapping lemmas to families of the grammar. To avoid problem with tokenisation or lemmatisation, we start our filter process from the lemma level of the PTB. For instance, consider the sentence “There’s no need for such concessions.” we feed the filtering process with the lemma sequence {“there”, “be”, “no”, “need”, “for”, “such”, “concession”, “.”}. The table below gives some numbers of lexical taggings with √ different applications of the filtering steps. Mean ambiguity is n lt where lt is number of lexical tagging and n the length of the sentence (n = 8 in the example). The first line correspond to the initial state: each lemma is associated to each TAG tree of the grammar given by the lexicon. The line [CO] gives the results after the co-anchor filtering. The next 3 lines ([CO+A], [CO+NP], [CO+S]) are 3 examples of the 10 independent filtering starting from the [CO] sets with abstractions relative to A, NP and S. The last line corresponds to the intersection of the 10 lexical tagging sets, the 3 previous ones and the 7 relative to other categories: ADV , AP, DET , N, PP, PREP and PUNCT .
Initial CO CO+A CO+NP CO+S Final
Lexical taggings
Mean word ambiguity
1.4954 × 1010
18.70 18.33 18.13 11.33 13.73 7.04
1.27549 × 1010 1.16881 × 1010 2.71802 × 108 1.26465 × 109 6.01914 × 106
Figure 12.7 gives the number of lexical taggings which is observed on sentences of the PennTreeBank for each sentence length. The first curve (plain line) corresponds to the input of the filtering; the second one (dashed line) corresponds to the lexical taggings that are left after the application of the first filter that removes the path were the co-anchors principle is not verified. The last curve (dotted line) corresponds to the output of the filtering process using the linear companionship principle to filter out lexical taggings for which substitution nodes and roots of initial trees are not wellbalanced.
S UPERTAGGING WITH C ONSTRAINTS
293
!
Figure 12.7: Filtering in Tree Adjoining Grammars
12.8 C ONCLUSION We have presented a lexical disambiguation method based on constraints which allows to filter out many wrong lexical taggings before entering the deep parsing. These filtering methods were originally introduced for lexical disambiguation in Interaction Grammars, but we have shown that it can also be applied on other grammatical formalism such as Tree Adjoining Grammars. In fact, the main requirement is the lexicalisation of the grammar, we have then defined a general notion of Lexicalised grammar. Based on this, we have also defined a notion of constraints that can be used in different settings. Of course, the way these constraints can be expressed strongly depends on the formalism on which the method is applied. But, once the constraints are computed, the disambiguation remains independent of the underlying formalism. As this method relies on the computation of static constraints on the linguistic data and not on a statistical model, we can be sure that we will never remove any correct lexical tagging. Moreover, we manage to apply our methods to an interesting set of data and prove that it is efficient for large coverage grammars and not only for some toy grammars. With the notion of grammar abstraction, it is possible to express the lexical disambiguation process in some grammar as a parsing process in a more abstract grammar. This way of expressing lexical disambiguation is very
294
C HAPTER T WELVE
powerful but there is a trade-off to handle. If the abstraction is strong (i.e. only a few information of the input structure is kept), then the filtering is computationally efficient but it does not run out a large number of lexical taggings. In the opposite way, finer abstraction (like the one used for IG which takes into account constraints computed on the whole grammar) is able to filter out many wrong taggings but it is computationally more expensive. Based, on the results presented here, we can imagine several way to push further the usage of the companionship principle. First, we have seen that, for IG, our principle cannot be computed on the whole grammar and that in its implementation we consider unanchored structures. We would like to explore the possibility of computing finer constraints (relative to the full grammar) on the fly for each sentence. We believe that this can eliminate some more taggings before entering the deep parsing. Another challenging task we would like to investigate is to use the Companionship Principle not only as a disambiguation method but as a guide for the deep parsing. Actually, in the IG case, we have observed for at least 20% of the words that dependencies are completely determined by the filtering methods. If deep parsing can be adapted to use this observation, this can be of great help.
B IBLIOGRAPHY Alahverdzhieva, K. (2008). XTAG using XMG. A Core Tree-Adjoining Grammar for English. Master’s thesis, Universität des Saarlandes, Germany and Université Nancy 2, France. Bangalore, S. and Joshi, A. K. (1999). Supertagging: an approach to almost parsing. Computational Linguistics, 25(2), 237–265. Bar-Hillel, Y. (1953). A quasi-arithmetical notation for syntactic description. Language, 29, 47–58. Bohnet, B. and Nivre, J. (2012). A transition-based system for joint part-ofspeech tagging and labeled non-projective dependency parsing. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing (EMNLP 2012) and Computational Natural Language Learning (CoNLL 2012), pages 1455–1465. Bonfante, G., Guillaume, B., and Perrier, G. (2004). Polarization and abstraction of grammatical formalisms as methods for lexical disambigua-
S UPERTAGGING WITH C ONSTRAINTS
295
tion. In Proceedings of the 20th international conference on Computational Linguistics (Coling 04), pages 303–309, Geneva, Switzerland. Boullier, P. (2003). Supertagging : A non-statistical parsing-based approach. In Proceedings of the 8th International Workshop on Parsing Technologies (IWPT 2003), pages 55–65, Nancy, France. Brill, E. (1995). Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics, 21, 543–565. Clark, S. and Curran, J. R. (2004). The importance of supertagging for wide-coverage CCG parsing. In Proceedings of the 20th international conference on Computational Linguistics (Coling 04), pages 282–288, Morristown, NJ, USA. Association for Computational Linguistics. Clark, S., Hockenmaier, J., and Steedman, M. (2002). Building Deep Dependency Structures with a Wide-Coverage CCG Parser. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL 02), pages 327–334, Philadephia, PA. de Groote, P. (2001). Towards abstract categorial grammars. In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics (ACL 01), pages 252–259. Debusmann, R., Duchier, D., and Kruijff, G.-J. M. (2004). Extensible dependency grammar: A new methodology. In Proceedings of the Workshop on Recent Advances in Dependency Grammar (Coling 2004), Geneva. Gardent, C. and Kow, E. (2005). Generating and selecting grammatical paraphrases. Proceedings of the ENLG. Guillaume, B. and Perrier, G. (2009). Interaction Grammars. Research on Language and Computation, 7(2-4), 171–208. Harris, Z. S. (1962). String analysis of sentence structure, volume no. 1. Mouton, The Hague. Joshi, A. (1987). An Introduction to Tree Adjoining Grammars. Mathematics of Language. Joshi, A. and Rambow, O. (2003). A Formalism for Dependency Grammar Based on Tree Adjoining Grammar. In Proceedings of the Conference on Meaning-Text Theory (MTT 2003).
296
C HAPTER T WELVE
Jurafsky, D. and Martin, J. H. (2009). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice Hall series in artificial intelligence. Prentice Hall, Pearson Education International, Englewood Cliffs, NJ, 2. ed., [pearson international edition] edition. Kahane, S. (2006). Polarized unification grammar. In Proceedings of the 21st International Conference on Computational Linguistics (Coling 06) and 44th Annual Meeting of the Association for Computational Linguistics (ACL 06), pages 137–144, Sydney. Karlsson, F., Voutilainen, A., Heikkila, J., and Anttila, A. (1995). Constraint Grammar, A Language-independent System for Parsing Unrestricted Text. Mouton de Gruyter. Koller, A. and Kuhlmann, M. (2009). Dependency trees and the strong generative capacity of CCG. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2009), Athens, Greece. Kupiec, J. (1992). Robust Part-of-Speech Tagging Using a Hidden Markov Model. Computer Speech and Language, 6(3), 225–242. Lamarche, F. (2008). Proof Nets for Intuitionistic Linear Logic: Essential Nets. Technical report, INRIA. Lambek, J. (1958). The mathematics of sentence structure. American mathematical monthly, pages 154–170. Li, Z., Zhang, M., Che, W., Liu, T., Chen, W., and Li, H. (2011). Joint models for chinese pos tagging and dependency parsing. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP 2010), pages 1180–1191. Association for Computational Linguistics. Marchand, J., Guillaume, B., and Perrier, G. (2009). Analyse en dépendances à l’aide des grammaires d’interaction. In Proceedings of Traitement Automatique de la Langue Naturelle (TALN 09), Senlis, France. Merialdo, B. (1994). Tagging English Text with a Probabilistic Model. Computational linguistics, 20, 155–157. Moortgat, M. and Morrill, G. (1991). Heads and phrases. Type calculus for dependency and constituent structure. In Journal of Language, Logic and Information.
S UPERTAGGING WITH C ONSTRAINTS
297
Morey, M. (2011). Étiquetage grammatical symbolique et interface syntaxesémantique des formalismes grammaticaux lexicalisés polarisés. Ph.D. thesis, Université de Lorraine. Narayan, S. and Gardent, C. (2012). Structure-driven lexicalist generation. In Proceedings of the 24th International Conference in Computational Linguistics (Coling 2012) - Technical Papers, pages 2027–2042, Mumbai, India. Ninomiya, T., Matsuzaki, T., Tsuruoka, Y., Miyao, Y., and Tsujii, J. (2006). Extremely lexicalized models for accurate and fast HPSG parsing. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), pages 155–163, Sydney, Australia. Association for Computational Linguistics. Rush, A. M., Sontag, D., Collins, M., and Jaakkola, T. (2010). On dual decomposition and linear programming relaxations for natural language processing. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP 2010), pages 1–11. Association for Computational Linguistics. Steedman, M. (2000). The Syntactic Process. MIT Press. Tesnière, L. (1959). Éléments de syntaxe structurale. Klinksieck. van Halteren, editor (1999). Syntactic Wordclass Tagging. Springer.
C ONTRIBUTORS
Philippe Blache is Director of Research at CNRS (French National Institute for Scientific Research). He works in the Laboratoire Parole et Langage, one of the most important labs in France working on interdisciplinary studies of speech and language production and perception. The LPL members are linguists, psycholinguists, computer scientists, neuroscientists and doctors. Since 2012, he is Director of the cluster of excellence Brain and Language Research Institute. His scientific interests center on natural language processing and formal linguistics. He is working on the development of a new constraintbased theory, Property Grammars, making it possible to represent all kind of information by means of constraints. (email: [email protected]) Guillaume Bonfante is Associate Professor at the Ecole des Mines de Nancy (France). His main research topic is computational models and complexity theory. He sees Natural Language Processing as a main domain of application of such techniques. (email: [email protected]) Annelies Braffort is Senior Researcher at CNRS (the French National Center for Scientific Research). She leads the Sign Language Processing team at LIMSI-CNRS laboratory, located in the Paris-Saclay campus (France). Her main research topics on French Sign Language concern corpus analysis, linguistic modelling and automatic processing. Current efforts in the team are made towards LSF resources and automatic generation via the animation of signing avatars. (email: [email protected])
300
C ONSTRAINTS AND L ANGUAGE
Henning Christiansen is Professor of Computer Science at Roskilde University. He was born in Aarhus, Denmark and has a master degree from Aarhus University 1981 and PhD from Roskilde 1988. His main interests include programming techniques, artificial intelligence, logic and constraint programming, abduction and language analysis. He started recently also to work with interactive installations for art presentation and interactive theatre. (email: [email protected]) Benoît Crabbé is Associate Professor at Paris-Diderot University (Paris 7) in the Linguistic department. He has worked at LORIA in Nancy until 2005 and at Edinburgh University in 2006. He is now a member of the Alpage team at INRIA Rocquencourt. His research activities mainly focus on modelling syntax for natural languages. He is particularly interested in the syntax of French. (email: [email protected]) Verónica Dahl is an Argentine/Canadian Computer Scientist, who is recognised as one of the 15 founders of the field of logic programming for her pioneering contributions, particularly in human language processing and deductive knowledge bases. Her book on Logic Grammars was used throughout the world to help discover the human genome. Her work on computational molecular biology had great practical impact in the areas of agriculture, forestry, marine sciences and entomology. She was awarded the Calouste Gulbenkian Award for Sciences and Technology 1994, the Marie Curie Chair of Excellence 20082011 from the European Commission, and was included in 2011 in CHR’s Hall of Fame. She is also an award-winning literary writer, and a musician and composer specialising in Latin American music. (email: [email protected]) Helen de Hoop is Professor of Theoretical Linguistics at Radboud University Nijmegen, The Netherlands. She co-authored several articles and two books on Optimality Theoretic semantics and bidirectional Optimality Theory. She is the principal investigator of the research group “Grammar and Cognition” at the Centre for Language Studies (CLS) of the Radboud University Nijmegen, a group which aims to arrive at a better understanding of the relation between form (syntax) and meaning (semantics) in context (pragmatics). (email: [email protected])
C ONTRIBUTORS
301
Denys Duchier is Professor of Computer Science at Université d’Orléans (France) since 2006. He received his PhD from Yale University (United States) in 1991, his thesis was on Logic Programming. He then moved to Canada, where he worked at the University of Ottawa and later at the Simon Fraser University in Vancouver. In 1996, he moved to Saarbrücken (Germany), where he worked on the design and implementation of the Oz programming language. His research interests focus on constraint programming, the application of constraints in computational linguistics, and the design and implementation of programming languages. (email: [email protected]) Michael Filhol ’s background in computer science is artificial intelligence and NLP. He has always had a strong interest for codes, languages and linguistics, and a particular taste for Sign Language. Since the start of his PhD in 2005, he has worked to propose linguistically informed models that formally capture the core properties of these gestural languages. As a CNRS Researcher at LIMSI today, he focuses on text-to-sign machine translation. (email: [email protected]) Kilian A. Foth studied Computer Science and Linguistics at Hamburg University and obtained his Ph.D. in Computational Linguistics in 2006. He has worked as a research assistant, organist and software developer. His main interests are natural language analysis and electronic group decision support systems. (email: [email protected]) Baohua Gu is an IT expert with Mathematics and Computer Science background. He holds a PhD degree in Computing Science. He is a fan of Java programming and web technologies. His interests include data mining, information extraction and natural language processing. He is now a software developer with Infinite Source Systems Corp. (email: [email protected]) Bruno Guillaume is Researcher at Inria Nancy Grand-Est (France) in the Sémagramme team. He studies symbolic methods for processing syntax and semantics of Natural Languages. He also develops tools and linguistics resources in the same area. (email: [email protected])
302
C ONSTRAINTS AND L ANGUAGE
Bo Li was born in Henan, China. He has lived in Denmark from 2004 to 2013, during the time, he received his bachelor and master degree in Computer Science & Communication from Roskilde University. He moved back to Zhengzhou - the capital city of Henan in 2013, and now he is the founder and managing director of Creval – a newly founded Zhengzhou-based IT company. (email: [email protected]) Patrick McCrae is the founder and managing director of LangTec, a technology start-up for semantic text mining based in Hamburg. Prior to LangTec, Patrick was a Senior Consultant for web-based systems with IBM. After years in consulting he returned to academia in 2006 to pursue a research project on semantic parsing at the University of Hamburg, where he obtained his PhD in Informatics in 2010. His primary research interests are automated text understanding, ontology integration in natural language processing and automated textual reasoning. (email: [email protected]) Wolfgang Menzel is Professor of Informatics at Hamburg University. His research interests cover architectures and methods for natural language processing with a special emphasis on robustness, multi-modal information fusion and constraint-based diagnosis. After obtaining his doctoral degree in electrical engineering from Technische Universität Dresden he worked on different projects in computational linguistics, including text-to-speech synthesis, parsing, machine translation, spoken language processing and intelligent tutoring systems. (email: [email protected]) J. Emilio Miralles worked on simulating biological systems during his Bachelor of Sciences in Biophysics at Simon Fraser University in Vancouver, B.C. His interests eventually led him to pursue the areas of bioinformatics, language processing, and constraint programming. (email: [email protected]) Mathieu Morey is a Computer Scientist working on syntactic and semantic parsing. He is currently a postdoctoral fellow at Aix-Marseille Université (France). He graduated from Université de Lorraine (France) in 2011 and spent 10 months at Nanyang Technological University (Singapore) as an Erasmus Mundus MULTI postdoctoral fellow. (email: [email protected])
C ONTRIBUTORS
303
Yannick Parmentier is Associate Professor at Université d’Orléans (France) since 2009. He holds a PhD in Computer Science, which he obtained from the Université Henri Poincaré, Nancy (France) in 2007. During his PhD, he worked on metagrammars and semantic calculus with Tree-Adjoining Grammar. In 2008, he worked as a post-doctoral research fellow at the University of Tübingen (Germany), on syntactic parsing with tree-based grammars. His research interests include formal grammar design and implementation, syntactic parsing, and semantic construction. (email: [email protected]) Guy Perrier is Professor at the Université de Lorraine (France). His research focuses on modeling the syntax and the semantics of natural languages. His main contribution is the introduction of the Interaction Grammar formalism and the development of FRIGRAM, a French interaction grammar with a large coverage. (email: [email protected]) Simon Petitjean is a Ph.D. candidate in Orléans (France) since 2010, under the supervision of Denys Duchier and co-supervision of Yannick Parmentier. His research interests include Natural Language Processing and Constraint Solving. He mainly contributed to grammar engineering technique, by being involved in the development of eXtensible MetaGrammar (XMG). The current development steps lead towards a modular framework, allowing a high level of flexibility in the description of the linguistic resources. (email: [email protected]) Jean-Philippe Prost is Senior Lecturer in Computer Science at Université Montpellier 2. He was granted his Ph.D. in Computational Linguistics in 2008 by Macquarie University (Sydney, Australia) and Université de Provence (Aix-en-Provence, France) under a cotutelle supervision. His interests include Model-Theoretic Syntax, syntactic parsing, the representation of graded grammaticality, and logico-stochastic approaches to language modelling. (email: [email protected]) Hedda R. Schmidtke is Assistant Teaching Professor in the dept. of Information and Communications Technology at Carnegie Mellon University in Rwanda. Her main research interest is on representations of context in distributed cognitive systems and cognitively motivated systems. She publishes in the areas of Cognitive Science, Artificial
304
C ONSTRAINTS AND L ANGUAGE
Intelligence, Ubiquitous/Pervasive Computing, Wireless Sensor Networks, and Geographic Information Systems. Hedda has a PhD. in Computer Science (Dr. rer. nat.) from the University of Hamburg. After her PhD. she worked as a research associate and later research professor at Gwangju Institute of Science and Technology in South Korea. Before joining CMU, she was research director of the TecO lab at Karlsruhe Institute of Technology in Germany. (email: [email protected]) Jørgen Villadsen is Associate Professor at the Department of Applied Mathematics and Computer Science of the Technical University of Denmark (DTU). His research is in logic, computational linguistics and artificial intelligence. He is area co-chair for logic and computation at the European Summer School in Logic, Language and Information (ESSLLI 2014). (email: [email protected])
I NDEX
A AB-Grammar . . . . . . . . . . . . . 257 abduction . . . . . . . . . . . . . 25, 154 abductive logic program . . . . . 26 abductive reasoning . . . . . . . . . 25 active detection (in German) 139 ambiguous segment . . . . . . . . 245 anaphoricity . . . . . . . . . . . . . . . . 66 ancillary constraint . . . . . . . . 139 articulatory constraints . . . . . 204 assumption . . . . . . . . . . . . . . . 154 atomic context formula . . . . . 224 Attribute Grammar . . . . . . . . . .22 Attribute Value (AV) . . . . . . . . 11 AZee . . . . . . . . . . . . . . . . . . . . . 204 AZOP . . . . . . . . . . . . . . . . . . . . 205
B binary constraint . . . . . . . . . . 133 binary relations . . . . . . . . . . . . 227 biomedical text . . . . . . . . . . . . 153
C categorial grammar . . . . . . . . 167 characterisation . . . . . . . . . . . . . 53
Chinese . . . . . . . . . . . . . . . . . . 237 Chinese Word Segmentation Problem . . . . . . . . . 237 CHR Grammar . . . . . . . . 30, 239 CIPS-SIGHAN repository . . 247 citation form . . . . . . . . . . . . . . 194 CKY parsing algorithm . . . . . 43 clitic ordering . . . . . . . . . . . . . 108 colour-based description language . . . . . . . . . 106 combinator . . . . . . . . . . . . . . . 167 companion . . . . . . . . . . . . . . . . 255 companionship constraints . .255 companionship principle . . . 257 affine companionship principle . . . . . . . . . 273 generalised companionship principle . . . . . . . . . 270 linear companionship principle . . . . . . . . . 275 rough companionship principle . . . . . . . . . 280 undirected companionship principle . . . . . . . . . 272
306
C ONSTRAINTS AND L ANGUAGE
completeness coefficient . . . . . 55 concept and relationship extraction . . . . . . . . 153 conflicting constraint . . . . . . . . 61 constraint arity . . . . . . . . 127, 133 constraint expressivity . . . . . 140 Constraint Handling Rules (CHR) . . . . . . . 23, 154 Constraint Optimisation Problem (COP) . . . . . . . . . . . . 43 constraint programming . . 17, 22 constraint satisfaction . . . . . . . 17 Constraint Satisfaction Problem (CSP) . . . . . . . . . 5, 126 constraint store . . . . . . . . . . . . . . 5 constraint-based formalisms 123 Constraint-Based Grammar (CBG) . . . . . . . . . . . . . 6 Construction Grammar . . . . . . 40 context . . . . . . . . . . . . . . . . . . . 220 context logic expressiveness 226 context-dependent specification 204 Context-Free Grammar (CFG) 94, 125 cross-framework grammar design . . . . . . . . . . . 109
D de-contextualisation . . . . . . . 219 de-levelled grammar . . . . . . . 202 declarativity in programming . .6 decorated ordered trees . . . . . . 52 deep syntactic description . . . 93 defeasible weighted constraint 146 Definite Clause Grammar (DCG) 3, 22, 28 deixis . . . . . . . . . . . . . . . . . . . . . . 65 dependency grammar . . . . . . . 10
depicting signs . . . . . . . . . . . . 195 description language . . . . . . . 100 diagrammatic reasoning . . . . 222 discourse analysis . . . . . . . . . . 28 double articulation . . . . . . . . . 193
E economy . . . . . . . . . . . . . . . 62, 81 empirical adequacy . . . . . . . . . . 6 expressivity . . . . . . . . . . . . . . . . . 6 eXtended Dependency Grammar (XDG) . . . . . . . . . . 133 eXtensible MetaGrammar (XMG) . . . . . . . . . . . 96 extraposition of relative clauses (in German) . . . . . . 134
F Feature Co-occurrence Restriction (FCR) . . 4, 10 feature structures . . . . . . . . . . . 10 features . . . . . . . . . . . . . . . . . . . 125 First Order Logic (FOL) . . . . 169 flexible parsing . . . . . . . . . . . . 150 formal expressivity . . . . . . . . . 94 formal grammar . . . . . . . . . . . . 93 forms . . . . . . . . . . . . . . . . . . . . .202 fully-lexical signs . . . . . . . . . .194 Function Unification Grammar (FUG) . . . . . . . . . . . . . 3 functions . . . . . . . . . . . . . . . . . 202
G Generalised Phrase-Structure Grammar (GPSG) . 10, 46 generate and test . . . . . . . . 5, 126 generative grammar . . . . . . . . . 37 Generative-Enumerative Syntax (GES) . . . . . . . . . . 7, 41
I NDEX
generator . . . . . . . . . . . . . . . . . 127 geometric reasoning . . . . . . . 221 geometric semantics . . . . . . . 221 gestures . . . . . . . . . . . . . . . . . . . .69 global phenomena . . . . . . . . . 135 Government and Binding theory 139 gradience . . . . . . . . . . . . . . . . . . 45 grammar abstraction . . . . . . . 293 grammaticality . . . . . . . . . . 42, 50 grammaticality judgement . . . 52 grammaticalness index . . . . . . 55
H has-operator . . . . . . . . . . . . . . 137 Head-driven Phrase-Structure Grammar (HPSG) . . 4, 133 heuristic search . . . . . . . . . . . . 128 higher order logic . . . . . . . . . . 170 highly iconic structures . . . . 195 Hyprolog . . . . . . . . . . . . . . . . . 154
I iconicity . . . . . . . . . . . . . . 62, 192 immediate dominance (ID) . . 15, 47 in-context reasoning . . . . . . . 219 inconsistency-tolerant . . . . . . 168 incremental optimisation . . . . 70 indexicals . . . . . . . . . . . . . . . . . 219 inhabitation . . . . . . . . . . . . . . . 172 integrity constraints . . . . . . . . . 26 Interaction Grammar (IG) . . 256 interpretation rule . . . . . . . . . 202 is-operator . . . . . . . . . . . . . . . . 137
K KAZOO . . . . . . . . . . . . . . . . . . 210
L labelled deductive system . . 221
307
less-resourced languages . . . 192 lexical rules . . . . . . . . . . . . . . . . 97 lexical tagging . . . . . . . . . . . . 258 lexical tagging automaton . . 261 Lexical-Functional Grammar (LFG) . . . . . . 111, 133 lexicalised grammar . . . . . . . 261 lexicon . . . . . . . . . . . . . . . . . . . . 50 linear precedence (LP) . . . 15, 47 linearity . . . . . . . . . . . . . . . . . . 198 linguistic construction . . . . . . 115 linguistic principle . . . . . . . . . 107 local constraint . . . . . . . . . . . . 133 locality . . . . . . . . . . . . . . . . . . . . . 7 logic grammar . . . . . . . . . . . . . . . 3
M manual parameters . . . . . . . . . 193 maximum ambiguous segments 245 maximum matching . . . . . . . . 243 metagrammar . . . . . . . . . . . . . . 96 metagrammar compilation . . 117 Metamorphosis Grammar . . . . . 3 model . . . . . . . . . . . . . . . . . . . . . 49 model theory . . . . . . . . . . . . . . . 37 Model-Theoretic Syntax (MTS) 7, 37, 39, 42 Montague grammar . . . . . . . . 168 multi-dimensional type theory 167 multilinearity . . . . . . . . . . . . . 196
N Nabla . . . . . . . . . . . . . . . . . . . . 173 namespace . . . . . . . . . . . . . . . . 107 natural logic . . . . . . . . . . . . . . 168 necessary and sufficient constraints . . . . . . . 204 NLP problem . . . . . . . . . . . . . 123 non-lexical signs . . . . . . . . . . 195
308
C ONSTRAINTS AND L ANGUAGE
O obligatory co-ocurrence . . . . . 44 Optimality Theory (OT) 13, 131 orientation constraint . . . . . . 206 OT semantics . . . . . . . . . . . . . . 62 over-constrained CSP . . . . . . 128
P paraconsistent . . . . . . . . . . . . . 168 parameterised classes . . . . . . 106 parsing strategies . . . . . . . . . . . 43 partial CSP . . . . . . . . . . . . . . . 130 partly-lexical signs . . . . . . . . 194 passive construction (in German) 143 passive detection (in German) 142 PATR II . . . . . . . . . . . . . . . . . . . . 97 perception . . . . . . . . . . . . . . . . 228 personal pronoun . . . . . . . . 62, 65 PG parsing . . . . . . . . . . . . . . . . . 48 PG-graph . . . . . . . . . . . . . . . . . . 49 phoneme . . . . . . . . . . . . . . . . . 193 phrase-structure grammar . . . . 15 placement constraint . . . . . . . 207 pointing . . . . . . . . . . . . . . . . . . . 68 polarity . . . . . . . . . . . . . . . . . . . 256 precision index . . . . . . . . . . . . . 55 production rule . . . . . . . . . . . . 202 Prolog . . . . . . . . . . . . . . . . . . . 3, 22 pronoun first person pronoun . . . . 64 second person pronoun . 64 third person pronoun . . . 64 Proof-Theoretic Syntax (PTS) 37 proof-theory . . . . . . . . . . . . . . . 37 propagation . . . . . . . . . . . . 24, 127 property constituency . . 15, 46, 115, 151
dependency . . . . . . . . . . 151 exclusion . 16, 46, 115, 151 linearity . . 15, 46, 115, 151 obligation . 15, 46, 115, 151 requirement . . . 16, 46, 115, 151 uniqueness 15, 46, 115, 151 Property Grammar (PG) . . 4, 46, 114, 126, 150 propositional attitudes . . . . . . 168 psycholinguistic responsibility 7
R radical non-autonomy . . . . . . . . 7 ranked constraints . . . . . . . . . 136 re-contextualisation . . . . . . . . 219 recursive ancillary constraints 141 reference . . . . . . . . . . . . . . . . . 220 reference frame . . . . . . . . . . . 219 restrictions . . . . . . . . . . . 124, 125 rewriting system . . . . . . . . . . . . 93 rotation . . . . . . . . . . . . . . . . . . . 231 rules . . . . . . . . . . . . . . . . . 124, 125
S saturation . . . . . . . . . . . . . . . . . 106 scaling . . . . . . . . . . . . . . . . . . . 231 score . . . . . . . . . . . . . . . . . . . . . 205 semantic properties . . . . . . . . 152 sequent calculus . . . . . . . . . . . 174 sign . . . . . . . . . . . . . . . . . . . . . . 193 Sign Language of the Netherlands (NGT) 68 Sign Languages . . . . . . . . 68, 191 Sign-Based Construction Grammar . . . . . . . . 129 sign-word equivalence . . . . . 197 signing space . . . . . . . . . . . . . .196 simpagation . . . . . . . . . . . . . . . . 24 simplification . . . . . . . . . . . . . . 24
I NDEX
solution candidates . . . . . . . . 133 space . . . . . . . . . . . . . . . . . . . . . 222 conceptual space . . . . . . 222 store . . . . . . . . . . . . . . . . . . . . . . . . 4 supertagging . . . . . . . . . . . . . . 254 supra-local constraint . . 133, 135 synchronisation . . . . . . . 199, 207
T ternary constraint . . . . . . . . . . 137 testing component . . . . . . . . . 125 tokeniser . . . . . . . . . . . . . . . . . .173 translation . . . . . . . . . . . . . . . . 231 tree template . . . . . . . . . . . . . . . 95 (Lexicalised) Tree-Adjoining Grammar (TAG) . . 95, 290 type interpretation . . . . . . . . . 171 type language . . . . . . . . . . . . . 171 type theory . . . . . . . . . . . . . . . . 167 typed feature structures . . . . 126
309
U unary constraint . . . . . . . . . . . 127 under-constrained CSP . . . . . 128 unification . . . . . . . . . . . . 3, 6, 102
V valency . . . . . . . . . . . . . . . . . . . 135 variable name global variable name . . 103 local variable name . . . . 104 virtual signer 191, 204, 210, 212 Vorfeld . . . . . . . . . . . . . . . . . . . 139
W Weighted Constraint Dependency Grammar (WCDG) . . . . . . . . 133 well-formedness conditions . 97, 125
X XTAG project . . . . . . . . . . . . . . 95