Symbols: An Evolutionary History from the Stone Age to the Future 9783031268083, 9783031268090, 3031268083

For millennia humans have used visible marks to communicate information. Modern examples of conventional graphical symbo

123 50 36MB

English Pages 248 [240] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Symbols: An Evolutionary History from the Stone Age to the Future
 9783031268083, 9783031268090, 3031268083

Table of contents :
Preface
Contents
1 Introduction
1.1 What's in a Symbol?
1.2 Syntax
1.3 What this Book Is About
2 Semiotics
2.1 Introduction
2.2 The Field of Semiotics
2.3 Iconicity
2.4 Syntax
2.5 Articulation
3 A Taxonomy of Non-linguistic Symbol Systems
3.1 Introduction
3.2 A Brief History of Non-linguistic Symbols
3.3 A Preliminary Taxonomy of Non-linguistic Symbol Systems
3.4 Examples of Systems
3.5 An in Depth Comparison of Two Non-linguistic Symbol Systems: Japanese kamon and European Heraldry
3.5.1 Kamon
3.5.2 British Heraldry
3.5.3 Structural Differences: Summary
3.6 Survey of a Variety of Nonlinguistic Symbol Systems
3.6.1 Vin7ca Symbols
3.6.2 Uruk Accounting
3.6.3 Kudurrus
3.6.4 Central Asian Tamgas
3.6.5 Pictish Symbols
3.6.6 European Heraldry
3.6.7 Kamon
3.6.8 Alchemical Symbols
3.6.9 Symbols of Guild
3.6.10 House Marks
3.6.11 Gaunerzinken and Hobo Signs
3.6.12 Khipu: The Accounting System
3.6.13 Totem Poles
3.6.14 Naxi Pictography
3.6.15 Pennsylvania Barn Stars
3.6.16 Dakota Winter Counts
3.6.17 Australian Message Sticks
3.6.18 Silas John's System
3.6.19 Lukasa Memory Boards
3.6.20 Tupicochan Staff Code
3.6.21 Dance Notation
3.6.22 Weather Icons
3.6.23 Scouting Merit Badges
3.6.24 Traffic Signs
3.6.25 Car Logos or ``Hood Ornaments''
3.6.26 ``Asian'' Emoticons
3.6.27 Other Non-linguistic Symbol Systems
3.7 Detailed Statistical Analysis of kamon
4 Writing Systems
4.1 Introduction
4.2 Writing
4.2.1 Preliminaries
4.2.2 Types of Writing Systems
4.2.3 A Side Note on Alphabets and Typewriters
4.2.4 Blissymbolics: An Attempt at a ``Semasiographic'' Writing System
4.3 Limitations of Writing
4.3.1 Inclusiveness
4.3.2 Graphocentrism
Limitations of Writing in Representing Speech
The Two-Dimensional Aspect of Writing
4.3.3 Summary
4.4 Writing: A Summary
5 Symbols in the Brain
5.1 Relevant Areas of the Brain
5.2 Meaning in the Brain
5.3 Reading Written Language
5.3.1 The Letterbox
5.3.2 Summary: The Evolution of the Letterbox
5.4 Reading Non-linguistic Symbols
5.5 A Hypothesis
6 The Evolution of Writing
6.1 What Is Known About the Evolution of Writing?
6.2 A Hypothesis
6.3 Scribal Schools
7 Simulating the Evolution of Writing
7.1 Previous Work on Computational Modeling of the Evolution of Writing
7.2 A New Computational Simulation
7.2.1 Description of the Model
7.2.2 Simulation of Evolution
7.2.3 Summary and Discussion
7.3 What Types of Symbol Systems Could Have Evolved intoWriting?
7.4 Summary
7.5 Details of the Model
7.5.1 Data Generation
7.5.2 Model
7.6 Semantic-Phonetic Compounds from Experiments
7.6.1 Monosyllabic Cases
7.6.2 Sesquisyllabic Cases
7.6.3 Disyllabic Cases
8 Confusions and Misrepresentations
8.1 Introduction
8.2 What Does Writing ``Look Like''?
8.3 The Statistical Analysis of Symbol Distributions?
8.3.1 Statistical Analysis of the Indus Valley Inscriptions
8.3.2 More on Structure in the Indus Inscriptions
8.3.3 Variations of Distributions of Symbols
8.4 Summary
9 The Future of Graphical Symbols
9.1 The Dream of a Universal Written Language
9.2 A Fully Expressive Semasiographic System?
9.3 The Social Status of Writing
9.4 Final Thoughts
Figure Credits
Bibliography
Author Index
Index

Citation preview

Richard Sproat

Symbols An Evolutionary History from the Stone Age to the Future

Symbols

Richard Sproat

Symbols An Evolutionary History from the Stone Age to the Future

Richard Sproat Google Japan Shibuya City, Tokyo, Japan

ISBN 978-3-031-26808-3 ISBN 978-3-031-26809-0 (eBook) https://doi.org/10.1007/978-3-031-26809-0 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Cover illustration: The image on the cover is based on an idea of the author and represents the Ancient Egyptian hieroglyph for “scribe”. It depicts the scribe’s equipment consisting of a tube for holding reeds, a leather bag for holding ink and a palette for mixing ink. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Todo lenguaje es un alfabeto de símbolos cuyo ejercicio presupone un pasado que los interlocutores comparten. Jorge Luis Borges, “El Aleph”

Preface

Nearly 20 years ago I was contacted by Steve Farmer, a comparative pre-modern historian. I had recently published a book (Sproat, 2000), where I developed a formal computational theory of writing and its relation to the language and speech it encoded. He had found that and a previous book of mine useful in helping him understand computational linguistics and formal models of language, and he had a question for me. His question was deceptively simple: was it “possible to distinguish statistically any linguistic character string represented by a fairly large corpus of texts from non-linguistic symbol chains?” Steve’s interest in this question stemmed from his collaboration with Harvard Indologist Michael Witzel. They had recently published an article (Witzel and Farmer, 2000), which debunked a recent much heralded “decipherment” of the socalled Indus Valley Script, the to-date largely uninterpretable collection of short cryptic “texts” from the Bronze Age Indus Valley Civilization (3rd Millennium BCE). After his work on that article, where he and Witzel largely accepted the standard assumption that the Indus Valley had a full-blown writing system, Steve started to consider various aspects of the Indus Valley culture and was coming to suspect that the symbols were not writing at all, but some sort of non-linguistic system or at best a “proto-writing system.” For one thing, the only surviving texts (on non-perishable materials such as steatite) are exceedingly short. Despite claims that there must have been long texts—indeed a whole lost literature—on perishable materials, no evidence of non-perishable paraphernalia required to produce such texts—writing utensils, ink pots—has been forthcoming. Nor, despite 700 years of use, do the Indus symbols seem to show the kind of evolutionary shape changes found in writing systems that are in wide scribal use (Kelly et al., 2021). Such a situation, stable over 700 years, flew in the face of historical developments found in all known literate civilizations (Farmer et al., 2002). Hence his question to me: nobody knew what the symbols meant, but would it be possible to tell just by the distribution of the symbols in texts, whether they represented language or not? To put it in modern terms, if you didn’t know either written English or mathematical symbology, could you tell that the former represented a language and the latter was instead a formal non-linguistic system? At vii

viii

Preface

the time, I didn’t think there were any reliable tests, and indeed I still think that, a topic I take up later in this book. But that initial email led to a collaboration that resulted in a paper (Farmer et al., 2004) that managed to convince people or offend them in roughly equal measures. This led to further controversy about the Indus symbols in particular and more generally whether statistical methods are informative about the kind of things graphical symbols denote, a topic which, again, I take up later in the book. But it also got me to thinking about non-linguistic symbol systems and what their status is. Written language is prominent in our minds because it is so ubiquitous in the modern world, and we tend to think much less about the various graphical notation systems that communicate information, but do so without reference to language. Yet these systems are also ubiquitous, also communicate various different kinds of information, and, most importantly for one of the themes we will develop in this book, long predate writing in the history of human civilization. They are also largely misunderstood, apparently. Many of the critiques of our 2004 paper involved the presumption that non-linguistic systems are simple and structureless. As we shall see in this book, nothing could be further from the truth. Yet it is this misunderstanding that has led people, as we shall see, to assume that evidence of structure in a symbol system is ipso facto evidence of linguistic structure. One purpose of this book is to try to put such misconceptions to rest, and one way to achieve that goal is to examine in some detail a variety of non-linguistic symbol systems with different symbol set sizes, different ways of combining symbols, and different functions and, in the process, develop a taxonomy of such systems. It will become clear as a result of this that structure in a symbol system simply relates to the complexity of the domain that the system is used to represent. Another purpose of the book is to understand better the relationship between nonlinguistic symbol systems and a type of symbol system that has a special place in the history of humankind: true writing. To this end I delve in some depth into how writing works, how it relates to the speech that it encodes, and how speech and writing differ in terms of what they can and cannot easily express. But there is another aspect of the relationship between non-linguistic systems and writing that needs to be understood: evolution. It is widely accepted that writing evolved out of formerly non-linguistic systems. In Mesopotamia, where the evidence is clearest, the non-linguistic system was an accounting system. Also widely accepted is that the key point in that evolution was the realization that symbols could be used not just for what they mean but for how the words for the concepts the symbols originally encoded sounded. This allowed for the transfer of symbols to represent other words that sounded similar. In this book, I explore what this must have meant in neurological terms, and I offer a hypothesis as to the institutional context in which the symbol-sound correspondences would naturally have been trained. I offer a computational simulation in support of this hypothesis. If I am successful at these various quests, it will have been in large measure because of the many scholars I have interacted with over the years, who have helped me understand the limits of my thoughts, and helped me revise them. Most people who have come to the study of writing systems and graphical symbol systems

Preface

ix

more generally have come to it from a background in the humanities—often with a specialization in one or another writing system, ancient or modern. My background is different. Trained as a formal linguist, I took up computational linguistics when I moved to AT&T in the mid-1980s, and thence moved into an area that was a research topic of interest at Bell Labs at the time: text-to-speech synthesis. I became interested in the problem of language processing with a view to having a system—a computer— read text. As a result I became interested in the relation between written language and the speech that one can generate from it, which led to my becoming interested in writing systems and to the 2000 book referenced above. That first email from Steve Farmer was the impetus for me to try to understand more about graphical symbol systems that were not tied to language, and the relation between these non-linguistic systems and written language. In this journey I have benefited from discussions with and feedback from many people including, at various times over the years, Michael Witzel, William Boltz, Christopher Woods, Edward Shaughnessy, and Suyoun Yoon; as well as audiences at talks I have given over the years on related topics at Empirical Methods in Natural Language Processing, the Berkeley Linguistics Society, Kings College London, York University, the Max Planck Institute for Evolutionary Anthropology in Leipzig, Saarland University, Johns Hopkins University, Carnegie Mellon University, and the Signs of Writing Conferences in Chicago (2014) and Beijing (2015). I have benefited from extremely detailed comments on previous versions of this work by Jacob Dahl, Kyle Gorman, Alexander Gutkin, Steve Farmer, Zoltan Somogyi, and Brian Roark. I also thank Sven Osterkamp and one anonymous reviewer for feedback and suggestions. My interactions with Rajesh Rao and Rob Lee and their colleagues (see Chap. 8), while often heated, have proved very useful in that they forced me to think deeper about the issues surrounding putative statistical tests for whether a symbol system is or is not true writing. As is often the case, the people who disagree with you the most can often be the ones who most force you to understand the issues better. Finally, I would like to thank Alexandru Ciolan, my editor at Springer, for his support during the production process. Shibuya City, Japan

Richard Sproat

Contents

1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 What’s in a Symbol? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 What this Book Is About . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 4 7

2

Semiotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The Field of Semiotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Iconicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Articulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11 11 13 15 17 18

3

A Taxonomy of Non-linguistic Symbol Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 A Brief History of Non-linguistic Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 A Preliminary Taxonomy of Non-linguistic Symbol Systems . . . . . . . . 3.4 Examples of Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 An in Depth Comparison of Two Non-linguistic Symbol Systems: Japanese kamon and European Heraldry . . . . . . . . . . . . . . . . . . . . 3.5.1 Kamon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 British Heraldry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.3 Structural Differences: Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Survey of a Variety of Nonlinguistic Symbol Systems . . . . . . . . . . . . . . . . 3.6.1 Vinˇca Symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.2 Uruk Accounting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.3 Kudurrus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.4 Central Asian Tamgas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.5 Pictish Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.6 European Heraldry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.7 Kamon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.8 Alchemical Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.9 Symbols of Guild . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21 21 22 27 34 35 37 43 51 52 52 53 55 56 57 58 60 60 61 xi

xii

Contents

3.6.10 House Marks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.11 Gaunerzinken and Hobo Signs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.12 Khipu: The Accounting System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.13 Totem Poles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.14 Naxi Pictography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.15 Pennsylvania Barn Stars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.16 Dakota Winter Counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.17 Australian Message Sticks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.18 Silas John’s System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.19 Lukasa Memory Boards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.20 Tupicochan Staff Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.21 Dance Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.22 Weather Icons. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.23 Scouting Merit Badges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.24 Traffic Signs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.25 Car Logos or “Hood Ornaments” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.26 “Asian” Emoticons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.27 Other Non-linguistic Symbol Systems . . . . . . . . . . . . . . . . . . . . . . . Detailed Statistical Analysis of kamon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

62 63 65 69 71 71 74 75 76 77 78 80 81 82 82 84 85 86 87

Writing Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Writing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Types of Writing Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 A Side Note on Alphabets and Typewriters . . . . . . . . . . . . . . . . . . 4.2.4 Blissymbolics: An Attempt at a “Semasiographic” Writing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Limitations of Writing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Inclusiveness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Graphocentrism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Writing: A Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

91 91 92 92 96 98 99 101 102 103 110 112

5

Symbols in the Brain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Relevant Areas of the Brain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Meaning in the Brain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Reading Written Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 The Letterbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Summary: The Evolution of the Letterbox . . . . . . . . . . . . . . . . . . . 5.4 Reading Non-linguistic Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 A Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

113 114 116 119 119 124 125 128

6

The Evolution of Writing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 What Is Known About the Evolution of Writing? . . . . . . . . . . . . . . . . . . . . . 6.2 A Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Scribal Schools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

131 131 139 141

3.7 4

Contents

7

xiii

Simulating the Evolution of Writing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Previous Work on Computational Modeling of the Evolution of Writing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 A New Computational Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Description of the Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Simulation of Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.3 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 What Types of Symbol Systems Could Have Evolved into Writing? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Details of the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Semantic-Phonetic Compounds from Experiments . . . . . . . . . . . . . . . . . . . 7.6.1 Monosyllabic Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.2 Sesquisyllabic Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.3 Disyllabic Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

145

8

Confusions and Misrepresentations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 What Does Writing “Look Like”? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 The Statistical Analysis of Symbol Distributions? . . . . . . . . . . . . . . . . . . . . 8.3.1 Statistical Analysis of the Indus Valley Inscriptions . . . . . . . . . 8.3.2 More on Structure in the Indus Inscriptions. . . . . . . . . . . . . . . . . . 8.3.3 Variations of Distributions of Symbols . . . . . . . . . . . . . . . . . . . . . . . 8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

179 179 181 185 185 189 191 193

9

The Future of Graphical Symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 The Dream of a Universal Written Language. . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 A Fully Expressive Semasiographic System? . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 The Social Status of Writing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Final Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

195 195 201 203 207

145 148 149 155 158 159 161 162 162 164 168 168 172 176

Figure Credits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Author Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

Chapter 1

Introduction

1.1 What’s in a Symbol? Most people in the world will be familiar with the octagonal red stop sign. While the sign is typically accompanied by a written text message—e.g. STOP—that written message is not really needed for the sign to convey its meaning. To see this, one only needs to consider the stop sign in Fig. 1.1: even if you do not read Arabic, it is likely that if you see this sign at an intersection, you would know what it means. The red octagon is a conventional symbol for the notion “stop”. This of course is a simple case: it involves a single symbol on its own conveying a simple piece of information. The stop sign is in turn one of a set of more or less conventional symbols used to convey information along roads. Typically these are also simple symbols, conveying relatively simple information. Sometimes such symbols may be combined: one may see a sign indicating a filling station, and next to it a sign with a knife and fork indicating a restaurant. In this case the “message” is that nearby one can fill one’s vehicle, and get a meal. The messages that one typically finds involving road signs are invariably simple like this, and the ways of combining them—e.g. just lining them up—are also simple. More on this in what follows. But not all symbol systems are simple. Let us consider an example of a system that involves far more complicated combinations, namely an example of medieval British heraldry. Let us say that I would like to have a coat of arms. Of course, to get one I would have to apply to the College of Arms,1 which application would almost certainly be turned down. But let us ignore that for now. I always liked the color blue as a child, so I have decided my background will be blue, or azure in the terminology of the formal language blazon. (See below in Sect. 3.5 for more on blazon.) I want to keep the design simple, so apart from the azure field, I’ll just add a single ordinary charge, namely a bend. According to the first rule of heraldry, since my field is a color, my bend must be of one of the two designated metals (it could also be a fur),

1 www.college-of-arms.gov.uk.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Sproat, Symbols, https://doi.org/10.1007/978-3-031-26809-0_1

1

2

1 Introduction

Fig. 1.1 An Arabic stop sign. Source: Wikipedia. https://en.wikipedia.org/wiki/ Stop_sign#/media/File: Saudi_Arabia_-_Road_Sign_ -_Stop_(Arabic).svg, Author: Qrmoo3. License: CC BY-SA 4.0

Fig. 1.2 The shield of my hypothetical coat of arms: Azure, on a bend argent three brussels sprouts proper

namely gold (or) or silver (argent). I am going to pick argent, because or invariably renders as yellow, and yellow on blue is a bit garish, in my view. Finally, being addicted to bad puns, I am going to design my arms to be canting arms, typically a pun on the name of the bearer. In my case I will pick brussels sprouts since sprout is similar to Sproat, and in fact some people in my experience never seem to be able to make the distinction, and finally because, due to this resemblance, my nickname in elementary school was brussels. I will pick three brussels sprouts, since three is a common number for a charge, I will place them on the bend, and since brussels sprouts would look just like green circles otherwise, I am going to pick a realistic looking rendition for them, in blazon terminology proper. And there we have it: Azure, on a bend argent three brussels sprouts proper. See Fig. 1.2. The point of this bit of fiction is to illustrate some of the features of symbols. In the unlikely event that I applied to the College of Arms to be granted this shield, and that my application were granted, then first of all these arms would represent me and would in effect mean me: I could have them printed on household items or on

1.1 What’s in a Symbol?

3

letterhead and it would identify those items as being related to me. More technically, the denotation of the arms would be me—though denotation cannot generally be thought of in such simple terms as a symbol being related directly to an object in the real world. (After all, the denotation of a symbol may not even exist in the real world, as is the case for the denotations of words like dragon or unicorn.) The symbol itself is really a message of sorts composed of simpler symbols, some of which are abstract—e.g. the bend and the colors and metals (a color and metal being basic parts of the symbol system of heraldry); and some iconic, like the sprouts, being depictions that clearly evoke an object. As with many complex symbol systems, the rules of heraldry restrict how things may be combined, hence the limitation on how I can color the bend, noted previously. So much for the denotation and a crude characterization of the basic symbols making up this complex symbol. There are in addition the various connotations, which are evoked by my choice of symbols. Most obviously there is the pun on my name, along with a bit of personal history embedded in the design. There is also my preferences in colors. The pun will likely be obvious to the observer, the other choices perhaps not so obvious, but all of them are part of the history or, to couch it in linguistic terminology, the etymology of the symbol. All symbols carry with them a set of denotations and a set of connotations. The denotations are simply what they are used for. The ‘A’ at the beginning of this paragraph is used to represent, in its most basic linguistic use, a particular vowel—or in English, anyway, a set of vowels depending on the context. In other uses it has other denotations: for example it can denote the highest letter grade in an academic setting. Or a particular set of musical notes. Or a particular train in the New York City subway system. But depending on the particular use case, the other denotations effectively become connotations, insofar as they are typically irrelevant in the given context: If I am reading the letter ‘A’ in the word ‘insofar’, the academic grade sense of that symbol is irrelevant. And there are connotations that may arise because of the history of a symbol. In the case of ‘A’, the letter descended over nearly 4000 years from what was originally, in the ancestral Proto-Sinaitic scripts, a picture of an ox’s head—e.g. Phoenician ’lp ‘ox’, originally representing a consonant, a glottal stop, and only later reinterpreted by the Greeks as a vowel when they borrowed their alphabet from Phoenician (Gnanadesikan, 2009). If you happen to know this history, that connotation is lurking in there somewhere. It is fair to say that a lot of the perceived ‘magic’ of symbols relates to their connotation as much as to their denotation. Symbols have often been believed to have special powers, and hidden meanings. Kabbalists such as Franciscus Mercurius van Helmont (1614–1699) found hidden meanings in the letters of the Hebrew alphabet far beyond the mundane use of those symbols to represent the sounds of the Hebrew language. See Sect. 3.3. Symbols are a technology, where most of the time the purpose of the technology is to communicate some sort of information. As James Burke showed in his 1970s era book and TV series Connections (Burke, 1978), any given piece of technology has a history that can often involve many unexpected twists and turns: the (now thoroughly antiquated) computer Hollerith card had its origins in cards used to control the operation of the Jacquard loom. In similar fashion, symbols

4

1 Introduction

often have a complex history and thus a lot of excess baggage—to which is often added imagined baggage of the kind that the Kabbalists were fond of. But ultimately symbols are just a technology. We explore some of the properties of that technology in this book, and one of those properties is that in many symbol systems, there are clear rules on how one can combine basic symbols into more complex messages, like the color restriction (“first rule of heraldry”) noted for my imagined coat of arms described previously. We turn now to a brief introduction to the topic of syntax.

1.2 Syntax I have five different signs in the top row of Fig. 1.3, represented as five differently colored hexagons. What, if anything, can we say about the fact that these five different signs are ordered in the particular way they are? If you think about this for a moment you will quickly realize that the answer must depend upon what the signs are, i.e. what they represent. Suppose for example that the black hexagon represents the English word the, the white hexagon the word dog, the red hexagon is, the blue hexagon very and the yellow hexagon hairy. Then the five signs make up the sentence the dog is very hairy. In this case one can say a lot about the order, namely that in this particular order the signs make up a meaningful and grammatically correct English sentence; and that in most other orders they do not. In other words, the words are constructed into a sentence according to the rules of the syntax of English. Move now to the second row: here we have a row of five road signs telling us that at this place one can buy fuel, find a telephone, get a meal, get a room for the night, or

Fig. 1.3 Messages with five signs

1.2 Syntax

5

Fig. 1.4 A typical arrangement of informative signs for fuel, food and lodging

find a hospital. Like the English sentence these signs also communicate information. But in this case the order does not matter: one could reorder these signs in any way, and the “message” would be the same. There is no syntax here. Or is there? While it is true that it would actually make no difference if the signs were in a different order, there are certainly conventions on how these signs tend to get placed. Fuel signs tend to be placed first, perhaps because buying fuel is one of the most common things one does on highways, followed by getting a meal or needing a telephone. People tend to look for lodging only at certain times of the day, and with any luck, most people will never need a hospital while they are traveling. The arrangement in Fig. 1.4 is fairly typical. So convention tends to place these signs in certain orders. This is not really syntax: it would not be wrong to place them in another order. But imagine for a moment that one did not know what the signs meant, and that one merely observes that these signs tend to recur again and again in the same order. Just statistically one might well conclude that there is a syntax here, even when in fact there is not. But those issues aside for the moment, linguistic messages such as the dog is very hairy have a definite syntax, whereas non-linguistic signs such as our road signs do not. So is syntax a sign of language? If I can definitely ascertain that there is a syntax to a set of messages, can I conclude that this means we are dealing with something that relates to language? The final example should disabuse the reader of that notion. The arithmetic expression 2 + 3 = 5 also has a definite syntax. I can change the 2 and 3 around without violating the syntax or the meaning. I can swap the 5 and the 3 and have something that is syntactically well formed—but of course will now be false. But, under the familiar arithmetic notation system being assumed here, I cannot for example swap the 3 and the + or the 5 and the = and still have a well-formed equation. The rules of the syntax of mathematical expressions are of course very different from those of natural language at least in part because the former is a formal and artificial system that was developed by convention, whereas English is a natural language that evolved naturally over time. But mathematical expressions have syntax nonetheless. Syntax or structure is not determinative of something being natural language. Perhaps though we need to clear up one possible misunderstanding about expressions like 2 + 3 = 5. Obviously I can read such an expression in English (two plus three equals five), so one might be inclined to think that this is after all just language. But while 2 + 3 = 5 can be read as a linguistic message, one can in fact do this in practically any language (or at least in languages for cultures that have non-trivial counting and some notion of arithmetic) so that one could easily render this expression into French or Japanese as one could into English. The fact that one

6

1 Introduction

can read a non-linguistic message using language is not the same as saying that the message represents language or linguistic information.2 All of this may seem trivial and obvious but it is surprising how often this point is misunderstood. In this book we will make a clear distinction between linguistic symbol systems—i.e. true writing systems; and non-linguistic symbol systems, which is any other graphical symbol system that conveys some sort of information but is not tied to language. Indeed we will use the term writing throughout as synonymous with a linguistic symbol system. We acknowledge that there are those who use the term writing to denote any system of conventionalized graphical marks, but that is not the way we will use it in this book. It is important to be clear about this point since much confusion has been sown by vague uses of the term writing. We return to this issue more in Chaps. 4 and 8. Returning to the main theme, one of the most common misunderstandings about non-linguistic systems is the assumption that such systems are ipso facto structureless and without syntax. While many are indeed that, many are not, and one of the goals of this book will be to convince the reader that non-linguistic systems may have structure, often rich structure, and whether or not a system displays syntax relates to the kind of message it is designed to convey rather than whether or not it is tied to language. But does the syntactic structure of some non-linguistic systems still depend on language in a different way? Is it because of the structure that natural languages have evolved over tens or hundreds of thousands of years, that it is possible for humans to construct other, non-linguistic, communications systems that themselves have structure? Probably, though it would be hard to demonstrate that this must be the case. The evolution of language itself and what it was precisely that evolved—a distinct mental “module” devoted to language as nativists have argued, or simply a complex use of cognitive functions adapted for other purposes—is a contentious issue. How writing itself evolved from prior non-linguistic systems is also unclear, though we will suggest some possible mechanisms in Chaps. 6 and 7. As to how more general graphical symbol systems evolved, and to what extent their evolution depended on the prior existence of a developed language faculty we can only speculate. We do know that non-linguistic icons are processed differently in the brain from written language, and in a way more akin to the interpretation of pictures (Huang et al., 2015), so at least that suggests that there is a more tenuous connection to language for non-linguistic systems. But those experiments considered only nonlinguistic symbols in isolation, not non-linguistic symbols from a syntactically complex symbol system being used in complex constructions: such cases might involve processing more akin to what happens in the brain when spoken language is processed. We return to this theme in Chap. 5.

2 We will however argue later that the reading of non-linguistic symbols aloud was probably critical

in the evolution of the first writing systems developed.

1.3 What this Book Is About

7

1.3 What this Book Is About This book is about graphical symbols, what sorts of things they denote and how, in some systems, the symbols can be combined into complex messages. It is not about signs more generally: that is the domain of semiotics which has been characterized by one prominent semiotician as being “concerned with everything that can be taken as a sign” (Eco, 1976, page 7), and includes such things as smoke signals, or even rashes as symptoms of underlying diseases. The domain of this work is man-made graphical symbols,3 and in particular ones that are conventionalized in that their form and meaning are agreed upon by a large community of users. We will define this notion more formally in what follows. And since semiotics as a field would seem to relate to the topic of this book, I will briefly review that field (Chap. 2), and point out where the topic of this investigation differs and in many ways goes beyond what semioticians have typically dealt with. We will see that, at most, semiotics and the topic of this book have a non-null intersection, but that there are areas of semiotics we will not cover and in contrast we will be concerned with areas that have at best received lip service by semioticians. One of the distinctions we will introduce briefly in Chap. 2 is the important distinction between non-linguistic symbol systems, and writing, which encodes linguistic information. Writing is perhaps the most familiar symbol system since it is one that most of us use every day. Less familiar are the many varieties of nonlinguistic systems. Thus, in Chap. 3, I will present a taxonomy of non-linguistic graphical symbols, according to what they denote, and what the possibilities are, in the given system, of combining the symbols into more complex messages. In order to understand the workings of a couple of these in greater detail, I will also present in that chapter an in-depth comparison of two systems of heraldry, European heraldry, and Japanese kamon, which served much the same function in the two cultures, but differed in their syntactic combinatorics. Writing is a special case of a conventionalized graphical symbol system that has complex syntax, formally defined as a symbol system that encodes information from 3 Since I use the term “graphical” throughout this book, one might wonder where tactile systems—

the most notable example is Braille—fit into this schema. At the risk of overextending the meaning of “graphical”, I will assume that tactile systems are also instances of graphical systems. As a practical matter, systematic tactile systems have been rare in history. Traditional symbol systems do include cases that probably at least had a tactile component: khipu (Sect. 3.6.12) may have been such a system. But the widespread use of a conventional symbolic system based on symbols that one could sense by touch seems to have been relatively modern. Braille itself was first developed in the nineteenth century by Louis Braille and was inspired by an earlier system by Charles Barbier (Barbier, 1815; Henri, 1952). Braille of course has spawned a whole family of tactile systems based on arrangements of dots, not only to represent a large number of written languages, but also numerical and mathematical information, among others. But again, this is a quite modern phenomenon. But in any case, it seems reasonable to assume that tactile systems fall under the rubric of graphical systems more generally. If nothing else, as we will mention in Chap. 5, Footnote 4, the processing of Braille by blind users seems to make use of the same areas of the brain as the processing of standard written forms by sighted users.

8

1 Introduction

natural language. The literature on writing systems is significant, with quite a few books and other works having been added in just the last few years. Chap. 4 will merely review the main issues, focusing on the question of how writing systems encode linguistic information, what linguistic information is encoded, and what it takes for something to be a full writing system. As has been pointed out many times elsewhere, we will see that full writing systems must encode phonology, that is they must have some way to represent the sounds of the language: they cannot just represent meanings. Unfortunately this empirically derived observation that all full writing systems must encode phonology has been misinterpreted as “speech centric” by those who prefer to emphasize the commonality between writing and other graphical systems. But as we will see, there is really nothing to argue about here. How are symbols processed in the brain? In Chap. 5 we review some of the literature on that topic, and point out some of the differences between how writing is processed and how other symbols are processed. Crucial in the processing of writing as opposed to non-linguistic symbol systems is the involvement with writing of the language processing areas, in particular those related to phonology. Writing systems are special, but they evidently evolved from non-linguistic systems. While there have been hundreds of writing systems developed throughout history, as far as we know the pristine invention of writing happened in only three, or at most five places independently of one another—see Sect. 6.1. How did it happen? Unfortunately the archaeological evidence on that point is almost non-existent in all but one culture (Mesopotamia), and sparse even in that case, so the best we can do is speculate. But we can at least do one thing: we can simulate the evolution of written language from non-linguistic symbol systems using computational models. The evolution of writing will be the topic of Chaps. 6 and 7. Chapter 8 will be about a topic that seems to be a source of confusion: given a symbol system where we do not know the meaning of the symbols—say a symbol system from an ancient civilization—what can we say about that symbol system before we have established what the symbols mean? One of the most common assumptions when faced with an unknown system is to assume that must have been some form of written language, especially if overtly it “looks like” writing. (We will also discuss what people seem to think it means for something to “look like” writing, and some of the pitfalls in the assumptions.) But can we ascertain this short of a decipherment? The past couple of decades have seen claims that one can determine the status of a system on the basis of statistical properties of the distribution of symbols in extant “texts” of the system. The little demonstration in Sect. 1.2, Fig. 1.3, ought to give a clue as to how successful such approaches are likely to be since they are more or less the computational equivalent of trying to determine for our sequence of colored hexagons, what sort of system one is dealing with. In any event, we review some of this recent work and point out some of the difficulties it faces. Finally in Chap. 9, we will look at one question that seems to be a recurrent theme in the popular press, namely whether emoji or some similar form of non-linguistic symbol system could replace written language as an effective and complete form of communication. In Neal Stephenson’s The Diamond Age, a large segment of

1.3 What this Book Is About

9

the population’s written communication was via mediaglyphics, animated glyphs that are supposed to be able to communicate the same sort of information that conventional writing systems do. Then there are jocular exercises like Emoji Dick,4 which attempts to translate Herman Melville’s Moby Dick into emoji, which has led in turn to others wondering just how much of written communication can be replaced by sequences of little icons (WNYC, 2014). We close with some thoughts on these ideas.

4 https://www.kickstarter.com/projects/fred/emoji-dick.

Chapter 2

Semiotics

The topic of this chapter is the field of semiotics, which is broadly interested in the topic of signs and their meanings. As such it would seem to be very related to the theme of this book. However, as we shall see, there are large areas of semiotics that are not directly relevant to the current study and, contrariwise, the current study goes well beyond semiotics in some ways. Indeed, the main reason for including this chapter is to explain how the present work differs in focus from much of what most semioticians are interested in. It is likely that semioticians will not be happy with my explanations, but I hope that the explanations will at least seem defensible. That said, the reader who is not particularly interested in semiotics—or who already knows the field but is not particularly interested in how the present work differs from it—may safely skip most of this chapter. However he or she will still want to look at Sect. 2.5, which introduces the semiotic notion of articulation, which we use elsewhere in the book.

2.1 Introduction There are many ways of communicating information. The communication may be intentional, as in my writing this text with a view towards explaining a particular view of symbol systems to a hopefully interested audience. The communication may be unintentional, as with an overt collection of symptoms that can be thought of as “communicating” the presence of an underlying disease. Or they may be intentional or unintentional depending on circumstance: when people speak they typically accompany their speech with facial expressions and gestures, of which they may or may not be aware depending on the case. In addition to intentionality, another dimension is conventionality. The written symbols used to write this book are conventional since as readers and writers of English, we as a community agree that this set of symbols assembled in a particular way can be used to encode messages that are to be read and understood in English. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Sproat, Symbols, https://doi.org/10.1007/978-3-031-26809-0_2

11

12

2 Semiotics

But if two people develop a secret code that only they understood, that is not conventional. Yet another dimension is graphicality. Again, the written symbols I am using to write this text are graphical insofar as they are (or at least show up as) marks on a surface. But speech is not graphical. Neither is music—though musical notation is. Consider, then, the following issues that one might study as forms of communication: 1. Written language and, more specifically, how writing encodes language. 2. Graphical signs that conventionally represent information but are not tied to language: mathematical notation, chess notation, traffic signs, guild signs, heraldic/emblematic signs … 3. Spoken languages and sign languages. 4. Spoken argots (codes), such as Boonville jargon (Rawles, 1966). 5. One-offs: you and I agree that an apple on the stoop means I am home but not to be disturbed. Or a one-time pad, which in cryptography denotes a formula for encrypting and decrypting a message that only the sender and receiver know, and which is only used once. 6. Birdsong. Frog mating calls. Dogs marking locations with urine. 7. Slime mold chemical signals. 8. Genetic codes. 9. Clear mucus and a chronic cough are symptoms of pertussis. 10. Various more or less indirect ways of communicating information: metaphor, metonymy (“the ham and eggs wants more coffee”), irony. 11. Myth and what it communicates. 12. The connection between texts. 13. The main character Sen in Miyazaki Hayao’s Spirited Away, who is entrapped into working in a spirit world onsen, has to be understood in the context of child sexual slavery (Info, 2020). All of these topics would fall somewhere in the broad field of semiotics, relating as they do to signs of one kind or another and what they communicate. In this work we will be interested in exactly two of these: 1 and 2, both of which are intentional, conventional, graphical forms of communication. We will be largely unconcerned with the other examples, though to understand writing and how it works one needs to understand how it relates to the third item, language. Of course a semiotician might argue that by limiting ourselves in this way we are automatically handicapping ourselves, since we are not considering the full range of possible ways in which information may be communicated. True, perhaps, but the first two items on our list above form an important natural class. Writing in particular has been termed the “technology of civilization” (Powell, 2009), and there is a lot of merit to that view: without writing, detailed record keeping, beyond what one could record using accounting symbols or other non-linguistic symbol systems, would depend on human memory, and there is much reason to believe that in most purely oral cultures (cultures that lack writing), records are in general much more fluid than in cultures that possess writing (Goody & Watt, 1968; Goody, 1977). And writing

2.2 The Field of Semiotics

13

in turn depends on non-linguistic symbol systems, our second item above, since it is generally believed that writing evolved from a prior non-linguistic system. Thus our object of study will be those semiotic systems that might, in principle, have evolved into writing. In any case, as we shall see, the first two already form a rich set of cases, from which we can glean a lot about information and how it is communicated. Since we have introduced the term semiotics, it behooves us to say a little more about that field, which we do in the next section. We will in particular focus on why for the present study I am not making an active attempt to align my story strongly with that much broader field. In the subsequent two sections, we take up two themes, iconicity and syntax that, respectively, have and have not been the main focus of semioticians. Finally we will end the chapter by discussing articulation, a semiotic notion that will prove useful in our subsequent foray into a taxonomy of non-linguistic symbol systems.

2.2 The Field of Semiotics The field of semiotics, broadly construed, dates back millennia at least to Plato and Aristotle, but the modern incarnation of what we recognize as the field starts with the work of Charles Sanders Peirce, an American philosopher of the nineteenth century (Peirce, 1868, 1934). More recent figures have notably included Umberto Eco (1976) and Thomas Sebeok (1977; 2001), who is credited with being one of the founders of the field of biosemiotics. Good reviews of the field can be found in Bouissac (1998) and Chandler (2002). In its broadest, most “imperialistic” charter, the field is “concerned with everything that can be taken as a sign” (Eco, 1976, page 7), and is in principle the theoretical discipline that deals with sign and symbol systems. Is the present study not, therefore, simply an instance of semiotics? There are several reasons, however, why we will not be overly adherent to the theoretical notions developed in semiotics. As we noted in the previous section, the notion of sign that we intend is much narrower than what semioticians usually consider. To further support the previous discussion on this point consider the entry for sign in Bouissac’s Encyclopedia of Semiotics, which tells us that: In scholarly writing, the term sign might include, for example, words, sentences, marks on paper that represent words or sentences, computer programs … pictures, ideograms, graphs, chemical and physical formulas, fingerprints, ideas, concepts, mental images, sensations, money, postures and gestures, manners and customs, costumes, rules and values, the orienting dance of the honeybee, avian display, fishing lures, DNA, objects made of other signs … and also nonrepresentational objects (perhaps in music or mathematics) that have types of structure characteristic of other signs. (Bouissac, 1998, page 572)

Clearly the graphical symbols we are interested in represent a far narrower concept than what Bouissac lays out. Second, semiotics provides no formal theory of the combination of signs in text— in other words the syntax, as laid out in the previous chapter. Look in any text

14

2 Semiotics

on semiotics, and one will be hard pressed to find any serious discussion of the combinatorics of signs even though every semiotician recognizes that this topic is important. Only in linguistics has the issue received serious scholarly attention. But this is also an important topic for us, since one of the interests here is in combinatoric sign systems that might be mistaken for (written) language; or that might evolve into written language under the right set of sociological, political and linguistic circumstances. Syntax depends on formal mathematical models for its proper characterization and part of the problem with much recent work in semiotics is that, to put it bluntly, the field has been effectively hijacked by deconstructionists, who are not particularly interested in formal mathematical models. But even non-deconstructionist semioticians, such as Eco, are not typically well versed in mathematical models of information. Cf. the following (incorrect) definition by Eco: “[according to the mathematical theory of information] information is only the measure of the probability of an event within an equi-probable system” (Eco, 1976, page 42), which appeals to the Shannon notion of information (Shannon, 1951) which, however, does not depend upon equiprobability. We will also not make much use of distinctions, due originally to Peirce, between symbols, indices and icons. Briefly, an icon is a (graphical or auditory) device that resembles its intended referent, an index is a device that somehow points out its intended referent (a canonical example is a pointing finger), and a symbol is something that has an arbitrary but conventional relation to its referent, like the word cheese in English, which in no way “resembles” cheese and only has meaning and form it does in Modern English by accident of history. We will have more to say about icons specifically in the next section, but it should be pointed out that not all semioticians agree with the distinction. Eco (1976), page 178, notably does not, rejecting the distinction largely for reasons internal to his own theory of signs, though he also points to the fact that even iconicity is conventional, thus nullifying some of the basis for distinguishing symbols from icons.1 For our purposes, the distinctions are largely irrelevant. As we shall see, many of the signs we consider would be considered iconic, and all early writing systems made use of iconic symbols. We do not wish to deny that one may subclassify signs in this way, just that we will not make much use of the distinctions. In this work we will thus restrict ourselves to using the term symbol, possibly incurring the wrath of semioticians like Sebeok (2001), who notes (page 56) that ‘symbol’ is the most abused term of those under consideration here. In consequence, it has either tended to be grotesquely overburdened, or, on the contrary, reduced to more general kinds of behavioural phenomena, or even to absurd nullity.

But here we will give it a rather specific meaning: a man-made graphical form, or more generally a set of such forms (since any given symbol may have variant forms) that is itself a member of a set, termed a symbol system—in semiotic

1 But

see Sebeok (2001) for some arguments against Eco on this point.

2.3 Iconicity

15

terminology a code, that has a well-defined cultural function. To anticipate some systems we will consider in the ensuing discussion: Mathematical symbols are each a member of the mathematical symbol system. Boy scout merit badges are individually members of a set of symbols whose function has a well-defined function in scouting. Mesopotamian deity symbols were individually members of a set of symbols that had a recognized function of representing favored deities. And letters of the Roman script, as used for say English, are elements of the set of symbols used in the English writing system.

2.3 Iconicity As noted previously, many signs are iconic in that their graphical form evokes what they denote (Peirce, 1934), as opposed to being merely symbolic, where there is no particular association between the form of sign and what it denotes. Thus an emoji such as the basic smiley face ☺ is iconic since it evokes a smiling face; whereas the Mesopotamian pre-cuneiform symbol representing a ‘sheep’, ⊞, is not particularly iconic and would therefore be considered symbolic instead. Icons need not be pictures: a diagram is also iconic insofar as its components correspond in some way to components of what the diagram depicts. Indeed, Peirce considered diagrams to be a particularly important category of iconic sign. A computer program flowchart (for those of us old enough to remember when such things were commonly used) does not “look like” the computer program it represents, but the components of the diagram correspond to logical components of the program. Furthermore, as Sebeok (2001) stresses over and over again, icons need not even be visual. There are many cases in which it is important that a symbol be iconic. In our discussion of heraldry and kamon in the next chapter, we will see many iconic symbols, and in that context iconicity is certainly important. A lion in heraldry should look at least like a stylized lion in order to evoke that animal, and connotations associated with it. A bear in canting arms for someone named Bearham should look like a bear in order to elicit the pun on the family name. As already noted, in all early writing systems, the written symbols were to a large degree iconic, though in most cases the iconicity was lost over time: it is far from obvious, for example that the letter “A” was originally a picture of an ox head, or “B” a house. The iconicity originally grounded the symbols to objects in the real world, which they either denoted, or evoked, but that grounding and the very reasons for that grounding were lost to the mists of time. For some writing systems that preserve iconicity—most notably Egyptian hieroglyphs, but also Mayan hieroglyphs—iconicity in the written symbols can interplay in interesting ways with pictorial art, so that for example in Egyptian one can find cases where hieroglyphs are used both for their linguistic meaning, but also as elements of artistic design (Baines, 1989, 2017, 2021); see Fig. 2.1. Crucially, this integration of writing and art was restricted to the highly iconic hieroglyphs, and did not involve the cursive hieratic or demotic scripts.

16

2 Semiotics

Fig. 2.1 Left: example of integrating hieroglyphs in Egyptian art. 19th dynasty (thirteenth century BCE) example of personification of three hieroglyphs ’nkh djd ws 𓋹 𓊽 𓌀, a common phrase meaning literally ‘life, stability and power’. Temple of Seti I, Abydos. (Source: Baines 1985, Figure 19, based on Calverley and Broome 1935, used with permission.) A more canonical hieroglyphic inscription with this sequence is shown on the right. Source: www.worldhistory.org. License: CCA. Left panel: Baines (1985), Figure 19. Used with permission of the author. Right panel: https://www.worldhistory.org/image/4572/ankh-djed--was/, “Original image by Kyera Giannini. Uploaded by Mark Cartwright, published on 02 March 2016. The copyright holder has published this content under the following license: Creative Commons Attribution (CCA).”

But it is also easy to overstress iconicity, and overattention to it can be misleading. Egyptian hieroglyphs, because they are apparently iconic, are often lumped in people’s minds into the same bin as, e.g., petroglyphs, such as those of pre-contact North America, even though the latter are for the most part depictions of some or other person or animal; whereas the former in their function in the Egyptian writing system represent some element of the Egyptian language, either some semantic component or, more commonly, the pronunciation of the word. Thus the fact that the owl hieroglyph 𓅓 clearly was a picture of an owl, should not make one lose sight of the fact that its basic function in Egyptian was to write the sound /m/. As is commonly said, looks can be deceiving. Or to take another perhaps more directly evocative example, consider the Modern Korean alphabet, hangul, the basic letters of which were designed to evoke something about the production of the sound that the letter writes. This was an explicit design principle laid out in the hunmin jeongeum (訓民正音), the “Correct Sounds for the Instruction of the People”, promulgated by King Sejong in 1446. Thus the /s/ letter ㅅ represents the teeth, which are involved in the production of the sound (the tip of the tongue being placed before the teeth); the /m/ letter ㅁ represents the lips, which are closed for the production of this sound; the /k/ symbol ㄱ represents the raising of the body of the tongue to contact the soft palate in the back of the mouth, part of the articulation of sounds like /k/. All of these, and many other aspects of hangul are clearly iconic since they stand in a direct relationship to the articulation of the relevant sounds. Furthermore, unlike the “A” and “B” examples above, the shapes, though they have changed somewhat since Sejong’s period, still retain this iconicity, so that it is easy to point out to students why the symbols have the shape they do. Still, in the everyday use of the writing system, this iconicity is largely irrelevant. I am unaware of any evidence that suggests that Korean speakers, while

2.4 Syntax

17

Fig. 2.2 Arabic calligraphic text in the shape of an ostrich. Source: Wikipedia. Image is public domain. Source: https: //commons.wikimedia.org/w/ index.php?curid=4177202 Author: GYassineMrabet—vectorised with inkscape from Caligrafia arabe pajaro.jpg. Image is released into the public domain

reading or writing in hangul, are actively making use of the underlying iconicity of the symbols they are manipulating. Returning momentarily to the Egyptian case, to get the kind of artistic use of written symbols that Baines (2021) discusses, one does not necessarily even need to have a writing system where the basic symbols are iconic. The Arabic script, which is highly cursive, has long been used in elaborate designs such as the example in Fig. 2.2. More generally, the fact that one can incorporate written symbols into graphical designs in a way that has no counterpart in speech has led to some fundamental intellectual dissonance on the issue of whether it is correct to argue that writing systems are able to function as full communication systems precisely because they encode linguistic information—more specifically, phonological information. We return to a discussion of that issue in Chap. 4. Therefore while there is clearly a legitimate basis for distinguishing between symbolic and iconic signs, as we have said, we will not concern ourselves too much with that distinction in this work, and will refer to both with the cover term symbol. Again, semioticians may not be happy with this (ab)use of the term symbol, but at least I hope it will be clear what I mean by it.

2.4 Syntax Treatments of semiotics emphasize the two dimensions along which signs can relate to one another: paradigmatic and syntagmatic. To take an example from Chandler (2002), page 84, in a sentence like the man cried, the syntagmatic dimension refers to the arrangement of the words in the sentence, whereas the paradigmatic dimension relates to the fact that for any given word in the sentence, some other word could be

18

2 Semiotics

substituted: thus for man one could substitute woman, for cried one could substitute laughed, and for the one could substitute a. While any treatment of semiotics will discuss these two dimensions, for syntax in particular few treatments of semiotics have much to say about them, other than to say that they exist and to illustrate what they entail. Chandler attributes the two dimensions to Saussure (1916), and of course within the field of linguistics itself, studies of syntax have been extensive. But not much formal treatment of the syntax of non-linguistic systems exists. Writing systems, since they encode language, also typically inherit the syntactic properties of the languages they encode,2 but as we shall discuss in Chap. 3, lots of non-linguistic systems also exhibit syntax, and the existence or non-existence of a (non-trivial) syntax may be the key distinction between two otherwise similarly functioning systems. Again, syntax and combinatorics, which have at best played a secondary role in the field of semiotics, are central to the purposes of the present discussion. It is only because symbols can combine into more complex messages that make many symbol systems what they are. If mathematical notation only allowed for the expression of single symbols in isolation, it would be useless to the purpose of representing mathematical information. And it is only because of combinatorics that people might believe that some ancient symbol system was writing. True, claims of “writing” have been made about even trivial inscriptions involving one or two symbols (Kammerzell, 2009). But even those who make such claims realize that there would be no hope of a decipherment into some language without the uncovering of longer, syntactically complex, texts.

2.5 Articulation One useful notion from semiotics is the notion of articulation (Chandler, 2002). Articulation simply concerns the number of levels of information found in a code. Starting at the top, the most familiar code, language, is doubly articulated. Language, whether spoken or written, consists of a level of signs that can be said to be the minimal units of meaning. In linguistics, these are typically defined as the morpheme, which is a unit that is in general smaller than a word, but still contains meaningful elements. A word may consist of a single morpheme, like dog, but all languages also have morphologically complex words consisting of multiple morphemes: dogs, for example, consists of two morphemes, the dog part, and the

2 Though

the relation with the encoded language’s syntax is not always trivial: When the Chinese script was first adopted for Korean and Japanese, there was a difference between the word order found in the written form, and the word order of the spoken language. Korean and Japanese both have Subject-Object-Verb order, whereas Chinese is Subject-Verb-Object. In both Korean and Japanese, notational devices were used to indicate how the Chinese written form should be read in the native language; see Handel (2019). A similar mismatch was found in early written Akkadian, which adopted Subject-Object-Verb order in its written form from Sumerian (Jacob Dahl, p.c.).

2.5 Articulation

19

marker s, which carries the meaning of plurality. This is the first level of articulation. The second level are the meaningless building blocks out of which the larger units are built. In spoken language these would be phonological units, typically the basic segments of the language, which linguists dub phonemes. In written language these would be the basic symbols of the writing system, often termed graphemes. This is not as trivial as it sounds since in many writing systems the graphemes themselves may sensibly be decomposed further into units that are also meaningless. And in general we shall see that the assumption that the most basic layer in a system is uniformly meaningless is an oversimplification. Codes that have a single articulation lack the lower level of meaningless elements: all of the basic symbols have meaning, but the symbols still form a coherent system, so that codes relate to one another in a conventional fashion. Chandler (2002) gives as an example of a singly articulated code, a system of traffic signs (page 261). Finally, in unarticulated codes, the symbols do not form a coherent system. Depauw (2009) gives as an example the set of familiar automobile company logos. There is no systematicity, the BMW logo does not relate in any way to the Mercedes Benz logo, and “[w]hen a new brand of cars comes on the market, a new logo is made, not by combining recurrent compositional elements, but by creating something completely new” (Depauw, 2009, page 208). This is in marked contrast to systems such as heraldry where basic elements in the system are used over and over again to make new emblems. Language is often considered to be the only instance of a doubly articulated code (Chandler, 2002). We will have reason to dispute this characterization: for one thing in many complex systems it makes sense to think of them in terms of a set of primitives, many of which are meaningless in and of themselves, which are combined into larger units that themselves do carry meaning. To understand articulation better it will be useful to consider another conception of the project of semiotics that was proposed by Eco (1976), who argued (page 7): A sign is everything which can be taken as significantly substituting for something else. This something else does not necessarily have to exist or to actually be somewhere at the moment in which a sign stands in for it. Thus semiotics is in principle the discipline studying everything which can be used in order to lie. If something cannot be used to tell a lie, conversely it cannot be used to tell the truth: it cannot in fact be used ‘to tell’ at all. I think that the definition of a ‘theory of the lie’ should be taken as a pretty comprehensive program for a general semiotics.

To see how this relates to articulation, consider the following statement which, if written by me, would be a lie: I am a citizen of Slovakia. While the statement taken at the first level of articulation of this whole written sentence is a lie, and this in turn depends on the meanings of the individual words (also at the first level of articulation), there is no sense in which any of the individual letters that make up the words can be said to be either true or false. If I misspell Slovakia as Slovackia, the intrusive ‘c’ has no effect on the truth value. This is still a lie, just a misspelled one. It is important to understand though that Eco’s notion of lie here cannot in general be equated with truth or falsehood. If it were, there would be no way to account for, among other things, performative speech acts like “I dub thee Sir Walter”. Such

20

2 Semiotics

locutions, first studied in detail by J. L. Austin (1955), are notable for not having truth values. Thus as Levinson (1983) points out (page 229), if someone says “I dub thee Sir Walter”, one cannot felicitously respond “Too true”, since what the speaker said is not a statement, but rather a performative description, uttered in the act of conferring knighthood, and generally accompanied by other acts, such as tapping the recipient on the shoulders with a sword. But while such statements cannot be true or false, they can be inappropriate: as Austin argues, a sentence like “I dub thee Sir Walter” can only be felicitously uttered by a person, usually a sovereign, who has the conventional constitutional power to grant knighthood. My utterance of the phrase along with the accompanying theatrics would not be an appropriate speech act. So if lying includes such infelicitous uses of utterances, or symbols more generally, then many symbols can be used to lie, and thus meet the requirements of Eco’s conception above. Thus I can directly lie in the sense of truth or falsehood using arithmetic expressions: 2 + 2 = 5 would be an instance of such a lie. But for a case that is more or less directly parallel to the speech act cases just considered if, following on a theme introduced in the previous chapter, I create and use a shield as part of a coat of arms, I am by means of this shield implying that I am armigerous, that is that I have the right to bear arms. Since I am not, this use would constitute a lie of sorts. Or, to depend on an example we will introduce when we discuss heraldry in the next chapter, if I am armigerous, and I place upon my shield an escutcheon of pretense suggesting that I am married to an heiress when in fact I am not, that is also a lie. Clearly if I use the arms of another individual as my own, that is also a misrepresentation. But note that the use of some of the basic charges of heraldry—again see the next chapter—such as my use of the bend in my “arms” in Fig. 1.2, is not generally something I can lie with, any more than a single letter in a sentence is generally something that one can lie with. These basic units are themselves meaningless and only take on meaning in the context of the larger message, as embodied in the notion of double articulation introduced previously.

Chapter 3

A Taxonomy of Non-linguistic Symbol Systems

3.1 Introduction The basis of any natural science is a good understanding of the domain that one is studying and what inhabits it, and taxonomies serve as a starting point for organizing the diversity. Modern biology would have been unthinkable without Linnaeus, modern chemistry without Mendeleyev, modern linguistics without the understanding of how languages may relate to each other afforded by work of Edward Lhuyd, William Jones and others. Lack of appreciation of diversity can often lead to naive beliefs. For example, a sizeable portion of the American public—38% according to a 2017 Gallup Poll (Swift, 2017)—believe in Biblical myths of divine creation and Noah’s Flood. In order to believe that a wooden boat could house a pair of each biological kind, one would have to believe that the diversity of species is well represented by, for example, what one can find at one’s local zoo, rather than what one actually finds in nature. Similarly, a lot of the misunderstandings that we will take up in Chap. 8 about the difference between written language and non-linguistic symbol systems stem in part from an underappreciation of the diversity of non-linguistic systems, including their complexity and the kinds of information they can encode. To be fair, it is not as if there have been many systematic taxonomies of non-linguistic systems. There are, to be sure, in-depth analyses of individual systems, such as detailed work over several decades on Inkan khipu (Urton, 1998, 2017; Hyland, 2014, 2017). And in works about writing systems one sees discussion of selected non-linguistic systems, such as symbols for knitting patterns by Harris (1995), or musical and dance notation in the collection edited by Daniels and Bright (1996). But there is little that gives a coherent picture of the scope of non-linguistic systems and the main thing that the selected systems have in common is that they are not writing. In this chapter I attempt to rectify this situation a bit, by presenting an overview of non-linguistic systems and the dimensions along which they vary. This is based in part on my earlier work (Sproat, 2014) as well as later work by others (Morin et al., 2020). I do not pretend that this taxonomy is in any sense complete. Nor do I claim © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Sproat, Symbols, https://doi.org/10.1007/978-3-031-26809-0_3

21

22

3 A Taxonomy of Non-linguistic Symbol Systems

to have covered more than a fraction of the non-linguistic systems that have been created by humans over thousands of years. But I believe this taxonomy is a useful step in the right direction, and will serve as a starting point for future work in this area. In addition to laying out the criteria for classification, and presenting a taxonomy including a number of systems, I will also present an in-depth comparison of two systems, European heraldry and Japanese kamon, that served very similar functions in the cultures that used them but, due to cultural differences, ended up with very different syntactic behavior. As we shall argue, this difference is important, insofar as issues such as the syntactic complexity of a symbol system are related both to their function and the circumstances under which they are used. Syntactic complexity need not imply that the system encodes linguistic information. But before we turn to discussing the criteria for the taxonomy, and then the taxonomy of selected systems, we need to set the context by examining a bit about the history of the use of non-linguistic symbol systems, to the extent that this is understood.

3.2 A Brief History of Non-linguistic Symbols Non-linguistic graphical symbols long predate writing, but nobody really knows for sure by how much. A recent study reported in Sehasseh et al. (2021) suggests that the use of shell beads as long as 142,000 years ago in what is now presentday Morocco may have been used by the wearers as “signals of identity”. At least under the broadest possible interpretation of semiotics (see Chap. 2), where almost anything can be seen as conveying meaning, it is hard to show that this suggestion is wrong. More familiar cases, such as the famous Cro-Magnon cave art from perhaps 35,000 years ago, could possibly have been symbolic in a similar sense. LewisWilliams (1997) and Clottes and Lewis-Williams (1998) argue that the palaeolithic cave paintings were deeply associated with shamanic rituals and represented a connection—a portal if you will—between the present world and the spirit world which was believed to exist on the other side of the cave walls. While often highly realistic depictions of animals were the most famous manifestations of this culture, one also sees more or less standardized geometrical figures that seem to correspond to the kinds of geometrical arrangements of lines that are commonly associated with stage one of trance, and which are also found in shamanic rock art of much later cultures. In the three stages of trance, whether drug-induced or not, subjects ‘see’ different types of visions. The first stage is dominated by geometrical forms involving lines, dots, zig-zag figures, and so forth. The second stage involves trying to make sense of the geometrical figures in the first stage by turning them into everyday objects—a beast of prey, or a snake for example. In the third and deepest phase, the subject enters the ‘spirit world’ and ‘sees’, and at the same time feels to be very real, fantastical creatures, and the subject him- or herself also has the ability

3.2 History

23

Fig. 3.1 Stages of trance from Palaeolithic cave paintings. Stage one depicts geometric figures seen during the beginning of going into a trance state, stage two realistic animals associated with ones transition into deep trance, and stage three fantastical beings representing spirit companions ‘seen’ in deep trance. Cf. Clottes and Lewis-Williams (1998), page 92, Figure 92. Source: R. McLean, Rock Art Research Unit University of the Witwatersrand. Used with permission

to transmute into another guise, often that of an animal. See Fig. 3.1 and Clottes and Lewis-Williams (1998), pages 16–17, for more details. Religious and magical uses seem to have been among the earliest functions of pictorial representations, possibly including what one might call symbols (Clottes & Lewis-Williams, 1998; Whitley, 2011). But what we are interested here is particularly conventionalized graphical symbols, where a community of users agrees upon the denotation as well as what constitutes good and bad use of the symbols. Is there evidence of anything from the Paleolithic that fits that description? Of course, humans, at least for tens of thousands if not hundreds of thousands of years, certainly did have a conventionalized symbol system in everyday use: language. But what about graphical symbols? Most prominently, von Petzinger (2017) has argued that 32 abstract signs recur again and again in Paleolithic art across many locations, and may represent some form of early conventionalized symbol system. Even attested are some (rare) cases of such symbols being strung together in “texts”. Von Petzinger has suggested even that because such symbols are so ancient, and seem to have been in use even when people first arrived in Europe, the signs may have been carried with our ancestors when they first left Africa.1 The fact that the same symbols seem to be used again and again, would seem to suggest that the system was conventionalized. Indeed repetition of symbols across “texts” seems to be a minimal requirement for considering a symbol system to have become conventionalized: if in the system each symbol only occurs once, then it is hard to see how the system could be so considered. But what is also noteworthy about von Petzinger’s results is that it is the same thirty two symbols that seem to be in use for about 30,000 years throughout the 1 https://ideas.ted.com/what-the-mysterious-symbols-made-by-early-humans

-can-teach-us-about-how-we-evolved/.

24

3 A Taxonomy of Non-linguistic Symbol Systems

Paleolithic and across a wide swath of territory. Some of the symbols may be argued to be “natural”, in the sense that their use would follow naturally from human cognitive predilections. Symbols in von Petzinger’s set that might fall into that category include two variants of hand prints, penniform symbols that look like feathers or trees, and a few forms involving lines crossing or joining—“X” and “Y” figures, for example—that have also been argued to recur again and again in linguistic scripts (see Sect. 5.3.1). But others involve a more motley collection of forms that do not obviously seem “natural”. One might interpret this as evidence of an exceedingly rigid conventionalization, set down early in human evolution. But such a static conventionalization across such a wide range of space and time is, of course, unparalleled in any other known symbol system.2 Right as I was preparing this manuscript for final submission to the publisher there appeared an article that claimed to have found an interpretation for a subset of these signs from the Upper Paleolithic (ca. 50kY–12kY BP).3 Bacon et al. (2023) argue that sequences of dots or strokes, and an accompanying “Y” symbol sometimes inserted into these sequences, which are associated with depictions of various types of animals, are to be interpreted as a counting system. Specifically, they argue that the numbers of dots or strokes represent the number of lunar months after the end of winter when important events such as mating occur in the life cycles of various game animals.4 The position of the “Y” symbol in the sequence is argued to correspond to the post-winter lunar month in which birthing occurs. They speculate that the “Y” symbol is iconic for birth, either because it represents one line becoming two, or because it is a depiction of legs parted in the act of giving birth. If Bacon and colleagues’ analysis can be sustained, then these signs would be by far the earliest conventionalized symbol system to which we can assign a definite meaning.5 But 2 In Sect. 3.6.4, in our discussion of Central Asian Tamgas, we suggest that because of the very large geographic range and duration of the system, it seems unlikely that they represented a single system. 3 I thank Tomás Melka for drawing my attention to this. 4 More specifically, they assume the starting point for the counting is the bonne saison, “the time at the end of winter when rivers unfreeze, the snow melts, and the landscape begins to green” (page 7), which they observe “varies by several weeks from the south to the north of Europe, but corresponds approximately to late spring”. Oddly the paper makes no mention of climate change. The data they study dates from anywhere between 35kY to 11kY BP, a period in the Ice Age when we know there was massive variation in temperature (Woodward, 2014) which would, one would have thought, significantly affect when bonne saison occurred. 5 Note that while Bacon et al. (2023) venture to call this a “proto-writing system” they also acknowledge that the system does not represent language. Rather by “proto-writing” they seem to mean merely that the system was a conventional way of representing information. As we shall see later on—Chap. 4, and see in particular Sect. 4.3.1—such a system can only be lumped with writing in the most inclusivist sense of the term. Certainly if by “writing” one means the kind of symbol system you are currently reading (which we argue in Chap. 4 is the common-language notion of the term), they provide no evidence to support the rather bold final claim that “a form of writing existed tens of thousands of years before the earliest Sumerian writing system” (page 15). As we argue in Chap. 8, this looseness of terminology about what one means by “writing” leads to all sorts of confusion.

3.2 History

25

Fig. 3.2 Animal motifs at Göbekli Tepe: left panel, a fox; Right panel, ducks. Source: Wikipedia. Left panel: https://en.wikipedia.org/wiki/Göbekli_Tepe#/media/File:Göbekli_Tepe_ Pillar.JPG. Author: Zhengan. License CC BY-SA 4.0. Right panel: https://upload.wikimedia.org/ wikipedia/commons/2/26/Göbekli2012-3.jpg. Author: Klaus-Peter Simon. License: CC BY-SA 3.0

again, these symbols would seem to have been rigid over a vast area and period of time. To find novel, but apparently conventionalized symbols, one has to look much later, to the Neolithic. The unique Neolithic megalithic site at Göbekli Tepe (modern day Turkey), dated to approximately 9000 years before present, affords a possible early instance of such a conventionalized graphical symbol system. Many of the motifs involve stylized (though sometimes realistic) depictions of animals (Fig. 3.2), as well as some more abstract figures, many of which occur on different megaliths at multiple locations. Peters and Schmidt (2004) list about 15 symbol types. They suggest as one possible interpretation that the animals were used as totems, representing particular groups of people (pages 209–212). Roughly contemporaneous with Göbekli Tepe, that is to about 10,000 years ago, were numerical tokens used to record commodities in Mesopotamia. Later on this system evolved to include “complex tokens” representing specific commodities. We will discuss this system more fully in Sect. 6.1, since it has been argued to be the precursor to the first writing (Oppenheim, 1959; Schmandt-Besserat, 1992, 1996). Again the symbol system seems to have been small, with just a few token types. Yet another early and roughly contemporaneous system was the Vinˇca system (Sect. 3.6.1) from the Danube region of Southeastern Europe, dating to as early as 8000 years, and consisting of a somewhat larger system of about 200 symbol types (Winn, 1973, 1981). The function of these symbols is unknown, though at least in Winn’s earlier work, he characterized them as being related to religious practice. So three systems that seem to have been somewhat conventionalized, with apparently different functions, evolved within a couple of thousand years of each other. They all have one thing in common: they seem to have been associated with that period of the Neolithic when humans started to transition to more settled lives

26

3 A Taxonomy of Non-linguistic Symbol Systems

in villages, and started to practice agriculture. Very likely this more settled and structured life, and the need to have more or less permanent marks to indicate things ranging from property to spiritual functions, was the main impetus for the development of these conventionalized systems. It has been hypothesized that concomitant with the transition to settlement was the development of a more hierarchical power structure, which in turn led to the need for devices that could help reinforce that new structure. As Benz and Joachim (2013), page 20, suggest, as people became more settled: the public display of symbols and new forms of rituals became necessary to convince group members to accept new rules …

The development of standardized symbols may have been a consequence of the increase in social hierarchies and inequality (Benz, 2017), which itself was afforded by the development of agriculture. Grain in particular is much less perishable than meat, and the introduction of large-scale grain farming made it easier for a smaller group of people to hoard resources, than was the case in earlier hunter-gatherer societies (Mayshar et al., 2022). Most of the systems that we will discuss long postdate this period. There are a couple of practical reasons for this. First of all, systems that were invented within the last few thousand years are much more well-preserved than systems that date to many thousands of years ago. Second, in order to say anything much about a system, it is usually important to have some understanding of what the symbols denoted. While this is in fact not known for all of the systems we will discuss in this chapter, it is understood at least reasonably well for most of them. And again, such well-understood systems tend to be newer, and most of them postdate the invention of writing. For systems that are not in current use, understanding of what kinds of things the symbols denoted has depended, to a greater or lesser extent, on written sources. Even systems, parts of which have been “deciphered” in modern times, such as khipu (Inka civilization, South America, see Sect. 3.6.12), have benefited from contemporaneous written sources that at least described some aspects of how the system was used. So in the case of khipu, it was known from early post-contact Spanish-language sources that the khipu were used to encode information important in administration, and Urton (2010) describes how Spanish colonial administrators, with the help of native khipu lenguas (“interpreters”) “translated” the contents of khipu accounts into Spanish. All of which, of course, is a useful thing to know if one is interested in understanding what at least some of the khipu were about. Obviously for preliterate cultures from thousands of years before the invention of writing, such clues are not available, so there is often little beyond theorizing that can be said about the non-linguistic systems these cultures possessed. Before we turn to the taxonomy of symbol systems, one other point needs to be mentioned, and that is the “magic” power that symbols had for pre-modern people. So-called “sympathetic magic” denotes to the power that symbols were believed to have over the real-world objects they denoted. Bégouen (1924), for example, refers to the fear often manifest in hunter-gatherer cultures when an anthropologist attempts to draw or photograph them. The depiction is believed to allow the bearer

3.3 Preliminary Taxonomy

27

to exert influence over the subject. Reinach (1903) originally proposed the notion of “hunting magic” as an explanation of the function of Cro-Magnon cave art, and while Clottes and Lewis-Williams (1998) argue against this interpretation, they do not deny that early depictions could be viewed by their creators as having a magical force. Indeed it is not just graphical symbols that could exert this force: (spoken) words also have an influence, which helps explain, for example why many cultures have taboos about naming living people after dead people, or for that matter even the avoidance of the number “4” in East Asian cultures because of the homophony of the Chinese-derived readings of the words for “4” and “death”. But the spoken word is of course evanescent: in order for it to have an influence it must be repeated. A graphical symbol on the other hand is more or less permanent: carve a symbol on a rock, and that symbol will be there every time you pass it. How powerful graphical symbols must have seemed to ancient peoples who believed in the power of symbols to control reality.

3.3 A Preliminary Taxonomy of Non-linguistic Symbol Systems For any taxonomy one must start with the dimensions along which the items are to be classified. For graphical symbol systems the following dimensions suggest themselves. First of all the function of the system, meaning what kind of information is encoded by the system. Second, the size of the system, in terms of the number of distinct elements: to a large extent this is determined by the function. Another dimension may be termed degree of multivalence, or as we discussed previously, the degree to which a symbol’s connotations may be prominent relative to its primary denotation(s). As with size, this is to some degree predictable given the kind of information the system is intended to convey. A formal system is typically not well served if the symbols are highly multivalent. On the other hand, religious symbology often depends on multivalence and indeed in such symbol systems, the many denotations and connotations of a symbol are central to the symbol’s power. Next one must consider the syntax of the system, that is the constraints, if any, on combining symbols in the system into longer ‘messages’. If there is a non-trivial syntax, what is its dimensionality? Modern writing systems in general are linear (1D), except that locally basic symbols may combine in ways that involve the use of a second dimension (Sproat, 2000), and thus may be said to be 1.5D. Non-linguistic symbol systems may also be fully two-dimensional. Finally, what is the articulation of the system, the notion from semiotics introduced in Sect. 2.5? Each of these issues is described in more detail in what follows.

28

3 A Taxonomy of Non-linguistic Symbol Systems

Function What kinds of things do the symbols of the system denote? What kind of information is the system used to convey? We can distinguish at least the following categories (Sproat, 2014; Morin et al., 2020): • Simple informative systems. In simple informative systems, the symbols by and large convey a single piece of information. For example, graphics used in weather reports show icons depicting states of the weather such as whether it is sunny, partly cloudy, raining, etc. Further examples: traffic information signs, at least some uses of hobo signs and Gaunerzinken (Berendsohn, 2020; Streicher, 1928; Praßl, 2017)—see Sect. 3.6.11. • Emblematic systems. Symbols that indicate that the bearer is a member of a particular institution or rank or has the right to bear this particular kind of symbol, etc. Examples include: symbols of military rank and distinction, scouting merit badges, Phi Beta Kappa keys, and symbols of other scholarly fraternities. One might also consider grades on academic assignments or in courses as falling into this category. Special subclasses of emblematic systems include: – Heraldic systems. Heraldic systems usually represent a particular set of features of the bearer, including possibly marks of distinction. But such systems can be often highly combinatoric, involving “texts” built of many symbols, often with a quite rigid syntax. Examples: European heraldry, Japanese kamon, kudurrus, some functions of Totem poles (Barbeau, 1950). – Guild symbols. Familiar examples include barber poles, three balls denoting a pawnbroker, red lanterns indicating houses of prostitution. See Hunt (2012) on guild signs in German-speaking areas of Europe. • Religious Iconography. Religious symbols could also be characterized as a simple informative system, except that here though here the notion of “single piece of information” is much less clear, since such symbols are intentionally often highly multivalent in the meanings they evoke. Examples: Christian cross, star of David, star and crescent, dharmachakra, swastika (in its original Buddhist usage). • Formal systems. In a formal system, the individual symbols have well-defined meanings, and there are generally strict rules on how the symbols may be combined. Examples: mathematical symbols, alchemical symbols, chemical notation, Feynman diagrams (Kaiser, 2005), programming flowcharts and Systems Biology Graphical Notation (Le Novère, 2009). • Performative systems. Performative systems indicate a sequence of actions to be taken to perform a particular task. Perhaps the most familiar to many people today is are the wordless assembly instructions that come with furniture from Ikea. Other examples: Silas John’s system for notating Apache prayers (Basso & Anderson, 1973), musical notation (McCawley, 1996), dance notation and other movement notation systems (Farnell, 1996), chess notation, systems that can be used to indicate the sequence of plays in a game, knitting patterns (Harris, 1995), and programming languages. A particular subcategory of performative systems are mnemonic systems, where the point of the symbols is to evoke a narrative,

3.3 Preliminary Taxonomy

29

such as a message to be transmitted, on the part of a carrier of the message. Example: Australian message sticks (Kelly, 2020). • Narrative systems or “prompt” texts. Narrative systems are used to recount stories and as such are in one sense the most language-like of the non-linguistic systems. In narrative systems, the symbols typically represent actors or events in the story in an iconic way. Examples: Dakota winter counts (Mallery, 1883), (probably) Naxi symbology (Li, 2001), Yukaghir iconography (Sampson, 1985; DeFrancis, 1989). • Purely decorative systems. Some systems that involve what are commonly thought of as symbols seem nonetheless to be purely decorative. In such systems, the symbols may derive historically from symbols that had meanings or ranges of meanings, but where those meanings are quite irrelevant in their current use. A clear modern example is the use of Chinese characters in body tattoos, worn by people who may be completely unaware of the character’s original meaning, and use them solely because they look “cool”. Other examples are Pennsylvania German barn stars (Graves, 1984), and Asian emoticons (e.g. Bedrick et al. (2012)) where in the latter case symbols from various scripts and other symbols are combined into a “text” that represents an image, usually a face. The face itself may convey some sort of emotion (e.g. sadness, via depiction of crying), but other than that has no real meaning and mostly functions to decorate the surrounding text. In reality, symbol systems are rarely of just one category. Symbols of guild, for example, serve the dual function of indicating that the bearer is a member of a particular profession, but at the same time serves to simply inform the viewer that a particular set of services can be had at a particular place. Size What is the size of the symbol set? Taking this in broad terms one can ask whether it is small (fewer than 50 symbols), large (hundreds or thousands of symbols) or medium (between these two). The size of the set can often be related to the function of the system and what kinds of information it encodes. Degree of Multivalence As already noted, any given symbol can carry a range of meanings some of which may properly be called denotations, and others connotations, the extra baggage that a symbol may bear because of its history. Most symbols are therefore multivalent in the sense that they carry lots of meanings in addition to whatever may be considered the main function of the symbol. But beyond this, some symbol systems, depending on their function, may depend more or less on multivalence. A formal system, such as mathematics, is ill-served if the meanings of the symbols are not precise. On the other hand, in religious symbology, the multivalence of symbols may be the key function. The multivalent interpretation of the Christian cross, for example as a symbol of Christ’s crucifixion, as a symbol of power, or its association with death can be directly traced to the fact that the symbol,

30

3 A Taxonomy of Non-linguistic Symbol Systems

respectively, derives from a Roman instrument of torture, was used as a symbol of an all-powerful church, and has frequently been used to adorn grave markers. These multiple meanings of Christ’s sacrifice, the power of the church, death, and so forth are all immanent in the Christian cross, and are critical to its function in the religion. The cross does not simply mean “Christian”: it carries all these other connotations with it. The “mystique” frequently associated in popular imagination with symbols relates to this multivalence, and it has often been transferred, inappropriately, to symbols that in their basic function are anything but mystical. Thus Kabbalists like Franciscus Mercurius van Helmont (1614–1699) sought hidden truths in the Hebrew text of the Bible, and Helmont’s theory in particular found hidden meanings in Hebrew letters. Helmont’s basic theory was that the Hebrew letters represent the shapes the tongue takes on when making the sounds associated with the letters,6 but beyond this, Helmont (1667) thought he had found other hidden meanings as in this example with the Hebrew letter he (‫)ה‬: When, therefore, the mouth opens, it must close again when it seeks rest; and then the tongue rises perceptibly, as is suitable for producing of the next letter. The name He is a definite article, meaning “this” or “that.” A certain mystical meaning concerning generation seems to be hidden in this letter, for all animals produce this sound when panting from the heat of lust. And for this reason it is probable that a He, but no other letters, was added to the names of Abraham and Sarah because many people were descended from them. (Translation by Coudert and Corse, pages 115–117.)

Thus we have, in Helmont’s view, he as the symbol for a sound (/h/), but in addition as the symbol for a morpheme (the definite article, written with he), as well as a symbol with the deeper meaning related to generation. Another, genuine, case of multivalence is the Ndembu “Mystery of the Three Rivers” as described by Turner (1967) (who uses the term “multivocal” with the same meaning as “multivalent” here): This mystery (mpang’u) is exhibited at circumcision and funerary cult association rites. Three trenches are dug in a consecrated site and filled respectively with white, red, and black water. These “rivers” are said to “flow from Nzambi,” the High God. The instructors tell the neophytes, partly in riddling songs and partly in direct terms, what each river signifies. Each “river” is a multivocal symbol with a fan of referents ranging from life values, ethical ideas, and social norms, to grossly physiological processes and phenomena. (Turner, 1967, page 107)

Symbols, due to their origins, their supposed origins, and their accreted history, come to us as a package of all those past associations, either real or imagined. The mystique is that and that their use supposedly invokes that history, brings it to life, as it were. In the Ndembu Three Rivers example, the origin of the symbol is emphasized

6 Nonsense

of course in the case of the Hebrew alphabet as amusingly pointed out by Kempelen (1792), who noted the absurdity of claiming that any of the Hebrew letters resemble the shapes taken on by the articulators when making the corresponding sounds. He wondered why Helmont could not have undertaken the necessary introspection to realize that his own theory was ludicrous.

3.3 Preliminary Taxonomy

31

in its use, but normally the everyday use of a symbol is much more mundane and the symbol’s etymology can be effectively forgotten. But to what extent a symbol’s multivalence is critical depends upon the function of the symbol system in which it is used. Syntax While in some systems symbols may tend to occur alone, one often finds “messages” consisting of several symbols arranged into a “text”. As we saw in the examples in Chap. 1, these combinations may either be trivial or meaningful. I can arrange a set of traffic information signs into a “text”, but that text will have no particular meaning other than the combination of the individual signs. So the “fuel”, “phone”, “accommodation” and “food” signs may be arranged together on a sign post, but the meaning is no more than just that here (i.e. at the upcoming exit) one can find fuel, phones, accommodation and food. Even though we can arrange the signs linearly, there is no ordering and there is therefore no syntax here. We are simply dealing with the linear arrangement of a set (taken from a larger set) of symbols, and we know it is a set in this case since the symbols do not repeat: it would be unusual to find two instances of a “fuel” symbol in such a message. Indeed repetition of symbols may indicate that the arrangement of symbols into a message is conveying more than just the union of the meanings of the individual symbols. In five day weather forecasts, icons can be used represent the weather state for the given day—sunny, raining, cloudy, and so forth. In this case the symbols can repeat, and the syntax actually conveys some information, namely the sequence of weather states as they are predicted to unfold over time. In formal language terms one could notate the trivial syntax of weather expressions as a simple regular expression so that if Σ is my set of symbols, then a “message” in this symbol system is simply Σ+ , or in other words a string of one or more symbols from the set. Unlike the traffic sign case, here symbols can repeat, and order matters: fuel-accommodation means the same thing as accommodation-fuel; sunny-cloudy does not mean the same thing as cloudy-sunny. The repetition of symbols is critical for distinguishing these two cases: in the case of traffic signs, any given “message” is nothing more than a set of symbols corresponding to the set of things one can find at a given exit. One can arrange them linearly and, as we noted in Chap. 1, there may even be conventions for placing certain elements of the set before others. But the linear arrangement itself carries no meaning. On the other hand, with weather symbols the linear arrangement does carry meaning. In the first case there is really no syntax; in the second there is a simple (trivial) syntax. Going further, in mathematical notation, the arrangement of symbols matters. Restricting ourselves for the moment to arithmetic expression only, one could for example describe the syntax of such expressions by saying first that an equation E is any two arithmetic expressions that are joined by an equals sign, and that an arithmetic expression X is either a number n, or involves one or another of

32

3 A Taxonomy of Non-linguistic Symbol Systems

the operations O of addition, subtraction, multiplication or division applied to two expressions.7 In formal terms: E →X =X X → n|X OX O → +| − | × | ÷

In the above, the symbol ‘→’ means that the expression on the left can be rewritten as that on the right, and ‘x|y’ means ‘x or y’. For example, one expression that fits the grammar is: 1 + 3 × 4 = 10 + 3

An expression that does not is: 1 3 4 × + = 10 3 +

This grammar is only for simple arithmetic: the full syntax of all possible mathematical expressions is much more complicated, and would involve some two-dimensional syntactic operations to describe layout in matrices and so forth. So one can ask whether the symbol system has a syntax. If it does not the value for the syntax dimension is none. If there is a syntax it could be trivial, as with weather icons, or complex. While the syntax of a symbol system is in part derivable from the function of the system, it is not always trivially derivable since there may be various factors at play. For example, since the individual symbols of Korean hangul designate phonological segments, there is no reason a priori to arrange them in anything other than a linear fashion, as one finds in Greek and Greek-derived alphabets. But hangul is not completely linear: the segmental symbols are arranged into syllable-sized blocks (see directly below). The partly non-linear arrangement into blocks serves a couple of other functions. First it allows one to indicate syllables (or in Modern Korean more properly morphemes), since each block corresponds to one of these. Second, at the time of its invention in the fifteenth century, it made the system look less strange from the vantage of someone who was used to reading Chinese characters, since these also group their components into blocks. This brings us to the next point about syntax. 7 We assume for the current discussion that all expressions are binary, so that 3 + 3 is an expression but 3 + 3 + 3 is syntactically one expression 3 + 3 nested inside another . . . + 3.

3.3 Preliminary Taxonomy

33

The English writing system is one-dimensional.

Fig. 3.3 The strictly one-dimensional arrangement of letters in English

Fig. 3.4 Examples of the arrangement of symbols in syllable-sized blocks in Hangul for the sentence ‘Hangul is a 1.5-dimensional writing system’. Overall text is linear but within each syllable block the individual letters (jamo) are arranged not only left to right, but also in vertical arrangement. Thus for example in the first block 한 the individual letters ㅎ and ㅏ are written left to write, but both are above ㄴ. In the syllable 원 there are four letters, with ㅇ above ㅜ, the combination of these to the left of ㅓ and this total combination above ㄴ

Fig. 3.5 A simple case of quartering in Heraldry illustrating two-dimensional arrangement. This shows a common technique whereby arms A and B are combined into an arrangement with AB in the first row and BA in the second. See also Sect. 3.5.2. Source: Wikipedia. https://commons. wikimedia.org/w/index.php?curid=7637232. Author: Balmung0731. License: CC BY-SA 3.0

Dimensionality of Syntax If there is a syntax, is it strictly linear so that one can describe it in terms of the linear concatenation of symbols? Is it 1.5 dimensional, so that macroscopically messages in the system are linear, but at a finer grained level, symbols may be arranged in two dimensions? Or is it fully two dimensional, so that to describe the syntax one must consider the layout in 2D space? These various options are illustrated in Figs. 3.3, 3.4, and 3.5, two with examples from writing systems and one with an example of quartering from European heraldry. As we shall note, quartering is in principle recursive meaning that one can requarter arms that are already quartered. Complex examples of multiple quarterings do exist: see Fig. 3.23 later in this chapter.

34

3 A Taxonomy of Non-linguistic Symbol Systems

Table 3.1 Summary of the taxonomic dimensions and possible values. Note that some values may also be marked in the systems detailed in Sect. 3.6 as “NA” if the value is unclear, or if not applicable—e.g. the dimensionality of the syntax in a system with no syntax Function Size Multivalence Syntax Dimensionality Articulation

Decorative, emblematic, formal, heraldic, narrative, performative, religious symbols, simple informative Small (< 50 symbols), medium, large (hundreds or thousands) None to high None, trivial, complex 1, 1.5, 2 0, 1, 2

Articulation Articulation was discussed in Sect. 2.5, and relates to the number of levels of information found in a code. To recap, if the basic symbols in the code are themselves meaningless, and combine to form meaningful symbols, that code is said to be doubly articulated. If the basic symbols themselves carry the meaning, then the code is singly articulated. Finally, unarticulated codes are those codes where the basic symbols carry the meaning, but do not themselves form a coherent system, and where if one needs a new symbol, then it is invented from scratch. The dimensions discussed previously are summarized in Table 3.1.

3.4 Examples of Systems Section 3.6 gives an analysis of twenty six symbol systems of a variety of types, with a summary of their taxonomic features per the previous discussion as well as can be determined, and further details on the system. In addition, a further sixteen systems are summarized. In each case we give a taxonomic table that itemizes the dimensions. For example, traffic signs have the following entries: Function Simple informative

Size Medium

Multivalence None

Syntax None

Dimensionality NA

Articulation 1

Two systems described there, heraldry and Japanese kamon are only briefly summarized since we will treat these in far greater detail in the upcoming section.

3.5 Kamon/Heraldry

35

Fig. 3.6 Some examples of tamgas from the Arys River Basin, first century BCE–third century CE. Source: Smagulov and Yatsenko (2019), page 164, Figure 1(8). Used with permission of the UNESCO International Institute for Central Asian Studies, Samarkand

3.5 An in Depth Comparison of Two Non-linguistic Symbol Systems: Japanese kamon and European Heraldry Marks of ownership or manufacture by individuals or clans are one of the common functions of non-linguistic symbol systems. The function of such markings include identifying property as one’s own, identifying the maker, or identifying individuals in situations where visual identification of the individual themselves might be difficult — e.g. in battle. Simple marks of this kind are potter’s marks,8 used to mark a piece of pottery as being from a particular manufacturer. In Central Asia, some Tamgas (Voyakin, 2019) (see Fig. 3.6) were used as identifiers for clans. Regensburg Cathedral has over 10,000 marks by stonemasons indicating their contribution to the construction of the cathedral over several centuries during the late middle ages (Fuchs, 2009). In this section we examine two of the more elaborate such systems of identification, namely Japanese kamon 家紋, literally “family signs”, often also called just 紋 mon, or “crests”, (Chamberlain, 1886; Chikano, 1993; Honda, 2004; Takasawa, 2011; Morimoto, 2013), and European heraldry (Burke, 1884; Fox-Davies, 1909; Friar & Ferguson, 1993; Slater, 2002). While heraldry may be more familiar to many readers than kamon, we discuss the latter first since, as we shall see, heraldry has one feature that makes it more complex than kamon, and therefore makes kamon in some sense logically prior to heraldry. The similarity of function between kamon and heraldry has long been recognized. Thus Fox-Davies (1909) notes (page 12): The family tokens (mon) of the Japanese, however, fulfil very nearly all of the essentials of armory, although considered heraldically they may appear somewhat peculiar to European eyes.

8 https://www.britannica.com/art/potters-mark.

36

3 A Taxonomy of Non-linguistic Symbol Systems

As we shall see, the two systems indeed shared many properties, both functional and structural: • Both were used to represent either clans, in kamon or, in British heraldry at least, individuals. • Both were used in particular in battle situations to identify armies associated with clans/individuals, and therefore had to be easily identifiable from afar. • Both frequently made use of stylized depictions of plants and animals, though in kamon plant varieties vastly outnumbered animals, whereas in heraldry the reverse was true. • Both also made use of many simple geometrical figures. • In both it was not uncommon for the chosen motif to allude to some property of the name of the individual or clan being represented. In the case of heraldry this involved visual puns, called canting arms (in French armes parlantes). • Written language could also be incorporated into the design in both cases. The use in battle situations led to structural constraints on the systems. In the case of kamon the emblems were typically bicolor. In the case of heraldry, the so-called “first rule of heraldry”, or the “rule of tincture”, which proscribed against using “metal” on “metal” or “color” on “color”; see below. In modern times, both systems have become associated not only with actual individuals or clans, but legal individuals—corporations. Many of the familiar logos of some older Japanese businesses, such as the three diamonds of Mitsubishi—the name “Mitsubishi” literally translates as “three lozenges”, originally representing Trapa japonica, a plant with somewhat lozenge-shaped leaves—have their origins in kamon. There are however some important differences between the two systems. Already noted is the fact that in British heraldry in particular, the arms are associated with an individual, not a family; whereas in kamon, the emblem may be associated with a clan. In Japan use of kamon is largely unregulated, whereas in Europe one generally has to be granted the right to use arms: in Britain, for example, this grant is managed by The College of Arms (www.college-of-arms.gov.uk). As a result kamon are far more widely distributed than arms in heraldry and many ordinary families have their own mon. Walk through any graveyard in Japan and you will see mon on practically every family tomb. Differences between Europe and Japan in the right of women to be armigerous (in heraldic terminology, to “bear arms”) and to transfer her arms to those of her husband, led also to in important syntactic difference between the two systems. To wit: kamon were essentially syntactically simple emblems, whereas arms in European herald are potentially syntactically complex, with possibly many levels of embeddings—arms within arms—where these levels record marriage, inheritance from both the father’s and mother’s side, and additional marks of distinction that might be won by the arms bearer. In this section we will describe these two systems, with a view to understanding this important difference. For European heraldry we will specifically focus on British heraldry, but other European systems are similar in general properties, though there

3.5 Kamon/Heraldry

37

are differences of detail. The main take-away message will be that symbol systems evolve syntactic complexity when they need to encode more complex information than can easily be conveyed with a single symbol. And that such complexity is merely a product of the complexity of the underlying information being conveyed. Crucially, complexity in a symbol system need having nothing to do with that system representing natural language: it may merely reflect the complexity of the underlying concepts being conveyed.

3.5.1 Kamon The origin of 家紋 kamon, dates to the later part of the Heian Period (CE 794-1185) (Okudaira, 1983). They were originally used as marks on property, indicating family ownership. Later they came to be used on banners, and from after the Kamakura period (CE 1185–1333) onwards were used consistently in battles, serving much the same function as European heraldry did to identify troops. Kamon are frequently circular in shape, and typically have a simple motif. By far the most common motifs were plants, typically leaves or flowers of a particular species. Honda (2004) lists 2486 mon related to plants. The next largest sets involve human-made artifacts (902 mon); abstract figures (693 mon); and animals (395 mon). This is in marked contrast to European heraldry, where animal motifs predominate. Not infrequently, mon may also consist of one or more kanji (225 mon, in Honda’s listing). Figure 3.7 shows two mon, the first being one of the most famous, the

Fig. 3.7 The crests of: (a) Tokugawa Ieyasu (1543–1616), with three hollyhock leaves (Japanese: 葵 aoi); (b) Ishida Mitsunari (1559–1600), with a text 大一大万大吉 dai-ichi dai-man daikichi, literally “great one, great ten thousand, great fortune”, but with the interpretation “if one (person) does everything for ten thousand, and ten thousand do everything for one, then that will lead to great fortune.” Source: Wikipedia. Left panel: Source: https://ja.wikipedia.org/wiki/ %E3%83%95%E3%82%A1%E3%82%A4%E3%83%AB:Mitsubaaoi.svg. Right panel: Source: https://ja.m.wikipedia.org/wiki/%E3%83%95%E3%82%A1%E3%82%A4%E3%83%AB: %E5%A4%A7%E4%B8%80%E5%A4%A7%E4%B8%87%E5%A4%A7%E5%90%89.svg Author: 百楽兎. License: CC BY-SA 3.0.

38

3 A Taxonomy of Non-linguistic Symbol Systems

ะ⭠ Hotta 唂伵Ȁㄚᵘ⬌

⸣ᐍ Ishikawa ㅩㄌ㛶

kuromochi ni tate mokk sasa rind ‘vertical quince in a black rice cake’ ‘bamboo and gentian’

≤䟾 Mizuno ѨȀ・Ƕ⋒☹

ᡨ⭠ Toda Ȅǿȡ‫ޝ‬Ɍᱏ

maru ni tachi omodaka ‘standing Sagittariainacircle’

hanare muttsu hoshi ‘sixseparatedstars’

Fig. 3.8 A sample of mon of various lords of Matsumoto Castle, Nagano Prefecture

hollyhock crest of the Tokugawa family; and the second being the kanji-text crest of Ishida Mitsunari, Tokugawa Ieyasu’s ill-fated opponent in the Battle of Sekigahara (October 21, 1600). Figure 3.8 shows an assortment of mon from various lords of Matsumoto Castle, Nagano Prefecture, which include a variety of motifs commonly found in mon. Comparable to the blazon of heraldry (Sect. 3.5.2), kamon has a standardized mode of ‘reading’ the emblem. They are read from the outside in, starting with a description of the surround circle, if any, then a description of what is inside that, and so forth. For example consider, Fig. 3.9, which can be read as 丸に井桁に竹に雀 maru-ni i-geta-ni take-ni suzume, ‘a circle surrounding a well frame surrounding bamboo, with a sparrow’. See Chikano (1993), pages 10–14, for further details. As we will also note for heraldry, the story behind a family’s choice of crest can be various. Often it relates to a historical incident as with the case of the unusual 武富 Taketomi family crest with a bow and two arrows (Fig. 3.10), which supposedly is due to an award of the emblem to the army of 坂上田村麻呂 Sakanoue no Tamuramaro (758–811), a general of the Heian period, for meritorious military service.9

9 http://www.bbweb-arena.com/users/takedomi/taketomi_003.htm.

3.5 Kamon/Heraldry

39

Fig. 3.9 How to read a mon: 丸に井桁に 竹に雀 maru-ni i-geta-ni take-ni suzume ‘a circle surrounding a well frame surrounding bamboo, with a sparrow.’ Source: Morimoto (2006), page 64. Used with permission of the author.

Fig. 3.10 Unusual mon of the 武富 Taketomi family with a bow and two arrows

As with canting arms in heraldry, the selection of the motif is often related to the family name. This is more clear with some motifs as compared to others. Thus for mon based on the wisteria 藤 fuji, Honda (2004) lists 13 families that used this motif in their mon (page 150), and which have the character 藤 as part of the spelling of their name: 藤原 Fujiwara, 内藤 Nait¯o, 佐藤 Sat¯o, 武藤 Mut¯o, 近藤 Kond¯o, 尾藤 Bit¯o, 首藤 Shut¯o, 安藤 And¯o, 後藤 Got¯o, 伊藤 It¯o, 進藤 Shind¯o, 斎藤 Sait¯o, 加藤 Kat¯o. With the exception of Fujiwara, all of these names use the Sino-Japanese reading of the character t¯o (in some cases in its rendaku form d¯o), rather than the native pronunciation fuji. See Fig. 3.11 for an example of a wisteria design for the 藤井 Fujii family. Some further examples: 瓜 uri ‘gourd’ is used by the 瓜生 Ury¯u family (Mori¯ moto, 2013). 麻 asa ‘hemp’ is used in the mon of the 大麻比古神社 Oasa Hiko Shrine. 稲 ine ‘rice plant’ is used for the 稲生 Inao and 稲富 Inatomi families (Honda, 2004). 海老 ebi ‘shrimp’, which is a somewhat unusual motif is used in mon for the city of 海老名 Ebina (Takasawa, 2011). Family names with 竹 take ‘bamboo’, such as 竹田 Takeda are often associated with mon with a motif involving 根笹 nesasa, Pleioblastus chino, a kind of bamboo (Takasawa, 2011). The Kyoto and Tokyo restaurant Kiku no I uses a chrysanthemum mon (Fig. 3.12). One difficulty with this issue is that all of the motifs above are also used with names that do not apparently relate to the motif, but of course we tend to notice when

40

3 A Taxonomy of Non-linguistic Symbol Systems

Fig. 3.11 “Falling wisteria” 下り藤 sagari fuji emblem of the 藤井 Fujii family

Fig. 3.12 Emblem of the 菊乃井 Kiku no I restaurant: 井桁に菊 i-geta ni kiku ‘chrysanthemum in well-frame’

something matches our expectations, and ignore it when it does not. In Section 3.7 at the end of this chapter I provide some statistical support for the idea that there is indeed an association between chosen motifs and names. One important difference between heraldry and mon in this regard is that unlike canting arms, there are very few true puns. Of the more than three hundred mon that are listed by Chikano (1993) as being related to the family name, virtually all of the names actually share a kanji with the name of the motif. For example, in the first set of examples above, the character 藤 does mean ‘wisteria’, and the family names all share that character. There are just a handful in Chikano’s lists that seem to be true puns. One example is the 雪輪に麻の葉 yukiwa ni asa no ha ‘hemp leaf in a snow wheel’ used for the 阿佐 Asa family (Chikano, 1993, page 17); see Fig. 3.13. Kamon were typically inherited father to son (Fig. 3.14) but, as in European heraldry, women could also have mon. These were called 女紋 onna mon “women’s crests” (Morimoto, 2006). The typical pattern for onnamon was to inherit motherto-daughter (Fig. 3.15). But there were other possibilities: Sometimes women’s crests could be inherited from other female relatives. Sometimes crests could be individually designed 私紋 watakushi mon or ‘private crests’. Onnamon could also be inherited by a daughter from the main family kamon. A typical situation would be a samurai family in the Edo period whose daughter married into another

3.5 Kamon/Heraldry

41

Fig. 3.13 雪輪に麻の葉 yukiwa ni asa no ha ‘hemp leaf in a snow wheel’ emblem used for the 阿佐 Asa family, apparently a pun

Fig. 3.14 Typical patrilineal transmission of kamon. Source: Morimoto (2006), page 27, used with permission of the author

samurai family and took the family crest as her onnamon as part of her dowry. One situation where this might occur was when a family was defeated by another family, and agreed to provide their family crest via the daughter in marriage to the winning family (Morimoto, 2006, page 91). Often when women’s crests were derived from the main family crest via various transformations that made them seem more feminine: some examples of these can be seen in Fig. 3.16.

42

3 A Taxonomy of Non-linguistic Symbol Systems

Fig. 3.15 Typical matrilineal transmission of onnamon. Source: Morimoto (2006), page 28, used with permission of the author

Fig. 3.16 Some transformations used in deriving onnamon from kamon. From left to right: shrinking; color inversion; reversal. Source: Morimoto (2006), page 59, used with permission of the author

But women’s mon remained in the female line and never were combined with the kamon into which the woman married: she adopted her husband’s name and became part of his family. And herein lies an important difference between kamon and European heraldry, one that led to a difference in the syntactic combinatorial possibilities of the former, compared to the latter to which we now turn.

3.5 Kamon/Heraldry

43

3.5.2 British Heraldry A complete achievement of arms has multiple components. From top to bottom: • The crest, a figure such as an animal at the top of the achievement, under which is, • A helm, or helmet, possibly with a wreath, and mantling. • Under this is the shield, upon which are displayed the actual arms. • To the left and right from the viewer’s point of view are possible dexter and sinister supporters, typically animals or human figures. • Around the shield may be a circlet, which will contain a motto reflecting the highest order the individual has achieved. • At the bottom are the motto and any insignia representing honors accrued to the individual. See Fig. 3.17 for an example that contains all of these except for the insignia and wreath. In this discussion we will concern ourselves only with the central part of the achievement, the shield. In what follows we only touch upon the more salient aspects of the rules of heraldry: the classic work on this topic, particularly for British heraldry, remains that of Fox-Davies (1909),10 and the interested reader is referred there for considerably more detail. We will make considerable reference to blazon, the formal language used to describe a coat of arms. Blazon has its own peculiar terminology for the colors and metals (see directly below), the charges (i.e. the figures that appear on the shield), the additional devices used on the shield, and the layout. A correctly blazoned set of arms starts out with a description of the color (or metal) of the field (i.e. background), and proceeds to a description of the charges, their colors (metals), and additional devices. If the arms are quartered (see below), then the blazon would start by noting that, and proceed to describe the quarters. A simple example of a blazon for an unquartered shield would be: Sable, a lion passant guardant or.

meaning that the field is black (sable), and superimposed on it is a gold (or) (heraldic) lion, walking (passant), with its head facing (guardant) the viewer. Blazon looks a lot like an odd mix of French and English, with some strange vocabulary, but should really be thought of as a formal language. Already alluded to previously is the “first rule of heraldry” or the “rule of tincture”, which proscribes against placing metal on metal or color on color. The metals, were gold and silver, termed respectively or and argent in English blazon. Conventionally these are actually, respectively, yellow and white—the use of yellow and white as colors being rare, though attested. The colors are red (gules), green (vert), blue (azure), black (sable) and more rarely purple (purpure), murrey 10 Available

open-source on the Internet Archive, among other places.

44

3 A Taxonomy of Non-linguistic Symbol Systems

Fig. 3.17 The Royal Coat of Arms of the United Kingdom, consisting of the crest (lion with a crown), standing on the helm (also with a crown), a mantling, the shield, two supporters (dexter: lion; sinister: unicorn), a circlet containing the motto Honi soit qui mal y pense (“shame on whomever thinks bad of it”) and the royal motto, Dieu et mon droit (“god and my right”). Source: Wikipedia. Image is in the public domain. https://en.wikipedia.org/wiki/Royal_coat_of_arms_of_ the_United_Kingdom#/media/File:Royal_Coat_of_Arms_of_the_United_Kingdom.svg. Author: Not given. Image is in the public domain

(somewhere between red and purple, termed sanguine in blazon) and orange (tenné). The origin of this rule seems to have been one of convenience of identifying a shield from a long distance, for example in a battle situation. If one placed silver on gold, then the details of the arms would be hard to see but, say, red on gold would be easier. In this usage then, heraldic shields were similar in function to the kamon of the Kamakura period. Instead of colors or metals, a field may also have a fur, the main two of which are termed ermine—in its basic form, white covered with black spots; and vair (Fig. 3.18), which resembles lines of repeated obelisks. Since these involve repeated shapes, the furs may also be combined with colors. Charges may be the so-called ordinaries, a closed set of mostly basic geometrical shapes; and other charges, an open-ended set of animals, mythical creatures, humans, plants and other objects. Amongst the animals by far the most favored symbol was the heraldic lion in various forms: Fox-Davies (1909) devotes a whole chapter to the lion. Animals and other non-ordinary charges were typically stylized and depicted in one

3.5 Kamon/Heraldry

45

Fig. 3.18 The vair fur. Source: Wikipedia. https://en.wikipedia.org/wiki/ Vair#/media/File:Arms_of_ Beauchamp_(of_Hatch).svg. Author: Sodacan. License: CC BY-SA 4.0

Fig. 3.19 Examples of some common heraldic charges: (a) the most common charge, the heraldic lion, in the form lion passant gardant (Fox-Davies, 1909, Figure 278, page 178); (b) a leopard in the form leopard passant gardant (Fox-Davies, 1909, Figure 326, page 192); (c) a stag in the form stag at gaze (Fox-Davies, 1909, Figure 383, page 208). Source: Fox-Davies (1909), Figure 278, page 178; Figure 326, page 192; Figure 383 page 208. Work is in the public domain in its country of origin and other countries and areas where the copyright term is the author’s life plus 70 years or fewer

of the heraldic colors or metals. Natural looking and colored charges were termed proper in blazon. Specialized terminology is used to describe specific features of charges. For example, a lion may have its tongue extended in which case the term langued would be used: a lion rampant or langued gules would denote a gold lion in the rampant (forefeet in the air) position, with a red tongue protruding. Some common charges are shown in Fig. 3.19. The choice of a particular charge may be for various reasons. In some cases a charge may reflect something in the history of the individual’s family. Fox-Davies (p. 201) cites the case of the Lane family of King’s Bromley where a roan horse is used—though in the crest, not on the main shield—to commemorate the actions of an ancestor Jane Lane, who rode such a horse, hiding Charles II as her manservant

46

3 A Taxonomy of Non-linguistic Symbol Systems

after his defeat by Cromwell at the Battle of Worcester (September 3, 1651). In the common case of canting arms, the charge is usually a pun on the family name: for example the Verhammes shield with or, three hams sable (Fox-Davies, 1909, p. 200). But the important point to bear in mind is that whatever the origin of the choice of the shield design, the final shield is “lexicalized” as being associated with the given individual. As noted in Sect. 3.5.1, in kamon by far the most common ‘charges’ were plants. In heraldry, in contrast, animals are much more favored. But, so far, with the exception of a broader range of tinctures, and of course a different set of charges and distributions of charges, there is much in common between kamon and heraldry. Where the systems diverge is primarily in the matter of marriage, and how that was represented in the system. In kamon it was not represented at all. In contrast, in heraldry, it was represented by impalement, quartering or escutcheons of pretence, all falling under the general rubric of marshalling of arms. We deal with each of these cases next. Impalement of two sets of arms refers to the practice of dividing a shield into two equal fields, and placing one set of arms on one half and the other set of arms on the other. This was a conventional way to combine arms of the husband and wife after marriage. More specifically: The arms of man and wife are now conjoined according to the following rules: If the wife is not an heraldic heiress the two coats are impaled. (Fox-Davies, 1909, page 526)

By convention the husband’s arms occupied the dexter half—that is to say the right side of the shield as viewed from the point of view of the bearer of the shield, but the left side from the point of view of the viewer; and the wife’s arms occupied the sinister half. See Fig. 3.20. Note in this and subsequent figures that the wife’s arms are displayed prior to marshalling on a lozenge rather than a shield: this was the conventional way to display female arms in heraldry. Fig. 3.20 An example of impalement. Source: Wikipedia. https://commons.wikimedia. org/wiki/Category:Impaling_ in_heraldry#/media/File: Impalement_demo.svg. Author: Balmung0731. License: CC BY-SA 3.0

3.5 Kamon/Heraldry

47

Fig. 3.21 An example of an escutcheon of pretense Source: Wikipedia. https://en.wikipedia.org/wiki/ File:Escutcheon_of_ pretence_demo.svg. Author: Balmung0731. License: Gnu Free Documentation License, Version 1.2

But there is another way in which a husband and wife’s arms may be combined. Following on from the previous quotation, Fox-Davies says: If the wife be an heraldic heir or coheir, in lieu of impalement the arms of her family are placed on an inescutcheon superimposed on the centre of her husband’s arms, the inescutcheon being termed an escutcheon of pretence, because jure uxoris she being an heiress of her house, the husband “pretends” to the representation of her family. (Fox-Davies, 1909, page 526)

See Fig. 3.21. In kamon, there is no equivalent to the heraldic tradition of the husband’s adoption of the arms of his wife, whether heiress or not, upon his own. In traditional Japanese marriage, only the male line was considered important. Since the late Kamakura period the rights of women to inheritance and property had been eroded so that in the later feudal period, women became purely domestic and female names essentially disappeared from family genealogies (Tonomura, 1990). So fixated on the male line was Japanese society that if a family lacked a son they might resort to the adopting a man as their son, often, though not always, the husband of their daughter. This tradition, called 婿養子 mukoy¯oshi, was practiced in families that had no male heirs, but wished nonetheless to maintain their line. Indeed, mukoy¯oshi has survived into the modern age and many business families have maintained their name over multiple generations by means of this trick (Oi, 2012). The man in question would give up his own family, and be adopted into his new family, setting up a situation in which “he takes the surname and Buddhist sect of his adopter, worships the latter’s ancestors, and prepares to take over the family occupation” (Smith & Beardsley, 2004, page 35); though not mentioned specifically, one assumes that he also adopted his new family’s kamon. In such an environment, there was no motivation to incorporate a woman’s mon into that of her husband. Quartering of arms occurred when a wife, an heiress with children, dies. Upon her death, the son of the marriage may quarter his mother’s arms with those of his father. To quote Fox-Davies:

48

3 A Taxonomy of Non-linguistic Symbol Systems

Fig. 3.22 An example of quartering, repeated from Fig. 3.5. Source: Wikipedia. License: CC BY-SA 3.0. Same as Fig. 3.5

Providing the wife be an heiress …the son of a marriage after the death of his mother quarters her arms with those of his father, that is, he divides his shield into four quarters, and places the arms of his father in the first and fourth quarters, and the arms of his mother in the second and third. That is the root, basis, and original rule of all the rules of quartering, but it may be here remarked, that no man is entitled to quarter the arms of his mother whilst she is alive, inasmuch as she is alive to represent herself and her family, and her issue cannot assume the representation whilst she is alive. (Fox-Davies, 1909, page 543).

Note that the first quarter is the upper dexter half, the second quarter the upper sinister half, the third quarter the lower dexter half and the fourth quarter the lower sinister half. See Fig. 3.22, repeated from Fig. 3.5. There are other ways to combine arms in heraldry. For example a special dispensation (e.g. from the crown) may be placed upon a shield in a canton, a rectangle on the chief or top of the shield. But with impalement and quartering we already have a powerful syntax that can generate in principle an unbounded number of embedded shields. Let us represent the rule of impalement as in 1. below and quartering as in 2, 1. S → I(H, W) 2. S → Q(H, W) where S denotes a shield, H and W a husband and wife’s arms, I represents the operation of impalement and Q represents the operation of quartering. Then these rules can be applied recursively. For example suppose H1 and W1 marry and that W1 is an heiress. Then her son H3 , upon her death has the right to quarter her arms, yielding Q(H1 , W1 ). Assume a similar situation with H2 and W2 , where their son, for the same reason, can quarter their arms to Q(H2 , W2 ). This son does not have any male offspring, but he does have a daughter W3 , who subsequently marries H3 . She can then use her father’s quartered arms and her husband H3 can legitimately impale her arms so that we now get the impalement in 3, replacing his original arms in 1:

3.5 Kamon/Heraldry

49

Fig. 3.23 The arms of George, Marquess of Buckingham, an extreme case of quartering. Source: Wikipedia. Image is in the public domain. https://commons.wikimedia.org/w/index.php?curid= 2909155. Author: P. Sonard. Image is in the public domain

1. H3 → Q(H1 , W1 ) 2. W3 → Q(H2 , W2 ) 3. H3 → I(Q(H1 , W1 ), Q(H2 , W2 )) This process could in principle be continued if the particular details of marriages, births and deaths work out in the right way. Thus in heraldry, syntax allows one to construct a complex message in a shield, one that in principle allows the “reader” to reconstruct something about the history of the individual that bears the arms. Though extremely complicated examples of quartering may be rare, they do exist. See Fig. 3.23. See Fig. 3.24 for a more typical actual example of a complex shield. If the same principles applied in kamon, one could imagine finding derived mon like that depicted in Fig. 3.25. Sanada Nobushige (真田 信繁, 1567–1615) of Numata (modern Gunma Prefecture), married Chikurin-in (竹林院), adopted daughter of Toyotomi Hideyoshi (豊臣 秀吉, 1580–1649). They had two sons, the

50

3 A Taxonomy of Non-linguistic Symbol Systems

Fig. 3.24 A more typical example of a complex syntactic embedding in heraldry: the arms of Thomas Stanley, Earl of Derby (d. 1572) from Fox-Davies (1909), Figure 755, page 543. Blazoned as: Quarterly, 1. quarterly, i. and iv., argent, on a bend azure, three bucks’ heads caboshed or (for Stanley); ii. and iii., or, on a chief indented azure, three bezants (for Lathom); 2 and 3, gules, three legs in armour conjoined at the thigh and flexed at the knee proper, garnished and spurred or (for the Lordship of the Isle of Man); 4. quarterly, i. and iv., gules, two lions passant in pale argent (for Strange); ii. and iii., argent, a fess and a canton gules (for Wydeville). Fox-Davies notes that the lion rampant langued arms on the escutcheon of pretense are not those of Stanley’s wife Anne Hastings, who was not an heiress, and are difficult to account for. Source: Fox-Davies (1909), Figure 755, page 543. Work is in the public domain in its country of origin and other countries and areas where the copyright term is the author’s life plus 70 years or fewer

second of which, Sanada Morinobu (真田 守信, 1612–1670) survived. If he had kept his family name and if the European heraldic system had obtained, after his mother’s death he could have in principle quartered the arms of his father’s clan, the six coin 六文銭 roku mon sen, with the Pawlonia crest 五七の桐 go shichi no kiri of the Toyotomi clan. Such syntactic combinations are completely foreign to kamon. The closest thing to such combinatorics one finds is cases such as those described by Chamberlain (1886), who notes a Hakodate tea firm using the coin mon , where one of the dependents set himself up as a bookseller and derived his business’s mon from the main family’s mon by adding the character for ‘one’ 一 ichi below it . Such cases were however not systematic and did not follow any conventions like those of European heraldry.

3.5 Kamon/Heraldry

51

Fig. 3.25 Imaginary mon with quartering combining the kamon of the Sanada and Toyotomi clans

3.5.3 Structural Differences: Summary While Japanese kamon and European heraldry have a very similar function, differences in cultural history and conventions led to a radically different behavior in the two systems. Simply put, while kamon are syntactically simple, heraldry is syntactically complex due to the combination of arms in marriage and by offspring of certain marriages. This in turn highlights two important points about symbols. First, while much of the “mystique” of symbols, as well as much of the field of semiotics, focuses primarily on individual symbols, their meanings and what “meaning” means, another important aspect in many symbol systems is the combinatorial possibilities of the systems, a point generally given only lip service in texts on semiotics. Second, complex combinatorics can arise for many reasons. This latter point is particularly important. As we shall see later on, one of the fundamental misconceptions about symbol systems that allow for complex combinations is that complexity must arise because the symbols represent natural language. Natural language by its nature allows one to combine complex messages out of simple parts. If one finds examples of an ancient and otherwise uninterpretable symbol system that appears to allow for long “messages” and can be shown to have something that looks like syntactic structure, many people’s gut reaction is to assume that this must have represented some language—i.e. be writing. But as the example from heraldry clearly shows, structure can arise for many reasons, and those reasons may be unrelated to language. Of course one could have picked many non-linguistic symbol systems to make the same point: mathematical symbology, music notation, dance notation, and numerous other systems allow for the construction of complex “messages” that obey syntactic combination conventions. But the kamon/heraldry comparison is a nice example of

52

3 A Taxonomy of Non-linguistic Symbol Systems

how two systems that are essentially similar when one considers their basic function, can diverge due to simple differences in cultural constraints on their use.

3.6 Survey of a Variety of Nonlinguistic Symbol Systems In the ensuing discussion, the reader may find it useful to consult Table 3.1, which lays out the values of the different dimensions used in the classification. From time to time I will make reference to prior work in Wu et al. (2012) and Sproat (2014) where, as part of a larger project on statistical methods to distinguish written language from non-linguistic symbol systems, we developed electronic corpora of some of the systems described here. Note that the systems are arranged in roughly chronological order.

3.6.1 Vinˇca Symbols Function Religious?

Size Medium

Multivalence High?

Syntax None

Dimensionality 1

Articulation 1

The Vinˇca culture was part of the Old European Neolithic culture, and flourished between the sixth and third millennia BCE. Their symbols were inscribed on pottery. Typically the symbols were single, but about 120 items have “texts” consisting of two or more symbols, which are sometimes, though not often, arranged linearly so that they resemble a typical writing system. The seminal work on the Vinˇca materials is that of Winn (1973, 1981). Winn classified the symbols into about 200 types, with about 800 tokens of text in total. See Fig. 3.26.

Fig. 3.26 Vinˇca: Tordos Spindle Whorl #20, (Winn, 1981, 270). Source: Winn (1981), page 270. I believe this constitutes fair use

3.6 Symbol System Survey

53

Even though Winn’s (1981) characterizes the signs as “pre-writing in Southeastern Europe” (echoing ideas of his thesis advisor, Marija Gimbutas), his final conclusion seems to suggest that the signs rather had some sort of religious significance: In the final analysis, the religious system remains the principle source of motivation for the use of signs. (Winn, 1981, page 255)

That said, the actual meaning of the symbols remains unknown. We mention in passing that more recent work, including that of Haarmann (2008), Haarmann and Marler (2008) and Winn (2008), has attempted to revive the notion that the Vinˇca symbols were part of a “Danube Script”, and the main arguments for this position are due to Haarmann (2008). As I noted in the supplementary materials to Sproat (2014),11 Haarmann’s arguments essentially boil down to two main points: • A process of elimination argument whereby if the system is not apparently decoration, potter’s marks, religious symbology, etc. then it must be writing. But since in any case we have currently no way of knowing that it is not one of these other categories, the argument seems rather pointless. • A second argument involves “identifying properties which the sign system of the Danube civilization shares with other ancient writing systems” (Haarmann, 2008, page 13), by which Haarmann apparently means the overt form of the signs. As I point out in Sproat (2014), “purely formal properties of sign systems are notoriously hard to align with function, and are likely to be very misleading.” Furthermore, since the “Danube Script”, if it were a script, dated several millennia before the first known true writing in Mesopotamia, it is not clear what “formal properties” Haarmann expects to find shared across two such systems separated by thousands of kilometers and thousands of years. Finally, as I also note in my previous work, very few of the inscriptions—basically just the handful that have apparent linear arrangements of symbols—actually look much like writing. There seems therefore to be no reason to believe that the Vinˇca symbols were anything other than some form of non-linguistic sign system. The corpus reported in Wu et al. (2012) and Sproat (2014) based on Winn’s data has 185 symbol types.

3.6.2 Uruk Accounting Function Formal

Size Large

Multivalence Medium

Syntax Trivial

11 https://muse.jhu.edu/article/547992/summary.

Dimensionality 1

Articulation 1

54

3 A Taxonomy of Non-linguistic Symbol Systems

The accounting systems of Mesopotamia have a special place in the history of humankind since they were the precursors to writing (Damerow et al., 1998; Schmandt-Besserat, 1992; Woods et al., 2010; Englund, 2011). Predating writing by several millennia, there were multiple accounting systems, the earliest being physical tokens (Oppenheim, 1959; Amiet, 1966; Lambert, 1966; Schmandt-Besserat, 1992, 1996). These consisted of “simple” tokens, which evidently represented numerical information, and “complex” tokens, which have been argued to represent particular commodities. Later on, bullae—clay “envelopes”—were used to contain tokens, much as a coin purse is used today to contain coins. Crucially, only simple tokens have been found within envelopes (Englund, 2006). Finally by the Uruk IV-III period (late fourth millennium into the early third millennium BCE), accounts were being kept with numerical and ideographic symbols impressed on clay tablets (Damerow & Englund, 1987; Damerow et al., 1988; Englund, 2011, 1995). The core of this later system was a set of numerical systems, including two sexagesimal (base 60) systems, two bisexagesimal systems (base 60, but with 120 as the next designated number) and a series of other systems. The intricacies of these systems were first worked out in detail by Friberg (1978–1979). The different systems were used with different types of commodities in a manner that is somewhat reminiscent to the different number systems used to count different kinds of things in Japanese or Korean; see also Valério and Ferrara (2022).12 One of the consequences of this system was that numerical signs could be multivalent: for example the sign N14 —the small black circle seen in various places in Fig. 3.27—represented either 10 or 6 (Damerow & Englund, 1987, page 117) depending upon which system was being used, and could represent different amounts of a commodity depending on which commodity was being measured (Englund, 2011, page 38). In addition to the numerical system, there were a large set of “ideographic” characters, representing measure expressions and commodities, numbering in the several hundred (Damerow et al., 1988). Such symbols show up in lexical lists from the Uruk III period (Englund, 1998). Many of the tablets that have come down to us were likely not actual accounts, but rather exercise tablets for student accountants (Fig. 3.27). If that is the case, it means that by the late preliterate period, the institutional training of scribe-accountants was already well established. One of the side products of this training regimen by the Uruk III period were the aforementioned lexical lists, which were simply lists of items in a particular category: types of grains, plants, animals, cities, and so forth. See Fig. 7.6 in Chap. 7 for a list of types of woods. Given its importance in the development of writing, we will refer to this system in several places in the ensuing discussion.

12 It

is also reminiscent of the expression of prices in Britain prior to the decimalization of the 1 of currency system in 1971. In the old system, ordinary items were priced in pounds, shillings ( 20 1 of a shilling); but ‘prestige’ items such as men’s suits, or solicitor’s fees, a pound) and pence ( 12 were priced in guineas (21 shillings).

3.6 Symbol System Survey

55

Fig. 3.27 The obverse of an exercise tablet for the accounting of grain products, flour and malt, MSVO 4:66, from Englund (2011), Figure 2.5(a), page 42. Per Englund (page 46): “The individual entries of the text consist of notations that represent on the one hand discrete numbers of grain products—if dry products in the bisexagesimal, if liquid products in the sexagesimal system—and on the other hand notations that represent measures of grain equivalent to the amount necessary to produce the individually recorded products.” Source: Englund (2011), Figure 2.5(a), page 42. Author released all his figures for unrestricted use, CC BY-SA.

3.6.3 Kudurrus Function Emblematic/Heraldic

Size Small

Multivalence High

Syntax Trivial

Dimensionality 1

Articulation 1

Babylonian Kudurrus (Seidl, 1989) were legal documents specifying property rights, carved on stones (Fig. 3.28). The symbols represented deities favored by the property owner. Frequently the stones also included cuneiform Babylonian writing. In the corpus developed from Seidl’s text and reported in Wu et al. (2012) and Sproat (2014), there were 64 symbol types.

56

3 A Taxonomy of Non-linguistic Symbol Systems

Fig. 3.28 Mesopotamian deity symbols: kudurru stone of the late Kassite period (ca. 1530 BCE to ca. 1155 BCE) from the Cabinet des Médailles, Paris. The top third of the stone is inscribed with deity symbols, the lower two thirds with Babylonian text. Source: Wikipedia. Image is in the public domain. https://en.wikipedia.org/wiki/Kudurru#/media/File:Caillou_Michaux_CdM.jpg. Author: Marie-Lan Nguyen and one more author. Image is in the public domain

3.6.4 Central Asian Tamgas Function Size Multivalence Syntax Dimensionality Articulation Emblematic? Medium High? Trivial? NA 1?

Examples of Tamgas were already given in Sect. 3.5 (see Fig. 3.6). Tamgas, covered in a comprehensive bilingual edition edited by Voyakin (2019), were a system, or systems of marks widely used across Central Asia, over a roughly two thousand year period, ending in the early twentieth century. Per Voyakin (2019), page 19:

3.6 Symbol System Survey

57

Fig. 3.29 Some examples of Tamgas from the Arys River Basin, first century BCE–third century CE. Source: Smagulov and Yatsenko (2019), page 164, Figure 1(8), repeated from Fig. 3.6. Used with permission. Source: Same as 3.6

These marks or signs were the emblems of groups of people differing in number and social status (their main function was to distinguish between ‘insiders’ and ‘outsiders’)—e.g. individual families, clans and, probably, sometimes entire tribes.

Signs were typically singleton, or in any case in very short “texts”, as befitted their use as identifiers (Fig. 3.29). Given their use on a wide variety of media—as petroglyphs, on pottery, birch bark and metal—by a large number of different peoples, and over a couple of millennia, it seems doubtful that this was a single system. One can contrast, for example, mostly single signs on pottery from mid first millennium BCE Khorezm (present-day Uzbekistan) (Baratov, 2019) (Fig. 3.30), with medieval Kyrgyz signs, often carved on rock faces, from several hundred to over a thousand years later (Tabaldyev, 2019). The latter were used as marks of ownership, and were often associated with Turkic Runic inscriptions; see Fig. 3.31. Even if the functions of these two examples turned out to be largely the same, it is not clear they were part of the same system.

3.6.5 Pictish Symbols

Function Religious?

Size Medium

Multivalence High?

Syntax None?

Dimensionality NA

Articulation 1?

The Iron Age Picts of Scotland left several hundred standing stones inscribed with symbols consisting of “texts” from one to a few symbols in length (Fig. 3.32). The meaning of the symbols is unknown, but most scholars have assumed that this was a non-linguistic symbol system, and if nothing else, the Picts had at least some literacy in a known script—Ogham (Rhys, 1892). Only recently has it been proposed by Lee et al. (2010a) that the system was a form of written language; see Sproat (2010a, 2014) for arguments against this conclusion. The corpus used in Wu et al. (2012) and Sproat (2014), derived from a collection of 340 stones at the University of Strathclyde, which in turn was derived from

58

3 A Taxonomy of Non-linguistic Symbol Systems

Fig. 3.30 Single tamga inscribed on a fifth century BCE pot from Khumbuztepa (Baratov, 2019, Figure 6, page 55). Used with permission. Source: Baratov (2019), Figure 6, page 55. Used with permission of the UNESCO International Institute for Central Asian Studies, Samarkand

Jackson (1984), Royal Commission on the Ancient and Historical Monuments of Scotland (1994), Jackson (1990), Mack (1997), and Sutherland (1997), contains 104 symbol types.

3.6.6 European Heraldry Function Heraldic

Size Large

Multivalence High

Syntax Complex

Dimensionality 2

Articulation 2

3.6 Symbol System Survey Fig. 3.31 Medieval tamgas from Suuk-Debe, present-day Kyrgyzstan (Tabaldyev, 2019, Figure 4, page 381). According to Tabaldyev, these are “dated to a broad time span—from the Bronze Age to the medieval period”. Used with permission. Source: Tabaldyev (2019), Figure 4, page 381. Used with permission of the UNESCO International Institute for Central Asian Studies, Samarkand

59

8

7

6

9

14 11

10

12

13

15 19 16

17 18

Fig. 3.32 Pictish symbols: the Aberlemno Serpent stone. Source: Wikipedia. https://en.wikipedia.org/wiki/ Pictish_stone#/media/File: Serpent_stone.JPG. Author: Catfish Jim and the soapdish. License: CC BY-SA 3.0

5

4

3

2

1

20

21

60

3 A Taxonomy of Non-linguistic Symbol Systems

This system is discussed at length in Sect. 3.5, so here we will merely summarize the main features. The symbol set is large, consisting both of basic largely meaningless “ordinaries”, and charges that at least may evoke meanings by virtue of what they depict. These components are built into more complex symbols that represent the individuals that bear the arms. On top of this, the system has a complex twodimensional syntax.

3.6.7 Kamon Function Heraldic

Size Large

Multivalence High

Syntax Usually trivial

Dimensionality 2

Articulation 2

This system is also discussed at length in Sect. 3.5. The symbol set is large, consisting of some common basic motifs such as circles or well-frames, which are similar in function to the “ordinaries” of European Heraldry, as well as more evocative depictions of plants and, to a lesser extent, animals and other features. These components are built into more complex symbols that represent the families that use the crest.

3.6.8 Alchemical Symbols Function Formal

Size Small

Multivalence High

Syntax Trivial

Dimensionality 1

Articulation 1

Symbols were used in alchemy to represent basic substances, or substances that at least were supposed to be basic. As Holmyard (1957) notes, there was a lot of variation among alchemists in their notation, but at least some symbols were in common use, some examples of which are given in Fig. 3.33. The symbols were of various origin. Metals were associated with the planets, and this can be seen in the symbols of metals in the figure. The downward pointing triangles for water and earth versus the upward pointing triangles for fire and air indicated the tendency of the former to move downwards versus the latter to move upwards. There is some compositionality in the symbols. For example the symbol for “sublimate of mercury” consists of the planetary symbol for mercury and a squiggle denoting sublimation; Holmyard (page 154) also notes the combination of the signs for gold and silver to denote electrum, a gold-silver alloy, a combination of symbols that dated to the Alexandrian period. Alchemical symbology thus had a sort of trivial

3.6 Symbol System Survey

61

Fig. 3.33 Some alchemical symbols

syntax. The symbols had a fairly high degree of multivalence, reflecting the supposed connections between substances and other physical phenomena.

3.6.9 Symbols of Guild Function Size Multivalence Syntax Dimensionality Articulation Emblematic Medium None None NA 0

Guild symbols are a classic instance of an unarticulated sign system. As with automobile company logos (Depauw, 2009), there is no underlying set of basic symbols out of which guild symbols are constructed. If one wishes to make a symbol for a guild that does not already have one, then one does it from scratch. Often the guild symbol relates in an obvious way to the business of the guild, as in a pair of scissors and calipers for a tailor; see Hunt (2012) for examples of some examples of guild signs for the German-speaking regions of Central Europe. Or to take another example, consider the inverted rhomboid that resembles a gyro roaster that is a common symbol of a shawarma restaurant.13 Sometimes the reason for the sign may be more complex. For the barber pole (Fig. 3.34), one interpretation runs as follows. Barbers anciently also performed surgery and the pole, and the white helix represented the staff that the customer patient gripped when undergoing phlebotomy (Andrews, 1904), the red representing blood—in some interpretations arterial blood, with the blue representing venous blood.

13 I

thank Kyle Gorman for this example.

62

3 A Taxonomy of Non-linguistic Symbol Systems

Fig. 3.34 An example of a symbol of guild: the barber pole. Location: Kanazawa, Ishikawa Prefecture, Japan

3.6.10 House Marks Function Emblematic

Size Large

Multivalence None

Syntax None

Dimensionality NA

Articulation 0

“House marks” (Fig. 3.35) are marks used to identify the owner of buildings and other property, such as grave sites or livestock. They were widely used in many parts of Europe (Homeyer, 1870; Skånberg, 2003). The motifs are generally simple line figures, conducive to easy marking with a knife or other sharp implement. Frequently they are similar to runic letters, and may also incorporate written letters into the design; see Fig. 3.35 for some examples. The set of signs is open ended like car logos, though the signs do include some recurring themes. For example, many incorporate cross-like components (Homeyer, 1870, page 144). Skånberg (2003), page 148, footnote 1017, reports more than half of the marks in his collection being based on crosses, somewhat broadly construed, and several of the examples in Fig. 3.35 incorporate crosses. Still, given the overall linear grapheme-like shapes of the signs, it is not clear what significance to attribute to this: crosses are after all common components of symbols in writing systems. In any case, it does not seem that house marks can be easily broken down as involving the syntactic combination of basic elements, as is the case in, say, heraldry.

3.6 Symbol System Survey

63

Fig. 3.35 Examples of house marks from Norwich, England (Homeyer, 1870, Table VII). Source: Homeyer (1870), Table VII. Work is in the public domain in its country of origin and other countries and areas where the copyright term is the author’s life plus 70 years or fewer

3.6.11 Gaunerzinken and Hobo Signs Function Size Multivalence Syntax Dimensionality Articulation Simple informative Medium ? None? ? 1

Gaunerzinken (also Zinken) (Groß, 1906; Streicher, 1928; Praßl, 2017) and hobo signs (Berendsohn, 2020) are marks left by wayfarers. In German-speaking countries, most of the work on these signs has been done in the context of criminology, particularly from the Graz school starting with the work of Hans Groß. While not restricted to criminal activity, such signs have been used to coordinate criminal acts, such as attacking a building. But other uses are more benign: for example signs that indicate that a place is safe for a vagabond to pass through, or that in this village there is a person who will give handouts. In the United States, such signs have been associated in particular with hobos, migrant workers who engage in piecework. While catalogs of hobo signs can be found—see for example the report by Berendsohn (2020), there is some debate as to whether hobo signs were actually in real use (Wray & Wray, 2020). However the evidence for Gaunerzinken seems clear. Some examples of basic signs and their denotations are given in Fig. 3.36. Generally such signs would be used singly or at most in small sets, communicating simple messages about the location. Thus they behaved much as traffic signs. However it seemed that more complicated messages could also be constructed. Figure 3.37 gives one such example. According to Groß (1906) (pages 328–329): The bird, drawn with a single stroke, represents a parrot, alluding to the great loquacity of the owner of the mark, who was a famous housebreaker. The second sign is a church, the third a key. Below, we see three round objects on a line; this, according to the calendar of the Styrian peasantry, is the emblem of St. Stephen, i.e., three stones placed on the ground, alluding to the martyrdom of the Saint by stoning. They here indicate the date, viz. St. Stephen’s Day, 26th December. By the side is an infant in swaddling clothes, this indicates the birth of the Saviour, the date being 25th December. The whole thus means: The owner of the parrot sign intends to break into a church on 26th December. He desires accomplices, and

64

3 A Taxonomy of Non-linguistic Symbol Systems

Fig. 3.36 Some examples of Gaunerzinken. Captions from left to right and top to bottom: “biting dog”, “police live here”, “people call the police”, “doing good is worth it”, “nothing here”, “you can get something for working”, “you can stay overnight”, “there is money here”, “one can quietly become pushy”„ “there is food here”, “it is worth it to act sick”, “get away quickly”. Source: Wikipedia. https://de.wikipedia.org/wiki/Zinken_(Geheimzeichen)#/ media/Datei:Gaunerzinken_3a.png. Author: Manfred Brückels. License: CC BY-SA 3.0

Fig. 3.37 An example of a complex message involving Gaunerzinken. Source: Groß (1906), Figure 18, page 328. Work is in the public domain in its country of origin and other countries and areas where the copyright term is the author’s life plus 70 years or fewer will accordingly be in the neighbourhood of the sign (a lonely chapel in a wood) on 25th December to meet whoever turns up. The police, knowing the importance of the signs, took a copy to the Magistrate, a priest helped to interpret the liturgical emblems, and on Christmas day four dangerous criminals were captured near the chapel in the wood.

3.6 Symbol System Survey

65

Given that the message was deciphered by the police leading to an arrest, the interpretation given must have been approximately correct. The message is clearly a complex one: it is in some way compositional of the meanings of several symbols. On the other hand it is hard to say whether it really has a syntax. Would the message have meant something different if the symbols had been put in a different order? All one can be sure of is that someone whose moniker was a parrot, wished to plan some activity involving breaking into a church. The references to St. Stephen’s day (December 26) and Christmas day (December 25) by means of somewhat conventional symbols, suggests events to take place on those days, and it would presumably make sense to someone familiar with the kind of crime to be committed that the first step was to assemble a crew and the second to carry out the crime. Since the assembly must take place before the crime, it made sense to assume that the assembly was set for Christmas Day and the crime for St. Stephen’s day. Presumably any ordering of the symbols would have achieved the same interpretation.

3.6.12 Khipu: The Accounting System Function Formal

Size Small

Multivalence None?

Syntax Trivial

Dimensionality 1

Articulation 1

In discussing khipu, the notational system of the Inkas of Northwest South America (Acosta, 1608) (Urton, 1998) (Urton, 2001) (Brokaw, 2005) (Artzi, 2010) (Boone & Urton, 2011) (Hyland, 2014) (Hyland et al., 2014) (Fernández, 2015) (Tun, 2015) (Urton, 2017) (Hyland, 2021), one needs to state at the outset that the encoding of information using knots in strings is nothing more than a medium for representing information, and does not say anything about the kind of information that is being represented. This should probably be obvious already, but there is much confusion on this point when people speak of khipu as if it was a single system. Khipu should not be equated, say, with Western accounting systems. It should be equated, rather, with making marks on paper. What those marks signify depends upon which of the many symbol systems at one’s disposal one is using. Similarly, the knots of khipu may well have, and probably did, allow the users to represent a number of different symbol systems. In the present discussion we will mostly focus on just one of these, which also happens to be the only one that is at least somewhat well understood: the use of knots to represent numbers and some other concepts for the purposes of accounting. We will at the end of this section briefly discuss another kind of information that, it has been argued, khipu could encode. The basic structure of khipus consisted of a main, top cord, and a series of pendant cords hanging off of the main cord. The pendant cords themselves could also branch into further pendant cords, and the top cord could also have loop pendants, from

66

3 A Taxonomy of Non-linguistic Symbol Systems

Fig. 3.38 Khipu UR6, corresponding to Urton (2017), Figure 4.3, page 69. Source: Image provided by Gary Urton, and used with permission

which depended pendant cords. The strings were typically wool from camelid herd animals such as llamas or alpacas, or plant fibers. In addition to the knots, which in the accounting system at least, encoded numerical information (Fig. 3.39), ply—“S” or leftwards twist, versus “Z” or rightwards twist—could also encode information (Hyland, 2014; Hyland et al., 2014). Furthermore, position was significant in that, just as with Hindu-Arabic numerals, powers of ten were represented in the system by assigning a place in the string to 10,000s, 1000s, 100s, 10s and 1’s, working from the top cord down the pendant in that order. Consider khipu UR6 (Fig. 3.38) (Urton, 2001, 2017), which Urton (2017), chapter 4, analyzes as a calendar representing two years, since there seem to be two sections of cords, the first of which contains 362 cords and the second 368, for a total of 730 (= 365 × 2). The total counts of the knot values are 2059 for Year 1 and 3042 for Year 2. Fine, but what do the numbers denote? Urton argues (page 76) that the probable meaning in this case is the number of laborers assigned to official projects over this two year period. It was known from the colonial period that taxation in the form of work by subjects on state projects was one of the items that officials kept track of, and that 3,000 “units” was a reasonable annual number. Speculative, perhaps, but Urton’s interpretation could be argued to be as plausible as any.

3.6 Symbol System Survey

67

Fig. 3.39 Basic khipu knots, from Urton (2017), Figure 2.3, Page 47. “Figure 8” knots were used for signing units, other single knots positive powers of base 10. The unit knots can be considered iconic, insofar as they represented the count directly by the number of individual knots within the figure 8. Source: Urton (2017), Figure 2.3, page 47. Used with permission of the author

The basic information encoded in accounting khipus was thus numerical, but this does not mean that the khipu could only encode numbers, since sequences of numbers can, in addition to their transparent numerical interpretation, also encode other information: think of postcodes, which (in the United States, or Japan, for example) are purely numerical in their written form, but in their interpretation correspond to a particular place. Thus Urton shows that a set of khipus from Puruchuco (to the east of Lima), all share a sequence of 12 introductory cords, with unit (figure 8) knots (Fig. 3.39) on the first, seventh and ninth cords. He suggests that this arrangement was a way of indicating the place Puruchuco “in semasiographic (not phonographic) signing” (Urton, 2017, page 92). Thus khipus could at least represent numbers, which in context could represent various kinds of commodities, including days of labor; the numerical system could probably also be used, in a post-code-like fashion, to represent places. At least the numerical system is partly iconic (see the caption for Fig. 3.39), and in that usage, singly articulated; in the use of the numerical system to represent places, one could on the other hand argue for double articulation, since the actual numerical value of the knots is incidental. The set of symbols is in any case small, and the syntax is arguably one-dimensional. It is thus well established that khipus could encode numerical information about amounts of commodities, as well as some additional information, such as locations. But a long-standing question is whether they could also function as true writing, albeit in a very different medium from any other known writing system. Early Spanish witnesses claimed that the Incas used khipus to encode “historical narratives, biographies, and epistles” (Hyland, 2017, page 1), and so-called “narrative khipus”

68

3 A Taxonomy of Non-linguistic Symbol Systems

seem to exist—or to put it another way, there are khipus that do not seem to be accounting documents, and therefore may be narrative khipus. Recently, Hyland (2017, 2021) has been working with two khipus from the village of Collata, where local tradition has it that these khipus were “narrative epistles about warfare” (Hyland, 2017, page 1). The structure of the khipus is typical in that they consist of a top cord and a series of dependent cords, but the use of different colored cords, different materials (animal hair such as llama, vicuña, etc.), and ply is richer than in accounting khipus. Assuming that each combination of material, color and ply encodes a distinct symbol type, Hyland arrives at a symbol size of about 95 symbol types, consistent with a phonetic writing system, such as a small syllabary.14 Tradition divided Inka society into moieties, or lineages, of which two, the Alluka and the Yakapar, are relevant for the village in question. Hyland proposes that the final cords of khipu A consisting of the sequence dark-brown/wanaku/S-ply, white+dark-brown/llama+wanaku/Z-ply and blue/llama/S-ply, encoded the syllables a-llu-ka. Khipu B ends in a sequence, that assuming the mappings given previously, could be decoded as a-ka-?, where the unknown cord is golden-brown/vicuña/Sply. Hyland observes that the Quechua word for this golden-brown color is paru, and so one can decode this, possibly, as a-ka-paru, which is close to Yakapar. Thus each khipu would be associated with the two moieties, not dissimilar to the pattern observed for accounting khipus. In previous work, Hyland et al. (2014) had argued that ply was used to encode moiety on another artifact, a khipu board from the Peruvian village of Mangas, but the proposal for the Collata khipus goes far further than this, since it hypothesizes actual phonological encodings. The suggestion is interesting, but of course this is only a beginning, since the proposed decoding accounts for only three out of the 288 pendant cords of khipu A and three out of the 199 pendant cords of khipu B. It must also be pointed out, that as Hyland admits, these khipus are post-contact, meaning that it is indeed possible that they encode language, but that at the same time the makers of the khipu got the idea of writing from the Spanish. So it is unclear how or even if they may relate to the supposed pre-contact narrative khipus. If Hyland turns out to be right, and furthermore if it can be shown that pre-contact khipu also allowed for similar encoding of the Quechua language, and possibly others, then we would be left with no choice but to conclude that the Incas did, after all, have writing. But then we would also have to say that when we talk of khipu, we are talking of at least two systems of marks, one which is clearly non-linguistic and the other linguistic. As noted previously, the tying of knots on strings and more generally the use of features of strings is just a medium, and that medium could in principle have been used to mark all kinds of information.

14 Hyland

states that this is a “quantity within the range of logosyllabic writing system” (page 1), but this is clearly a misstatement since typical logosyllabic writing systems, such as Chinese, have hundreds or thousands of symbols.

3.6 Symbol System Survey

69

That said, while the status of narrative khipu as real writing may yet turn out to be correct, there should be no prior expectation that, as a comparatively advanced civilization, the Inka must have had writing. Yet such expectations have been expressed: The view that khipus were simple mnemonic tools challenges theories that argue that complex civilizations, such as the Inka Empire, require “writing”—a means to preserve information in a durable form according to communal conventions, understandable to individuals familiar with the norms. (Hyland, 2017, page 1)

This strikes me as misplaced, though it is a common misconception.15 Writing is the product of civilization, but it is not a necessary product of civilization. To see this one need only consider that cities that must have required some amount of bureaucracy already existed in Mesopotamia before writing was invented and that writing was not the instigator of, but was the product of civilization. But perhaps there is still a point to Hyland’s expectation: perhaps a sufficiently complex civilization must at some point have writing. Thus Childe (1936), pages 203–204, argued that as the amount of wealth associated with temples increased in Ancient Sumer, so did the need for an ever more accurate method for recording information, leading eventually to writing, and indeed Childe was certainly of the view that any civilization of sufficient complexity must have writing. But these are after all an empirical questions: How complex must the civilization be for writing to be absolutely required? What kinds of information must be encoded by a civilization at this level, and how much of that information actually needs a linguistic graphical symbol system? And whatever that level is, below that, would other less general graphical symbol systems not have sufficed? Urton seems to make a strong case that the purely non-linguistic accounting khipus served the Inkas well and certainly covered a lot of their needs. Was there a gap that only true writing could fill? Without a good answer to these sorts of questions, I do not see how one can evaluate the claim of necessity. As Salomon (2001), page 1, cogently observes: The fact that some huge states got along without writing should invite searching questions about whether grammatological and anthropological understandings of writing are really up to the task of explaining relations among language, inscription, social practice, and sociopolitical integration.

3.6.13 Totem Poles Function Size Multivalence Syntax Dimensionality Articulation Heraldic/Narrative Medium High Trivial 1.5 1

15 Exactly

this issue has been raised by many to counter our argument in Farmer et al. (2004) that the symbols found in cryptically short texts from the Indus Valley Civilization were not a true writing system. But already in that paper (e.g. page 47) we list a number of well-known cases of civilizations that have flourished without true writing. As a large non-literate civilization, the Indus Valley would not have been unusual.

70

3 A Taxonomy of Non-linguistic Symbol Systems

Fig. 3.40 Seattle Totem Pole (Tlingit, originally from Alaska)

Totem poles (Malin, 1986) were produced between the nineteenth and twentieth centuries by a number of Native American cultures of the Pacific Northwest (Fig. 3.40). The symbols in the texts are anthropomorphic and zoomorphic, and were carved on tree trunks, often cedar, though other trees were used. Depending on the culture the poles could record events, thus counting as narrative, or else could be used as house frontal poles, or mortuary poles, and thus functioned in a more heraldic fashion. There were various styles depending on the tribe. For example, the Haida poles are distinctive in that they are carved in bas relief (Malin, 1986). In the corpora in Wu et al. (2012) and Sproat (2014), derived from data in British Columbia Provincial Museum (1931), Garfield (1940), Barbeau (1950), Gunn (1965), Gunn (1966), Gunn (1967), Drew (1969), City of Duncan (1990), Stewart (1990), Stewart (1993), and Feldman (2003), there were 477 symbol types. The different function of the poles among different cultures provides a good example of how the function of what is ostensibly the same symbol system can vary across cultures. This is an important point. Writing varies across cultures too: the Roman alphabet is used differently in the different languages that use it in their writing system. So it may seem natural to assume that if a symbol system seems to vary in its use across cultures, then that is evidence that it probably encodes language—i.e. is writing. For example, Rao et al. (2009b) used this reasoning to argue that the variation in the Indus Valley symbol system supports the view that it is writing. But as the evidence from totem poles shows, a symbol system can show variation in use across cultures without it being tied in any way to language.

3.6 Symbol System Survey

71

Fig. 3.41 The first page of NZA002 in the library of congress collection

3.6.14 Naxi Pictography Function Size Multivalence Syntax Dimensionality Articulation Narrative Large None? Complex 1.5 2

The Naxi, a Tibeto-Burman-speaking minority group in China’s Yunnan province, have a pictographic symbol system (Fig. 3.41) that is used in the recounting of myths. While the system is often called writing, it is not clearly a real writing system: the symbols do not in general represent words or morphemes of the Naxi language, though many of them do have canonical translations into Naxi words or phrases. A complication is that the Naxi also have a true syllabic writing system, and elements from that writing system are often used to add phonetic cues in pictographic texts. The Library of Congress has a collection consisting of 3,342 manuscripts.16 An additional collection can be found at the Harvard Yenching library.17 The Naxi system is on the border between non-linguistic and linguistic systems: while it is not a full writing system and does not have mechanisms for representing all of the Naxi language, it is similar in many ways to semasiographic systems such as Bliss Symbolics (Bliss, 1965) (see Sect. 4.2.4), which do allow users to compose a limited, though still large number of messages. A compendium of about 3000 Naxi symbol types and their translations were compiled by Li (2001).

3.6.15 Pennsylvania Barn Stars Function Size Multivalence Syntax Dimensionality Articulation Decorative Small None None NA 0

16 Some

of these are viewable online at http://rs6.loc.gov/intldl/naxihtml/naxihome.html.

17 http://hcl.harvard.edu/collections/digital_collections/naxi_manuscripts.cfm.

72

3 A Taxonomy of Non-linguistic Symbol Systems

Fig. 3.42 Pennsylvania German barn stars. Top: Hoffman Barn, with twelve barn stars (1853). Source: Wikipedia. Bottom: some sample barnstar designs. (top panel) https://en.wikipedia.org/ wiki/Hex_sign#/media/File:END_ELEVATION_WITH_HEX_SIGNS_-_H._and_S._Hoffman_ Barn_(1853),_Pottstown,_Montgomery_County,_PA_HABS_PA,46-POTTS.V,2A-2.tif. Image is public domain

Barn stars, also called “hex signs”, were traditional decorative art among Pennsylvania German (“Dutch”) communities. They are found throughout northeastern Pennsylavania, particularly Berks County, as well as locations in Ohio and elsewhere. As the name implies, they were mostly used to decorate barns, but one can also find them on porches and other structures. Traditional barn symbols are typically in the form of stars, rosettes, wheels-of-fortune, and swastikas: the typical fare one can find labeled as “hex signs” in tourist shops in Pennsylvania are for the most part not traditional barn star symbols (Fig. 3.42).

3.6 Symbol System Survey

73

While the name “hex signs” suggests a connection to witches and magic, the consensus among the few scholars who have studied the system (Mahr, 1945; Graves, 1984; Yoder & Graves, 2000) is that the symbols, while descending from signs that at one time may have had meaning, are in their use on barns, largely meaningless decoration. One piece of evidence for this view is that when the stars occur multiply on a surface, the “texts” are nearly always symmetric. The corpora in Wu et al. (2012) and Sproat (2014), were derived from the slide collection of W. Farrell, who photographed barn starts in Berks County in the 1940s. The slides are now housed in the archives of the Berks County Historical Society in Reading, PA. There are 32 symbol types in the corpus. As a largely meaningless set of signs, barn stars constitute a reductio of the notion of a zero-articulation system. Like automobile logos, there are no more basic elements in the system than the symbols themselves; and unlike automobile logos, which at least can be said to denote the companies they are associated with, barn stars denote nothing. So one may legitimately ask why one would want to include them at all in a treatise on graphical marks that have conventional meanings. The reason we included them in Wu et al. (2012) and Sproat (2014) was because there we were interested in understanding whether purely distributional statistical methods could determine whether a symbol system is a linguistic writing system or not, and we wanted to include a wide range of symbols that could occur in more or less linear “texts” and thus might “look like” writing to some extent. This is still a valid consideration for the present enterprise. Furthermore, as I noted in the supplementary materials to Sproat (2014), there is a substantial amount of confusion in the literature on what one means when one uses the term “non-linguistic symbol system”. Thus, Vidale (2007), in his critique of Farmer et al. (2004), cites the case of the obviously decorative pottery designs of Shahr-e Sukhteh and elsewhere: Together with coupling and opposition of selected symbols, systematic, large-scale redundancy (constant repetition of the same designs or symbols) is a distinctive feature shared by the more evolved and formally elaborated non-linguistic symbolic systems considered (highly repetitive patterns on the pottery of Shahr-i Sokhtai, ‘endless’ repetition of icons such as scorpions, men-scorpions, temple facades, water-like patterns and interwoven snakes at Jiroft, and redundant specular doubling of most major symbols in the Dilmunite seals). While positional regularities might be detected in part of the Jiroft figuration, redundancy in all these systems dismiss one of the basic assumption of Farmer and others, who take the rarity of repeating signs as a proof of the non-linguistic character of the Indus script. (Page 344)

Barn stars, being decorative, surely fall into the pattern of cases that Vidale has in mind; as noted previously, the “texts” are invariably symmetric, and it is furthermore not uncommon to have a “text” consisting of the same barn star repeated multiple times. But most of the non-linguistic symbol systems we have surveyed in this section do not fall into this category, in that they both carry meaning and, where there is a syntax involved, may show quite complex structure. Vidale’s objection clearly confuses decorations with non-linguistic symbol systems in general.

74

3 A Taxonomy of Non-linguistic Symbol Systems

Fig. 3.43 Buffalo robe winter counts. Source: Wikipedia. Image is in the public domain. Source: https://en.wikipedia.org/wiki/Winter_count#/media/File:A_copy_of_the_winter_count_kept_by_ Yanktonai_Sioux_Lone_Dog.png. Author: Lone Dog. Image is in the public domain

3.6.16 Dakota Winter Counts Function Size Multivalence Syntax Dimensionality Articulation Narrative Medium High Trivial 1 0

Mallery (1883) compiles the “winter counts” (Lakota waniyetu wowapi) for the Dakota covering about 100 years (and thus 100 winter symbols). In this system, a year is represented by a symbol that gives a particularly memorable event for that year. For example, a symbol for the year 1786–1787 represents a chief who wore an “iron” shield over his head . Since one invents a new symbol for each year, the system is unarticulated. A “text” of such counts, as in Fig. 3.43, represents a history, one symbol per year. But the syntax is trivial in that it merely records the sequence of years one after another. As part of the work for Wu et al. (2012), we compiled a coding of the Dakota corpus, consisting of 121 symbol types.

3.6 Symbol System Survey

75

3.6.17 Australian Message Sticks Function Size Multivalence Syntax Dimensionality Articulation Performative Medium ? None? NA 1

“Message sticks” were used as a means of graphic communication by indigenous Australians. Our discussion is based on a recent comprehensive review by Kelly (2020). Sticks were generally carved with symbols and designs by the person who wished to send a message and explained to a messenger, who would then carry the stick to its intended recipient, who could be quite distant. The messenger would then explain the message to the recipient. While a lot of the evidence suggests that these were largely mnemonic devices, some suggestion that at least some of the symbols may have had a fixed conventional meaning, is given by a report of a stick being correctly interpreted (as a request to deliver boomerangs and headbands) by the recipient in one case where the messenger had died en route. The size of the set of symbols is uncertain, and seems to have been somewhat open-ended. While many sticks in Kelly (2020) involve simple linear symbols such as those in Fig. 3.44, some involve more complex designs. Thus Kelly describes one stick that where “two vertical bands enclosing zig-zags” represent emus, and Fig. 3.44 Australian message stick from Roth (1897), page 234, via (Kelly, 2020), page 136, Figure 4. Per Kelly’s description this is a “message stick from the Boulia district of Queensland . . . . The message is a request for a meeting and the motifs refer to geographic features. ‘H’ is the proposed meeting place.” The stick is divided into eight sections labeled with letters from a to h from bottom to top on the lefthand side. Source: Roth (1897), page 234. Work is in the public domain in its country of origin and other countries and areas where the copyright term is the author’s life plus 70 years or fewer

76

3 A Taxonomy of Non-linguistic Symbol Systems

Fig. 3.45 Part of a “prayer for life”, based on Basso and Anderson (1973), Figure 1, page 1014

“four central rectangles with cross-hatching” represent wallabies (his Figure 1b). Also unclear is the syntax—i.e. whether there was any, or whether the symbols could be arranged in any way the creator chose. While there is some uncertainty about the antiquity of the system, given their use by at least twenty different language groups over a wide geographic area, it seems likely that they predated British colonization of the continent.

3.6.18 Silas John’s System Function Size Multivalence Syntax Dimensionality Articulation Performative Medium High Trivial? 1 1

In 1904, Silas John Edwards, a Western Apache shaman invented a notation system for 62 prayers that he claimed were revealed to him in a vision. While the system has been described by Basso and Anderson (1973) as a writing system, and certainly does encode some linguistic features—perhaps most notably the use of a symbol that looks like a script form of the English word she, used to represent the Apache first person singular pronoun shíí (see the second and third symbols in Fig. 3.45), there are a number of ways in which it differs from a true writing system. First, the system is limited to writing exactly the set of 62 prayers and was never intended as a general way of writing Apache. Second, the system includes a lot of information on non-linguistic behavior, such as particular actions that are to be performed during the incantation of the prayer. The system is thus more of a performative symbol system, rather like Labanotation—Sect. 3.6.21, (Farnell, 1996), rather than a true writing system. Another indication that the system was not a real writing system was that it seems that one had to be instructed in how to interpret each text. Unfortunately it appears that the six texts, comprising 165 symbols in total, that were published in Basso and Anderson (1973) are the only ones that have survived. These were the only texts which the authors were permitted to see and in which they received instruction as to their interpretation. The system apparently consisted of a few tens of distinct symbols.

3.6 Symbol System Survey

77

3.6.19 Lukasa Memory Boards Function Size Multivalence Syntax Dimensionality Articulation Narrative Small High ? 2? 1

The Luba, a Bantu tribe of the Southern region of what is now the Democratic Republic of Congo, had a secret society called Bambudye, which recorded information about myths and the society’s organization on specially designed memory boards, called Lukasa. According to Reefe (1977), there were three kinds of boards, two of which he was able to obtain specimens of and learn at least some things about their structure. Events and individuals were recorded on the boards by means of beads and shells, arranged in various ways. Reefe describes how the board depicted in Fig. 3.46 (= Reefe’s Figure 1) helps a narrator recount the Luba genesis myth, which involves a hero from the east named Mbidi Kiluwe, who travels to the land of the Luba, crossing the Zaire river. Among

Fig. 3.46 A Lukasa memory board, from Reefe (1977), Figure 1. Source: Reefe (1977), Figure 1. Used with the permission of the publisher, MIT Press

78

3 A Taxonomy of Non-linguistic Symbol Systems

the Luba he meets the “red-skinned” ruler Kongolo, to whom Mbidi Kiluwe taught the skills of a divine leader. Mbidi Kiluwe also impregnated Kongolo’s sister, who gave birth to Kalala Ilunga. Kalala Ilunga later grew up to challenge Kongolo and kill him in a battle. Aspects of this story are represented by various beads on the board: • Mbidi Kiluwe by a large blue bead a bit to the left of the top middle cowrie. • The Zaire river, by a long incision on the upper left. • Birds and reeds along the river, by a curved line of beads and two white beads for the birds. • Kongolo and his follows by a circle of yellow beads surrounding a red bead. • The final war, by two lines of beads in the top center. This latter representation was inspired by the story that Kalala Ilunga was motivated to attack Kolongo after observing columns of ants and termites fighting each other. In addition to the myth, the board also records features of Bambudye ceremonies. As Reefe (page 50) notes: All memory boards served as an index for Bambudye ceremonies. Members went through a complex set of initiations as they advanced in the society, which was divided into village chapters. Bambudye chapters met in a secret lodge in the middle of which was a raised earthen bench or a series of earth mounds; these are symbolized by the mounds (lukala) running across the middle of lukasa.

Thus the board served a dual function, one to represent a traditional story, and the other to serve as a recording of the ceremony in which the board itself is used. The same devices—beads and shells could be used to represent different entities and events, depending on their arrangement (and thus in principle have a high multivalence), but that some entities, such as the earth mounds, have designated representations. Whether there is a true syntax is unclear, but the boards do apparently make crucial use of two dimensions in the arrangement of the symbols.

3.6.20 Tupicochan Staff Code Function Simple informative

Size Small

Multivalence None

Syntax Trivial

Dimensionality 1

Articulation 1

Salomon (2001) describes a system for marking staffs of office in the Central Peruvian village of Tupicocha. The staffs are made of wood, and carved with symbols, with new staffs being prepared prior to the annual handing over of the offices (of which there are ten) on New Year’s Day, to the new office holders. The symbol system in use is by far the smallest system we consider, consisting of only

3.6 Symbol System Survey

79

Fig. 3.47 Tupicochan staffs of office for 1995, based on Salomon (2001), Figure 7. The two staffs represent the offices of “first rural constable” (primer alcalde de campo) and the “first rural constable’s deputy” (alguacil del primer alcalde de campo). These two staffs exemplify all three of the signs used in the system, the raya ‘stripe’ (horizontal bar in this view), the aspa ‘X’ and the peaña, the two-step pyramid with a cross. Note that the latter, in its physical form, is used to mark boundaries; in this case it apparently symbolizes the boundary between the town and the country

three overt symbols: see Fig. 3.47. The syntax of the system is simple, with the peaña, if it occurs, occurring in the first position, followed by either a sequence of rayas followed by aspas, or a sequence of aspas followed by rayas. The use or non-use of the peaña, and the order and number of rayas and aspas, is significant, and marks the office for which the staff is intended. In particular, the peaña is associated with rural offices, its lack with town offices. Note that the lack of a peaña can take two forms, either simple absence in the sequence, or an overt empty space where the peaña should be, which could be taken for a fourth, non-overt symbol. See the righthand staff in Fig. 3.47. Larger numbers of rayas and aspas correspond, in general to higher offices, though Salomon notes a difference (page 9), namely that aspa count correlates with the official government hierarchy of the offices, whereas the numbers of rayas “comes much closer to representing public sentiment about the importance of each office.” As Salomon argues, the system is fluid, having changed in details of syntax over time. But his main point is to argue that the system, as a “semasiographic” symbol system, differs from common conceptions of “semasiographic writing” (see in particular Sect. 4.2.4). In most conceptions of the notion, a semasiographic system is intended as a way to communicate the same ideas as one could communicate in speech, but via a non-language-based channel; in the Tupicochan system, rather, the purpose is ostensibly to “detach some areas of practice from the realm of discourse

80

3 A Taxonomy of Non-linguistic Symbol Systems

altogether” (page 1). Indeed, the whole ceremony of the transfer of power apparently takes place in the absence of any verbal discussion of the process itself. The staffs that are created for the next round of office holders are examined and corrected as need be, but this is done without any verbal discourse. As a consequence of the non-verbality of the system, Salomon reports having extreme difficulty getting his Tupicochan consultants to put into words anything concerning the staffs. The implication is that they think about the staffs only in a completely non-verbal way. Though Salomon argues that the meaning of the system is hard to pin down, in part this seems because it is hard to put into words, in part because the details change over time; it is clear enough in any case that the symbol system does mean something and that there is a consensus on correct and incorrect usage at any given point. The system is emphatically non-verbal, but then as was pointed out, so are the entire proceedings surrounding the transfer of office (page 9): In observing the New Year’s Day political cycle, it was noted that there is next to no scope for verbal discussion of the proceedings. What talk does take place is sociable chitchat, ostentatiously off the point.

Salomon’s argument seems to be that the staff symbol system provides a nonlinguistic avenue that supports the conventional laconicity of the proceedings, but it could equally well merely be a consequence of that laconicity that the Tupicochans cannot think of the symbols in a verbal way.

3.6.21 Dance Notation Function Performative

Size Medium

Multivalence None

Syntax Trivial?

Dimensionality 2

Articulation 1

Dance notation (Harris, 1995; Farnell, 1996) is a canonical example of a performative system. The basic symbols include representations of both feet, and well as arrows indicating the direction of movement. Orientation of the feet is indicated iconically. Figure 3.48 gives an example of one system of dance notation showing steps from the reverse turn of a waltz, augmented with written instructions. As Harris notes (page 149) about a later version of the same instructions, there are a number of assumptions that are taken for granted such as that “the couples on a dance floor are proceeding in an anti-clockwise direction”; also missing any representation in the system is the relation of the dance steps to the music. More intricate systems exist than the one described by Harris: Labanotation (Farnell, 1996), includes encodings not just for the feet, but movements of other parts of the body, spatial orientation and spatial relations between the dancers (e.g. touching, near to, grasping, passing but not touching, and so forth).

3.6 Symbol System Survey

81 5 6

The Reverse Turn Gentleman

4 3

2

1

Begin here

GENTLEMAN 1. Step forward with the L.F. turning on it to the L. Step to the side with the R.F. still turning. 2. Close the L.F. up to the R.F. 3. Step back with the R.F. turning on it to the L. Step to the 4. side with the L.F. still turning. 5. Close the R.F. up to the L.F.

Fig. 3.48 Dance notation for a waltz, redrawn from Silvester (1935), pages 68–69

3.6.22 Weather Icons Function Size Multivalence Syntax Dimensionality Articulation Simple informative Small None Trivial 1 1

Weather forecasting services use graphical icons to represent the predominant weather expectation for a given day. For example, about a decade ago, the Weather Underground used about 40 distinct icons, 20 for daytime and 20 for nighttime forecast. These icons, taken in series, form a “text” that corresponds to the weather predictions for a five-day period, one icon per day. In this case, while we are dealing with a human-designed symbol system, it is one where the distribution of the symbols is determined by natural phenomena (or more properly a computational model thereof). The corpus reported in Wu et al. (2012) and Sproat (2014) was based on data from an early version of the Weather Underground (www.underground.com) site, and used only the daytime symbols, and thus had 20 distinct icons (Fig. 3.49).

82

3 A Taxonomy of Non-linguistic Symbol Systems

Fig. 3.49 A sample weather icon sequence: forecast for Portland, Oregon, April 29, 2011

3.6.23 Scouting Merit Badges Function Emblematic

Size Medium

Multivalence None

Syntax None

Dimensionality NA

Articulation 0

Scouting merit badges (Fig. 3.50) are awarded for demonstrating knowledge or skills in one of a set of defined areas. According to the Boy Scouts of America, there are more than 135 badges that can be awarded.18 The set of badges is not static, but has changed substantially over the last century since the scouting organizations were founded (White, 2018). Like car logos, scouting merit badges are unarticulated, and if a new badge is needed, the design of the new badge is done from scratch, rather than combining elements from a set of basic symbols. But unlike car logos, which generally occur alone, scouting badges are typically linearly arranged on a sash and so can form a sort of “text”. There is no syntax—the badges would often simply be placed in the order in which they were awarded. As I noted in Sproat (2014), a corpus of merit badge “texts” (as on the sash in Fig. 3.50) would have some language-like properties. In particular, since some merit badges are awarded far more than others, the distribution of the symbols would follow a roughly Zipfian power law distribution (Zipf, 1949). But a salient nonlanguage-like feature of the system is that a merit badge cannot be earned twice, and therefore there would be no repetition of symbols in the text.

3.6.24 Traffic Signs Function Simple informative

Size Medium

Multivalence None

Syntax None

Dimensionality NA

Articulation 1

18 https://www.scouting.org/programs/scouts-bsa/advancement-and-awards/merit-badges/.

3.6 Symbol System Survey

83

Fig. 3.50 An example of Boy Scout merit badges on a sash. Sash worn by former US President Gerald Ford, from the Gerald R. Ford Presidential Library, via Wikipedia. Source: https: //commons.wikimedia.org/wiki/File:Boy_Scout_sash.jpg. Courtesy of the Gerald R. Ford Presidential Museum. Image is in the public domain

Traffic signs are a canonical instance of a simple informative symbol system (Fig. 3.51). Excluding signs that involve directions to places, which cannot avoid written language, traffic signs are for the most part intended to be understandable without reference to a particular language. True, one may find linguistic elements on such signs (“gas”), in some cases conventionally so (“STOP”), but the sign invariably has a particular shape, color or contains an icon that makes its interpretation possible without linguistic information. Thus a picture of a fuel pump indicates that fuel may be obtained at a particular exit, a pair of parallel squiggly lines indicates curves ahead, and a red octagon means stop. Colors are often informative: in the United States, blue signs provide useful information about services, yellow signs indicate

84

3 A Taxonomy of Non-linguistic Symbol Systems

Fig. 3.51 A typical arrangement of informative signs for fuel, food and lodging, repeated from Fig. 1.4

warnings and brown signs are often associated with parks or cultural sites. A subset of traffic signs that need to convey information about speeds or distances make use of another conventional non-linguistic symbol system: numerals. Traffic signs are of course not completely universal and there are variations in conventions across different parts of the world, as well as adaptations to local circumstances: in many parts of the United States one will see yellow warning signs with a jumping deer silhouette indicating the danger of that animal leaping into traffic; in Australia the deer is replaced with a kangaroo. We already noted in Sect. 1.2 that one can have “texts” consisting of sequences of several traffic signs, and that in many cases there are conventions about how the signs are ordered, but that there is no syntax as such.

3.6.25 Car Logos or “Hood Ornaments” Function Emblematic

Size Medium

Multivalence None

Syntax None

Dimensionality NA

Articulation 0

A canonical case of a zero-articulated system is car logos, which also typically manifest themselves as hood ornaments on the cars produce by the company in question. As Depauw (2009), page 208, notes: Car logos … are such an unarticulated code, in which each unit/sign corresponds to the extra-semiotic reality of a brand of cars. When a new brand of cars comes on the market, a new logo is made, not by combining recurrent compositional elements, but by creating something completely new.

Indeed, the components of a car logo are apparently completely open-form. In the case of the familiar Volkswagen logo, the components are stylized alphabetic symbols from the company’s name, and for the Hupmobile (from last model year 1941, Fig. 3.52), the logo simply consisted of the founder’s name; but in general they can be anything the designer of the logo wants to use. This makes them much more in line with Dakota Winter Counts (Sect. 3.6.16) than with other emblematic systems like European heraldry, or kamon.

3.6 Symbol System Survey

85

Fig. 3.52 The Hupmobile logo from the 1941 Hupp Skylark. Source: Wikipedia. https://en.wikipedia.org/wiki/ Hupmobile#/media/File: %2741_Hupp_badge.JPG. Author: trekphiler. License: CC BY-SA 3.0

3.6.26 “Asian” Emoticons Function Decorative

Size Medium

Multivalence None

Syntax Trivial

Dimensionality 1

Articulation 2?

Bedrick et al. (2012) developed an analysis system to detect and parse “Asian emoticons”, typically called kaomoji (Japanese 顔文字). The results of this project resulted in a database with several thousand examples. Unlike the familiar 90◦ flipped ASCII “smileys”—:-), ;-), :-(, 8-), and so forth—kaomoji are oriented horizontally, and make use of a much wider range of characters. Some examples can be seen in Fig. 3.53. Traditional ASCII smileys are relatively limited, comprising perhaps a few tens of examples. Asian smileys, in contrast, are productive and open ended. Though kaomoji uses hundreds of distinct elements from writing systems, as well as other symbols from the Unicode inventory, the use of these symbols is non-linguistic. A full kaomoji consists of a short ‘text’ composed of these symbols. Hence the whole kaomoji symbol, with whatever meaning it has, is composed of meaningless written symbols from various scripts: hence the designation of this system as being doubly articulated. The texts tend to be somewhat (though often not perfectly) symmetric. However unlike in the symmetric “texts” found with Pennsylvania barn stars, the mate characters found in Asian emoticons are different symbols, chosen because they are visually close mirror images. A simple statistical analysis of the symbol distributions, considering only the symbols and not their form, would easily miss the fact that the texts are symmetric.

86

3 A Taxonomy of Non-linguistic Symbol Systems

Fig. 3.53 Some representative kaomoji emoticons

|^_^| [o_-] \(^v^)/

3.6.27 Other Non-linguistic Symbol Systems Below we list other symbol systems with their taxonomic features but no further analysis. System

Function

Size

National flags Military rank symbols Fraternity symbols Religious symbols Ikea assembly instructions Musical notation Chess notation Knitting patterns Change ringing notation Programming languages Zodiac symbols Mathematical notation Chemical notation Feynman diagrams Programming flowcharts Systems Biology Graphical Notation

Emblematic Emblematic

Medium None Medium None

Multivalence Syntax None None

Dimensionality Articulation NA NA

0 0?

Emblematic

Medium None

None

NA

0

Religious Medium High iconography Performative Medium None

Usually trivial Trivial

NA

0

2

1

Performative Medium None

Complex 1.5

1

Performative Medium None Performative Medium None

Trivial 1 Complex 2?

1 1

Performative Small

None

Trivial

1

1

Performative Large

None

Complex 1

1

Formal

Small

High

None

0

Formal

Large?

None

Complex 2

1

Formal

Medium None

Complex 1.5

1

Formal

Medium None

Simple

2

1

Formal

Medium None

Complex 2

1

Formal

Medium None

Complex 2

1

NA

3.7 Statistics of kamon

87

3.7 Detailed Statistical Analysis of kamon We noted previously that there is a connection between kamon and the names of families that use the crest that is somewhat reminiscent of canting arms in European heraldry. Indeed this connection is widely accepted: as kamon expert Hitoshi Takasawa noted,19 for example, names like 桜井 Sakurai and 桜田 Sakurada containing the character 桜 are often associated with a cherry motif. The one problem with these observations though is that they are prey to confirmation bias. If I see that the 藤井 Fujii family happens to use a crest that contains a wisteria (藤 fuji) motif (see Fig. 3.11), I may tend to notice that case, whereas if a family named 松本 Matsumoto uses the same crest, I may be less likely to notice. Clearly it would be good to verify the connection statistically, at least for a few of the more common motifs. In order to do this, I conducted a small analysis using the data from the collection of family names and associated emblems in Takasawa (2011). I first randomly select twenty names from those covered in the book. The only restriction on the selected names was that they not contain the characters 桜 sakura ‘cherry’, 藤 fuji ‘wisteria’, 松 matsu ‘pine’, or 梅 ume ‘plum’. The resulting twenty family names had a total of 368 kamon associated with them. I then chose family names that do have the above characters from the 2,000 most frequent names listed on the website https://myoji-yurai.net/prefectureRanking.htm, which has data on names and their frequencies in Japan. Specifically, I chose up to five of the top frequency names containing each character, for a total of nineteen names—the 2,000 most frequent names in Japan only include four with the character 桜 sakura. Unfortunately Takasawa (2011) only lists two names with 桜, so the remaining two names had to be discarded. The names and the associated counts for kamon listed in Takasawa (2011) are given in Table 3.2. I then counted how many families in each of the five categories—桜 sakura, 藤 fuji, 松 matsu, 梅 ume, or other—had each of the four motifs of cherry, wisteria, pine or plum, or associated designs.20 This allowed me to compute the conditional probability of finding a kamon with a given motif, given the family name. Results are shown in Table 3.3 for two different ways of computing the estimated conditional probability. As can be seen, while at least wisteria and plum

19 Personal 20 For

communication in email. example for pine an associated design is 松皮菱 matsukawa bishi ‘pine bark lozenge’.

88 Table 3.2 Names with characters of interest from Takasawa (2011), and the number of kamon listed for each. The first group consists of names including the kanji 桜 sakura ‘cherry’; the second, 松 matsu ‘pine’; the third, 梅 ume ‘plum’; and the fourth 藤 fuji/t¯o ‘wisteria’

3 A Taxonomy of Non-linguistic Symbol Systems 桜井 Sakurai 桜田 Sakurada 松本 Matsumoto 松田 Matsuda 松井 Matsui 松尾 Matsuo 小松 Komatsu 梅田 Umeda 梅原 Umehara 梅本 Umemoto 梅津 Umezu 梅村 Umemura 佐藤 Sato 伊藤 Ito 加藤 Kato 斎藤 Saito 藤田 Fujita

43 21 60 10 11 29 33 25 17 20 21 3 42 13 8 42 27

are not uncommon among kamon in general, each of the motifs is far more likely to occur given the matching family name character, than are any of the other motifs. There are obvious caveats here: the sample is small; Takasawa’s listing for each family is by no means complete compared, for example, to the compendious collection in Chikano (1993).21 Nonetheless, broadly speaking the above analysis supports the notion that families on occasion chose their kamon to reflect their names in a way that is reminiscent of canting arms in heraldry.

21 Unfortunately

Chikano (1993) arranges the data by crest rather than family name making it difficult to use for the present purposes.

3.7 Statistics of kamon

89

Table 3.3 Estimated conditional probabilities of finding one of the four motifs m given the family name. The third column gives the estimates from grouping the counts from all names in a given kanji category c. The fourth column gives the estimates from computing the conditional probability of the motif m for each name n in nc —the set of names with kanji c, summing these and then normalizing by the size of the set nc . The final column gives the value of t for a ttest comparing the conditional probabilities for a matching motif—e.g. 松 matching the pine motif—versus the same motif for names in other categories (corresponding to the conditional probabilities in column 3), along with the associate p value. Unfortunately only for pine and plum is the p value significant at least the p < 0.05 level, though with more data the other categories would probably show significance ∑ p(m|n) Character in name Motif in crest p(m|c) t (p) n∈nc |nc | Other









Cherry Wisteria Pine Plum Cherry Wisteria Pine Plum Cherry Wisteria Pine Plum Cherry Wisteria Pine Plum Cherry Wisteria Pine Plum

0.00 0.04 0.00 0.04 0.27 0.02 0.00 0.02 0.02 0.18 0.00 0.05 0.00 0.02 0.15 0.03 0.00 0.05 0.00 0.42

0.00 0.07 0.00 0.04 0.16 0.01 0.00 0.01 0.01 0.21 0.00 0.02 0.00 0.01 0.13 0.03 0.00 0.03 0.00 0.44

4.42 (0.14)

2.10 (0.10)

6.00 (*0.0038)

3.89 (*0.017)

Chapter 4

Writing Systems

4.1 Introduction Of all the graphical symbol systems, writing stands out as special. It has been called “the technology of civilization” (Powell, 2009), and with good reason. While not all civilizations in history had writing, and while in those that did, throughout most of history most people were not literate, the power of writing to record events, administer the state and more generally to enhance communication beyond what can be achieved by any of the non-linguistic systems we have considered, has been transformative. Not surprisingly, then, writing has received a lot of study. The following are just a few of the book-length treatments of the topic since the mid twentieth century: (Driver, 1948; Gelb, 1952; Moorehouse, 1953; Istrin, 1965a; Gelb, 1963; Istrin, 1965b; Diringer, 1958; Pope, 1975, 1999; Sampson, 1985; Coulmas, 1989; DeFrancis, 1989; Drucker, 1995; Harris, 1995; Daniels and Bright, 1996; Glassner, 2000; Sproat, 2000; Coulmas, 2003; Rogers, 2005; Robinson, 2007; Gnanadesikan, 2009; Powell, 2009; Woods et al., 2010; Sampson, 2012; Daniels, 2018; Handel, 2019; Ferrara, 2022).1 While there is substantial disagreement among the various scholars on what precisely defines writing and sets it apart from other graphical symbol systems, writers in general agree that the key insight that makes writing systems work is the phonographic principle, the idea that one can write a word with symbols that represent how it sounds rather than what it means. As we will see, writing systems differ substantially in to what degree they represent sound and to what degree meaning; but all writing systems must include a way to represent sound, if they are to allow one to write whatever one can say. In this chapter we will describe how writing systems represent meaning and sound and how they differ in this regard. By saying that writing allows one to “record language” or “record speech”, we are actually treading a minefield. One of the main bones of contention, and one that

1 For

a bibliography of relevant work prior to 2018, see Gnanadesikan and Sproat (2018).

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Sproat, Symbols, https://doi.org/10.1007/978-3-031-26809-0_4

91

92

4 Writing Systems

has caused a substantial amount of disagreement in the field is between scholars who are “speech-centric”, viewing writing largely as a way to record what one might have said in speech; versus those scholars who view written language more as a species of graphical symbolic representation, and which thus has properties in common with other graphical symbol systems, properties that it does not share with speech. In this chapter I shall try to persuade the reader that this is really an argument about nothing, and that writing has to be understood both in terms of its ability to record spoken language as well as a graphical symbol system, with all the implications that has.

4.2 Writing 4.2.1 Preliminaries The English word write, Old English wr¯ıtan, comes from a Germanic, and ultimately Indo-European root meaning “carve” or “dig”; it is related to the German word reissen, “tear, rip”. In a similar fashion, the Latin word scrib¯ o “write”, from which words like scribe, script, scrivener, inscribe, describe as well as the normal words for write in Romance languages and most other Germanic languages come, descends from an Indo-European root meaning scratch. The Chinese character 書 sh¯u2 “book, writing” includes 聿, a writing brush, as its semantic component. And the Sumerian word for ‘scribe’ dub-sar , from dub ‘tablet’ and sar ‘to write’, and thus a scribe is one who makes marks on (clay) tablets. Terms for writing thus, not surprisingly, derive from terms that relate somehow to the process of making “visible marks” on a surface, either the tools used to make those marks, or the manner in which the tools are used. But this does not distinguish writing from making other marks, such as when one draws or paints. Nor does it distinguish between what we normally think of as writing and other notational systems, such as music or mathematical notation. Clearly writing is more than just making marks on paper or some other surface. And it is more than just communicating information using marks. Writing is a special kind of visual communication, one that is intimately tied to natural language in a way that no other graphical communication system is. More specifically, writing is a conventional symbol system used to represent linguistic messages. This much is in agreement with definitions of writing given by the linguists Ferdinand de Saussure (1916) and Leonard Bloomfield (1933). Bloomfield, in particular, famously stated “Writing is not language, but merely a way of recording language by visible marks.” To put it technically, writing is glottographic. And more specifically still, a fully developed writing system will allow one to notate, to a close approximation, anything 2 In

Modern Chinese its basic meaning is “book”, but in compounds such as 書法 sh¯uf˘a “calligraphy”, in older Chinese and in Japanese, the meaning of “writing” is retained.

4.2 Writing

93

that can be communicated in speech, though as we will see, there are some important caveats surrounding this idea. Now to be sure, some people have defined the notion of writing more loosely. Thus Powell (2009) defines writing as “a system of markings with a conventional reference that communicates information.” Harris (1995) also includes under the rubric symbol systems that clearly do not represent language in any way. The problem with such definitions is that they fly in the face of common understanding of what the term “writing” refers to. Is the following writing?3

Or this? ∂ 2u = c2 ∂t2

(

∂ 2u ∂ 2u ∂ 2u + + · · · + ∂x2n ∂x21 ∂x22

)

Let us go down Powell’s list of criteria. First of all, obviously both of these examples involve a “system of markings”. Second, both involve markings that have a “conventional reference”: for example “♪” represents an “eighth note”, and “∂” conventionally denotes a partial differential. Finally, both clearly communicate information: in the first case, the score can communicate Bach’s original creation to musicians two and a half centuries later; in the second, the formulation expresses the wave equation. So both of these would seem to fit the criteria of writing, and indeed one can speak of “writing music” or “writing an equation.” But when one normally thinks of writing one has something rather more specific in mind. If I say, “you really ought to write to your father”, I do not generally mean that one should send one’s father a musical score, or a set of mathematical equations. Rather, I mean that one ought to communicate some linguistic message using a conventional set of graphical symbols.4 If I say that someone cannot read or

3 From

Wikipedia: https://en.wikipedia.org/wiki/Toccata_and_Fugue_in_D_minor,_BWV_565. Image is in the public domain. 4 As a reader of an earlier version of this book pointed out, it would also be infelicitous to send him, say, the prologue of Moby Dick. Of course that is true, meaning that it is not a sufficient condition that the message be a linguistic message. But it is surely a necessary condition.

94

4 Writing Systems

write, I do not generally mean that they are unfamiliar with mathematical notation, or that they cannot read (or write) music: I mean that they are unable to understand or communicate in written language. By writing, then, we clearly usually mean not the musical or mathematical examples that we just discussed, but rather the kind of symbols that we are using to communicate the contents of this book and that you are currently reading. Writing then is a linguistic symbol system or, put another way, writing is a set of conventional symbols used to represent linguistic information, which can be strung together to convey a linguistic message in graphical form. How does writing do this? Let us turn our attention back to mathematical symbology for a moment, and take a much simpler expression than the one cited previously, namely a simple sum: 1+2=3 This expression is composed of five symbols, three of which represent numerical values, one of which represents the notion of addition and the other of which represents the notion of equality. These symbols can be combined into a more complex expression wherein we understand that the first piece 1 + 2 denotes a summation of two numerical terms, and expressions of the form A = B denote equality of the expressions on the left and right. To express a mathematical notion, we typically build up a complex expression out of pieces each of which represents a simple mathematical concept. This is simply the concept of syntax that we have already met (see the discussion in Sect. 3.3). Analogously, writing allows us to compose complex written expressions by putting together symbols, where each symbol or some combination of symbols represents a linguistic entity. But what linguistic entity should one pick? We often think of sentences as being composed out of words. A slightly more sophisticated view would say that sentences are composed out of morphemes, where a “morpheme” is often defined as the “basic unit of meaning” in a language: The word words is one word, but two morphemes, in this case the noun word and the plural marker -s. So perhaps the basic symbols of the writing system should correspond to words or morphemes? Indeed, the very earliest writing systems did evolve from systems where the scribes picked symbols to represent words/morphemes, often symbols that were more or less evocative of the meaning of those words/morphemes. In the easiest case this would just be a picture of what the word represents, assuming that the word’s denotation is easily depicted. Thus the earliest Chinese representation of the word for “horse” (Modern Chinese 馬, Mandarin mˇa) was in fact a picture of a horse:5

5 From

https://commons.wikimedia.org/wiki/File:%E9%A6%AC-oracle.svg, via the Ancient Chinese Characters Project. Image is in the public domain.

4.2 Writing

95

In similar fashion, the earliest pre-Cuneiform sign for head (Sumerian sag) was a picture of a head:6 A moment’s reflection should convince the reader that this will only get one so far: while it is easy to draw pictures of horses and heads, it is less easy to come up with drawings that represent, e.g., hiatus or consternation. Even in many cases where a drawing is in principle possible, coming up with a system that is both sufficiently distinctive and practical to use would be difficult: how would you conveniently depict the difference between whisky and rum, or between water and vinegar? Presumably because of these sorts of considerations, scribes in all ancient literate societies quickly discovered that in order to make writing more flexible, they would have to develop the ability to write words not by representing their meaning but rather by representing their sound. Much has been made of the rebus principle, by which a word that is hard to draw can be written by borrowing a symbol used to write a word that sounds similar to the target word. Thus the cuneiform symbol (Sumerian ti) had as its original meaning “arrow”; it was most likely originally a depiction of a bow and arrow.7 But “arrow” is homophonous with, or at least sounded similar to, “life” in Sumerian, and thus this symbol came to be used to represent “life”. By using homophony or close homophony, all ancient writing systems, though representing many words with symbols that pertained to their meaning, also developed ways to represent the pronunciations of words. In the case of Chinese, this system eventually became formalized into a scheme whereby most characters (about 95% of those ever invented) comprise a piece that represents something about the meaning, and another piece that gives a clue to the pronunciation; see our example in Fig. 4.1. But all ancient writing systems had a similar structure. Thus, for example, in Egyptian words were often written with a combination of symbols that related to the meaning, and symbols that indicated the pronunciation, or more specifically the consonants in the word. The verb “leave” pr, for example, could be written: consisting of: • pr—a picture of a house, also pronounced pr; • a mouth, denoting the sound r. This functions as a so-called phonetic complement, insofar as it represents the final r of pr, and serves to further specify, or “complement” the intended reading; • a pair of legs, denoting motion and functioning as a semantic determinative—i.e. a sign that indicates something about the meaning of the word.

6 Fron

https://en.wiktionary.org/wiki/%F0%92%8A%95#/media/File:Sa%C4%9D_(linear_script, _head).jpg. Image is in the public domain. 7 Jacob Dahl, personal communication.

96

4 Writing Systems

Fig. 4.1 How various writing systems might encode an imaginary word kan for a kind of fish. See the text for a detailed explanation

4.2.2 Types of Writing Systems Chinese writing and Chinese characters as used for other languages, most notably Japanese, is the only writing system in contemporary use that retains the ancient practice of having both systematic semasiographic elements, as well as phonographic elements in the writing system—cf. Drucker (1995), page 11. In all other extant systems, the symbols evolved to represent sounds exclusively. What is represented varies greatly. In some systems, such as Japanese hiragana and katakana (both originally developed from simplified forms of Chinese characters), the units represented are (roughly) simple syllables. In Semitic scripts, such as those used for Hebrew or Arabic, the basic symbols represent only consonants, though auxiliary systems have been developed to represent vowels. The alphasyllabic (or abugida) scripts of India, derived from the ancient Brahmi script of King Ashoka, represent all consonants and most vowels, though the two categories differ in how they are represented: consonants are the main symbols and are often written inline (though in Kannada and Telugu consonants in a cluster are written as subscripts); vowels on the other hand are written as diacritics. Thus in devanagari, the word hindi is written

हन्द , where there are three consonants h ह and nd न्द, and two vowels i ि◌ and ¯ı ◌ी, the latter being diacritics written, respectively, before and after the consonants; in particular i is written before the consonant that it logically follows.

4.2 Writing

97

The Semitic consonantal scripts (technically termed abjads) are important in the history of writing since they were the precursors of many other scripts, including the Greek, Latin, Cyrillic and other alphabetic scripts as well as, probably, the alphasyllabic scripts of India and Southeast Asia, as well as the Ge’ez script of Ethiopia and Eritrea. The Semitic scripts in use today can all be traced to early scripts developed in the Sinai in the mid second millennium BCE. These were almost certainly influenced by Egyptian writing (Rougé, 1859; Driver, 1948; Drucker, 1995; Goldwasser, 2010), and can be viewed as a simplification of the Egyptian system. As we saw previously, in Egyptian, symbols could represent both logographic information related to the meaning, as well as phonological information about the consonants in the word. Phonographic symbols in Egyptian could represent one, two or three consonants. In the previous example, the house symbol was biconsonantal (pr), and the mouth symbol uniconsonantal (r). The Sinaitic script simplified all that, removing the logographic symbols entirely, and representing only single consonants. The result was an extremely compact set of symbols, among the smallest of any scripts. Several hundred years later, another fundamental change occurred when the Greeks adapted a version of the Canaanite or Phoenician consonantal alphabet, into an alphabet where both consonant and vowel sounds could be written; cf. Swiggers (1996), and see Gnanadesikan (2009) for a particularly clear exposition of how this borrowing probably happened. The end result was that unlike previous scripts, not only could one represent all sounds of the language, but the symbols used were all of equal status. Contrast this with the direction taken in the Brahmic scripts of India, probably developed on the basis of the Aramaic consonantal script, where vowels and consonants are distinct symbol sets, with the former being diacritics on the latter, as described previously. Figure 4.1 gives some examples of how different types of scripts—the morphosyllabic Chinese system, an abjad (Hebrew), two alphabets (Latin and hangul), an alphasyllabary (devanagari) and a syllabary (hiragana) encode phonological and other information for an imaginary word kan for a kind of fish: • In the first case, Chinese, we are engaging in a bit of fiction: the depicted character does not exist, but it is a possible character involving the fish radical 魚 and the character 甚 (Mandarin shèn ‘serious’), which used as a phonetic component can have kan as one of its pronunciations. Invention of new characters like this is fairly rare in Modern Chinese, but throughout the history of Chinese writing, tens of thousands of such semantic-phonetic characters have been invented. Chinese, and the Chinese-character portion of the Japanese writing system, is the only modern writing system that retains the ancient practice of at least somewhat systematically marking semantic as well as phonological information in the written form. • In the Hebrew abjad (written from right to left as indicated by the arrow), only the consonants /k/ and /n/ would be written. • In an alphabet such as the Latin script, all segments are represented. • So too in an alphasyllabary are (nearly) all segments represented, but in the example from Devanagari, the script used to write (among other languages)

98

4 Writing Systems

Hindi, the vowels (in this case an /æ/ as in the English word ash rather than /a/), typically show up as diacritics written above, below, or even before the consonant.8 • Korean hangul is also classed as an alphabet, but here the letters are not all written in the same line, so that the /k/ and /a/ are written left-to-right, but both of them are above the /n/. • Finally in a syllabary, Japanese hiragana, a whole syllable /ka/ is represented as a single symbol. However, as with most syllabaries, hiragana does not actually have a separate symbol for every syllable. In the present case there is a separate symbol for the final /n/—which actually is not really an /n/, but a nasalization of the previous vowel. While there are many variations on the details when one considers all of the world’s writing systems, this small sample gives a good sense of some of the different ways in which writing encodes linguistic information. The invention of the alphabet has been taken by some authors—particularly Gelb (1952), Moorehouse (1953), and Diringer (1958)9 as the pinnacle of development of writing systems. Some authors have also linked the alphabet specifically to, among other things, segmental theories of phonology (Faber, 1992), and even technological creativity (Hannas, 1997, 2003). Many other writers—DeFrancis (1984, 1989); Sampson (1985, 2012); Coulmas (1989); Sproat (2000, 2010a); Rogers (2005); Gnanadesikan (2009); Powell (2009); Daniels (2018); Sproat (2021)—have taken a more nuanced view, rejecting the idea that the alphabet is somehow privileged.10

4.2.3 A Side Note on Alphabets and Typewriters That said, there is one sense in which alphabetical systems do have one advantage and that is in typing technology, since the number of symbols in the script is generally within the bounds of what can fit on a reasonable sized keyboard. With

8 Alphasyllabaries

also have an inherent vowel, which is usually some /a/-like or reduced vowel, which is not represented in writing. A consonant that has no other mark is generally presumed to have that vowel following it. 9 Cf. Diringer (1958), page 37: “The alphabet is the last, the most highly developed, the most convenient and the most easily adaptable system of writing. Alphabetic writing is now universally employed by civilized peoples.” 10 A somewhat tangential, but nonetheless important issue is that the way the brain processes writing seems to be largely uniform across different kinds of writing systems, so that at least from a neurological point of view, there does not appear to be major differences between kinds of writing system, alphabetic or otherwise. Thus, as Dehaene (2009) has argued, literate brains co-opt portions of the brain that originally had evolved for other purposes—obviously so, since humans did not evolve to read and write. Thus, for example, the low-level processing of scripts seems to reside in a portion of the occipitotemporal cortex that Dehaene terms the “letter box” and which was originally evolved as part of the brain’s visual object identification system. See Chap. 5 for further details.

4.2 Writing

99

traditional typewriters, an additional issue was that to be maximally simple the script should also be linear, like the Latin script as used in English, where each letter is arranged one after another on a line. Once one starts adding accents, as needed for many European languages, or needs to cover a system like Korean hangul, where the letters are arranged two-dimensionally within syllable blocks, traditional typing technology became more complicated. While traditional typewriters existed for Korean, they required some ingenuity over and above what one found in Englishlanguage typewriters in order to make sure that the individual letters ended up appropriately placed. Chinese typewriters also existed, and while they were never huge versions of conventional typewriters but with thousands of keys as popular imagination had it (Mullaney, 2017), they were nonetheless complex and ingenious devices that also required great skill to operate. Computer-based input systems of course can be much more flexible, so that it is now as easy to type in Korean as it is in English. Inputing text in Chinese (or Japanese) is still more challenging than typing English: The most popular systems involve using phonetic input such as pinyin romanization for Chinese or the hiragana syllabary for Japanese, both of which can fit on a standard keyboard. The system then must convert all of these (Chinese) or a portion of these (Japanese) into the normal written form involving Chinese characters. This conversion cannot be 100% automatic, and invariably some attention is required on the part of the typist to select the appropriate version.

4.2.4 Blissymbolics: An Attempt at a “Semasiographic” Writing System Returning to the main theme of this section, what is clear is that the ability to represent phonology is central to all writing systems. While there has been some disagreement about the degree to which writing systems in general encode phonology, no authors have argued that any extant full writing system is purely logographic with no encoding of phonology whatsoever. Even Sampson—the author who has argued the strongest for the possibility of purely semasiographic writing systems—has not provided examples of naturally developed writing systems that eschew representations of phonology. The one example of a semasiographic system that Sampson (2012) discusses is Blissymbolics, developed starting in the early 1940s by Charles Bliss (Karl Kasiel Blitz). While by no means the first such system to be proposed (see Sect. 9.1 for further discussion) Bliss’s system was by far the bravest attempt ever to develop a purely semasiographic writing system; see also Rogers (2005); Sproat (2010a). Bliss was an Austrian Jew who escaped the Nazis and ended up in Shanghai. He felt that one of the reasons for the rise of Fascism was ignorance and that this in turn was related to literacy, and lack of communication across cultures. He believed that traditional writing systems were hard to learn, and furthermore, because they were related at least in some way to pronunciation,

100

4 Writing Systems

they were necessarily linked to particular languages. If one could develop a system that naturally appealed to meaning, one could make learning to read easier and simultaneously enable cross-cultural communication. On moving to Shanghai, Bliss encountered Chinese writing and fell for the myth that Chinese writing directly recorded semantics, and that it was not tied to a particular language (since Chinese characters were, after all, also used by Japanese and Koreans). But he recognized that learning to read and write Chinese was hard work, and set about to develop a simpler system. Blissymbols are simple “stick figure” depictions of objects, so that for example one can represent a chest of drawers with the symbol Abstract concepts could also be represented, and Bliss was not above using symbols that had already acquired wide usage, such as the symbol for “love”—though obviously this use of a stylized heart for this purpose is culture-specific: Some concepts were represented using compound symbols. “Tax” for example was represented as a compound of “percentage” (using the universal symbol %), “for” represented by a double wedge (>>) and “state”, represented by a flag planted in the ground: So far, so good, but representation using stick figures is somewhat limiting, which raises a problem if one wants to represent the difference between objects that are visually rather similar, such as horses, mules and donkeys. For this, Bliss proposed using subscripts, so that a horse would be represented as something along the lines of “horse-like animal 1”: Bliss proposed a similar scheme for colors, where a basic symbol for color would be subscripted. The following is “red”: More unusual colors like “Prussian blue” would have indices that, in one scheme, would be the position of that color in a color chart. The system thus starts to get arbitrary and unwieldy, since one is forced to memorize what these otherwise nonmnemonic indices denote in each case. Once Bliss got past the easy cases, and as the system accreted more and more devices to attempt to distinguish meanings of words, the system also became more and more arbitrary. Obviously with the use of a large enough set of symbols, and combinations of these, along with devices like indices, one can represent an arbitrarily large vocabulary. But the system would present a major problem for learners, and would therefore not be practical for representing vocabularies that are typical of adult speakers, which is to say vocabularies in the tens of thousands of words. Furthermore, Bliss could not entirely get away from phonological writing: personal names were one problem, where the simplest solution involved simply adopting the normal orthographic version of the name, or at least some of its letters.

4.3 Limitations of Writing

101

So while Bliss’s proposal was a brave attempt, it was hardly successful at achieving its goal. The main use of Blissymbols today is by people with cognitive disabilities, who are unable to communicate in ordinary speech or sign language, but who seem to be able to handle at least limited communication using Blissymbols.11 As we have already seen, non-linguistic systems abound, and from that point of view what Bliss proposed was not that novel. What was different was that Bliss intended that the system could be as fully expressive as standard written language. Unlike Bliss’s experiment, most other non-linguistic systems were not a conscious attempt to develop a full system of written communication that did not depend on spoken language. In fact, neither was Bliss the first to dream of a universal written language that was tied directly to meaning: we will return to this theme in Sect. 9.1. But Bliss’s attempt was by far the most fully developed. Yet what Bliss’s work showed was that even with dedicated effort, one is bound to end up with a system that is more limited than what can be expressed in speech, unless one is willing to sacrifice language-independence, and develop a system that at least in part makes reference to phonological information. Phonology is key to writing.

4.3 Limitations of Writing To summarize: writing is a graphical symbol system that represents linguistic information and can be used to record in graphical form what can be said in speech. While in theory writing could work by encoding any kind of linguistic information, in practice all fully functional writing systems work by encoding phonological information. It was the discovery that one could represent the sounds of a language in written form that was key to the success of the “technology of civilization”. Without it, writing could never have developed to the point that it would be possible to construct running narratives or epic poems in written form. Attempts to build writing systems that are based entirely on semantics have been notably more restricted in what they can cover. We return to this topic in the final chapter. On the face of it, the above conclusion would seem obvious given the facts and furthermore unobjectionable. Yet it turns out that it is by no means uncontentious, for there have been scholars who have opposed this “glottocentrist” view. There have been a number of reasons for this opposition, but the main ones break down into two basic categories, the first which one may call “inclusiveness” and the second which one may call “graphocentrism”.

11 See the materials of Blissymbolics Communications International, https://www.blissymbolics. org/ for some examples of the modern incarnations of the system.

102

4 Writing Systems

4.3.1 Inclusiveness DeFrancis (1989) notes that there are two main schools of thought about what constitutes writing (page 4, italics his): – Inclusivists: Writing includes any system of graphic symbols that is used to convey some amount of thought. – Exclusivists: Writing includes only those systems of graphic symbols that can be used to convey any and all thought. Translated into terms perhaps more familiar in the present context, inclusivists consider any conventionalized graphical communication system that conveys meaning of any sort to be writing: Powell (2009), referenced earlier in the chapter, would count as an inclusivist, and more generally almost all of the symbol systems we have considered in this book would count as writing from an inclusivist’s point of view. Indeed, some authors would cast the net even more broadly to include not only conventional symbol systems, but graphical representations more broadly: thus Garcés (2017) includes among his examples of escrituras logográficas (in his Chapter 2), not only Chinese and Aztec writing, but also Dakota Winter Counts (Sect. 3.6.16), as well as the Cro Magnon cave paintings of the Lascaux Cave. Exclusivists, on the other hand, consider as writing only those systems that encode the only communication system capable of transmitting all thought of which we are aware: natural language. DeFrancis himself would count as an exclusivist as, apparently, would I. By inclusiveness, then, I mean the notion that the definition of writing that we have been pursuing here is too rigid and not sufficiently inclusive, dividing as it does the world of conventional graphical symbol systems into two categories, writing versus non-writing. The inclusivist would counter that there have been cultures that have developed recording devices that do not encode language, or at least are not known to encode language, yet which have in those cultures much the same function as writing does in civilizations that developed that mode of graphical communication. We have seen examples of such systems already: the accounting khipu of the Inca was such a system, and they clearly served some of the functions that writing did in other parts of the world. Is khipu not “their way of writing” (Boone & Urton, 2011)? Indeed many people who study precontact civilizations of the New World, many of which lacked writing in our exclusivist sense, argue for inclusive views. See Boone and Mignolo (1994), Fernández (2015), and Garcés (2017) for additional examples. I have to admit that I find such arguments puzzling. Echoing similar points made by DeFrancis, one could note for example that boats served much the same function in Venetian culture as horses and carts did in other European cities, yet it would be odd to argue that a boat is really just a horse and cart, broadly construed. Certainly both are modes of transportation, just as writing and khipu are both conventionalized graphical symbol systems to convey information. Khipu, as far as has been demonstrated, did not allow one to record running narratives. As we noted previously, Hyland (2017) has argued that two khipus may record a linguistic

4.3 Limitations of Writing

103

text using a complex system based on colors and wool types, but the results seem speculative so far, and in any case the artifacts she is studying are post-contact, meaning that the creators could have presumably known about the existence of writing in Spanish. Urton (1998; 2017) has argued that the khipu used a 7-bit code allowing them to encode placenames, and has seemingly identified place names in some of the extant khipu. But this does not constitute full writing. Khipu was an efficient record-keeping system that allowed the Incas to administer their empire but it apparently did not have the crucial property that writing has: Namely, once a fully functional writing system is developed, even if its original purpose was, say, accounting, there is nothing in principle stopping users of the system from recording anything that they might say, whether it be shopping lists, letters to acquaintances, poetry or narrative prose. There really seems to be no justification for extending the definition of the term “writing” to cover cases that do not allow for all these things.

4.3.2 Graphocentrism The other major style of objection involves the observation that writing is, of course, a graphical symbol system, and as such it has properties that other graphical symbol systems have, including two-dimensionality. Speech, on the other hand, is onedimensional in the time dimension. By characterizing writing as a way of recording speech, one is effectively ignoring all of the ways in which writing can express more than speech can—as well as all of the ways in which speech can express more than writing can. The kind of information encoded by speech and the kind encoded by writing are intersecting, yet distinct sets. The above is of course all true, but the argument that because it is true, then writing should not be viewed as a means for recording speech, strikes me as fallacious. To start with, saying that writing is based largely on phonology (and in any case on language) does not mean that it is the same as phonetic transcription. Phonetic transcription systems such as the International Phonetic Alphabet (International Phonetic Association, 1999) are designed with the specific scientific purpose of recording fine-grained phonetic nuances of speech, including pronunciation variations across dialects and even, if desired, variation within a single talker’s pronunciation. Writing systems are not designed for that purpose. Nor need writing’s being tied to speech mean that it is limited to the single temporal dimension of speech. In the next two subsections we will discuss how writing is both more limited in what it can express compared to speech; and at the same time, is able to take advantage of a second (spatial) dimension to express things that cannot easily be expressed in speech.

104

4 Writing Systems

Limitations of Writing in Representing Speech Mark Aronoff, in a paper entitled “Orthography and Linguistic Theory” (Aronoff, 1985), discusses the case of punctuation in Masoretic Hebrew. As we have seen, Hebrew is a typical Semitic writing system—an abjad, where the letters basically represent consonants, except for a handful that double-purpose as indicating some vowels. Even the consonants are underspecified, so that depending on context, the letter ‫( פ‬peh) can represent either /p/ or /f/. ‫( ש‬shin) does not distinguish between /s/ and /ʃ/. And so forth. In religious texts, however, the consonants are always pointed (nikud), so that these ambiguities are eliminated: ‫ ׁש‬/ʃ/ is now distinguished from ׂ‫ש‬ /s/, by a dot. In addition, vowels are indicated. The system then becomes essentially alphabetic, albeit with points above, below or within the consonants to represent the vowels or details of the consonant articulation. Masoretic Hebrew punctuation goes one stage further than this: in addition the text is quite consistently marked with several levels of punctuation, and Aronoff argues that the purpose of the system was to provide “a complete unlabeled binary phrase-structure analysis of every verse” (page 28). While Aronoff bills this system as “orthography” it is important to understand that this goes well beyond what orthographic systems typically encode. Hardly any writing system provides a systematic way to mark higher level prosodic information such as phrasing or intonation. Of course, most writing systems, or at least most modern writing systems, have a system of punctuation, and it may seem that such punctuation has a similar function to that of the Masoretic system. After all, periods are typically used to mark the ends of sentences, and commas to mark phrases within sentences. But in general punctuation is far from systematic in what it marks. Thus, Nunberg (1995) has shown that while there is some correlation between punctuation usage and linguistic units such as phrases, there is by no means a simple relationship between the two. To give just one example, it is frequently the case that one uses commas with a series of adjectives as in large, green ball, yet whatever the comma marks in such a case is clearly different from what it marks when it delimits much larger units, as in the sentence you are currently reading. If nothing else, one would often read a sequence like large, green ball as a single phrase, without any noticeable break between large and green. The reason Masoretic Hebrew orthography includes such information is because it has a special function that goes beyond the normal function of an orthography that is used in day-to-day written communication. The phrasing and intonation of the text was an important part of the performance of the text in its liturgical context: when the text was read aloud, the additional notation indicated how it should be read. As Aronoff notes (page 67), while there were probably several reasons for the system’s development, it was clear that the accents were used as a guide to the recitation of the text.12 It is quite clear that they had phonological implications which would have been reasonable only if they were treated as a

12 Often

referred to as cantillation.

4.3 Limitations of Writing

105

guide to recitation. Furthermore, we know that a fairly rigorous tradition of recitation existed long before the written accent system came into being, and that the accents supplanted and augmented this tradition.

Add to this the fact that when the system was developed, Hebrew was nobody’s native language, having died out as a spoken language somewhere during the Roman period, around 200 CE, or 800 years prior to the Masoretic texts.13 Writing systems rarely have systematic ways of representing a large class of linguistic phenomena that fall under the general rubric of prosody. To be sure there are more or less conventional ways to represent some prosodic phenomena. For example in languages that use alphabetic writing systems that have case distinctions, capitalization can be used to EMPHASIZE text; font changes (bold face, italics) can also serve the same function. Japanese often uses the katakana syllabary, normally reserved for foreign words, brand names and names of animals and plants, for this purpose. Thus, while だめ dame in hiragana is the normal way to spell the word meaning ‘don’t’ or ‘not allowed’, one can also emphasize a prohibition by writing it in katakana: ダメ. Besides prosody, another limitation of written language, which distinguishes it from phonetic transcription, is the representation of non-standard pronunciations, e.g. pronunciations from dialects. Standard spellings for a language typically reflect the pronunciation of a standard dialect, even if only imperfectly. Thus English spells many words starting with ⟨th⟩ to represent the interdental voiced fricative /ð/, but there are dialects, such as some African American dialects, that lack this sound. If one wanted to convey the pronunciation of a non-standard dialect, the best one can do is provide a quasi-phonetic transcription based loosely on the standard spelling system of the language. This is often called “eye dialect”, and a classic example is Mark Twain’s transcription of Jim’s “Missouri Negro Dialect” in Huckleberry Finn, as in the following from Chapter 8: Doan’ hurt me—don’t! I hain’t ever done no harm to a ghos’. I alwuz liked dead people, en done all I could for ’em. You go en git in de river agin, whah you b’longs, en doan’ do nuffn to Ole Jim, ’at ’uz awluz yo’ fren’.

Obviously such a system can only give a broad approximation to what Jim’s speech would have sounded like, and unless one was familiar with the dialect—more or less impossible some 180 years later—it is doubtful one could faithfully reconstruct the dialect from Twain’s system. Clearly faithful transcription would be possible in a system like the International Phonetic Alphabet,14 but such a system would in any event have been useless to Twain’s purpose, and so he was stuck with using an inadequate system based loosely on standard English spelling.

13 Modern Hebrew is of course the native language of about 9 million people and is by far the most

successful instance of language revival. 14 The International Phonetic Association was founded in Paris in 1886, two years after Huckleberry Finn was published, but the system we now know as the IPA was many decades in development.

106

4 Writing Systems

In a contemporary context, one often sees eye dialect in social media texts, with spellings such as ⟨dat⟩ for ⟨that⟩, ⟨da⟩ for ⟨the⟩, ⟨din⟩ for ⟨didn’t⟩, ⟨fo sho⟩ for ⟨for sure⟩, and ⟨nuffin⟩ for ⟨nothing⟩. We note in passing that different scripts make it more or less difficult to represent non-standard pronunciations, and phonetic, and in particlar alphabetic scripts certainly have an advantage in this regard. In Chinese standard orthography, for example, it is very difficult to represent odd pronunciations of words. One of the properties of the Chinese writing system is that, precisely because its representation of pronunciation is imprecise, it is more or less equally good at representing pronunciations in, say, Mandarin as it is in, say, Cantonese. Thus the character 橡 “oak”, where the phonetic element is 象 (“elephant”), gives as good a clue to the pronunciation of the whole character in Mandarin as it does in Cantonese. Thus in Mandarin 象 is xiàng, and indeed 橡 is also xiàng. In Cantonese both are zoeng6.15 Indeed this is often touted as one of the great advantages of Chinese writing, namely that it is dialect independent (though see DeFrancis (1984, 1989) for critical discussion of this idea). But this dialect independence is also a handicap if one wanted to indicate in writing a particular pronunciation: there is no way in traditional Chinese writing to indicate that in a particular case, the Cantonese pronunciation is the intended one. Syllabaries fare somewhat better: it is at least possible, say, in Japanese katakana to indicate an odd pronunciation, so long as that pronunciation can be rendered using syllabic symbols available in the script. But one is freest with alphabets, since one can specify sounds at a finer grained level than is readily available in other systems. If one wants to represent an Elmer Fudd-like pronunciation of, say, brick as say ⟨bwick⟩ (indicating that the ⟨r⟩ is pronounced as a /w/), one can easily do this, even though standard English does not have syllables beginning with /bwi/. Thus, while writing does allow one to encode speech, this is not the same as saying that writing systems allow one to encode all aspects of speech. Writing is clearly inadequate for conveying a lot of the nuances that one can convey in speech. One reason that it is easy to unintentionally offend people in email is that the finer nuances that speech conveys are simply missing from standard written language.

The Two-Dimensional Aspect of Writing Readers of Alice in Wonderland will be familiar with the “Mouse’s Tale”, a visual poem set in the form of a mouse tail (Fig. 4.2). Visual poetry has a long history dating back at least a couple of thousand years (Bohn, 1993), but became particularly popular as part of avante-garde art during the early part of the twentieth century. An example from Guillaume Apollinaire (1880–1918), cited by Bohn (page 57) is shown in Fig. 4.3. The text of the poem forms the frame of the mirror, and the author is “reflected” in the mirror, by means of placing his name in the middle of the frame.

15 The

‘6’ in zoeng6 is a tone indicator.

4.3 Limitations of Writing

107

Fig. 4.2 The Mouse’s Tail/Tale from Alice in Wonderland. Source: https://upload.wikimedia.org/ wikipedia/commons/3/32/Alice_in_Wonderland_Ch.3.jpg. License: CC BY-SA 4.0

The plain French text reads, clockwise starting at the top: Dans ce miroir je suis enclos vivant et vrai comme on imagine les anges et non comme sont les reflets (In this mirror I am enclosed alive and true as one imagines the angels and not as are reflections.) In both Carroll’s and Apollinaire’s examples, the visual arrangement of the words is significant, and corresponds to nothing one finds in speech. Sometimes even a simple linear arrangement of written symbols has the possibility of conveying nuances not found in speech. As an example one can take Fenollosa’s (1920)

108

4 Writing Systems

Fig. 4.3 Visual poem in the form of a mirror, Guillaume Apollinaire. Source: a visual poem by Guillaume Apollinaire. Work is in the public domain in its country of origin and other countries and areas where the copyright term is the author’s life plus 70 years or fewer

short (Classical) Chinese phrase 日昇東 (rì sh¯eng d¯ong) ‘the sun rises in the east’. Fenollosa points out that each of the characters in this phrase contains the component 日 ‘sun’: it appears as the top component of the character 昇 (‘rise’) as well as appearing in the middle of the character 東 “east”,16 and thus one can see what is in effect a little “animation” of the sun rising as one moves through the sentence.17 Intentionally, or accidentally, the written medium can convey information not found in speech.

16 東

is often presented as being composed of 日 “sun” and 木 “tree” with the implication that the east is where the sun appears behind the trees. This is however a folk etymology, since the original form of the character was a phonetic borrowing from a different character. 17 Unfortunately Fenollosa read too much into cases like this, believing that Chinese writing presented ideas directly and thus that all Chinese writing, and more particularly poetry, was a direct and natural appeal to meaning in the mind. Ezra Pound developed this idea further in what he termed the “ideogrammic method” (see, e.g., Tytell 1988). This is, of course, nonsense: Chinese writing, as we have already discussed, no more represent “ideas” than do spelled words in English. See DeFrancis (1984, 1989).

4.3 Limitations of Writing

109

Another aspect of writing is its two-dimensionality. True, in most ordinary text the two-dimensional aspect is not used in any interesting way. The page you are reading is two dimensional, but really this is just due to the page being of a limited physical size, and the necessity therefore of the text having to wrap. The text in ordinary prose is not really two dimensional. But two-dimensionality is used to good effect in some uses of text, such as for example tables, where information is arranged in rows and columns. Tabular information is hard to convey in speech. This is a problem that organizations that provide audio books for the blind, such as Recording for the Blind in Princeton, New Jersey, have had to address. In the case of tables, guidelines have been developed for readers on how to present tables; e.g., how often to repeat column headers, how to indicate which column a given piece of information is from, and so forth. Since it is not practical to have a human reader produce audio versions of all possible texts, there has been a great deal of interest in developing automated systems that produce speech output using text-to-speech (TTS) technology. The technology was pioneered by Ray Kurzweil in 1975 when he hooked up a TTS system to an optical character recognition system.18 Modern systems use digitized text as input, and much higher quality TTS systems than were available in the 1970s but such systems face various technological problems, especially for reading material that is rich in tables or mathematical expressions. Various approaches have been proposed, probably the most famous of these being TV Raman’s Aster system (Raman, 1994), which was developed particularly for the rendering of mathematical expressions including complex equations with many subscripts and superscripts, as well as matrices. Raman’s system makes use of various clever devices including changes in pitch and volume to produce simulations in auditory space of the layout of the expression on the page. Thus superscripts (as in exponents such as ex ), would be rendered using a higher pitch than the inline material. The resulting speech, of course, sounds highly unnatural upon first exposure, and this is because it is unnatural: the system is being asked to use speech in ways that it is not normally used, in order to convey information that is not normally conveyed in speech. And this in turn is a direct consequence of the fact that symbols, whether from a writing system or some other symbol system, are rendered on media where one is able to make effective use of a second dimension not available in spoken language. Indeed such technology turns the relationship between writing and speech on its head. It is no longer a matter of writing serving as a way of recording spoken language: instead, spoken language is being extended in its capabilities as a technology for rendering written information. We noted in the previous section that one sees instances of eye dialect in social media such as chatrooms and Twitter, attempts to imitate dialectal or merely unusual pronunciations of words with odd spellings. One also finds purely visual devices in social media, with emoticons, including elaborate Japanese-style kaomoji (strings

18 http://www.kurzweiltech.com/kcp.html.

110

4 Writing Systems

like o(≧▽≦)o—Sect. 3.6.26) as well as emoji. In most cases these are not intended to be read as words, but merely serve to convey a general mood or emotion. Again, these devices have no direct counterpart in speech, and to render messages that contain them into speech requires some arbitrary technological decisions on how they should be rendered. Finally, as one further example of a capability of writing that has no direct counterpart in speech, consider stylizations of written forms that can be used as a sort of visual “pun”. An example is given in Fig. 4.4, where the Japanese word 禁止 kinshi “forbidden” is cleverly stylized so that the bottom two strokes of the kanji 禁, are replaced by two dots, so that the whole character looks a little like someone standing on a skateboard. This is an instance of what Harris terms graphic syncretism where, according to him (Harris, 1995, page 48): ‘recognizing’ the pictorial element involves quite different visual and mental processing from recognizing the scriptorial element (the letter form). Nor does recognition of the one automatically entail recognition of the other.

The latter point is particularly clear in the case of this example since even someone who cannot read kanji may recognize the attempt to represent a skateboard. Whereas a native reader of kanji could, one supposes, miss the cute extension of the symbol to represent the picture. The example in Fig. 4.4 is of course similar to examples we already discussed previously (Sect. 2.3) where Egyptian hieroglyphs are integrated into larger visual designs.

4.3.3 Summary The term “writing” is contentious. Does “writing” denote any graphical symbol system, or just those that represent language? And in representing language, is writing specifically an attempt to represent the sounds of speech, or is written language to be viewed as distinct from language? Scholars of writing systems have come down on both sides of these issues. We have tried to argue here for the narrow, exclusivist view of writing, one where writing denotes those symbol systems that represent specifically language, and where the design of writing systems suggests an intimate connection to speech. In this I follow such scholars as DeFrancis and Rogers. The first point—whether writing is solely “glottographic”—is to a large extent a matter of how one chooses to define the term. However, we have suggested that at least in common parlance, the default interpretation of the term is strongly associated with writing linguistic messages. The second point, it seems to me, is less debatable: the evolution of writing systems in all cases involved the development of signs that encoded sound augmenting, and in many cases ultimately eliminating, symbols that directly represent meaning. The rich communication possibilities of writing would not be possible if this had not been the case. Writing gets its richness by piggybacking on language and speech. We thus defend a view that is strongly “glottocentric”.

4.3 Limitations of Writing

111

Fig. 4.4 禁止 kinshi “forbidden”, with the first kanji stylized to look like a skateboard. The sign reads: “Since they can be dangerous to others, skates, including skateboards and rollerblades are forbidden.” (Location: Yoyogi Park, Tokyo)

However, we have also noted areas where writing is inadequate to represent what can be done in speech, as well as areas where writing is more expressive than speech. Some authors, notably Harris (1995) have made much theoretical hay from such observations, using them to argue against glottocentricism and, in Harris’ case, what he terms an “integrationist” approach. But such arguments strike me as misguided: the fact that writing can express things not readily expressible in speech, or that one can do in speech what is hard to express in writing, is a simple consequence of the different media used for the two, and does not detract from the point that writing evolved originally as a way to transfer linguistic communication to a new medium. Once transferred, the users of the new technology discovered that there are things one can do with writing that would be hard to do with speech; and presumably that there were things in speech that could not easily be transferred to writing. The same dissociation can be found with other notational systems. For example, it seems hard to argue with the proposition that Western musical notation evolved as a means to notate the durations and other properties of tones on a dodecaphonic scale. Such notation includes symbols for representing rests, i.e. silences. The fact that one can then write compositions consisting entirely of silence, as with John Cage’s “Four Minutes and Thirty Three Seconds,”19 where it is at least debatable 19 Cage

probably just used an empty staff rather than a repetition of a rest symbol.

112

4 Writing Systems

Fig. 4.5 Non-musical use of musical notation

whether the performance itself is really music, hardly invalidates the proposition that musical notation’s main function is to represent a sequence of actions in the performance of a piece. Neither do jocular non-musical uses of musical notation as in Fig. 4.5. We thus see no reason to reject the basic thesis that writing is, at its core, a way of representing speech and language in graphical form.

4.4 Writing: A Summary We have discussed several aspects of writing in this chapter. First, we talked about what defines writing, as opposed to other kinds of graphical symbol systems that also communicate information. As we argued, writing is writing precisely because it encodes linguistic information and, more particularly, in order to be fully functional, must be at least in large measure phonographic. This effectively defines writing as a notational device that encodes language and this point, while seemingly innocuous enough, has not been entirely uncontentious. Some scholars have tried to argue that the term writing should be construed more broadly, and others have emphasized the point that because writing involves a different medium than speech, it should therefore be considered on its own, rather than viewing it as basically a way to record speech. We have tried to argue against these latter views by pointing out that while it is certainly true that writing differs from spoken language in what it encodes, that this has no bearing on the basic purpose of writing. In Chap. 6 we will turn to another topic: the pristine evolution of writing from prior non-linguistic symbol systems. And in Chap. 7 we will propose a computational model that simulates this evolution, but before we turn to these topics we need first to talk about what is known about the neural processing of writing, versus that of other graphical symbol systems. We turn to that in Chap. 5. One point that we have not discussed is what may be termed the prestige of writing. In many, though not all, cultures, the written word has an elevated cultural status, so much so that there have been many instances of pseudoscripts being developed. Thus on Greek vases of the classical period, one often found writing, for example the written names of figures depicted on the vase. But one often also found nonsense writing (Mayor et al., 2014; Houston, 2018), which at least in some cases may have been done in order to accord the utensil more value. In a similar fashion, scarabs from Canaan of the Middle Bronze age would often be inscribed with pseudo-hieroglyphs (Ben-Tor, 2009), presumably to make them look more like genuine Egyptian items. We will return a bit to this issue in the final chapter.

Chapter 5

Symbols in the Brain

Semiotics, as we saw in Chap. 2, deals in the broad topic of signs, and how they communicate meaning. Oddly, though, semioticians have not traditionally concerned themselves too much with brains. I say “oddly”, since it is in the brain that signs, including symbols and icons, are processed, and it is in the brain that meaning resides. For neurologists, psychologists and neuropsychologists, on the other hand, the general question of how brains process information has been primary. Traditionally, for hundreds of years, the only tool available to understand these questions was accidents: a patient would undergo some kind of trauma affecting some areas of the brain and henceforth would display a changed behavior. If the damage occurred to one of the areas involved in processing written or spoken language, the patient might lose the ability to speak entirely or might construct fully fluent but nonsensical utterances. In a few “lucky” cases, the patient’s impairment might be far more selective: for example, they might retain the ability to speak and understand language, but lose the ability to read—though retaining the ability to process other graphical symbols, such as digits. After the patient’s death, a researcher could find out which parts of the brain were damaged. With enough such cases, a picture began to emerge about which portions of the brain were responsible, broadly, for which kinds of information processing. Needless to say, such approaches are crude: they rely on accident, and more to the point involve what, from the standpoint of neural wiring, were very large areas of the brain. The situation was roughly akin to trying to understand the workings of a machine, by taking random swings of a sledgehammer, to see what functionality the machine loses. The use of “animal models”, e.g. with cats, or monkeys, which were not (at least until recently) constrained by the same kinds of ethical considerations as are applied to humans, did allow for more fine-grained research. In those cases, implantation of electrodes into specific neurons in the brain allowed researchers to determine with fairly fine granularity where certain kinds of information was processed. Most relevant for the question of how humans process graphical symbols is research on how monkeys, whose brains are very similar to ours, process visual stimuli. But of © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Sproat, Symbols, https://doi.org/10.1007/978-3-031-26809-0_5

113

114

5 Symbols in the Brain

course there are severe limitations on how far one can go with “animal models” when it comes to investigating human information processing, especially when it comes to language: animals cannot speak.1 Within the last half century or so, a great many more fine-grained tools have become available, including electroencephalograms (EEG), which allow for mapping the time course of neural activity in broad areas of the brain, and more recently imaging techniques such as positron emission tomography (PET) and functional magnetic resonance imaging (fMRI). Both of these recent techniques measure cerebral blood flow, the idea being that the more work a part of the cerebral cortex is doing, the more oxygen, and hence the more blood flow it needs. fMRI has been particularly important in studies of neural information processing since, while it has very poor time resolution, its spatial resolution is very good, meaning that increased activity in a particular area of the brain performing some task can be readily detected. In this chapter we will review some of the recent work on the processing of symbols in the brain, with a particular view to how written language is processed, and how non-linguistic symbols are processed. The ultimate goal is to ask, and provide some possible answers for the following question: What happens in the brain when a previously non-linguistic symbol system evolves into writing?

To get there, though, we first need to understand the following: 1. Where are language and meaning represented in the brain? 2. How does the low-level visual processing of written language proceed? 3. How does the output of the low-level visual processing of graphical symbols connect to the parts of the brain that process meaning? 4. How does the processing of non-linguistic graphical symbols differ from the processing of written language? As we will see, some of these questions can be answered fairly well with available data, others are more the realm of speculation at this point. The ultimate first question above, in particular, falls into the latter category, and will be the topic of the final section of this chapter, as well as the following two chapters.

5.1 Relevant Areas of the Brain Figure 5.1 shows some areas of the brain that will come up in the ensuing discussion in this chapter. In the figure, we show only the left hemisphere, the hemisphere that in most people is associated with speech and language. Important areas include:

1 In special circumstances electrode implantation has been possible in humans, in particular in cases

where people are undergoing brain surgery.

5.1 Brain Areas

115

Fig. 5.1 The left hemisphere of the brain, adapted from Blausen.com staff (2014). Source: WikiJournal of Medicine. https://commons.wikimedia.org/wiki/File:Blausen_0111_BrainLobes.png. Author: BruceBlaus. License: CC BY-SA 3.0

– The motor speech area or Broca’s area in the inferior frontal gyrus. Damage to this area is associated with agrammatic aphasia, also called Broca’s aphasia, which is characterized by disfluent and grammatically impaired language. – The sensory speech area or Wernicke’s area in the superior temporal gyrus of the temporal lobe. Damage to this area is associated with fluent aphasia, which is characterized by fluent but often nonsensical speech production. – The occipital lobe, the seat of visual processing. Important for the ensuing discussion is the ventro occipito-temporal region at the bottom of the occipital lobe, which seems to be associated with reading written language. – The primary motor cortex in the frontal lobe and the primary somato-sensory cortex in the parietal lobe. These areas relate to skilled motion of different parts of the body, and sensation in different parts of the body, respectively. Parts of the body are mapped to the adjacent portions of these two regions, so that the head for example maps to the bottom of the primary motor cortex/primary somato-sensory cortext nearest the temporal lobe, the upper limbs are above that, the trunk above that, and wrapping around to the inside of the longitudinal fissure that separates the two hemispheres, the lower limbs and the genitals. – The medial temporal gyrus, which also seems to be associated with knowledge about actions, in particular use of tools; as we will see it is also implicated in agraphia, the inability to write, in particular kanji. The primary motor cortex and primary somato-sensory cortex are also associated with those aspects of meaning that relate to motion or sensation. But as we will see, meaning more is widely distributed in the brain, with the different dimensions

116

5 Symbols in the Brain

of meaning in general being associated with activation in those parts of the brain that relate to each particular dimension. Since meaning is ultimately central to any understanding of symbols, we turn in the next section to a brief survey of what is known about how the brain represents meaning.

5.2 Meaning in the Brain A large amount of research over the past few decades has made it clear that “meaning” does not reside in one area of the brain, but is rather distributed across multiple areas, with different areas being functionally related to various dimensions of meaning (Damasio et al., 1996, 2004; Thompson-Schill, 2003; Martin, 2007; Mitchell et al., 2008; Pülvermuller, 2013a,b). The classic study by Damasio et al. (1996) correlates a large amount of evidence from both lesion studies and PET imaging for the localization of various classes of semantic knowledge, including categories such as fruits and vegetables, tools, musical instruments, and animals in different parts of the cerebral cortex. Thompson-Schill (2003) reviews evidence (page 82) that in studies that probe semantic knowledge about color, size, motion and form, areas of the brain that are activated correspond to areas that are also active when perceiving those attributes. Martin (2007) discusses evidence (e.g. pages 36–37) that knowledge about tools involves portions of the brain, including the ventral premotor cortex, that are active when people use those tools, as well as other areas such as the posterior medial temporal gyrus. Note that this latter area has been implicated in alexia—loss of the ability to read, and agraphia—loss of the ability to write, of kanji (Sakurai et al., 2008). Because of the complexity of kanji, a large portion of the memory for reading and writing involves memory for the motions involved in writing the component strokes of the characters. Indeed, there is evidence that word meanings can be used to predict neural activity. Firth (1957)’s famous maxim that “you shall know a word by the company it keeps” (page 11), has a modern reflex in vector-space representations of word meaning (Mikolov et al., 2013), which involve constructing vectors of words associated with a given target word with the result that words that fall into similar semantic categories will have nearby vectors, and those that are semantically quite different will have distant vectors.2 Using such measures dog will be much more similar to cat than it is to blancmange. Such representations were used by Mitchell et al. (2008) to train a machine-learning model to predict areas of neural activity based on the vector representation of a word. This was done by pairing vector-space semantic representations for words as inputs against patterns of fMRI activity for a

2 See Brunila and LaViolette (2022) for an in-depth analysis of the relationship between Firth’s original ideas and modern computational models. As Brunila and LaViolette point out, Firth had a much broader notion of ‘context’ than simply neighboring words in a sentence including, for example, the social context in which an utterance is used.

5.2 Meaning

117

Fig. 5.2 Summary of areas of the brain active for different semantic categories. From Pülvermuller (2013a), Figure 1, page 460. Source: Pülvermuller (2013a), Figure 1, page 460. License: CC BYNC-ND 4.0

given set of nouns, and then considering the neural activity predicted by the model for a set of such data not seen in training. The model was able to match neural activity for unseen words given their vector-space representation at significantly above chance levels. A nice summary of the different areas of the brain associated with different aspects of meaning is that of Pülvermuller (2013a). See Fig. 5.2. So, meaning is distributed among various parts of the brain: Generalizing somewhat, the meaning of apple is not located in just one place, but in multiple places that relate to shape, color, taste, smell, texture, and so forth, in addition to associations to the linguistic properties of the word apple itself—its sound and, in literate speakers, its spelling. But is that all? Patterson et al. (2007), in a review of the phenomenon of semantic dementia, argue for a more integrated view, in which in addition to the distributed representation of meaning, there is a hub, that coordinates inputs and outputs of different modalities, channeling these to and from the various components of meaning. Semantic dementia is a cognitive deficit where the patient gradually loses the ability to recognize common objects. It is not a loss of memory: patients are often able to remember quite complex patterns, such as the particular route to a destination. Nor is it specific to a particular modality. A patient with semantic dementia may lose the ability to identify a sheep visually, but they will also be unable to understand what the word sheep, either spoken or written denotes, will be unable to name sheep when shown a picture, be unable to identify something amiss in a picture of a sheep with, say, unusally large ears, and so forth. In delayed copy tasks, where the patient

118

5 Symbols in the Brain

is asked to draw a picture from memory that they saw a few seconds previously, their drawings will frequently miss crucial aspects that make a particular object distinctive: For example, when asked to redraw a dromedary camel, they will omit the hump. As with more familiar dementia, such as Alzheimer’s disease, the patient will likely retain very general categories, so that they can readily identify a sheep as an animal. But they seem to have lost awareness of the particular collection of distinctive properties that make a sheep a sheep, and no amount of probing in any modality seems to be able to retrieve it. As Patterson et al. (2007) argue, this is hard to explain if meaning is merely distributed widely in the brain. Since semantic dementia is not a generalized loss of memory, it cannot simply be because of widespread damage to cortical functions. If it were localized damage affecting one of the areas where meaning is represented, then one might expect that the damage could knock out some aspects of the meaning, but one would expect others would be retained: a patient might lose the ability to recognize a sheep from visual cues alone, but might recognize the familiar cry of a sheep, or understand what the word sheep denotes. However, the phenomenon becomes easier to explain if, besides the various distributed components of meaning, there is in addition a central control through which signals from different modalities— visual, auditory, speech, writing, …—pass, which are then distributed to the various components associated with meaning; and in the reverse direction, when someone is asked to draw, or name or write the word for a concept, the information passes again through this central control. This hypothesized area, Patterson et al. (2007) call a hub. See Fig. 5.3 for an illustration of their model. Various neuropsychological and functional evidence leads to the hypothesized location of the hub being in the anterior temporal lobe (Rosen et al., 2002; Patterson et al., 2007; Anzellotti, 2017), an area that seems to be involved when people are called upon to make very specific judgments about a stimulus. Patterson et al. (2007) cite neurological

Name

Action

Colours

Motion

Taskdependent representation Taskindependent representation Task Shape

Fig. 5.3 The “distributed-plus-hub” view of meaning, from Patterson et al. (2007), Figure 1b, page 977. In the particular instance shown, input from a particular modality, such as an object’s shape gets passed into a hub, which then distributes the input to the various portions of the brain that represent the object—its name, what kinds of actions it is associated with, its colors, type of motion and so forth. Used with permission. Source: Patterson et al. (2007), Figure 1b, page 977. Springer Nature publication

5.3 Reading

119

evidence (Damasio et al., 1996) from patients with anomia (the inability to name objects), that locates fine-grained identification of objects (e.g., a particular famous person) and basic identification of categories (e.g. an elephant), in the anterior temporal lobe. Increased blood flow in the region was also observed in healthy subjects asked to perform the same tasks. Given that meaning resides at least in those parts of the brain that relate to the perception or the manipulation of the corresponding real world objects, it is perhaps no surprise that early humans believed that meaning-bearing symbols exerted magical power over objects (see Sect. 3.2). Clearly, though, most of the time modern humans at least can distinguish stimulation of the cortex invoked by observing a real object, versus stimulation in the same parts of the brain invoked by a word or symbol. Like Magritte’s famous painting of a tobacco pipe entitled Ceci n’est pas une pipe (‘this is not a pipe’), we mostly understand that a depiction or a symbol for something is simply an image, not the object itself. One part of the brain that is likely active in this ability is the insula, a part of the cortex deep in the lateral sulcus, the horizontal fissure separating the temporal lobe from the parietal and frontal lobes. This region seems to be associated with interoception, the perception of bodily sensations, as well as self-monitoring more generally. The insula is implicated in hallucinations in schizophrenia, where multiple researchers have reported abnormal anatomy in the insula in patients suffering with that illness (Wylie & Tregellas, 2010; Namkung et al., 2017; Yang et al., 2020; Barber et al., 2021), as well as in addiction, due to its role in interoception and mediation between bodily states and conscious reaction to those states (Naqvi & Bechara, 2009). Not surprisingly, the region is also implicated in trance in shamanic practice (Hove et al., 2015), tying again back to some of the earliest use of depictions and symbols discussed earlier. Having addressed some of the issues related to Question 1 in the introduction to this chapter, we now turn to issues that relate to the next two questions.

5.3 Reading Written Language 5.3.1 The Letterbox Changizi and Shimojo (2005) and Changizi et al. (2006) report on statistical studies that argue that if one breaks down characters across a wide variety of writing systems, certain basic shapes occur more often than others. Consider characters that just consist of two strokes. Putting orientation, symmetry and whether the individual strokes are completely straight or not to one side, there are really just three ways in which two strokes can be combined, namely as a T, an X, or an L. Three strokes allow for various other possibilities, such as Y, F, K, ∆ …. Just based on these visual properties, one can break down complex glyphs, such as those that occur in Chinese or Japanese writing, into basic components. To take Changizi et al. (2006)’s example (page 14) the symbol ⇐⇒ contains: two Ls (the left and right arrow tips); four Ts

120

5 Symbols in the Brain

Fig. 5.4 Some “letters” that occur in a natural scene, after Dehaene (2009), page 137

(half of each arrow tip, combined with one of the two horizontal lines); four Fs (each of the two arrow tips, combined with one of the two horizontal lines); and two Hs (each of the two horizontal lines combined with the top, or bottom halves of the two arrow tips). If one then counts how many of each of the basic symbols occur in a page of text, one finds a distribution that is roughly similar across a wide range of writing systems, ranging from alphabetic scripts to logographic systems like Chinese. T and L are more common than X, which is more common still than ∆. As Dehaene (2009) notes (page 178), this distribution cannot be explained by chance: if one throws two sticks on the ground, if they meet they are much more likely to form an X than anything else. Rather, as Changizi and colleagues argue, the distribution seems to reflect the kinds of edge configurations that one finds in natural scenes. Consider Fig. 5.4. A cube-like figure is in the foreground, with part of it occluding the horizon. The places where the cube crosses the horizon form T-junctions. The bottom of one of the faces of the cube, with the bottom of the cube and the lefthand side hidden, forms an L. The forms F and Y, respectively, represent a corner of the cube facing away from the viewer, and a corner facing towards the viewer.3 Being able to detect such edges rapidly and integrate them into large scenes is clearly an adaptive advantage for an animal that needs to move around in its environment, and avoid obstacles, such as a large rock in its path. Changizi and colleagues’ hypothesis, then, is that the visual shapes one finds frequently represented in written language reflect basic edge combinations that are common in natural scenes and which our visual system is particularly well adapted to recognizing. While Changizi and colleagues’ work is widely cited, it is certainly not universally accepted. Daniels (2018), for example, takes them to task, arguing that “from a graphonomic point of view, [their article] is severely flawed” (page 152). To some extent though Daniels’ objection is beside the point. He notes for example that “[c]haracters are decomposed into ‘strokes’ not according to the practice of scribes, but by visual inspection of typographic forms”. While one might legitimately

3 The

reader may object that actual cubes do not occur so frequently in nature. However block-like rocks certainly do occur, so the scene in Fig. 5.4 is not so unnatural.

5.3 Reading

121

criticize the focus on printed forms, which can often be quite different from handwritten versions, the writing practices of scribes are to a large extent orthogonal to the issue, since the hypothesis that Changizi and colleagues are pursuing relates to how the visual system processes inputs, not the way in which scribes may write them. This is elementary: there are three strokes in the Chinese character 口 kˇou ‘mouth’, with the lefthand vertical stroke being written first, the top and right constituting a single stroke, and the bottom horizontal stroke being written last. But as far as the visual system is concerned, there are four L corners, and it is the visual system that is at issue here. Even if the details of Changizi and colleagues’ hypothesis turn out to be wrong, one suspects that something like that hypothesis must nonetheless be right, for a simple reason: while humans probably evolved to speak, they did not evolve to read. Yet neurological studies, as detailed by Dehaene (2009), have narrowed down an area in the ventral occipito-temporal region of the occipital lobe as being critical in the early visual processing of writing. This region, which is also often called the Visual Word Form Area (VWFA), Dehaene (2009) jocularly refers to as the brain’s “letterbox” (see Fig. 5.1), seems to be involved in reading no matter what the script. Dehaene points to various studies that all point to this area being critical, but the most evocative, which Dehaene cites at the outset of his discussion is also the earliest evidence for the importance of this region in reading. Déjerine (1892) reports on a patient, Mr. “C”, who as a result of a stroke, lost the ability to read. His speech production and understanding were unimpaired, as was his vision, except that he had lost color vision in his righthand visual field. More surprisingly, his ability to read digits was unimpaired: he was able to read long strings of digits and could perform arithmetic calculations on paper. Equally surprisingly, his ability to write was unimpaired—except that he was unable to read what he had just written. Upon Mr. C’s death an autopsy revealed a lesion in the ventral occipito-temporal region. See Fig. 5.5. In fact, Déjerine suggested—and see (Dehaene, 2009, page 62)—that what Dehaene terms the letterbox was probably not actually destroyed in Mr. C’s case, just disconnected by a lesion in the vicinity of that area. The visual processing of letters may have gone on as before, but the results of the processing never made it out of the letterbox. Since Mr. C’s ability to process digits was unimpaired, apparently digits are processed somewhere else—Dehaene (2009) suggests that they are processed “probably in a more posterior area of the left and right hemispheres” (page 95). Yet digits share visual properties with letters: 7 is after all almost an L inverted, and 1 and I are almost identical. In general there is no principled reason why the shapes that we know as digits could not have been used as letters. Indeed one writing system in fact uses symbols that were originally digits: The Thaana script for Dhivehi, the language of the Maldives Islands, derives many of its letters from Arabic numerals (Gair & Caine, 1996). So why are written symbols specialized to an area of the left hemisphere that could not possibly have evolved specifically to process writing? And why, as Dehaene reports (page 169), did a young patient with epilepsy requiring as treatment the

122

5 Symbols in the Brain

Fig. 5.5 Damage to Mr. C’s left hemisphere. Note the shaded area of the lesion on the lefthand side, in the occipital lobe. From Déjerine (1892), Figure 3, page 79, shown from the side of the longitudinal fissure but flipped horizontally to match the orientation of Fig. 5.1. Source: Déjerine (1892), Figure 3, page 79. Work is in the public domain in its country of origin and other countries and areas where the copyright term is the author’s life plus 70 years or fewer

“removal of the entire left visual brain”, end up being able to read using (as shown by a brain scan study) the ventral occipito-temporal region of the right hemisphere? Clearly that region of the brain is particularly efficient at a task that is critical to the low-level processing of written language. Dehaene cites studies of the comparable areas of the monkey brain that suggest that this area evolved for efficient processing of low-level edge detection of the kind that is needed for rapid scene analysis as discussed previously. Of course there are many other components to reading than just edge detection. The edges have to be combined into complex glyphs, the glyphs into morphemes and words, and the words ultimately into sentences, paragraphs and beyond. All of this requires many layers of neural processing that eventually includes the various regions of the brain that deal with phonology and meaning. Yet readers do this seemingly effortlessly; in order for the system to work, the low-level edge detections must be very efficient. The “letterbox” region seems to have evolved for precisely this kind of efficient processing, and thus is the preferred region to coopt for the purposes of reading. Digits and other non-linguistic symbols need similar processing to written symbols insofar as they are also composed, at a low-level, of intersecting edges, but since they do not apparently need to be processed as quickly as written symbols, they do not require the most high-efficiency neural hardware. The fact that it is the left ventral occipito-temporal region that is used in unimpaired readers makes sense, since it is in the left hemisphere that most languagerelated processing takes place, and thus this puts the letterbox closer to the parts of the brain that require the results of its processing. The young patient mentioned previously, who underwent a lobectomy treatment for epilepsy, could read but her reading was noticeably slower than normal, reflecting the extra time required to

5.3 Reading

123

pass the output of the right hemisphere “letterbox” over to the patient’s (intact) lefthemisphere language processing.4 But what happens when the letterbox passes the results of its processing on to the next region of the brain? As we have noted, it was probably somewhere in that pathway that Mr. C’s deficit lay, rather than in the letterbox itself. What happens is currently a matter of speculation, but clearly strokes and corners get assembled into letters, and then letters get grouped into morphemes. Dehaene suggests (pages 153– 158) that a critical component are what he calls bigram neurons, which are sensitive to pairs of not-necessarily consecutive graphemes that occur in words. Thus the word word contains the following bigrams: wo, wr, wd, or, od, rd. So long as most of the word’s bigrams are not too disrupted—either by omission or by s p a c i n g t h e m t o o f a r a p a r t, word recognition is not overly disrupted. That helps explain why so long as critical anchors such as the first and last letters are not moved, it is not dififuclt to raed txet werhe the lteetrs of the wrods have been itnenrally roeredred: The misspelled wrods is only missing one of the bigrams (or) that words has;5 see Grainger and Whitney (2004). The idea that Mr. C’s deficit resided not in the letterbox itself, but in its connections to downstream components, and thus somewhere along the pathway of assembly of those components into larger units, receives support from two other interesting facts about Mr. C’s post-trauma condition. Dehaene is correct in stating that Mr. C was still able to read digits, and could perform arithmetic on digit sequences. But Déjerine also observed the following (page 66, translation mine): on seeing the number 112, he states, “it is a 1, a 1 and a 2”, but only upon writing the number is he able to say “one hundred twelve”.

Why should this be? If he could recognize digits and if his numeracy was preserved, as it seems to have been, what was broken that prevented him from reading cent douze ‘one hundred twelve’, upon seeing the digit sequence 112? One reasonable hypothesis is that what was broken was, again, the higher-level assembly of smaller pieces into linguistic units. While Hindu-Arabic numerals and the decimal place notation convention invented in (what is now) India clearly constitute a nonlinguistic symbol system, most writing systems in the world incorporate this system as a way of representing number names. Thus 112 represents one hundred twelve in English, or cent douze in French, and fluent (and numerate) readers learn to map between the clearly non-linguistic representation and the corresponding linguistic expression. Thus language enters the picture, and evidently Mr. C’s deficit broke that connection.

4 At

this point, the reader may be wondering what happens in the case of blind readers reading Braille. Surprisingly, perhaps, recent work shows that blind readers make use of much the same parts of the occipital cortex, though using a perhaps wider area, as sighted readers use to read visual graphemes (Reich et al., 2011; Dzię giel-Fivet et al., 2021). 5 In noctrats erdangi xtte elki tish is chmu rahred. Thanks to a reviewer for suggesting this contrastive example.

124

5 Symbols in the Brain

But this is not all, because Déjerine observed another loss in Mr. C’s capabilities. Both Mr. C and his wife were fine amateur musicians, with Mr. C in particular being a good singer. He was also a good reader of music. That ability was also lost (page 74, translation mine): This verbal blindness, so clear for letters, was accompanied by a completely analogous musical blindness. I have already stated above that the patient was a skilled reader of music. Today, it is impossible for him to read anything, not being able to read notes any more than letters; he could however at my request write a G note or an F, various notes, and so forth. He preserved intact the ability to sing; according to his wife he sang as well as prior to his acquisition of this musical blindness. Indeed, at my request Mr. C sang three opera pieces very correctly.

Déjerine also observed that Mr. C was able to learn new songs after his trauma, but aurally upon hearing them, not by reading the score. So we have an entirely parallel deficit in music to what was observed in reading. Does it therefore make sense to talk of a letterbox? I believe it does, but with a caveat, namely that what is at issue here is not the ultimate function of the symbols of a symbol system, but rather the fluency with which one is required to read the symbols. Clearly reading music requires similar sorts of line and edge detections as with letters: if nothing else, the way you tell a G from an F in the treble clef is by observing, respectively, whether the note is on and thus intersected by the second line from the bottom of the staff, or below it. But reading music well requires one to do this processing very quickly, and it makes sense that the same brain region that well serves the process of reading written language, would also well serve the process of reading music. The fact that the musical notes are assembled into longer “messages” in accordance with the conventions of a non-linguistic rather than a linguistic symbol system is immaterial. Indeed, as Vogel et al. (2014) argue, this area of the brain is not, obviously, specialized for reading nor is reading the only processing that is accomplished there: rather it is “a general use region that has processing properties making it particularly useful for reading” (page 1), as we suggested previously. Mr. C’s music reading deficit, paralleling his reading deficit for language, suggests that the fluent processing of symbols probably follows similar lines, though ultimately will engage very different parts of the brain, no matter what kind of system one is dealing with. Again, as we have stressed throughout this book, complex structure and the consequences of structure are characteristic of writing and language, but they are not only characteristic of writing and language.

5.3.2 Summary: The Evolution of the Letterbox Dehaene (2009) suggests (pages 183–184) that the localization of the low-level processing of reading in the “letterbox” area probably evolved over a long period. In particular, many early scripts involved glyphs that were far more pictographic and thus more like icons than the much more linear components of modern writing systems. Take Egyptian hieroglyphs for example. If one could resurrect an Ancient

5.4 Non-linguistic Symbols

125

Egyptian scribe, and place them in an MRI machine, would the same localization of low-level reading functionality be observed? Of course Egyptian scribes, presumably with a view to writing efficiency, soon developed more abstract and linear forms of writing: hieratic and, eventually, demotic writing. At that point, one would expect their brain functions to mirror ours. Dehaene also points out that certain categories of shape tend to be avoided in writing systems: faces, which the brain tends to process in the right hemisphere, are thus poorly suited to functioning as written symbols. Dehaene admits the exception of Mayan (page 184), which does have head glyphs, but in fact this is really the exception that proves the rule: the head glyphs seem to have been largely decorative, and it is not the head or face itself that is critical, but some much more basic symbol (often a numeral, in the case of the numeral head glyphs), which is incorporated into the head. Writing systems thus seem to have evolved to take advantage of a neural architecture that was, necessarily, originally evolved for other purposes. As we also saw in the case of Mr. C, other symbol systems may also apparently coopt this space if the reader is a fluent reader of those systems. But are there therefore ultimately no differences between the processing of linguistic and non-linguistic systems? We turn to that question in the next section.

5.4 Reading Non-linguistic Symbols Compared to written language, which has received extensive investigation in the neuropsychological literature, pace Déjerine’s observations about Mr. C’s music reading deficit described previously, there is much less work that investigates the processing of non-linguistic symbols. Given the rise of social media over the past couple of decades, a lot of the focus of what work there is has been on the processing of non-textual elements, in particular emoji and emoticons, in social media texts (Kim et al., 2015; Barach et al., 2021). One piece of research that looks more broadly at the processing of non-linguistic symbols is that of Huang et al. (2015), who compare brain activation in the processing of written words in English and Chinese, pictures, and icons. The authors approach the problem from a human-computer interaction (HCI) background and note that modern digital icons serve many of the same functions as logographic symbols seem to serve in ancient writing systems: Icons are often designed with the purpose of associating a symbol with a certain meaning, such as ancient iconography conveying semantics of objects and concepts in the formal development of logographic language. (page 702)

They therefore hypothesize that icons should show the same kinds of neural activation as modern logographic written symbols, such as Chinese characters. The evidence they present disconfirms this hypothesis: icons are not processed like logographic symbols. On the contrary, while Chinese characters and English written words are broadly similar in how the brain handles them, icons are processed like

126

5 Symbols in the Brain

pictures. The exact differences between which areas of the brain are active in each case are interesting, and point to some fundamental contrasts between how the brain handles symbols that are tied to language, and how it handles symbols that are not. Huang et al. (2015)’s experiment used 200 stimuli consisting of 50 icons, 50 pictures, 50 written English nouns and 50 Chinese nouns written with single Chinese characters. Each category was divided into 25 cases with concrete meanings, and 25 with abstract meanings. A concrete icon would be, say, an icon of a horse; an abstract icon, a not sign superimposed on a silhouette of a group of people indicating “no mass gatherings”. A concrete picture might be a photograph of an apple; an abstract picture a photograph of abstract art. Ten Chinese speakers and ten English speakers participated in the task. Subjects were placed in an MRI scanner while being shown the stimuli and asked to decide, for each stimulus, whether it represented a concrete object or an abstract concept. All subjects were shown all stimuli. Whereas Chinese subjects were bilingual in Chinese and English and could judge all stimuli, English-speaking subjects were not assumed to know Chinese. They were therefore asked to judge all Chinese characters as “abstract”. Reaction time speeds showed differences in processing of icons versus pictures and Chinese characters in that while icons were significantly faster than pictures to classify, they were also significantly slower than Chinese characters. However, between-condition contrasts for fMRI showed no significant difference between the areas of the brain activated by pictures and those activated by icons, with their results suggesting that “interpreting icons was not really different from interpreting pictures and that icons were processed within the same but a smaller network of brain areas as used to process pictures” (page 716) which as they note “is in direct opposition to [their] original hypothesis.” On the other hand, there were significant differences in activations between Chinese characters versus icons, with Chinese characters requiring more resources of the right ventromedial prefontal cortex, the left posterior cingulate gyrus (in Chinese participants, bilateral in English-speaking participants), the dorsomedial prefontal cortex and the angular gyrus and adjacent supramarginal gyrus. For English nouns as opposed to icons, more resources were required bilaterally for the middle temporal gyrus, the angular gyrus and the supramarginal gyrus. The left dorsomedial prefontal cortex also required more resources, as well as the right posterior cingulate gyrus (for Chinese participants, bilateral for English-speaking participants). Thus while Huang et al. (2015) found no differences in brain activity for icons versus pictures, there were significant differences between interpreting icons versus both English nouns and Chinese characters. Furthermore, the two linguistic sets were broadly similar to each other, with a few differences, in their contrast to the activations for icons. It seems reasonable to assume that the differences between icons on the one hand, and English or Chinese written symbols on the other relates to the fact that the latter are tied in with the speech and language systems of the brain, whereas the former are not. Indeed, at least one of the areas listed among those active for Chinese characters and English nouns is the angular gyrus, which in the left hemisphere is involved in

5.4 Non-linguistic Symbols

127

the transfer of visual information to Wernicke’s Area, the sensory speech area—see Fig. 5.1. Presumably, then, the reason icons are treated more like pictures is because, as far as a brain trained to read from an early age is concerned, they essentially are pictures, in that unlike a written symbol, they have no trained associations to linguistic entities. Thus an icon for a horse is just a picture, but in the minds of a literate Chinese speaker the character 馬 is associated with the language processing areas of the brain. One may think of the word horse (or mˇa) when seeing an icon or a picture of a horse, but the rapid association between the image and the word that one finds in the case of written language is simply not there for the icon or picture. As Huang et al. (2015) note (page 714), “such modulated activations [in the Inferior Frontal Gyrus and frontal pole of both hemispheres with additional activations in the left Dorsomedial Prefontal Cortex and Ventromedial Prefrontal Cortex of Chinese speakers] might imply that the phonological processing was an essential mechanism to compare icons and Chinese characters in the experimental task.”6 So much for modern icons: what would one expect in the case of ancient preliterate non-linguistic symbols such as those used by Mesopotamian accountants of the Uruk IV period? Since one cannot perform fMRI on a Mesopotamian accountant from before the invention of writing, one can only speculate. However, it seems reasonable to hypothesize that the preliterate symbology that eventually led to writing was processed in the brains of ancient pre-scribes much as icons are in the brains of twenty-first century humans. Clearly then, when non-linguistic symbology evolved into writing, a concomitant set of changes in the neural processing of the symbols must have occurred. The brains themselves did not change their anatomy, but various parts of the brain came to be harnessed in new ways. How did this happen? In the next section we present a hypothesis.

6 Why then did English-speaking participants still show a difference between their brain activations

between icons and Chinese characters, which they could not read and which they were instructed to treat all as abstract? Huang et al. (2015) do not present a clear reason for this other than to highlight that there were still some differences between Chinese and English speakers for this task, and to suggest that the Chinese speakers may have been more “motivated”, due to increased activities in the dorsomedial and ventromedial prefontal cortex. However there are at least two points about the task that might be relevant: First, the task for English speakers was different for Chinese characters than any other task in that they were asked to always classify these as abstract. This required them to at least identify that a stimulus was a Chinese character, as opposed to an icon, and thus that it was writing as opposed to an iconic symbol. This brings us to the second point, namely that even though the subjects could not read the Chinese characters, they must have been aware that they represented written language, since their doing the task correctly depended on this knowledge. Conceivably, then, this may have been sufficient to at least partly activate some of the language-related areas of the brain.

128

5 Symbols in the Brain

5.5 A Hypothesis As we have seen, what we call “meaning” is widely distributed in the brain, with different aspects of meaning being associated with different functional areas. However, it seems likely that there is a hub, probably in the left anterior temporal lobe, which serves as centralized control that passes information back and forth between different modalities and the different regions of the brain that represent parts of the meaning. The low-level processing of graphical symbols, on the other hand, takes place in much more localized areas in the occipital lobe, with writing in particular being further localized in what Dehaene (2009) terms the “letterbox”—the left ventral occipito-temporal region in unimpaired readers. As discussed previously, this region corresponds to the portion of the monkey brain that processes simple patterns of edges, shapes like T, Y, F, that are basic building blocks of visual scenes and are also recurring patterns in scripts of all kinds (Changizi & Shimojo, 2005). The left ventral occipito-temporal region is very efficient at this, which is probably why it is coopted for this purpose in reading: to be a fast reader, the low-level visual processing of the letters has to be handled quickly, since there are many levels between that low-level processing, and the assembly of written morphemes, words and sentences and the retrieval of the meaning of the message. This then is the ultimate specialization in reading, and it was something that had to develop as writing evolved. But before that could happen, an association had to be built in the brain between graphical symbols, and specifically linguistic aspects of meaning. We suggested in Sect. 5.4, that a reasonable hypothesis is that accounting symbols would have been associated in pre-literate accountants with parts of the brain that relate to meaning, but not with those that relate more generally to language—and more specifically to phonology, mirroring what Huang et al. (2015) report for the neural processing of modern icons. In order for the graphical symbols to become true writing, these latter connections had to be built, and the only way in which they could have been built, it seems, would be in a context where constant repetition of a given symbol with a given pronunciation reinforced, and thus trained those connections. As we saw in the previous chapter, it has long been understood that the key insight in the development of writing was the realization that one could represent words in graphical form not based on what they mean, but rather for how they sound. We now need to ask what that means in neurological terms. With that in mind, I offer the following hypothesis: Writing evolved in an institutional context in which symbols were effectively dictated, so that the user of the symbol system gradually came to associate the symbols with sounds.

Anticipating our discussion in the next chapter, imagine a pre-scribal school where officials keeping records work. An overseer is “dictating” accounts: “100 sheep, 200 goats, 25 baskets of grain, …”. Each of these numbers and concepts (sheep, goats, etc.) has an accounting symbol associated with it, so the tie between symbol and meaning is already there, presumably via the left anterior temporal lobe hub already

5.5 A Hypothesis

129

discussed. By constant repetition and the subsequent transfer of the “messages” to sequences of graphical symbols, an additional association to the phonology is built, and thus to language more generally, insofar as the symbol has now become associated with both sound and meaning. Once that association is built, and fluent reading and writing of the symbols evolves into true writing, the connection to the brain’s “letterbox” would have evolved as a natural consequence of the need for a portion of the brain that is extremely efficient at low-level edge detection. At this point it is worth underscoring the point that specifically neurological accounts of the evolution of writing are scant in the literature. One exception is work of Overmann (2016), who reviews some of the neurological literature, and draws on that to propose neural mechanisms for what is known about the evolution of writing. Her specific proposal, which is cast in terms of Material Engagement Theory (Malafouris, 2013), is that among Mesopotamian scribe-accountants the repeated motor movements of the hand in writing symbols, along with the repeated reading of (pre-linguistic) texts in an administrative context (see Chap. 6), reorganized the brains of the scribes, thus enhancing the connection between symbols and language. In particular, the reinforcement of this connection was afforded since “[h]andwriting affords the repetition of characters at a volume that allows the effects to occur” (page 7), and because [W]riting is repeatedly moving the hand to produce marks and visually judging them for legibility in a material that influences how movements are made and characters formed. Over time, this interaction improves hand–eye coordination, trains the fusiform gyrus to recognize written objects by their features, and increases coordination between the fusiform gyrus and the brain regions that comprehend and produce language and control handwriting movements. (page 11)

No doubt this was all important, but it seems to me that this puts the cart before the horse. For one thing, Overmann’s proposal does not obviously distinguish between writing words with symbols that only represent their meaning, as opposed to writing words with symbols that also, or even largely, represent sound. Perhaps the latter association would develop organically as a consequence of constantly writing, but it is by no means obvious why this would happen. On the other hand, the verbal repetition of words in association with their corresponding symbols would, it seems, train the sound-symbol connection, and it is this sound-symbol connection that seems key. In the next two chapters we develop this idea further, and show by means of computational simulation how writing might have evolved in an administrative context as hypothesized above.

Chapter 6

The Evolution of Writing

This is the first of two chapters in which we consider the evolution of writing. We start off in this chapter with a discussion of what is known about how writing evolved. Then in Sect. 6.2 we reiterate a hypothesis already introduced in the previous chapter concerning the institutional context in which writing evolved. Finally, in Sect. 6.3 we discuss the institutions that were likely at play in one place where we know that writing evolved ex nihilo: Mesopotamia.

6.1 What Is Known About the Evolution of Writing? There is very little hard evidence for how writing evolved. There have been theories of what led to the initial critical discovery that one could record not only ideas with graphical symbols, but also specific linguistic elements—words, morphemes and ultimately sounds. And there have been theories about which particular precursors to writing occurred in various preliterate but soon-to-be literate cultures. But nothing in the way of incontrovertible evidence has yet surfaced from the archaeological record. To be sure, once early Bronze Age cultures did learn the art of writing, they themselves had much to say about the origin of writing, but unfortunately what they did have to say is often unhelpful from a scientific point of view since invariably they ascribed the invention to one or another mythical being or deity. The Sumerians ascribed writing to the Goddess Nisaba, originally a goddess of grain, which could be taken as a hint that the source of writing in Sumer is to be sought in accounting. The Chinese ascribed the invention to Cang Jie, the mythical four-eyed scribe of the equally mythical Yellow Emperor. Part of the problem is that with only a handful of cases of the pristine development of writing—i.e. the development of writing by a culture that had, as far as we know, no contact with any already literate culture, there is not a lot of evidence to go on and in particular it is hard to say what kinds of political, economic and linguistic factors may have needed to be in play for writing to develop. How many times © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Sproat, Symbols, https://doi.org/10.1007/978-3-031-26809-0_6

131

132

6 The Evolution of Writing

was writing invented independently? The answer seems to be at least twice and possibly as many as five times. The minimal count of two would be under the theory where all Old World writing systems ultimately had as their origin the invention of writing in Mesopotamia, so that for example Egyptian was directly influenced by its neighbor to the East, and the much later development of writing in China might have diffused from the Middle East to East Asia over the course of about a millennium and a half, which was certainly enough time for that to have been possible. As for the New World—pace the adventures of Thor Heyerdahl immortalized in the 1972 documentary Ra (Heyerdahl, 1971)—there is little reason to believe in pre-medieval contact between the Old and New World, and thus no way for the idea of writing to have diffused from Mesopotamia, or wherever, to Meso-America. One thus must take Mayan, or its Olmec or Zapotec precursors as independent inventions by what was technologically a stone age culture. While diffusion within the Old World is certainly a possible explanation—see e.g. Daniels (2018) for a forceful argument that the Egyptians did indeed borrow the idea of writing from Mesopotamia—it is not an idea that comes with much in the way of documentary evidence. If this did not happen, then it must be that the Egyptians and the Chinese invented writing independently. So that gives us four independent inventions: Mesopotamia, Egypt, China and Meso-America. To get five one would have to add the Bronze Age Indus Valley Civilization, contemporaneous with and much larger in terms of geographic area than Mesopotomia. But the surviving “texts” from the Indus, while numbering in the thousands, are all extremely short, and the status of the symbols as a true writing system is controversial. See also Chap. 8. But four, or five, makes little difference in any case: still, the number of cases of independent invention is countable on the fingers of one hand and therefore is not enough to really allow us to know what factors were crucial in the development of writing. As I suggested in Sproat (2017) and I shall argue further in Chap. 6, this is an area where computational modeling can perhaps help address some of the deficiencies in the material evidence. But for now, I wish to survey what little is known about the precursors to writing in the various cultures where it is clear that full writing systems developed. Perhaps it is easiest to start with the two systems about whose origin the least is known, Meso-American and Chinese. Mayan writing is by far the best understood Meso-American system and is the most well attested, and is furthest along the path to decipherment. But it was not clear that it was the first. Though the dates of Mayan epigraphic evidence have been pushed earlier and earlier in the last couple of decades (Saturno et al., 2006), it is not certain that the Mayans invented writing, and may have adapted it from earlier writing systems of the Olmecs and Zapotecs, who had written language during the late first millennium BCE (Palka, 2010). But these earlier systems have not been deciphered and there is considerable uncertainty about what kind of information the systems encoded, and their origin in any case is unclear. The situation with the earliest Chinese writing is not too different. The earliest clear evidence for writing that we have is the Shang dynasty (mid second millennium BCE) Oracle Bone Script, but this is already a full fledged writing system. Supposed precursors include some signs dating from several thousand years earlier from the

6.1 Evolution

133

Fig. 6.1 The verso of the Narmer Palette, with the name ‘Narmer’, written as a catfish n’r and a chisel mr, next to a depiction of King Narmer, and highlighted. Source: Wikipedia. Image is in the public domain. https://en.wikipedia.org/wiki/ Narmer_Palette#/media/File: Narmer_Palette_serpopard_ side.jpg. Author: Unknown. Image is in the public domain

Neolithic period, as well as later examples inscribed on pottery, such as an example that has been linked to later written forms for ‘sun’ and ‘mountain’ . But as Shaughnessy (2010) points out the connection between these earlier signs and the later script are dubious, the interpretation is speculative and the forms do not occur in anything resembling a text that would allow one to guess that they might have been intended to represent language. With Egyptian we are on somewhat more solid ground. The early (late fourth millennium BCE) Narmer palette (Fig. 6.1) has long been recognized as a precursor to writing in that the name of the king Narmer is evidently indicated phonetically using a pair of glyphs. Other early hieroglyphs are also found on the palette. More recently a collection of small rectangular tokens, evidently tags on goods (probably cloth) from Tomb U-j (again, late fourth millennium BCE) came to light with inscriptions that seem to involve a proto-hieroglyphic script. As Stauder (2010) argues, the identification of the meanings of these inscriptions is speculative— and in later work (Stauder, 2015) he suggests that some of the interpretations are unlikely, in particular the equation of the inscription in Fig. 6.2 with the place name ‘Elephantine’, 320 km to the south of Tomb U-j’s location in Abydos. So while it may be the case that Egyptian writing started as a way to mark goods with short inscriptions on tags, the connection between these early examples and later hieroglyphic writing is unclear. This leaves only Mesopotamia, where it seems we have the clearest evidence of how writing may have arisen, and one very popular and compelling story that

134

6 The Evolution of Writing

Fig. 6.2 A tag from Tomb U-j, supposedly indicating the place name ‘Elephantine’. Source: Stauder (2010), page 140, Figure 6.5. Used with permission of the Oriental Institute, University of Chicago

purports to fill in the details. The story, due to archaeologist Denise SchmandtBesserat (1992; 1996), but originally derived from ideas of Oppenheim (1959), Amiet (1966), and Lambert (1966) involved clay (or sometimes stone) tokens such as those depicted in Fig. 6.3. The tokens fall into two kinds: “simple” tokens which were basically counters representing numerical amounts; and “complex” tokens, which supposedly represented particular commodities. The “sheep” token in Fig. 6.3 is a “complex” token, whereas the tokens shown in Fig. 6.4 were instances of “simple” tokens. Mesopotamian token systems appear to have dated to the Neolithic period, some 10,000 years ago. As with modern coins, it is convenient, and more secure, if one could keep ones tokens together, and one device that was invented to do that was a clay “envelope”, such as the one depicted in Fig. 6.4. The problem, of course, is that once one had sealed the envelope it was like an old-fashioned piggy bank: one had to break it to retrieve the tokens in order to verify what one had. So at some point it was realized that one could mark on the outside of the envelope what tokens were inside, so that it was not necessary to break the envelope in order to know what one had. The envelope in Fig. 6.4 shows markings of this kind. But if you can mark a hollow envelope, you can also mark a flat solid tablet, and that means in turn that one does not really need the tokens: one can simply depend on the marks. What had originally been a system that depended upon three dimensional tangible objects to record possessions turned into an accounting system based on two-dimensional marks. And since these marks had conventional referents, it was a natural step from there to a system where an even wider variety of items could be represented by marks. This of course is still not writing in the strict sense. But if one then needs to start

6.1 Evolution

135

Fig. 6.3 Mesopotamian tokens, with possible denotations according to Schmandt-Besserat’s interpretation. From left to right, first row: type of garment, unit of metal, unit of oil, sheep. Second row: unit of honey, unknown, type of garment. Sproat (2010a), Page 5, Figure 1.1. Drawn by Lisa L. Sproat. Source: Sproat (2010a), Page 5, Figure 1.1. Author owns the copyright

Fig. 6.4 A “clay envelope” (bulla) from Susa (3500–2900 BCE). Note the indentations on the outside of the bulla corresponding to the simple tokens below. Source: Louvre (number Sb 01927) via the Cuneiform Digital Library Initiative. Source: https://www.google.com/url?q=https: //cdli.mpiwg-berlin.mpg.de/artifacts/274841&sa=D&source=docs&ust=1668596695714634& usg=AOvVaw11K_ODiebYu-YZpwUfk25p. Used with permission of the Louvre, Paris

136

6 The Evolution of Writing

indicating, for example, not only the goods, but the buyers or sellers of the goods, then one needs to represent names, and this would in turn impose pressure to come up with ways to write those names, thus favoring the discovery of the phonographic principle. This, in a nutshell is Schmandt-Besserat’s hypothesis. Two things need to be said at this point. First, while we noted that token usage in Mesopotamia predated writing by several millennia, the actual volume of data for that vast span of archaeological time is quite sparse. Second, Schmandt-Besserat makes a critical—and some would say questionable—assumption, namely that tokens evolved into impressions on tablets, and ultimately into proto-Cuneiform signs, and thus that a given token for a commodity would end up as a sign that resembled the original token. Thus a proto-Cuneiform sign that we know meant ‘sheep’ or ‘goat’ ⊞, supposedly evolved from a token that ⊕ looked like . The problem is that we do not actually have any direct evidence for what the earlier tokens meant. But an even bigger problem is that the only tokens that have been found in envelopes, or impressed on the outside are simple tokens, representing amounts (Englund, 2006). Complex tokens, which supposedly represented commodities, have not been found inside envelopes, nor have their impressions been found on the outside of envelopes. In other words, while there seems to be a traceable connection between the simple tokens and the later tabletbased accounting system, there is no direct evidence for a connection between ⊕ complex tokens such as and later pre-Cuneiform signs such as ⊞. Tokens contained within envelopes have also been found from the period after writing was invented, and artifacts from that period have survived that contain written descriptions of the contents of the envelope—see, e.g., Oppenheim (1959). Still, it is not clear whether these later artifacts are a continuation of the earlier preliterate token-and-envelope system from thousands of years before, so that any similarities between the tokens and those of the earlier system may be misleading.1 Schmandt-Besserat’s theory is certainly appealing: it is easy to understand, and certainly has made for good press because it seems a priori to be a simple and elegant account. In previous work (Sproat, 2010b), I discussed it favorably. But the problem is that the theory does not hold up to scrutiny such as that given it by Zimansky (1993) (and see also Englund (1998), pages 53–55), suggesting in turn that many, myself included, did not do their homework. Zimansky presents several counter-arguments, including the obvious point that merely picking apparent similarity between the physical objects that SchmandtBesserat takes to be tokens, and various later written signs, is a risky enterprise. In many ways this echoes the point that we have already discussed about iconicity, namely that as Eco (1976) noted, iconicity is highly culture specific: what looks

1 But see MacGinnis et al. (2012) for further discussion of the post-literate role of tokens. MacGin-

nis et al. also note that the envelope and its contents that Oppenheim described in 1959—from Nuzi, near modern Kirkuk—has been lost, and only a photograph survives. It is now impossible to confirm some of the details of Oppenheim’s account.

6.1 Evolution

137

similar to a twentieth century American writer likely looked very different to fourth millennium BCE Mesopotamian farmers or tradesmen. Also, as we noted previously, “complex” tokens (purportedly representing commodities such as sheep), have not been found in envelopes (Zimansky, 1993, page 515) or other administrative contexts. This contrasts with the “simple” tokens, which represented numerical amounts, and which have been found in abundance in envelopes. Again, while there is a strong connection between the simple tokens and the later tablet-based accounting system, the connection between the complex tokens and later symbols representing commodities is tenuous. But Zimansky’s perhaps most persuasive counterargument is statistical: nowhere did Schmandt-Besserat give anything like an explicit account of the relative frequencies of signs and try to correlate them with what one finds on the later known accounting tablets. Of course one would hardly expect an exact match. But at least to the extent that the economies of the preliterate and postliterate cultures should not have been vastly different, one would expect that commodities indicated frequently on the tablets would have also shown up frequently among the tokens. But this seems not to be the case. Given the importance of sheep and goats in Mesopotamia it is not surprising that on the tablets one finds frequent accounting involving these animals. On the other hand nails, to take a specific example, are not so common. Yet among the tokens, assuming Schmandt-Besserat’s equation of particular tokens with later graphical symbols, the reverse is true. In Schmandt-Besserat’s catalog (SchmandtBesserat, 1992), Zimansky counted just 15 sheep, but 60 nails, which turn out to be the most frequent form among the tokens. Why this overabundance of nails and dearth of sheep in a pastoral culture that we know later on recorded sheep with some regularity? So what are we left with? Very likely there is some truth to Schmandt-Besserat’s basic idea. The earliest use of proto-writing and even writing proper in Mesopotamia was in accounting, so it is likely that the idea of making visible marks on a surface did arise in the context of an accounting system, plausibly the token-based one that had existed for thousands of years already. But it must be borne in mind that accounting is merely one possible environment in which writing could evolve, not the only one. Certainly in China the earliest clear use of a writing system was not in accounting at all, but in recording the results of divination, though this could be viewed as another form of administration. We suggest that what was probably crucial for the development of writing, and in particular for the development of the phonographic principle, was an environment in which written symbols were communicated verbally, for example in a department of administrators where one of the administrators would “read” a “text” and others would need to write it down. While the system may not have been used for accounting sensu stricto, it would probably have had to be at least a regularly used record-keeping device, and thus effectively administrative in nature. Before we close this discussion of the pristine evolution of writing, there is one further issue we need to address. The discussion above has all centered around the idea that writing evolved in stages and, at least initially, unconsciously from one or another previous non-linguistic system. This would seem to be uncontroversial, and

138

6 The Evolution of Writing

one could perhaps take it for granted except for the fact that it has been challenged by a major scholar of one of the earliest writing systems, Sumerian. Glassner (2000)— and see Glassner (2003) for an English translation, has argued instead that writing was consciously invented by Sumerian scribes and thus that “il ne peut y avoir, par définition, ni pré- ni proto-écriture, ni écriture en gestation”2 (page 279). Glassner’s argument is based on perceived defects with what he presumes to be the two main theories of writing. The first, and later of these is the accounting theory of Oppenheim and Schmandt-Besserat (Oppenheim, 1959; Schmandt-Besserat, 1996) which we have already discussed, and whose problems we have already noted. The earlier is what Glassner terms the pictographic theory, by which Glassner means in particular the kinds of narrative pictography common among pre-contact Native North Americans such as those presented by Mallery (1883), and which we have already discussed in Chap. 3. Glassner notes (e.g. page 122) that there is a total lack of evidence for any such pictographic tradition in Mesopotamia, and he contrasts this with his claim that in its earliest phases, Sumerian writing shows its “phonetic character”. Furthermore, Glassner notes that writing itself was apparently an object of study from early on in Sumer, by which he apparently means, inter alia, the compilation of lexical lists in order to aid students in learning the written language. Several other scholars of the Ancient Near East (Dalley, 2005; Robson, 2005; Englund, 2005) have strongly criticized Glassner’s theory largely on Sumerological grounds, but there are in any case several reasons to be skeptical of his conclusions as I also argued in Sproat (2017). First of all Glassner’s “pictographic theory” was never really a serious theory of the origin of writing. True, many writers on writing systems discuss narrative pictographic systems as precursors of writing; Ignace Gelb (1952; 1963) did so, for example. But nobody has ever presented any evidence that such systems ever evolved into writing, and the most that can be said about them is that, like so many other graphical symbol systems not tied to language, they served some of the functions that writing does in literate cultures. Indeed, I argue in Sect. 7.3 that such narrative pictographic systems are the least likely to develop into true writing. Of course there is a sense in which pictography is relevant since many early written symbols were pictographic in origin: nobody disputes the fact that the Chinese character for ‘horse’ 馬 was originally a picture of a horse. But that observation does not mean that one therefore subscribes to the view that the kinds of narrative pictographic systems that Glassner has in mind were in any way relevant to the evolution of writing. Indeed, if they had it would have been surprising that, as Woods et al. (2010) observe (page 44), it took hundreds of years before Sumerian writing was evolved enough to write running prose—something surely not expected if writing had evolved from an earlier pictographic system of that kind. Glassner would appear therefore to have erected a straw man.

2 By

definition, one can have neither pre- nor proto-writing, nor writing in the process of development.

6.2 A Hypothesis

139

Glassner’s main point seems, in fact to revolve around the presumption that Sumerian writing evolved very quickly (though see again the point raised by Woods and colleagues about the length of time needed for the system to become truly fully functional), and that in particular the phonetic principle was discovered very early on. This is surely also a bit of a red herring: as we have argued, the phonograpic principle effectively defines writing. Indeed, it is amusing to observe that while some scholars have argued that the notion of writing should be construed more broadly in order to allow it to include a broader range of notational systems for cultures that lack writing in our more restrictive sense, nonetheless whenever people try to argue for including a new system under the rubric of writing, they invariably present evidence that this system had methods for encoding phonology. For example, Gordon Whittaker, in his recent work on Nahuatl writing (Whittaker, 2009, 2021), spends a good deal of space arguing that certain signs encode sound in a systematic way. So it seems somewhat disingenuous to herald the apparent quick discovery of the phonographic principle as evidence that the Sumerians “invented” writing consciously, since prior to that discovery we would not normally consider the system to be writing. So all Glassner’s observation means is that at some point the Sumerians crossed a crucial threshold in the development of a graphical symbol system, but while this discovery was indeed monumental, there is no reason to believe that it was an act of conscious invention. So in summary, there is every reason to believe that writing evolved, and was not invented, at least not in the sense of someone one day deciding that it would be nice if one could encode on clay tablets any speech that someone could utter. The realization that was indeed possible took much longer.

6.2 A Hypothesis Very little is concretely known about the evolution of writing. The best guess we have is that the first writing probably evolved from an accounting system, where the symbol inventory was gradually increased and, critically, the discovery was made that one could write words not on the basis of what they denoted, but rather on the basis of how they sounded.3 It is that connection, symbol to sound, that we argued was crucial to the development of writing. How did it happen? More particularly, what had to happen in the minds of the ancient proto-scribes in order to allow that connection to take root and grow? In the previous chapter we looked at some of the neurological evidence for how non-linguistic symbol systems seem to be processed in the brain, and compared that to what is known about how written language is processed in the brain. The latter, apparently, makes use of far more areas, so that when one reads not only is the

3 Later on, after the system was formalized, the symbol inventory decreased—Jacob Dahl, personal

communication—presumably because some symbols were deemed redundant.

140

6 The Evolution of Writing

visual cortex obviously active, as well as the areas related to meaning, but the areas that connect to the phonology of language are also activated. The brain connections between the areas that recognize written symbols and the areas that process language need to be trained: they do not come for free, and they do not seem to be activated in cases where one is dealing with non-linguistic symbols. But what could train these connections in the minds of people who, heretofore, had been essentially illiterate, or at least not literate in a symbol system that we would recognize as writing? I suggest the following hypothesis, stated in the previous chapter, and repeated here: Writing evolved in an institutional context in which symbols were effectively dictated, so that the user of the symbol system gradually came to associate the symbols with sounds.

Note the emphasis here on institution, rather than administration. The latter term often seems to be construed narrowly to denote the kind of administrative accounts to which the earliest writing in Mesopotamia was applied and to which some later systems, such as Linear B, were also applied. Chrisomalis (2009) seems to have this narrow notion in mind when he asserts that “the comparative evidence does not support the hypothesis of an administrative origin of writing” (page 70). But this is a red herring. The issue here is not the specific original application domain of the signs from which writing evolved, but rather the use of those signs in an institutional context in which the proto-scribe learned to associate symbol with sound. In China, for example, the earliest known application of writing was to recording the results of divination. While there is no direct evidence that Chinese writing actually evolved in that institutional context, it must be stressed that such a context would have been just as good an institution for it to develop in as the Mesopotamian accounting houses were. In the Shang Dynasty divination procedure (Keightley, 1978), the diviner would interpret the cracks formed on tortoise shells or cattle scapulae caused by inserting hot thorns into holes drilled into the surface of the shell or bone. The results of the divination would be announced and the scribe (史 shˇı) recorded the resulting interpretation. All in all, this was a perfect setting for sounds to become associated with symbols. In other words, it was not what the signs were used to represent, but rather how and where they were used that mattered. Divination would have been a good context. So would agricultural accounting. By the late pre-literate period in Mesopotamia, accounting had evolved into an institution, which presumably must have involved the training of students in a formal setting. As Englund (1998) discusses (his Section 5.5), there is clear evidence that student accountants from as early as the Uruk IV period (3350–3200 BCE) wrote practice tablets. Still, nothing is known about the social interactions between accountants in those institutions. The closest we can come are the eduba, Sumerian edub-a ‘house of tablet(s)’, the schools that trained scribes during the Old Babylonian Period (early second millennium BCE), a large part of whose curriculum involved the copying and memorization of lexical lists (Veldhuis, 2006). In the next section we review what is known about the eduba.

6.3 Schools

141

6.3 Scribal Schools Robson (2001) describes an excavation, House F from Nippur (modern day Nuffar, Al-Q¯adisiyyah, Iraq), Fig. 6.5, from the early second millennium BCE. The house was about 45 square meters in size—small by Old Babylonian standards (Baker, 2014)—and it contained, in addition to a bread oven, domestic pottery and even a fragment of a gaming board, over 1400 fragments of cuneiform tablets. In addition, there were baked-brick boxes that were evidently used to recycle clay, soaking it in water so that it could be formed again into tablets. The tablets themselves were of two basic kinds: Sumerian literary texts, and sign lists. The tablets showed evidence of copying, by hands of varying degrees of skill. This was evidently a school, an eduba, that housed pupils being trained to be scribes. Scribes not only learned writing but also arithmetic and other skills needed for them to grow up to be state administrators. Note also that by “learning writing” was included also the learning of the Sumerian language. The pupils would have

19 tablets bench 976 tablets recycling bin

11 tablets

recycling bin

48 tablets

bench 29 tablets

348 tablets

3 tablets

10m

tanour (bread oven)

+46 tablets?

Fig. 6.5 Plan of House F, based on Robson (2001), Figure 3, page 41 and Stone (1987), Plate 19, with approximate scale based on Stone (1987)’s original. The house included benches, and “recycling bins” where used clay tablets could be recycled and made into new tablets. Also shown are the numbers of tablets recovered from the rooms

142

6 The Evolution of Writing

been speakers of Akkadian, and while a few Akkadian and Old Babylonian language tablets were recovered from House F, the vast majority were in Sumerian, which was still used as a language of administration, though apparently a lot of administration by that time was also done in Akkadian (Robson, 2001, page 60). Note that Akkadian (a Semitic language) and Sumerian (a language isolate) were completely unrelated languages. What was the actual program of study like? As Lucas (1979) describes it (page 315), pupils in the eduba started young with the first exercises being the memorization and recording of “exercises in a vowel sequence ‘u-a-i,’ such as tuta-ti, nu-na-ni, bu-ba-bi, zu-za-zi, and so on.” The students then studied sign lists consisting of signs for entities and their pronunciation. These lists comprised about 900 entries. Study continued with the lexical lists referred to previously comprising “long lists or animals, plants, birds, fishes, insects, stones and minerals, geographical and place-names.” Instruction proceeded to sentence copying, formulaic expressions, titles and so forth. As Lucas notes, “one student records, ‘I have written (a tablet) from the different names of Inanna up to (the names of) the animals living in the steppe (and the names of) the different artisans.”’ Thus, a lot of repetitive memorization and rote learning—not after all that different from the way schoolboys learned Latin in a traditional European school of a hundred years ago. Lucas goes on to note (page 316) that scribal students were required to learn a wide range of subjects, from how to write letters, to keeping accounts, how to inscribe steles, as well as such subjects as arithmetic and surveying. What is interesting about these areas of study is that some of them involved activities that must have predated writing. We know that keeping accounts fell into that category, and presumably arithmetic too, since it is hard to be an accountant if one cannot at least add and subtract. While the eduba came to its fore in the literate period with a large range of studies conducted over many years, one assumes there must have been pre-literate institutions that trained accountants. The students in those institutions must also have received exercises with lots of rote copying surely, but also presumably verbal instructions: ‘write down 35 goats’; ‘what is 25 units of grain plus 35 units of grain’? These verbal instructions would have helped associate the graphical symbols with sounds, thus starting to build the necessary linkage between the visual, semantic and phonological processing areas of the brain that led eventually to writing. Once this linkage was established, the phonetic inventory of symbols grew quickly as more and more symbols acquired phonetic values. Indeed, most of the earliest pre-cuneiform syllabic signs that can be clearly identified as encoding Sumerian, as opposed to some other language, involve symbols that do not seem to be related to accounting: “arm”, “place”, “horn” are some of these early syllabic symbols. While the idea of connecting symbols with signs started in the accounting house, the set of names of commodities would not be expected to cover the syllable inventory needed to represent Sumerian phonology. But once the principle that one could use a symbol for its sound was established, there was no reason why the new scribes would not quickly learn to extend the symbol set: if the words for “arm”, or

6.3 Schools

143

“head” were useful as a syllable, why not draw an arm or head? In any case, many Mesopotamian symbols had several phonetic values which usually—but it should be stressed not always4 —could be explained by the Sumerian (and later Akkadian) words for the objects depicted by the symbols. Recall from Sect. 2.5 that language is doubly articulated, with a meaningful first level consisting of words or morphemes and a largely meaningless second level consisting of phonological units such as phonemes or syllables. What the first scribes discovered was that one could have a graphical system that was similarly doubly articulated, with some elements representing meaning directly, but with others representing largely meaningless syllables, which could be combined together into meaningful first-level constructs. In the next chapter we turn to a computational simulation that illustrates some aspects of how this process might have proceeded.

4 Jacob

Dahl, personal communication.

Chapter 7

Simulating the Evolution of Writing

This chapter is the second of two where the topic is the evolution of writing. Our purpose in this chapter is to provide evidence for the hypothesis discussed in Sect. 6.2, in the form of a computational simulation. To that end I review, in Sect. 7.1, my previous work on computational simulations of the evolution of writing. Then in Sect. 7.2 I present a novel simulation that supports the hypothesis by showing that a neural network can learn to associate symbols with sounds, and extend those associations to the writing of novel words or morphemes. The discussion in this chapter is necessarily technical. The most technical aspects, including the details of the model described in Sect. 7.2, are relegated to Sects. 7.5 and 7.6. Readers less interested in the technical details can skip those sections and focus just on the material described in the main chapter. One interesting question that arises from these sorts of considerations—one that to my knowledge has never been systematically asked or answered before—is what kinds of symbol systems could have evolved into writing. We discuss this in Sect. 7.3. We end the chapter (Sect. 7.4) with a brief summary of our findings.

7.1 Previous Work on Computational Modeling of the Evolution of Writing Clear evidence on the pristine development of writing is at best sparse, putting this problem on a par with another vexed question, namely how human language evolved tens or hundreds of thousands of years previously, or indeed how languages changed over time—since on the latter point we have clear documentary evidence really only for the past few thousand years, after the invention and widespread use of simple phonographic writing systems such as the alphabet. What one cannot find clear evidence for one may still be able to understand by means of another approach, namely computational simulation. In the evolution of language and language change, there has already been extensive work on the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Sproat, Symbols, https://doi.org/10.1007/978-3-031-26809-0_7

145

146

7 Simulating the Evolution of Writing

computational modeling both of the evolution of language ex nihilo, and the change of language, for example in social networks (Kirby, 1999; Niyogi, 2006; Steels, 2012; Kirby et al., 2014; Reali et al., 2014; O’Grady & Smith, 2018; Lazaridou et al., 2018; Bel-Enguix, 2019; Lipowska & Lipowski, 2022). One can therefore contemplate trying to understand some of the basic questions about the evolution of writing by similar means. In Sproat (2017) I introduced a model of the discovery of writing that attempted to simulate the discovery of the rebus principle, and the spread of the use of symbols and combinations of symbols from an initial purely ideographic use to represent a few key concepts to a broader set of morphemes. Depending upon the initial conditions of the model, the system would develop, over the course of a number of iterations, written forms for a larger or smaller set of the morphemes that initially had no written form. The model started with 100 basic concepts with a symbol associated with each. Each of these concepts was associated with one or more morphemes, each with its own pronunciation. For example the sheep concept might be associated with morphemes meaning sheep or ewe. A further set of morphemes—900 in the simulations I presented—consisted of both a pronunciation and a combination of one or more concepts from the basic concept list. Thus there might be a morpheme meaning ram consisting of a mix of the two basic concepts sheep,male. On each iteration of the model, the model would try to extend what could be written by either extending a symbol to something with a shared meaning, or else to something with a similar sound, where similarity in sound was measured by a simple edit distance measure. Symbols could also be combined so that one part would constitute the meaning and the other the sound. For example if one had a morpheme with the sound /yurk/ and the meaning components star,womb, the system might derive a written form ☆☣, with ☆ representing the semantic component star, and ☣ meaning ‘meat’ but picked for its pronunciation /urk/ as the phonetic component. Complex spellings like ☆☣ could be reused: a morpheme with the meaning components WINTER,KING and pronunciation /yuk/ could end up with ☆☣ as the phonetic component and ☃ as the semantic component meaning winter, to yield ☃☆☣ as the full spelling of this morpheme. The system thus developed written forms that are highly reminiscent of Chinese semantic-phonetic characters. Note that in the ensuing discussion of computational modeling we use the term language to refer to a synthetic language with particular morphological/phonological properties as discussed in what follows. Parameters of the system included: 1. The phonological complexity, in terms of syllable structure and syllable count, of the basic morphemes of the languages. Conditions were: monosyllabic; disyllabic; or sesquisyllabic, meaning that syllables could be either monosyllabic or ‘one and a half syllables’, where the initial syllable was of the form CV, with a single consonant and vowel.

7.1 Previous Work on Computational Modeling of the Evolution of Writing

147

2. Whether or not a symbol associated initially with a concept should be used for all morphemes associated with that concept, or just one that is designated as primary: in our sheep example, would the symbol for sheep be associated with both sheep and ewe, or just sheep? 3. A parameter that controlled the speed with which new morphemes would acquire spellings, crudely reflecting the socio-economic pressure to find ways to spell new words. In a series of experiments we were able to provide support for the long-standing conjecture that, on balance, it is easier for a language to develop a writing system ex nihilo if that language’s morphemes have a simpler phonological structure, with a language with largely monosyllabic morphemes having a strong advantage (Steinthal, 1852; Daniels, 1992; Boltz, 2000; Buckley, 2008). See Fig. 7.1. We also showed that the second parameter listed above was able to simulate an apparent difference between Sumerian writing and Early Chinese writing, where it seems that the Sumerian system evidently evolved from a pre-linguistic system where symbols were associated with ideas, that themselves were associated with multiple possible linguistic units. Whereas in Chinese the system in its earliest known phase seems to have had symbols associated with not just concepts, but with particular morphemes associated with those concepts. For whatever reason, it seemed that Chinese writing was more ‘advanced’ than Sumerian in that the system,

N

0.4

0.6

# spellings # sem−phon # phon # sem all phon

0.0

0.2

N

disyllable

2

4

6 Generation

8

10

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

0.8

monosyllable

# spellings # sem−phon # phon # sem all phon

2

4

6

8

10

Generation

Fig. 7.1 Results showing the evolution for one of the simulations for the monosyllable and disyllable conditions, based on Sproat (2017), Figure 4, page 207. Shown are the growths in the proportion (N) of spellings (relative to the total number of morphemes), the total proportion of semantic spellings, the proportion of semantic-phonetic and pure phonetic spellings, and the total proportion of phonetic spellings. In general, the development of written forms is strongly enhanced when the language has mostly monosyllabic morphemes. “All phon” means the cumulative number of phonetic spellings, whether purely phonetic or semantic-phonetic. Note the differences in scale on the vertical axis, with a higher proportion of overall spellings for the monosyllabic case, and by the fourth generation, more phonetic than semantic spellings

148

7 Simulating the Evolution of Writing

early on, associated symbols more with particular linguistic units rather than with concepts. Finally the third parameter, which controlled the rate of spread of the system to new morphemes, was argued to be relevant to Glassner’s (2000; 2003) claim that writing did not evolve but was “invented”, which we already discussed in Sect. 6.1. We pointed out that if the pressure to spread the system to new forms was high it might appear, at least as far as one could tell from the archaeological record, as if the system appeared more or less instantaneously; whereas what in fact happened was that it evolved naturally, merely at a faster rate. While this previous work was a start, it had a few weaknesses. First of all, the modeling of concepts was rather crude, depending as it did on the assumption that basic concepts were atomic units, and that morphemes derived their semantics by arbitrary mixes of these concepts. A more realistic model of semantics would be a Firthian model based on Firth’s original maxim “you shall know a word by the company it keeps” (Firth, 1957, page 11), which we already discussed in Sect. 5.2. In machine learning most current systems represent meaning via means of so-called word embeddings (Mikolov et al., 2013), which are vector representations of words that are derived from the environments in which the words are found, and are thus a computational implementation of Firth’s idea. Second, the measure of phonetic similarity was also rather crude. Finally, too many assumptions were hard-wired into the system. For example, the system was designed to allow it to pick more than one symbol to represent a new word, as we described previously. Ideally, such creativity should evolve spontaneously, since in the early development of writing, scribes were not presented with an algorithm that pointed out to them the possibility of representing a word or morpheme with more than one symbol: this is something they discovered by themselves. We turn in the next section to a simulation that is arguably more satisfactory in at least some ways than the system reported in Sproat (2017).

7.2 A New Computational Simulation In this section I present a new simulation that addresses some of the limitations about my previous work in Sproat (2017). In particular, I develop a more principled representation of semantic and phonetic closeness, and train a neural sequence-tosequence model to learn the relation between semantic information or semantic information combined with phonetic information as input, and glyph sequences as output. I do this in the context of a simple simulation of an accounting-like task where we attempt to model an early accountant-scribe learning to keep accounts for a large set of commodities. In the next section (Sect. 7.2.1). I describe the basic properties of the model and in Sect. 7.2.2. I consider how the system evolves over time.

7.2 A New Computational Simulation

149

7.2.1 Description of the Model In this simulation I used a sequence-to-sequence model, specifically an attentionbased (Bahdanau et al., 2015) recurrent neural network (RNN) to learn to map between a sequence of concepts and symbols for those concepts. Specifically, as described in detail in Sect. 7.5.1 at the end of this chapter, the task models a situation in which a scribe/accountant has to learn to write, using a sequence of glyphs, accounts consisting of rows of entries with a number followed by a concept corresponding to a commodity. For example, one entry might involve four horses, which would be conceptually represented as 4 @HORSE, where we use normal Hindu-Arabic numerals to represent the number, and a capitalized English word preceded by an ‘@’ sign to represent the concept. The scribe would need to learn to write this as, say, “IV 🐎”, where we use Roman numerals to represent numerical information and appropriate emoji to represent concepts. The system is trained on “texts” containing concept-symbol mappings taken for a seed set of 100 commodities. If presented with enough such pairs of concept sequences mapped to glyph sequences, the model can learn to map between any number concept it has seen to the appropriate Roman numeral and, more to the point, between the commodity concept and the appropriate glyph. Thus, to take our previous example, the system would learn to map a sequence of concepts such as 4 @HORSE, to a sequence of glyphs “IV 🐎”. What is of particular interest to us, however, is not so much the learning of a closed set of commodity-glyph mappings, but rather the extension of the glyphs to write new concepts that heretofore did not have written forms. To that end we look at the system’s behavior when it is applied to the task of finding glyphs to write texts that include concepts taken from a set of 900 new concepts not found in the original training data. Suppose there was no glyph associated with the concept @DONKEY. Still, @DONKEY is similar in meaning to @HORSE, so in the simplest case the system, given an input such as 3 @DONKEY, might produce “IV 🐎”, thus extending the use of the glyph for @HORSE to represent @DONKEY. More interesting though, as we shall see, is the case where the system can extend the use of glyphs not just on the basis of related meanings, but also on the basis of related sounds. As detailed in Sect. 7.5.2, the input concepts are encoded not as strings, but as word embeddings. This means that in principle the system could indeed generalize the use of symbols to other related concepts. Again, the glyph for @HORSE could be extended to use to write @DONKEY, and such semantic extensions known to have occurred in ancient writing systems. In addition to semantics, concepts are associated with one or more phonetic forms. These phonetic forms are randomly generated as detailed in Sect. 7.5.1, and are constrained to be of one of three conditions, following my previous work in Sproat (2017): purely monosyllabic; monosyllabic or disyllabic; and monosyllabic or sesquisyllabic. Comparable to word embeddings, we used phonetic embeddings whereby two phonetic forms that sound similar (say, tem and dem) are close in embedding space, whereas two forms that sound quite different (tem and zik) would

150

7 Simulating the Evolution of Writing

be far apart. In addition to training on sequences of concepts mapped to glyph sequences, the model can also be trained on sequences of concepts paired with their phonetic forms, mapped to glyph sequences. This allows us to simulate the situation in which the scribe is not merely learning to map between concepts for commodities and their written form, but gets aural input. To take our previous example, rather than just mapping from 4 @HORSE to “IV 🐎”, instead the scribe gets 4 @HORSE along with a phonetic input, e.g. sem mak where sem is the word for ‘four’ and mak the word for ‘horse’. Note that both the semantics and phonetics are provided as input in this case rather than just the phonetics: it is assumed the scribe knows the language and therefore on hearing sem mak would know that it represents 4 @HORSE. See Sect. 7.5.2 for technical details of how these different configurations are implemented. If phonetic input is given, the phonetic embeddings imply that in addition to generalizing on the basis of semantics, the system can also generalize on the basis of phonetics. This relates directly to our hypothesis on page 140 which posits that the key to the development of writing was the reinforcement of associations between sounds and symbols. Indeed as we shall see, extending on the basis of phonetics is far more effective, measured in terms of the increase in novel concepts that one can write, than extension on the basis of semantics alone. In the simulations reported here numbers are limited to values between 1 and 10, and the phonetic forms of numbers are constrained to be distinct from each other and from all of the concepts.1 Let us consider first what happens when the model is tested on held-out data containing only concepts seen in the training. The semantic-only model learns the mappings with perfect performance. Adding in phonetic information causes lower performance. Though the model still sees the semantic information in that case, by giving it access to the phonetic information it will learn to build associations between that and the glyphs. This means that there may be some phonetic confusions introduced between otherwise semantically distinct but phonologically close concepts. Consistent with that expectation, the performance on held out test data is lower for the monosyllabic and sesquisyllabic cases—89% accuracy in each case, compared with the disyllabic cases, which have 96% accuracy: apparently there is less chance for confusion with disyllabic morphemes but as we will see, this also means that languages with a more disyllabic morphemes are less likely to allow for generalization in the evolving writing system. One other important technical detail in the model needs to be introduced, namely the model’s confidence in the output. In predicting an output glyph corresponding to a particular input, the model is actually predicting all possible output glyphs with some score: the predicted output is simply the one with the best score. One can then

1 In natural language, numbers are not a priori distinct in phonetic form from other words as English one (cf. won), two (cf. too) and four (cf. for) show. We make this simplifying assumption in the current simulations in order to avoid confusing the model and in order to get it to focus on the target of real interest, namely the commodity and its associated glyph(s).

7.2 A New Computational Simulation

151

Semantic generalization vs. threshold Mono-sem

Sesqui-sem

Di-sem

60

40

20

0 2

3

4

5

6

Thresh

Fig. 7.2 Number of semantic extensions (vertical axis) plotted against confidence. There are no significant differences between conditions—as would be expected since the model in this case does not look at phonology at all. Linguistic configurations shown in the plot are monosyllabic (“monosem”), disyllabic (“di-sem”) and sesquisyllabic (“sesqui-sem”). Here, and in subsequent plots, the horizontal axis, “thresh” is the confidence threshold described in the text

ask what the difference is between the best score and the score of the second best candidate. This difference is commonly used in machine learning models as a proxy for the model’s confidence in its prediction: if the difference is large, it means the model is very ‘sure’ of the highest scoring output. If the difference is less, it reflects ‘uncertainty’ on the part of the model. One can then use a confidence threshold— henceforth just threshold—to filter the output. If one is considering the extension of the model to novel cases as described previously, imposing a higher threshold will allow fewer extensions than imposing a lower threshold, since the model will generally be more confident about fewer cases and, contrariwise, less confident about more cases. To see how this works, consider the case of the model trained without phonetic information, being asked to decode a set of 500 novel cases (Sect. 7.5.1) involving concepts unseen in the training. Necessarily, since the model has not seen any phonetic information about the input, it can only extend to novel forms on the basis of semantics. Figure 7.2 plots the minimum confidence threshold for a set against the raw number of predicted extensions, for minimum confidence thresholds ranging from 1.5 to 6, averaged over models trained on 10 languages (i.e. 10 distinct languages generated from the same morphophonological configuration as described previously) for each of the three phonetic conditions, monosyllabic, disyllabic and

152

7 Simulating the Evolution of Writing

Phonetic generalization vs. threshold Mono-phon

Sesqui-phon

Di-phon

200

150

100

50

0 2

3

4

5

6

Thresh

Fig. 7.3 Number of phonetic extensions (vertical axis) plotted against confidence. Differences between the monosyllabic condition and disyllabic condition are not significant, whereas the sesquisyllabic condition is significantly different from the monosyllabic condition (t = 3.64, p = 0.0004) and the disyllabic condition (t = 2.66, p = 0.0092)

sesquisyllabic (thus 30 languages in total).2 As can be seen the number of extensions decreases as the threshold increases. And since the model has not actually seen the phonetic forms corresponding to the concepts, there is of course no difference between the three different phonetic conditions. Let us now consider the case where the model has seen both the semantic and the phonetic input and is therefore making a prediction on the basis of both of these. Since such a model could in principle extend on the basis of either phonetics or semantics, we need a way to decide which of these two to attribute an extension at a given threshold to. We do this by comparing the output for a given novel example for the semantics-only run and the comparable semantics-plus-phonetics run. If the semantics-only run predicts an output at least the given threshold but the semanticsplus-phonetics run does not, the extension is attributed to semantics. If the reverse situation holds, it is attributed to phonetics. If both conditions are above threshold then we consider the system to have extended both on the basis of semantics and phonetics, yielding a semantic-phonetic compound glyph. More on that in what follows. Figure 7.3 plots the relation between confidence and phonetic extension. In this case, there is a significant distinction between conditions, in that the sesquisyllabic 2 The

reason for averaging is that there is a fair amount of variation in the behavior of the system across runs, so in order to get the sense of a trend with a particular configuration, it helps to provide trajectories for the system averaged across multiple runs. Significance reported between different configurations, on the other hand, considers the set of all runs for each configuration.

7.2 A New Computational Simulation

153

Phonetic Similarity Monosyllabic

Sesquisyllabic

Disyllabic

2.0

1.5

1.0

0.5

0.0 1

2

3

4

5

6

Threshhold

Fig. 7.4 Phonetic distance (vertical axis) plotted against confidence. In general, the higher the confidence (i.e. the higher the threshold), the lower the phonetic distance (i.e. the higher the phonetic closeness) of the extensions. The monosyllabic condition and sesquisyllabic condition are not significantly different, whereas the disyllabic condition is significantly different from both the monosyllabic condition (t = −3.58, p = 0.0004) and the sesquisyllabic condition (t = −3.83, p = 0.0002). Disyllabic cases always involve a higher phonetic distance

condition is distinct from both the monosyllabic and disyllabic conditions. If one considers phonetic extensions from the point of view of their phonetic closeness, as measured by the vector distance described in Sect. 7.5.2, on the other hand, one sees a definite difference between the disyllabic condition and the monosyllabic and sesquisyllabic conditions, the latter two not being distinct from each other. See Fig. 7.4. To a large extent this can be explained by a few factors. First, in the sesquisyllabic case, there are actually fairly few sesquisyllables so the condition is almost the same as the monosyllabic case. Second, in the case of disyllables, the model may be confident about a choice, but if it picks a disyllable to represent another disyllabic morpheme, there is a much smaller chance that the two will be really close just by the fact that the space of possible disyllabic sequences is much larger. In general the larger the set of possible phonetic forms, the smaller the chance of finding a match that is really close. As a practical matter this would mean that if one is trying to match a whole disyllabic morpheme on the basis of phonetic similarity to another disyllabic morpheme for which a symbol already exists, the phonetic distance may be too far for extension to be really practical. While favoring phonetic closeness is not itself a feature of the neural model, it is a reasonable goal if one wants to elicit a morpheme on the basis of its pronunciation. As noted previously, if both the phonetic and semantic models have strong confidence for a particular concept, then we posit a semantic-phonetic compound consisting of one glyph corresponding to the semantic extension and one for the

154

7 Simulating the Evolution of Writing

Table 7.1 Some semantic-phonetic compounds “discovered” by the model for the first monosyllable condition in Sect. 7.6. The first column gives the semantic value of the novel concept/morpheme, the second its pronunciation, the third the proposed glyph rendition, and the fourth the relevant semantic value of the first glyph and the phonetic value of the second. See Sect. 7.6 for further examples Sem. @GATEHOUSE @CARP @INCH @PONY @MOTEL @THIGH @PARSLEY

Pron. gan kuy fet far yiw op kol

Proposed form 🏰🐈 🐟🦄 🦶🐇 🐎🦊 🏨🌰 🦵🦔 🧅🐍

Glyph sem./glyph phon. @CASTLE/gan @FISH/xuy @FOOT/pet @HORSE/far @HOTEL/tiw @LEG/op @ONION/xol

phonetic extension. Some examples of such cases can be seen in Table 7.1. These have very much the feel of Chinese compound semantic-phonetic characters, which evolved quite early in the development of the writing system, or of Egyptian cases that involve a semantic classifier combined with glyphs representing the pronunciation. The relation between semantic-phonetic glyphs and confidence can be seen in Fig. 7.5. Note that there is here too no significant difference between conditions. This is expected since both the semantic and phonetic models must have a given confidence threshold, and number of semantic extensions at a given confidence is in general lower than the number of phonetic extensions at the same confidence.

Semantic-phonetic generalizaton versus threshold Mono-semphon

Sesqui-semphon

Di-semphon

100

75

50

25

0

2

3

4

5

6

Thresh Fig. 7.5 Number of semantic/phonetic extensions (vertical axis) plotted against confidence. There are no significant differences between conditions

7.2 A New Computational Simulation

155

Thus the limiting factor is the number of semantic extensions which, as we saw previously, are not significantly different across conditions. The model does not make the leap to semantic-phonetic compounds by itself: rather, it is a construct of the post-hoc analysis described previously that considers the confidence of both the semantics-only and semantic-plus-phonetic models. What would have led early scribes to have developed such compound written forms? What mechanism would have led early scribes to add phonetic complements to further specify the denotation of an otherwise potentially vague or misleading semantic extension of a symbol? Or, viewed alternatively, what would have led them to further semantically specify an otherwise ambiguous phonetic extension? To take an example from Table 7.1, what might lead to a compound such as 🐎🦊 @HORSE/far for far ‘pony’? At least in the Mesopotamian context, the use of compound symbols in early accounting to further specify categories of commodities might have provided the incentive for this extension to an evolving writing system. Englund (1995) describes the use of compound symbols to designate particular kinds of cattle in Uruk III accounting tablets, combining symbols denoting general categories like ‘bull’ or ‘cow’ with more specific information such as ‘calf’; thus SAL+AMAR ‘heifer calf’ and KURa +AMAR ‘bull calf’. A more elaborate example with a list of various types of wood sharing the common semantic element representing a plank of wood (Englund, 1998) is given in Fig. 7.6. Such expressions as SAL+AMAR might of course have been directly inspired by comparable expressions in the spoken language, but in any case the use of compound symbols to denote individual objects could have served as the incentive to adapt the technique more generally to indicate words with a combination of semantic and phonetic symbols.

7.2.2 Simulation of Evolution In the previous section we showed that the model is able to extend the use of glyphs on the basis of semantic or phonological similarity, but what is perhaps more interesting is to see how the system evolves over time. How does the proportion of writable concepts increase and how do the different phonological conditions affect the rate of growth? Given a threshold of say 4.0, and a limit on phonetic and semantic similarity of 1.0, one can use the extensions from the initial models trained in the previous section to update the lexicon with additional concepts and their associated semantically, phonetically, or semantic-phonetically extended glyphs. One can then use that new lexicon to generate new accounting texts, and retrain the model. Then the output of this second phase is used to update the lexicon, and so forth. We can afford to be a bit more lax in the phonetic closeness allowed for semantic-phonetic compounds since the combination of a semantic and phonetic element would make the result far less ambiguous: in the simulations the phonetic closeness for semantic-phonetic compounds is set to 2.0. See Sect. 7.6 for a set of example semantic-phonetic

156

7 Simulating the Evolution of Writing

Fig. 7.6 A “wood list”, Uruk III tablet W 20327,2, consisting of various compound symbols representing different kinds of wood, from Englund (1998), Figure 28, page 96. Common to ˇ which was probably a pictogram for a plank of wood most entries is the rectangular symbol GIS, (Englund, 1998, page 95). Source: Englund (1998), Figure 28, page 96. Author released all his figures for unrestricted use, License: CC BY-SA

compounds found the 10 simulation runs for each of the three monosyllabic, sesquisyllabic and disyllabic conditions. Figure 7.7 shows the evolution over 5 iterations for the method just described, averaged over ten simulations from different initial languages. (Thus 30 simulations in total, 10 for each of the monosyllabic, sesquisyllabic and disyllabic conditions.) In the top left panel is shown the proportion of spellable concepts: note that this proportion is computed not with respect to the whole set of 1000 concepts but rather only those that are found in the generated texts (both training and evaluation). As noted in Sect. 7.5.1, the number of distinct concepts in any set of generated texts will be less than the 1000 total number of concepts in the simulation, therefore it is reasonable to consider the proportion of concepts that can be spelled relative to the number of concepts seen in the texts. The model shows greater success at covering the set of seen concepts in the monosyllabic and sesquisyllabic cases than in the disyllabic case. Note that in the disyllabic system roughly half of the morphemes generated are disyllabic, whereas for sesquisyllabic cases the proportion of sesquisyllables is small, roughly 2.5% (see Sect. 7.5.1). The disyllabic system in particular is clearly affected by the number of longer phonological forms for

7.2 A New Computational Simulation

157

40

0.7

Cumulative # sem. innovations mono di sesqui

25 15

0.4

20

#S

0.5

30

0.6

35

mono di sesqui

10

0.3 1

2

3

4

5

1

2

3

4

5

Iteration

Iteration

Cumulative # phon. innovations

Cumulative # sem/phon innovations 15

(spellable conc.)/(tot. seen conc.)

Writable Concepts

mono di sesqui

#SP

80

0

20

40

5

60

#P

10

120

mono di sesqui

1

2

3 Iteration

4

5

1

2

3

4

5

Iteration

Fig. 7.7 Evolution of representable vocabulary for monosyllabic, (maximally) disyllabic morphologies and (maximally) sesquisyllabic systems, averaged over 10 simulations. See the text for a detailed description of the parameters of the system. The top left plot shows the growth of the lexicon over 5 iterations, showing the proportion of spellable concepts compared to the total concepts seen in the data: note that while there are 1000 concepts generated, the generated texts have a smaller number of distinct concepts, usually around 400. The top right plot shows the cumulative number of semantically derived innovative spellings, the bottom left phonetically derived spellings and the bottom right semantic-phonetic innovations. The number of the latter is small and does not increase much after the first iteration. Semantically-driven innovations are more common in the disyllabic condition, but conversely there is a large difference between the disyllabic versus other conditions for phonetically-derived innovations, with the monosyllabic and sesquisyllabic conditions strongly favored over the disyllabic condition. The overall number of spellable concepts is also higher for the monosyllabic (0.62) and sesquisyllabic (0.61) conditions compared to the disyllabic (0.51) condition. All differences between the monosyllabic and disyllabic conditions are highly significant per Welch’s t-test (p ≪ 0.001), as are differences between the sesquisyllabic and disyllabic conditions. For monosyllabic versus sesquisyllabic, the proportion of writeable concepts are not significantly different, the cumulative numbers of semantic innovations are only significantly different at p = 0.028, phonetic innovations at p = 0.0073 and semantic-phonetic innovations at p = 0.0021

158

7 Simulating the Evolution of Writing

morphemes, leading to a lower chance of phonetic closeness, though as one sees in Sect. 7.6, one does occasionally find semantic-phonetic innovations based on disyllables. The remaining panels break down the extensions by raw numbers of semantic, phonetic and semantic-phonetic extensions. Particularly in the case of the phonetic extensions, the monosyllabic and sesquisyllabic conditions show a higher degree of generalization. The observant reader will note in Sect. 7.6 that some of the semantic components look wrong as semantic components for the given meaning. For example in Sect. 7.6.1 in the first table, second round we find 🏨 for @BRIDESMAID. One can trace the evolution of this symbol as an extension from the original meaning @HOTEL, with pronunciation kal, as a phonetic extension to @BRIDESMAID with pronunciation xal. This new meaning is then extended in the second round to @WEDDING. This kind of extension mirrors what happens in the evolution of real writing systems. Handel (2019) terms such cases as phonetically adapted logograms (page 19). Well-known examples in Chinese include 來 lái ‘come’, originally a pictogram of wheat; 東 d¯ong ‘east’, originally a pictogram of a bag tied at both ends; and 少 shˇao ‘few’, originally a pictogram of sand; all of which were originally borrowed for their sound but became logograms for the respective meanings. Unlike in the simulations, there do not seem to be many cases of such phonetically adapted logograms being used as semantic components in combination with phonetic components intralanguage, though this did occur when the script was borrowed to write another language. Thus Vietnamese Ch˜u’ Nôm ít ‘few’ has 少 as a semantic component (Handel, 2019, page 146). These results mirror some of the results from my earlier work in Sproat (2017), using a model that is arguably more principled in how it learns and how phonological and semantic closeness are represented. The results also confirm again the theory of Steinthal (1852), Daniels (1992), Boltz (2000), and Buckley (2008) that languages with shorter morphemes, especially languages with largely monosyllabic morphemes, have an advantage when it comes to phonetic extensions and thus ultimately to the development of writing.

7.2.3 Summary and Discussion One implicit result of our simulation is that, given a widely used conventionalized symbol system, the discovery of writing was virtually inevitable. Given the very few known cases of the pristine development of writing, this conclusion may seem surprising, but I believe it is actually correct. The point is, however, that for writing to develop, a number of conditions had to be met: • The symbols must be used regularly, more or less on a daily basis, in an institutionalized activity. • The symbol set must be of a reasonable minimal size: a system with three symbols, for example, is not likely to gain traction as a proto-writing system since

7.3 What Types of Symbol Systems Could Have Evolved into Writing?

159

there are not enough symbols to make a robust association between symbol and sound. • The phonological properties of the language needed to be right: the larger the phonological unit corresponding to the symbols, the less likely the system will develop robust sound-symbol assocations. See again Steinthal (1852), Daniels (1992), Boltz (2000), and Buckley (2008) on the importance of having lots of short, mostly monosyllabic morphemes. • Concomitantly, the symbols must indeed be seen to correspond to these short monosyllabic morphemes: symbols that if described orally correspond to whole phrases, are unlikely to develop into writing.3 These naturally lead to the question of which of the many non-linguistic symbol systems could, under the right conditions, have developed into writing. We turn to this question in the next section.

7.3 What Types of Symbol Systems Could Have Evolved into Writing? At this point we have examined a large number of different kinds of non-linguistic symbol systems, and we have proposed a theory of how such systems might have evolved into written language. The reader may thus be wondering: of the symbol systems we have examined, which ones are more or less likely to have evolved into written language? Obviously any such ruminations must remain the realm of speculation: as we have pointed out, with so few known examples of the pristine development of writing, and even fewer cases where we have any evidence at all about the likely non-linguistic precursors, we simply do not have enough evidence to know which pre-linguistic symbol system types actually favored grammatogenesis. Still, it is a reasonable question to ask, and we can at least say a few things. For the phonetic generalization necessary for true writing to evolve, the phonological units that became associated with the graphical symbols had to be short—one or two syllables at the most. Any more than that, and the chances of finding

3 In addition to these points, one must of course presume that in the culture in question, there was no

reason why written language should be proscribed against. The idea that writing should somehow be disfavored may seem odd in the modern world with its large number of writing systems and wide literacy, but in ancient times there were cultures where writing was not favored at all, even when neighboring cultures already had literacy. A good example is South Asia before the development of the Kharosthi and Brahmi scripts (third Century BCE). The Vedas were famously transmitted orally over many centuries using a variety of sophisticated memorization techniques, and there was a great deal of resistance among Vedic priests to written forms of the texts. Presumably this was at least in part because of the massive amounts of training required to memorize such a large canon of work, and the effective devaluation of that skill that writing would imply. But whatever their source, clearly such proscriptions would mitigate against the culture developing writing.

160

7 Simulating the Evolution of Writing

homophones that the symbol could be transferred to, would be greatly diminished. For this reason, then, narrative symbol systems such as the Dakota winter counts (Sect. 3.6.16), are among the least likely to have evolved into writing—this despite the fact that such systems are often listed as “precursors” to writing—cf. Gelb (1952). The reason is simple: in Dakota winter count texts, each symbol represented a whole year’s narrative, in which the symbol in question was chosen to represent a particularly salient event in that year. Supposing such texts were “read”, and assuming each symbol were always read in the same way, nonetheless the speech associated with each symbol would have been far too long and elaborate for it to have been a useful basis for phonetic generalization. Symbols that are associated with single syllables generalize well as ways to write other words that contain that syllable. But if a symbol corresponds to a whole sentence, or even more material, there is no possibility of it generalizing. In a similar vein, performative systems would seem a priori unlikely to have evolved into true writing, unless the actions to be performed were describable by sufficiently short utterances or, if the system was primarily a linguistic performative system, the utterances to be performed were sufficiently short. Silas John’s system (Sect. 3.6.18), despite being branded a “Western Apache writing system” by Basso and Anderson (1973), and despite containing clearly linguistic elements—cf., again, the use a cursive rendition of the English word she to represent the Apache word shíí ‘I’, was rather unlikely to have evolved into a general way of representing Apache speech. The fact that the system was apparently only designed to represent the set of 62 prayers that Silas John had created, would seem to back up that conclusion. It never was intended as a general writing system, and probably never could have become one. A symbol system also needs to have enough symbols. No examples of the pristine development of segmental writing systems are known: in all pristine systems, the minimal units were at the very least simple syllables–C(onsonant)V(owel) or CVC. Usually a language has a lot of these, numbering in the hundreds, or thousands. So a symbol system such as the Tupicochan Staff Code (Sect. 3.6.20) with only three or four symbols, could hardly suffice. Similarly simple informative systems like weather icons with relatively small sets of symbols would not be good candidates. Overmann (2016), page 13, suggests that numeral systems would have also been a poor basis for writing, for similar reasons. What about khipu? Could a system based on colored knotted cords have developed into true writing? In Sect. 3.6.12 I discussed claims by Hyland (2017; 2021) that indeed there were khipu that encoded linguistic information. The evidence, thus far, strikes me as unconvincing. Nevertheless, khipu certainly had the right properties. There were probably enough symbols, once you count the various numerical symbols, as well as the place-designating symbols described by Urton (2017). Add to that the possible use of color to encode information: for Hyland’s analysis, cord color is critical, and in any case the use of color to encode information has been

7.4 Summary

161

known at least since the early description of Acosta (1608).4 And khipu were known to have been used in accounting, which was also almost certainly the context in which writing evolved in Mesopotamia. Assuming that the khipu administrators—the khipukamayuqs—dictated the contents of the records, the necessary sound-symbol associations could have been formed, and the system could have evolved into written language just as the Mesopotamian accounting system did. Of course, as Urton (2010) stresses, we do not know either how the khipukamayuqs talked about the contents of the khipus or indeed even if they did, so there is no way of knowing if the right conditions in fact obtained. Over and above that, there is the question of how practical the medium would have been for extensive written records. But, in any case, from a purely symbological point of view the khipu accounting system was a perfectly plausible precursor to true writing. The open question is just whether that transition was in fact made.

7.4 Summary The development of writing ex nihilo was a rare event in human history, having occurred no more than four or five times in different times and places. Yet given the right conditions, its development was probably inevitable. As argued in this and the previous chapter, those conditions included favorable properties of the language spoken by the early scribes; and the existence of a conventionalized symbol system of sufficient size that was used on a daily basis, and where the oral repetition of messages to be encoded in this system would have led to the association of phonological forms and glyphs. In terms of the neurological issues discussed in the Chap. 5, what needed to be built was a robust association between symbols processed in the occipital lobe—which we can assume were already associated with the (distributed) representation of meaning in the brain—with areas such as the superior temporal gyrus associated with phonological processing. With the right conditions and sufficient (but unintentional!) training this almost had to happen. But the key point here was the prior existence of a frequently used conventionalized (nonlinguistic) symbol system, something that we can assume required a minimal level of social development, and was probably most likely to be associated with early city states that were building an ever more complex bureaucracy. The system need not have been accounting strictly speaking, but it certainly needed to be something that served as a way to keep records and that required a trained professional class, and so for all intents and purposes equivalent in function to accounting. Many of us think of accounting in terms only of someone we hire for the unpleasant task of preparing our

4 “Son

Quipos unos memoriales, o regístros hechos de ramales, en que diversos ñudos, y diversas colores significan diversas cosas.” (Khipu are memoranda or registers made of cords, in which various knots, and various colors, signify various things) (Acosta, 1608, page 140).

162

7 Simulating the Evolution of Writing

annual tax returns. But the development of writing shows that accountants occupied a rather more important place in history than that.

7.5 Details of the Model 7.5.1 Data Generation The data used in the model are synthetic.5 I selected 1000 English nouns as “concepts” from the British National Corpus embeddings (see below), and generated a lexicon associated with each of these concepts. Thus one concept might be @HORSE, and a synthetic word is generated by a grammatical model as the phonetic form of that word. Following previous work reported in Sproat (2017), these phonetic forms are generated by a phonological grammar, and in the experiments reported here, are of three types: • monosyllabic, meaning that all morphemes are maximally CVC syllables: e.g. /kip/, /pa/, /ek/. • disyllabic, meaning that all morphemes consist maximally of two CVC syllables: thus /kip.pa/ could be a morpheme, or /pa.ek/, in addition to the morphemes possibly being monosyllabic. The proportion of disyllabic morphemes generated is about 49% of the total morphemes. • sesquisyllabic, or one and a half syllables, meaning that the morphemes are maximally of the form CV.CVC, where in addition the vowel of the “half syllable” CV is always /a/. Thus: /ka.pak/ or /pa.ki/, in addition to the morphemes possibly being monosyllabic. The inclusion of these types, following my previous work, is to test the theory that the phonological principle underlying writing systems is easier to discover in languages that have short morphemes compared to longer morphemes. The inclusion of sesquisyllables is because of theories that Ancient Chinese may have had sesquisyllabic morpheme structure (Baxter, 1992; Baxter & Sagart, 2014). Note that the proportion of sesquisyllables among the lexical items is rather small—about 2.5%—but this is actually not very different from the proportion in Baxter and Sagart’s (2014) reconstruction of Ancient Chinese. Of the 4968 reconstructed forms found in their wordlist,6 140 (2.8%) match the pattern /ə./ (mostly /Cə./), which is how sesquisyllables are indicated in their list. 5 All

data and scripts for generating data, as well as the TensorFlow code for the model and the scripts used to run the model will be distributed on GitHub at https://www.github.com/rwsproat/ symbols. 6 See http://ocbaxtersagart.lsait.lsa.umich.edu. But note that their reconstruction is for a version of Chinese that is still roughly 1000 years after the earliest Shang Dynasty written records. It is therefore possible that if Shang Chinese had sesquisyllables, there were more of them than in its later descendant.

7.5 Details of the Model

163

Since the 1000 lexical entries are intended to be concepts rather than actual morphemes, we allow for more than one phonetic form—in our simulations up to two. Thus in a monosyllabic “language”, @HORSE might be associated with two phonetic forms, say /mak/ and /so/. Of the 1000 morphemes, a fixed set of 100 concepts are associated with graphical symbols. (In the distributed code, this set is defined in concepts.py.) The task of the model is then to extend these 100 in use to represent as many of the other 900 as possible. 100 basic signs is certainly well on the low side of the number of signs that were in use in preliterate Mesopotamian accounting—probably on the order of several hundred (Damerow et al., 1988), or somewhat under 900 by the later Uruk III period (Englund, 1998, page 68). For graphical symbols we used conceptappropriate emoji. In addition to the basic concepts, the language also has a decimal number system, generated according to the same constraints as the language as a whole: thus a monosyllabic language would have monosyllabic number words, which would be combined compositionally into number names (e.g. 24, would be the equivalent of “two ten four”). Graphically numbers are represented using Roman numerals. Finally a set of “accounting texts” are generated of the form NUMBER COMMODITY. Thus a legal text might be 4

@HORSE

with concomitant graphical symbols IV 🐎 and phonological forms, such as sem mak depending on the type of language generated. The basic task for the model was first to learn to generate the right symbol sequence given either just the numberplus-concept combination, or this plus the phonological forms; see the next section for further details. In the simulations, 5000 texts were generated as training data, with 500 for testing using the original concepts and symbols, and then 500 “novel” cases consisting of new concepts from the set of 900 not originally associated with graphical symbols. In the texts for the novel cases, the position of the graphical symbol was occupied by a “mask” emoji (😷) and the task for the model was to predict a symbol for that concept. Since the texts are generated randomly, the number of distinct concepts represented in the texts will generally be a lot less than the original 1,000; the number is usually around 400 in our simulations.

164

7 Simulating the Evolution of Writing

7.5.2 Model I used a standard sequence-to-sequence encoder-decoder model with a Bahdanau attention mechanism (Mnih et al., 2014; Bahdanau et al., 2015). In such a model, an input sequence, such as words in a text, is read into the encoder, and embedded as a sequence of continuous vectors x = (x1 , x2 , . . . , xT ). The decoder is tasked with predicting the next symbol of the output, given the inputs and previous outputs. This amounts to finding the probability p, of predicting output yi given history y1 , . . . , yi−1 and input vectors x: p(yi |y1 , . . . , yi−1 , x). In the model of Bahdanau et al. (2015), this is implemented as a nonlinear function g, which takes the previous output along with a context vector ci , and a hidden state si : p(yi |y1 , . . . , yi−1 , x) = g(yi−1 , si , ci ) ,

(7.1)

where the hidden state si = f(si−1 , yi−1 , ci ) depends on a nonlinear function f and ci is in turn defined in terms of a sequence of annotations H = h1 , h2 , . . . , hT , ci =

T ∑

αij hj ,

(7.2)

j=1

with each annotation weighted with a probability αij defined as exp(eij ) , αij = ∑T k=1 exp(eik )

(7.3)

with energy eij = α(si−1 , hj ) being an alignment model. The alignment model scores the match between the inputs around position j and the output at position i. Finally, annotations H are derived from the concatenation of the forward and − → ← − backward hidden states h j , h j . This allows the system to encode information from the preceding and following inputs at each position. A schematic of the model is shown in Fig. 7.8, reproduced from Bahdanau et al. (2015), Figure 1, page 3. Per Bahdanau et al. (2015) (page 4): the probability αij , or its associated energy eij , reflects the importance of the annotation hj with respect to the previous hidden state si−1 in deciding the next state si and generating yi . Intuitively, this implements a mechanism of attention in the decoder.

In effect, attention reflects the importance each portion of the input has for predicting each output symbol. The model used here is essentially the same as the implementation used in Sproat and Gutkin (2021)’s computational study of logography in writing systems and the reader is referred to Section 4 of that paper for further technical details of the model. In our case the input sequences consist of the numbers, commodities, and the words (phonological forms) corresponding to each. The input tokens for numbers are the individual digit characters, whereas for commodities and phonological forms, the

7.5 Details of the Model

165

Fig. 7.8 The Bahdanau attention model from Bahdanau et al. (2015), Figure 1, page 3: x1 , x2 , . . . , xT are the inputs, the h are annotations, and s the hidden states. Output yt is predicted from output yt−1 , the previous state st−1 and the sum over the weighted inputs from the annotations. Source: Bahdanau et al. (2015), Figure 1, page 3. Paper is published on arXiv (https://arxiv.org/pdf/ 1409.0473.pdf) with no copyright

full commodity name, and phonological form are each individual tokens. The output sequences are the glyphs associated with the numbers and commodities. For example one input/output pair might be as follows, where sat is the word for “6” and dim the word for “pig”: Input: Output:

6 VI

@PIG 🐖

sat

dim

Each of the inputs is embedded in a 300-long vector, and these are then concatenated together to form the input, a 300 × k dimensional tensor, where k is the length of the input from the above table. So in the above example “6” is embedded as a 300long vector, as are each of @PIG and sat and dim. We represent the output tokens, including the individual components of the Roman numeral such as “V” or “I”, and the commodity symbols like 🐖 as unique integers from 1 to V, where V is the size of the output vocabulary. An output sequence is then just a sequence of vectors of length V, and the task of the model is to learn to predict the appropriate vocabulary integer for each position in the output. Returning to the inputs, the 300-long embedding for each digit in the numerical representation is trainable. On the other hand, for concepts like @PIG, we use the embeddings of the equivalent English words from the British National Corpus (BNC) embeddings7 (Fares et al., 2017), and these are not trainable in the model. The intuition is that these embeddings are proxies for semantic similarity between concepts, so that in the embedding space, @PIG will be more similar to @BOAR than it is to @QUEEN. Note that the BNC embeddings are 300-long vectors, which is why 7 http://vectors.nlpl.eu/explore/embeddings/en/models/.

166

7 Simulating the Evolution of Writing

I chose this vector size for all the embeddings (concatenation requiring identical dimensionality of the concatenated elements). The BNC embeddings provide a way for the model to exploit semantic similarity between concepts. For the phonological input, we need an equivalent way of representing phonetic similarity, where we want a similar embedding representation where dim is deemed similar to tim or din, but less similar to sat. Word embeddings work because words that have similar meanings often tend to occur in similar contexts—again, note Firth’s (1957) maxim that “you shall know a word by the company it keeps”. But a similar approach is not going to work for phonetic similarity, at least outside the context of poetry: phonetically similar syllables are not particularly prone to occur in similar environments. We therefore need a different approach. Phonetic similarity between any two strings can be defined in terms of articulatory and acoustic phonetic features, so that for example if one is looking at (maximally) CVC syllables and comparing a syllable dim with other syllables, syllables that begin with a consonant in the same place of articulation like /t/ should be more similar than those that begin with a consonant with a different place of articulation (e.g. /k/) or syllables that have no onset (i.e. initial) consonant. Similarly one can compare the vowels in the two syllables (matching vowels should count as more similar than non-matching vowels) and the final consonant. One can then define a similarity-based weighting scheme so that a pair of more similar syllables will have a lower cost than a pair of less similar syllables. The exact weighting scheme used is defined in the distributed code in phon_dist.py, but the salient details are as follows: first a distinction is made between an exact rhyme (i.e vowel, and following final consonant, if any)—e.g., syllables ending in /et/ rhyme exactly—and a “close rhyme”, where sharing the place and manner of articulation of the final consonant suffices: thus /et/ and /ed/ would be close rhymes, but /et/ and /en/ would not. In each case, the vowel must match. Similarly for the beginnings of syllables, a distinction is made between exact alliteration—syllables beginning with /t/ alliterate—and close alliteration, where /d/ would be a close match to /t/, but not /n/. All exact matches are also close matches. When comparing two maximally CVC syllables, empty onsets and codas are first filled with an “empty” consonant, then the exact alliterations, close alliterations, exact rhymes and close rhymes are summed, with a higher weight being given to matches—and thus mismatches—in rhymes as opposed to onsets. This weighting scheme is an attempt to approximate the notions of phonetic similarity that seemed to guide ancient scribes’ use of symbols for their phonetic values. For example, for Chinese, Baxter (1992) notes (page 348) that: In order to be written with the same phonetic element, words must normally have identical main vowels and codas, and their initial consonants must have the same position of articulation.

Given this weighting or cost, we can then define a distance d between any two syllables σ1 and σ2 as simply d(σ1 , σ2 ) ≡ cost(σ1 , σ2 ). To turn these into an embedding-like vector space, we first compute the frequencies of each syllable in the population of randomly generated syllables, and choose the set K of 300 most frequent syllables. Then for each σ ′ in the entire set of syllables

7.5 Details of the Model Table 7.2 The ten most similar and ten least similar syllable pairs, for one population of syllables, according to the “embedding” measure described in the text

167 Most similar bal ↔ fal bar ↔ far bay ↔ fay bel ↔ fel ber ↔ fer bey ↔ fey bop ↔ fop buy ↔ fuy fet ↔ bet fok ↔ bok

Least similar ol ↔ sur ur ↔ fol dur ↔ kol ur ↔ pol ol ↔ dur ur ↔ xol ur ↔ dol ur ↔ sol ur ↔ kol ur ↔ tol

S, we construct a 300-long vector E such that Ek = d(σ ′ , σk ), where σk is the kth syllable in K. The resulting vectors are then normalized so that the values fall in the range [−1, 1]. This results in a set of |S| vectors, where |S| is the size of the syllable set, and where the distance between any two vectors is reflective of the similarity between the corresponding syllables. Table 7.2, shows an example of the ten most similar syllable pairs and the ten least syllable pairs for one of the syllable populations used in the experiments. These results seem plausible. This approach can be extended to disyllabic cases by dividing the 300-long vector E into two parts, allocating the first 150 positions to d(σ ′1 , σk ), where σ ′1 is the first syllable in the morpheme, and the remaining 150 positions d(σ ′2 , σk ) where σ ′2 is the second syllable in the morpheme. If there is no second syllable, the first syllable is copied to the second syllable position. This allows a single syllable to be closer to a disyllable or sesquisyllable if it shares an onset with the first syllable of the disyllabic form and a coda with the second syllable of the disyllabic form. See Table 7.3, which shows that this also yields plausible-looking similarities. The model can be trained in various conditions, by turning off one or another of the inputs. The ones we discuss in the text are two: with only semantic input—6 and @PIG in the previous example, and with both semantic input and phonological input. Table 7.3 The ten most similar and ten least similar disyllable pairs, for one population of maximally disyllabic morphemes, according to the “embedding” measure described in the text. Syllable boundaries are denoted by “.”. Note that the first pair mun.mun ↔ mun is a direct result of the copying operation described in the text

Most similar mun.mun ↔ mun fek.tor ↔ fek.dor suk.du ↔ suk.tu fey.kuy ↔ fey.xuy gum.ko ↔ gum.xo kom.xin ↔ xom.xin kor.kok ↔ xor.kok rup.xek ↔ rup.kek rur.xiw ↔ rur.kiw tal.kum ↔ tal.xum

Least similar ok.bem ↔ yak.ok iy.gem ↔ gun.fiy iy.gem ↔ yok.un ok.sok ↔ fak.iy iy.gem ↔ xem.dak sak.iy ↔ iy.xok iy.gem ↔ gem.yum iy.gem ↔ yak.ok iy.gem ↔ fak.iy gok.ak ↔ iy.xok

168

7 Simulating the Evolution of Writing

The former case simulates the situation where the scribe merely has to learn that the 🐖 glyph represents the concept @PIG; the latter, where it is presumed the scribe also gets spoken input sat dim, where the task is to write down “VI🐖”. In training, in the second scenario all embeddings are presented as input to the model, whereas in the first scenario, the semantic embeddings are presented, and the phonetic embeddings are replaced with zeroed tensors of the same dimensionality as the original phonetic embeddings: this is equivalent to presenting the model with no phonetic information. In the experiments reported in the main chapter, the models were trained for 400 epochs, where the model sees the entire input dataset in each epoch.

7.6 Semantic-Phonetic Compounds from Experiments The following are semantic-phonetic compounds ‘discovered’ by the model in the simulations described in Sect. 7.2.2, for each of the set of 10 monosyllabic, sesquisyllabic and disyllabic languages. Within each are listed semantic-phonetic compounds associated with a particular round in the evolution simulation. Thus monosyllable data 00 round 01 lists the semantic-phonetic compounds added to the written lexicon for monosyllabic language 00 in round 01, that is the first round after the training on the original texts associated with the language. In each case, as in Table 7.1, the first column is the semantic value of the target word, the second column is its pronunciation, the third contains the semantic-phonetic glyph pair, and the final column the semantic and phonetic information for the semantic and phonetic glyphs, respectively.

7.6.1 Monosyllabic Cases

monosyllable data 00 round01 @GATEHOUSE gan 🏰🐈 @CARP kuy 🐟🦄 @INCH fet 🦶🐇 @PONY far 🐎🦊 @MOTEL yiw 🏨🌰 @THIGH op 🦵🦔 @PARSLEY kol 🧅🐍 monosyllable data 00 round02 @WEDDING kuk 🏨🐎 @QUEEN te 🤴🧈 @WOOD an 🦑👶

@CASTLE/gan @FISH/xuy @FOOT/pet @HORSE/far @HOTEL/tiw @LEG/op @ONION/xol @BRIDESMAID/xuk @PRINCE/re @SPRUCE/wan

7.6 Semantic-Phonetic Compounds from Experiments monosyllable data 01 round01 @FLOORING xol 🧱🦞 @BRICK/kol @ROOF fik 🧱🐇 @BRICK/wik @FORTRESS i 🏰🌰 @CASTLE/i @PUPPY an 🐕🐈 @DOG/an @SNOW dam 🧊🦶 @ICE/sam

monosyllable data 02 round01 @GULL at 🐦🧈 @BIRD/at @NEST up 🐦🦞 @BIRD/up @WALL bir 🧱🦵 @BRICK/bir @CHIMNEY len 🧱👻 @BRICK/len @CHEESE fet 🧈🐈 @BUTTER/pet @PUPPY up 🐕🦞 @DOG/up @ANKLE not 🦵🦶 @LEG/not @THIGH lim 🦵🐗 @LEG/rim @THRONE ot 🤴🐪 @PRINCE/ot @KING rar 🤴🥕 @PRINCE/rar @FERRET poy 🐇🦀 @RABBIT/koy @IVY im 🌷🥓 @TULIP/im monosyllable data 02 round02 @COURT rat 🧝🧈 @JUDGE/at @CEILING ge 🧱🐕 @ROOF/xe

monosyllable data 03 round01 @STARLING xoy 🐦🍑 @LOAF uy 🍞🦊 @PORRIDGE ger 🍞🌻 @KILN fiy 🧱🐏 @ROOF en 🧱🧊 @CARP uw 🐟🏠 @COTTAGE uy 🏠🦊 @LEEK foy 🧅🦅 @QUEEN lek 🤴🧑 @RASPBERRY ir 🍓🧑 @SUNSET ip 🌅🦶 @PETAL ar 🌷🐊 monosyllable data 03 round05 @LETTUCE for 🥒🧅 @WAIST maw 🦊🤴

@BIRD/xoy @BREAD/uy @BREAD/yer @BRICK/biy @BRICK/en @FISH/uw @HOUSE/uy @ONION/boy @PRINCE/wek @STRAWBERRY/ir @SUNRISE/ip @TULIP/ar @CUCUMBER/wor @DRESS/waw

169

170

7 Simulating the Evolution of Writing monosyllable data 04 round01 @ROOF ot 🧱🦴 @CHIMNEY pep 🧱🧊 @GATEHOUSE nuk 🏰🦔 @FORTRESS sum 🏰🦪 @ANKLE rok 🦵🧒 @RASPBERRY bum 🍓🐟 @IVY nar 🌷🐢 monosyllable data 04 round02 @SOUTH yep 🥚🥚 @BLIZZARD xul 🧊🦷

@BRICK/ot @BRICK/pep @CASTLE/buk @CASTLE/dum @LEG/lok @STRAWBERRY/fum @TULIP/lar @NORTH/kep @SNOW/xur

monosyllable data 05 round01 @VERTEBRA nul 🦴🦞 @BONE/nul @BROTHER al 👦🧈 @BOY/al @CATFISH xon 🐟🍌 @FISH/kon @CARP lam 🐟🦔 @FISH/lam @HOOF nut 🐎🦭 @HORSE/fut @HIP gut 🦵🦭 @LEG/fut @FLOWER dar 🌷🦞 @TULIP/tar monosyllable data 05 round02 @FOOTHILL sew 🐏🐕 @HILL/dew monosyllable data 05 round05 @FOETUS yek 🌻🐎 @WOMB/ek

monosyllable data 06 round01 @STARLING goy 🐦🌷 @CURD lek 🧈🦭 @CHEESE yew 🧈🐂 @DAIRY aw 🐄👩 @DRAKE saw 🦆🦂 @INCH lek 🦶🦭 @CART aw 🐎👩 @GUEST ir 🏨🍄 @ARM et 🦵🐪 @QUEEN xu 🤴🧂 @PUPIL lek 🏫🦭 @GERANIUM rol 🌷🦄 monosyllable data 06 round02 @CHIMNEY der 🧱🐦 @COUSIN row 🦀🐇

@BIRD/goy @BUTTER/rek @BUTTER/yew @COW/aw @DUCK/daw @FOOT/rek @HORSE/aw @HOTEL/ir @LEG/et @PRINCE/gu @SCHOOL/rek @TULIP/rol @BRICK/mer @SISTER/mow

7.6 Semantic-Phonetic Compounds from Experiments

171

monosyllable data 07 round01 @ROOF rip 🧱👩 @BRICK/rip @FARMER xat 🐑🐍 @EWE/gat @PASTURE up 🐑👄 @EWE/up @KNEE il 🦵🦞 @LEG/il @ANKLE ok 🦵🦆 @LEG/ok @LIP ek 👄👻 @MOUTH/ek @KING kaw 🤴👦 @PRINCE/maw @PUPIL mut 🏫🦌 @SCHOOL/dut monosyllable data 07 round03 @EAST bor 🐎🦎 @SOUTH/sor monosyllable data 07 round04 @VASE gay 🌷🦶 @FLOWER/ray

monosyllable data 08 round01 @LOAF pon 🍞🐖 @STONE al 🧱🧂 @ROOF dam 🧱🦑 @FORT ser 🏰🐗 @CARP iw 🐟🐊 @SEAFRONT ot 🏨🌋 @THIGH am 🦵🍞 @HIP gap 🦵👨 @QUEEN to 🤴🦭 @TEACHER om 🏫🏫 @IVY fam 🌷🐀 @GERANIUM dom 🌷🧠 @DAFFODIL nu 🌷🧒 monosyllable data 08 round03 @THROAT pum 👄🦅

@BREAD/pon @BRICK/al @BRICK/yam @CASTLE/der @FISH/iw @HOTEL/ot @LEG/am @LEG/dap @PRINCE/xo @SCHOOL/om @TULIP/bam @TULIP/lom @TULIP/mu @MOUTH/tum

172

7 Simulating the Evolution of Writing monosyllable data 09 round01 @SKELETON fiy 🦴👃 @DAD no 👦🤴 @PORRIDGE mo 🍞🥒 @ROOF am 🧱🐎 @PLASTER det 🧱🦉 @PUPPY sik 🐕🦫 @PASTURE em 🐑🧈 @LIVESTOCK tep 🐑🌋 @WATERFRONT ow 🏨🦞 @GUEST dil 🏨🦂 @FERRET ap 🐇🐦 monosyllable data 09 round02 @SEAFOOD wey 🥒🍌 @MAGPIE yak 🐦🌻 @STORY mak 👂🌻

@BONE/fiy @BOY/no @BREAD/mo @BRICK/am @BRICK/det @DOG/dik @EWE/mem @EWE/tep @HOTEL/ow @HOTEL/til @RABBIT/ap @CUCUMBER/xey @STARLING/fak @TALE/fak

7.6.2 Sesquisyllabic Cases sesquisyllable data 00 round01 @FLOORING xok 🧱🦌 @BRICK/kok @FORTRESS on 🏰👶 @CASTLE/on @MARE guk 🐎🦉 @HORSE/guk @MINK um 🦦👪 @OTTER/um @KING kin 🤴🦆 @PRINCE/gin @PUPIL biy 🏫🐊 @SCHOOL/wiy @FLOWER ro 🌷🦈 @TULIP/lo

7.6 Semantic-Phonetic Compounds from Experiments sesquisyllable data 01 round01 @CHIMNEY fal 🧱🏫 @GATEHOUSE iy 🏰🦄 @FORT le 🏰👨 @LETTUCE op 🥒🦁 @YOLK er 🥚🥕 @FARMER del 🐑🐊 @GUEST tey 🏨🐹 @COTTAGE dip 🏠👧 @KNEE ot 🦵🦞 @KESTREL bur 🦉🦉 @APPLE ew 🍐🍌 sesquisyllable data 01 round02 @DEATH oy 🧠🦆 sesquisyllable data 01 round04 @CALF nen 🐑🥄 @SON gew 👨🍐

sesquisyllable data 02 round01 @PORRIDGE xiw 🍞🍓 @LOAF ot 🍞🏰 @MARE ay 🐎🧑 @CART e 🐎🏫 @GELDING bey 🐎🐖 @HOOF in 🐎🐍 @THIGH iw 🦵🐢 @KNEE fat 🦵🐍 @KING rik 🤴🐜 @RASPBERRY rer 🍓🦵 sesquisyllable data 02 round02 @PIGEON kop 🐦🥚 @DONKEY ben 🐎🍋 sesquisyllable data 02 round04 @MULE gin 🐎🍐

173

@BRICK/bal @CASTLE/iy @CASTLE/re @CUCUMBER/op @EGG/er @EWE/tel @HOTEL/dey @HOUSE/tip @LEG/ot @OWL/yur @PEAR/ew @MURDER/oy @EWE/men @WIFE/lew

@BREAD/kiw @BREAD/ot @HORSE/ay @HORSE/e @HORSE/fey @HORSE/in @LEG/iw @LEG/pat @PRINCE/lik @STRAWBERRY/ler @BIRD/gop @PONY/sen @HORSE/win

174

7 Simulating the Evolution of Writing sesquisyllable data 03 round01 @TERMITE dow 🐜🦅 @ANT/dow @ANKLE bap 🦵🐟 @LEG/fap @KING op 🤴🧅 @PRINCE/op @LIQUID fot 🧂🏠 @SALT/mot @LILY not 🌷🍐 @TULIP/not @VIOLET pol 🌷🏫 @TULIP/por sesquisyllable data 03 round04 @CEILING gim 🧱🧱 @ROOF/gim

sesquisyllable data 04 round01 @PORRIDGE sol 🍞🧑 @BREAD/sol @RUBBLE ron 🧱🐇 @BRICK/ron @DAIRY a 🐄🦑 @COW/a @YOLK lop 🥚🐦 @EGG/rop @TOAD tam 🐸🐕 @FROG/tan @MOTEL bin 🏨👨 @HOTEL/bin @PUPIL ek 🏫🦫 @SCHOOL/ek @SUNSET wuw 🌅👃 @SUNRISE/nuw @DAFFODIL mel 🌷🌰 @TULIP/mel

sesquisyllable data 05 round01 @FARMER pu 🐑🐏 @EWE/mu @COLT ye 🐎🐔 @HORSE/we @GUEST bak 🏨🦢 @HOTEL/bak @KING di 🤴🦦 @PRINCE/di @CROP ew 🌱🏫 @SEEDLING/ew @FLOWER suk 🌷🐟 @TULIP/suk @IVY ur 🌷🐢 @TULIP/ur sesquisyllable data 05 round02 @LEAF maw 🌷🐹 @TULIP/raw sesquisyllable data 05 round03 @ALE mun 🥭🐄 @BREWERY/gun

7.6 Semantic-Phonetic Compounds from Experiments sesquisyllable data 06 round01 @TERMITE ol 🐜🦋 @GULL gep 🐦🐍 @STONE en 🧱🦔 @KILN bu 🧱🍈 @GATEHOUSE uk 🏰🥄 @LETTUCE nur 🥒👶 @LIVESTOCK aw 🐑🐹 @PASTURE lut 🐑🐪 @CARP pan 🐟🐎 @CART mer 🐎⛲ @PRIMROSE um 🌷🦂 sesquisyllable data 06 round02 @HIP an 🦵🦵 sesquisyllable data 06 round04 @SON we 🌅🍓

@ANT/ol @BIRD/gep @BRICK/en @BRICK/yu @CASTLE/uk @CUCUMBER/nur @EWE/aw @EWE/rut @FISH/fan @HORSE/mer @TULIP/um @THIGH/an @FATHER/ne

sesquisyllable data 07 round02 @LETTUCE sut 🥒🏰 @CUCUMBER/yut @FURNITURE pol 🧱👻 @FLOORING/pol @FOLIAGE ne 🌷🏠 @PETAL/ne @NECK pet 🦵🦵 @THIGH/get @TOE om 🦵🐦 @THIGH/om sesquisyllable data 07 round03 @SHRUB gut 🌷🦉 @IVY/put

sesquisyllable data 08 round01 @FORTRESS yur 🏰🦂 @CASTLE/fur @MOAT ot 🏰👄 @CASTLE/ot @LETTUCE may 🥒🐇 @CUCUMBER/may @PONY ap 🐎🦶 @HORSE/ap @MARE ruy 🐎🍋 @HORSE/puy @MOTEL kek 🏨🧱 @HOTEL/kek @SNOW uy 🧊🐜 @ICE/uy @THIGH tip 🦵🐑 @LEG/sip @LIQUID im 🧂🍑 @SALT/im @IVY tol 🌷🦪 @TULIP/sol sesquisyllable data 08 round02 @PRIMROSE sok 🌷🐈 @VIOLET/dok

175

176

7 Simulating the Evolution of Writing sesquisyllable data 09 round02 @FLOORING kew @DANDELION mum @RHODODENDRON yan sesquisyllable data 09 round03 @BEAK rek sesquisyllable data 09 round04 @PEASANT lit

🧱🦋 🌷🦉 🌷🐐

@BRICK/kew @DAFFODIL/mum @DAFFODIL/yan

⛲🦦

@GULL/xek

🐑🐟

@FARMER/yit

7.6.3 Disyllabic Cases

disyllable data 00 round01 @KILN low 🧱👶 @GUEST tor 🏨🐌 @KNEE wak 🦵👧 @THIGH yoy 🦵🦄 disyllable data 00 round03 @MUTTON ya 🍞🍌 @LAKE kot ⛲🐔

@BRICK/low @HOTEL/xor @LEG/pak @LEG/poy @BREAD/wa @WATERFALL/lot

disyllable data 01 round01 @BROTHER buw 👦🦵 @SEAFRONT yan 🏨👧 disyllable data 01 round02 @WAIST yon.lep 🦵🏨

disyllable data 02 round01 @GATEHOUSE fal @LIP kon @THRONE um @FLOWER mul disyllable data 02 round02 @ANTELOPE mot.der

@BOY/puw @HOTEL/nan @THIGH/bon.wep

🏰👃 👄🦎 🤴🐎 🌷🦆

@CASTLE/bal @MOUTH/kon @PRINCE/um @TULIP/dul

🦌🌷

@DEER/bot.ner

7.6 Semantic-Phonetic Compounds from Experiments disyllable data 03 round01 @FLOORING fuk 🧱🐄 @PONY um 🐎🦟 @CART maw 🐎🏨 @ARM bok 🦵🦪 @LIP ki 👄🐻 @SUNSET get 🌅👶 disyllable data 03 round03 @DRESS wet 🏰🍞

disyllable data 04 round01 @PUPPY bet 🐕🦶 @TOAD ko 🐸🐺 disyllable data 04 round02 @SUNSET xok 🌅🍞 disyllable data 04 round04 @ALCOHOL kek 🧂🥓

disyllable data 05 round01 @WALL fel 🧱👦 @GUEST pup 🏨🤴 disyllable data 05 round02 @LIVESTOCK set 🐑🧠

disyllable data 06 round01 @PORRIDGE bir 🍞🤴 @DAIRY puw 🐄🐜 @FERRET dat 🐇🐌 disyllable data 06 round02 @PORK nom 🥓🧱 disyllable data 06 round03 @FEMALE nup 🧒🌷 @DAUGHTER pet 👧🏰 disyllable data 06 round04 @RUG raw 🧱👦

177

@BRICK/puk @HORSE/um @HORSE/waw @LEG/bok @MOUTH/ki @SUNRISE/xet @SCARF/bet

@DOG/met @FROG/go @SUNRISE/kok @DRINK/lek

@BRICK/kel @HOTEL/mup @EWE/wet

@BREAD/gir @COW/fuw @RABBIT/wat @BACON/pom @MALE/mup @SISTER/pet @CARPET/naw

178

7 Simulating the Evolution of Writing disyllable data 07 round01 @TERMITE ban.pan 🐜👨 @CART xow 🐎🧠 @SNOW ar 🧊🐕

disyllable data 08 round01 @ANKLE gin 🦵🐢 @GERANIUM iy 🌷🍄 @FLOWER suk 🌷🐏 disyllable data 08 round02 @FOAL fo 🐎🏰 @KING let 🤴🐂 disyllable data 08 round03 @SHOULDER gey 🦵👻

disyllable data 09 round01 @STARLING or 🐦👧 @RUBBLE il 🧱🐢 @CARP sal 🐟🤴 @INCH wok 🦶👂 @SNOW nat 🧊🐇 @ELBOW so 🦵🐻 @WRIST xap 🦵🍐 @KESTREL noy.kap 🦉🐎 disyllable data 09 round03 @BULL rep 🐄🌷

@ANT/fan @HORSE/gow @ICE/ar

@LEG/kin @TULIP/iy @TULIP/suk @MARE/po @THRONE/yet @LEG/yey

@BIRD/or @BRICK/il @FISH/dal @FOOT/lok @ICE/nat @LEG/do @LEG/xap @OWL/xok.lap @COW/rew

Chapter 8

Confusions and Misrepresentations

8.1 Introduction Writing, associated originally with a few ancient civilizations, has come to be, in the minds of many people, associated with the very notion of civilization. The notion of a non-literate civilization seems almost an oxymoron: how could they have managed a complex society without the ability to write language? Since there have been many civilizations throughout history that apparently did not have writing—if nothing else, there was already a civilization in place in Mesopotamia before the first true writing was developed—this is a somewhat misplaced concern. Nonetheless, when a mysterious new symbol system is discovered associated with the remnants of an ancient civilization, often the first assumption is that this must have been writing (Kammerzell, 2009), since after all a civilization must have had writing, right? In some cases this is driven by some combination of professional pride or nationalism on the part of archaeologists. Discovering an ancient civilization is a great achievement, but discovering a literate ancient civilization, especially one with a previously unknown writing system, is an even greater achievement. And if the ancient civilization in question happens to occupy the same piece of land as you do, then a common result is that archaeologists and, of course, the popular press, project their own national identity back millennia into the past, and declare that early X (where X might be India, or Iran or wherever) had a great civilization that predated others. The problem is that, usually, the samples of the ancient symbols, at least initially, are fairly small in number and the “texts” are typically short. There is therefore usually not enough data to come up with a convincing decipherment—or at least convincing to anyone other than the person proposing the decipherment: see, for example, my discussion of attempts to decipher the Phaistos Disk in Sproat (2010b). And this is presuming one even has an idea of what language the people spoke, and how it was related (or not) to later languages that we already know something about. So one is often left with the supposition that a system was writing, but no way to demonstrate that it was. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Sproat, Symbols, https://doi.org/10.1007/978-3-031-26809-0_8

179

180

8 Confusions and Misrepresentations

But maybe things are not so bad: perhaps one can tell by looking at the texts or at the properties of the characters or their combinations that something must have been writing, or at least that the null hypothesis is that it was writing. Indeed such claims have been made, and we will examine these issues in some detail in this chapter. To anticipate our conclusion, none of these proposals really pass muster. Without knowing what the symbols denoted, which presumes a decipherment, either as a written language, or something else, one cannot say much about the system. Part of the problem, as we will also see is that a lot of the discussion on this topic has depended upon confusion about the notion of what “writing” means. As we noted in Chaps. 1 and 4, there are two main schools of thought on this issue, with “exclusivists” like myself reserving the term just for linguistic symbol systems and “inclusivists” using it for any conventional graphical symbol system. DeFrancis (1989) suggests (page 5) that in principle either one of these positions is defensible, since it seems to come down to a choice of terminology, but it is crucial that one at least be consistent: In a way it doesn’t much matter which definition we adopt, so long as we do not confuse the two. This means in particular that we should not unthinkingly assume that there is a relationship between partial writing [= non-linguistic systems] and full writing [= true writing], or that the former represents an early stage of the latter.

Actually, though, the problem is really deeper than a purely terminological one. As we noted in the introduction to Chap. 4, the common-language notion of the term write conjures up for most people linguistic symbol systems. So if I claim that such and such an ancient symbol system was writing, then I can be reasonably sure that for the average person, this will conjure up an image of a system that is like the English writing system that you are now reading. So using this system, one assumes the ancient people in question could have written down, for example, a conversation that they just had, just as you or I could do in English. That is surely the image about “writing” that most readily comes to mind. But often it turns out that what is being claimed is more akin to the “inclusivist” notion of writing. Thus, anticipating somewhat, Lee et al. (2010a) published a claim that Pictish symbols (Sect. 3.6.5) were “written language”, and purported to demonstrate this with a statistical methodology. In their paper they were quite specific about what they meant by writing, suggesting (page 2554) that the classifier they developed “suggests that the Pictish symbols are lexigraphic in nature,” where by “lexigraphic” they mean “glottographic”—i.e. tied specifically to language. When I demonstrated (Sproat, 2010a) that using their method one could also ‘discover’ that a known non-linguistic system, namely Mesopotamian kudurrus (Sect. 3.6.3), was in fact writing, Lee et al. (2010b) accepted my conclusion and noted that by “writing” they had in mind a more general—that is “inclusivist” notion—such as that of Powell (2009), discussed in the introduction to Chap. 4, who considers anything to be writing that is “a system of markings with a conventional reference that communicates information.” Since Powell’s definition says nothing about the “system of markings” being tied to language, this represents a clear shift of position on what Lee and colleagues meant by the term “writing”. But if their point all along was to show that

8.2 What Writing Looks Like

181

Pictish symbols were “writing” in this broader inclusivist sense, then their statistical methodology was surely overkill: it has long been known that the Pictish symbols were conventional and that they must, for the Picts, have conveyed some sort of information. But by pursuing this form of bait-and-switch, Lee and colleagues could appear in their original paper to be making a bold testable claim, backed up by a seemingly rigorous statistical analysis, yet back off to a weaker position later when challenged. But let us start with a simple question: can one look at an artifact covered with symbols and conclude anything about whether this was written language or not? In other words: what does writing “look like”?

8.2 What Does Writing “Look Like”? Can one tell just by looking at a text in an unknown symbol system if it was writing or not? What does writing look like? In 2020 there was much fanfare in the popular science press about the supposed decipherment of Linear Elamite, an early writing system of Mesopotamia. See Cimarosti (2020) for an example. The decipherment was claimed by François Desset, a French archaeologist working in Iran, and included some examples claimed to be from Jiroft of what he called the “geometric script”, samples of which first appeared in the early 2000s and described in Basello (2006) and Desset (2014). See Fig. 8.1 for an example. Many scholars of Elamite and Linear Elamite are skeptical both of Desset’s claimed decipherment, and that the “geometric script” samples are genuine—on the latter see Lawler (2007) and Muscarella (2008), and see Muscarella

Fig. 8.1 A sample of the “Jiroft” “geometric script” according to Desset (2014), corresponding to his Plate 1, page 96. Source: Wikipedia. https://commons.wikimedia.org/wiki/File:Jiroft_culture_ inscriptions.jpg. Author: Uuyyyy. License: CC BY-SA 3.0

182

8 Confusions and Misrepresentations

(2005) for more general issues concerning authenticity with artifacts in the “Jiroft corpus”.1 Putting those issues aside, if one looks at an example like that in Fig. 8.1 can one tell, without knowing anything about the meanings of the symbols, that it is writing? The Jiroft materials were discussed on the now defunct Ancient Near East (ANE) discussion forum (formerly hosted by Yahoo!). In a January 2012 post, one scholar of writing systems claimed that the text “looks like writing”, and goes on to note that “four small samples of this script are now known.” Of course, pace Desset’s claims about a decipherment, four small samples are not much to go on and it is unlikely that one can come up with a convincing decipherment on the basis of such samples. Sample size aside though, it would not even make any sense to try to decipher a symbol system as writing if one had reason to suspect the system was some sort of non-linguistic symbol system. If one tried, for example, to decipher the kudurru symbols summarized in Sect. 3.6.3, as writing, one would be bound to fail. The question is whether something “looking like” writing should suffice to make one want to try to decipher it. But what does it mean to say something “looks like” writing anyway? What particular features should a symbol system have to “look like” writing? In order to elucidate these questions, soon after the discussion on the ANE forum appeared, I conducted a small informal survey using Survey Monkey.2 In the survey, participants were asked to rate, on a scale of 1 to 4—very unimportant, to very important—the following factors in making something “look like” a writing system: 1. How important is the linear arrangement of symbols in assessing whether a symbol system is a writing system? 2. If the symbols are pictographic, how important is it that they be abstract rather than realistic? 3. How important is the length of the “texts” to a judgment of whether a symbol system is writing? 4. In order to judge something to “look like” writing, how important is it that symbols repeat? 5. How important is the number of distinct texts to a judgment that a symbol system is writing? Respondents were also asked to give a bit of information about themselves, and in particular what writing systems they were familiar with, and which ones they considered themselves an expert on. Thirty eight people responded. Based on their self-reported levels of expertise, these were classified into three bins as follows:

1 Muscarella

(2008) notes that “[t]he Internet mentions two inscriptions discovered at Konar Sandal in 2005, but no contexts are mentioned. In the same source, [Iranian archaeologist Yousef] Madjidzadeh is quoted as stating that they should be labeled Proto-Iranian, not Proto-Elamite”, tying in with the issue of nationalism raised in the introduction to this chapter. 2 http://www.surveymonkey.com.

8.2 What Writing Looks Like Table 8.1 Results of the informal survey: modes, number of respondents (out of 38) choosing the mode, and means for each of the survey questions

183 Repetition rate Linearity Number of distinct texts Text length Abstractness

Very important Very important Somewhat important Somewhat important Somewhat unimportant

18 16 18 16 17

3.29 3.24 3.00 2.92 2.21

• Group 0: largely unfamiliar with writing systems (5 respondents) • Group 1: familiar with and/or expert on several writing systems (22 respondents) • Group 2: familiar with and/or expert on several writing systems from a variety of writing system types (11 respondents) The results for the main questions were for the most part unsurprising insofar as they accorded with my own judgments as to which features are most likely to be important. The modes, number of respondents (out of 38) choosing the mode, and the means for each of these questions were as in Table 8.1. Thus it was considered to be very important that symbols be arranged largely linearly, and that symbols should repeat in texts. The length and number of texts were considered to be a somewhat important factor. And abstractness of symbols in pictographic systems was considered to be somewhat unimportant. On the latter, presumably enough people are familiar with Egyptian and other highly pictographic scripts to know that a system can look like pretty pictures, yet still be writing. There was no significant interaction between expertise level and the five main questions in the survey. The largest difference between group 2 (expert) and groups 1 and 0 (minimal expertise), was on question 3, where members of group 2, perhaps counterintuitively, considered length of texts a somewhat less important factor than members of the other groups. In addition, I asked respondents for other suggestions on other features that might be relevant to deciding if a symbol system looks like a writing system. I quote directly some of the suggestions below. First, there were a few suggestions that relate roughly to the statistical distribution of symbols: • How important is it that symbols are made on an object such that they could not have been made randomly (i.e., as an expression of a thought rather than meaningless scribble)? • Statistical distribution of signs should accord with statistical distribution of graphemes, syllables, or words in attested languages. • The existence of random combinations of the symbols. If we are lucky to have long enough texts, or many texts, the existence of random combinations that repeat themselves. • The number of variations of symbols. How many are there? Does it correspond at all to the number of sounds in a given language? Then there were some suggestions of the importance of provenance: • Provenance, resemblance to known systems. • Archaeological setting. No provenance will tend to make me highly suspicious.

184

8 Confusions and Misrepresentations

Finally, one suggestion had to do with the degree to which cursive styles have developed, suggesting a long tradition of use: • Degree of cursivity; a fully evolved system should show signs of use, and therefore influence of the writing instruments, medium, and human dexterity. The latter suggestion is of particular interest given that one of the arguments put forward by Farmer et al. (2004) for the non-linguistic status for inscriptions from the Indus Valley Civilization (Marshall, 1931; Mahadevan, 1977; Parpola, 1994; Possehl, 1996), was that over 700 years, there was no evidence of the development of a cursive style in that system. The presented survey and discussion is admittedly not very scientific, nor was it intended to be. In fact, no serious scholar believes that one can tell just by looking at a system whether it is writing or not and misjudgments can go either way: Ignace Gelb, the “father” of the study of writing systems famously misclassified Mayan writing as a “limited” preliterate system (Gelb, 1952). But it is nonetheless interesting to see what people believe are necessary characteristics of a symbol system in order for it to be considered writing. In any case, virtually all of the characteristics that were considered by survey respondents to be at least somewhat important for writing systems to exhibit, also show up in non-linguistic systems. Thus, as is clear from Chap. 3 and the accompanying section, non-linguistic systems can have a syntax, have fairly long “texts”, have many distinct texts, have abstract pictographs, and have repeating symbols. One even sees cursive versions of non-linguistic symbols that are in frequent use: consider the variation in handwritten digits, which traditionally made automatic recognition of written digits a challenge, spawning research datasets such as MNIST (LeCun et al., n.d.). After all, cursivity has nothing in principle to do with written language per se, but rather relates to the frequency with which the system is used in daily life. Less obvious from the earlier discussion, however, will have been the questions that relate to statistical properties of the symbols, such as whether there are “random” combinations and whether the distribution of symbols and their combinations matches what one would expect for natural language. This is not something that will be apparent from a cursory examination of a symbol system. On the other hand one might expect that computational studies of the distributions of symbols and their combinations and the statistical comparison with samples of known written language could be useful. Unfortunately, while there have been many attempts to show that statistical methods can be informative about what a symbol system’s function was, so far at least, nobody has found a method that is really decisive on this point. We turn to this issue in the next section.

8.3 Statistics

185

8.3 The Statistical Analysis of Symbol Distributions? 8.3.1 Statistical Analysis of the Indus Valley Inscriptions Some years ago I was involved in a controversy surrounding the thousands of very short inscriptions on seals, pottery and other surfaces, that have been excavated from various sites related to the Indus Valley Civilization (3rd Millennium BCE) in what is now Northwest India, and Pakistan. Ever since the first excavations at Harappa uncovered the first instance of a stamp seal with a short text (Cunningham, 1875), the assumption has been that these inscriptions, which seem to have been produced over a period of some 700 years, were the remnants of a literature in a mature writing system (Hunter, 1929; Mahadevan, 1977; Parpola, 1994; Possehl, 1996), much of which was assumed to have been written—in much longer texts than what we find in the extant corpus—on perishable material and thus lost. In Farmer et al. (2004) we challenged that assumption, pointing out it is unparalleled for a literate civilization to have long texts on perishable material, without also having longer inscriptions on non-perishable material such as clay or stone; that the system seems to have undergone very little change over the course of 700 years of use, and shows none of the moves towards more cursive writing styles that are characteristic of a system that is in daily use as a writing system (Kelly et al., 2021); that the texts have unusual repetition patterns, with the longest inscription on a single surface consisting of a “text” of 17 distinct mostly high-frequency glyphs. See Fig. 8.2 for some examples of texts that do have repetitions, which as we pointed out often seem to have odd symmetric patterns. Add to this the lack of evidence for other common markers of literate civilizations: ink pots, styluses and other writing implements, even artistic depictions of scribes at work as found in such diverse civilizations as Ancient Egypt or the Mayans, much less archaeological evidence for anything akin to the Mesopotamian eduba (Sect. 6.3). One or two oddities of this kind could be accepted, but all of them occuring together would be highly unusual—nay unparalleled—if the Indus Valley was a civilization that possessed a fully developed writing system. The idea that instead the symbols might have been some form of proto-writing becomes suspect when one considers that if they were proto-writing, they remained stagnated in that state for 700 years—again, an unparalleled situation, as far as we know. For these, and other reasons, we proposed that this was not a mature writing system, and was likely some sort of non-linguistic system. That, in any event, was the argument. Needless to say this conclusion did not sit well with researchers who had spent much of their career trying to decipher the inscriptions; see Vidale (2007) and Parpola (2008) for some responses. But the most interesting response came in the form of series of arguments to the effect that various statistical measures of the distribution of symbols were more consistent with the Indus texts being written language than a non-linguistic system. The arguments, presented in Rao et al. (2009a), Rao et al. (2009b) and Rao (2010), derived ultimately from Shannon (1948)’s notion of information entropy. This notion, mentioned already in Sect. 2.2,

186

8 Confusions and Misrepresentations

Fig. 8.2 Some examples of repetitions in the Indus Valley symbol system, after Farmer et al. (2004), Figure 6. Used with permission. Source: Farmer et al. (2004), Figure 6. Electronic Journal of Vedic Studies (https://hasp.ub.uni-heidelberg.de/journals/ejvs/about) is open access, and permission to reproduce is granted by the Editor-in-Chief

relates to the predictability in a system of a symbol, or set of symbols, given the context of the preceding symbols. Lower entropy means that the system is very predictable, higher entropy that it is very unpredictable. In natural language, given sufficient prior context, the entropy tends to be low, since some words (morphemes, sounds, characters…) are much more likely in that context than others. On the other hand, in a system where any given symbol is introduced with equal probability and completely independent of the prior context—a good example would be sequential tosses of an unweighted die, then the system has the highest entropy. Rao and colleagues’ arguments purported to show that if one compares the Indus inscriptions with a set of cases known to be written language, and cases that are known to be non-linguistic, the Indus inscriptions look more like language than not. This work inspired other work, in particular that of Lee et al. (2010a), who attempted to use entropic measures to demonstrate that Pictish Symbols (see Sect. 3.6.5) were also “revealed” to be “written language”—a surprising conclusion, since nobody had heretofore assumed that the enigmatic Pictish symbols were anything other than some sort of non-linguistic system.

8.3 Statistics

187

In a series of papers (Sproat, 2009, 2010a, 2014), I argued that by using measures such as Rao and colleagues and Lee and colleagues used, one could demonstrate that all sorts of systems, including artificial systems with symbols generated from a nonuniform distribution look like language. And in Sproat (2014) I used a set of corpora of known non-linguistic systems developed by Wu et al. (2012), to show that under such measures, many of these systems also looked like language. This resulted in a set of responses from Rao and Lee and colleagues (Lee et al., 2010b; Rao et al., 2010, 2015) wherein, among other things, the authors claimed I had misrepresented their work, which latter point in particular I refuted in my replies to their responses (Sproat, 2010c, 2015). I do not propose an extended in-depth discussion of this issue here. A lot of digital ink has been spilled already, both in the academic literature as well as the popular science press. But given the topic of this book, it is necessary to tie that discussion in with where it overlaps with the major themes I have presented. Also, since Rao and colleagues’ work repeatedly gets mentioned as a supposedly definitive demonstration by writers unfamiliar with the statistical methodology they used, it is important to try to set the record straight. First, one of the reasons why Rao and colleague’s original paper (Rao et al., 2009a) in Science may have seemed plausible to many readers is a generally limited understanding of the range of non-linguistic systems that exist and have existed in the world, and the complexity of some of them. Such ignorance would make it easy to accept the claim of these authors in their Science paper that two good models of non-linguistic symbol systems were, firstly, one where the ordering of symbols was absolutely rigid where after a given symbol x must come symbol y; and secondly one where the ordering was completely random and equiprobable. See Fig. 8.3.3 For Rao and colleagues, these set the limits of what was possible, so that one might expect to find non-linguistic systems that range within these two extremes. On the other hand, linguistic systems are firmly in the center of the range. So if one finds an unknown system that seems to fall in the middle, that at least is consistent with that system being linguistic; and if there are other reasons to believe it may have been true writing, the entropic measure would seem to lend further credence to that conclusion. That, at least, seems to have been the argument. The problem is that Rao and colleagues’ artificial examples are essentially never found in real non-linguistic graphical symbol systems, so that the actual range of what is possible is significantly narrower than what they claim. As I argued in Sproat (2014), one never finds completely rigidly ordered systems; nor does one find systems where symbols are completely randomly and equiprobably ordered in texts. Indeed it is hard to see what purpose such symbol systems would serve: what would be the point of having symbols that could only occur in a given order and where once one writes an x, one must write a y? The system would obviously convey no

3 Of

course another reason that Rao and colleagues’ work may have seemed convincing is that, to someone not familiar with standard techniques in computational linguistics, the methodology and the resulting plots must have looked impressive.

188

8 Confusions and Misrepresentations

Legend “non-ling” Type 1

Conditional Entropy

“non-ling” Type 2 Indus Range of linguistic systems

k most frequent tokens (e.g. k=20, k=30, …)

Fig. 8.3 Entropic measures of various linguistic and two artificial non-linguistic systems, based on Rao et al. (2009a), Figure 1A, page 1165. Shown here are Rao et al.’s “Type 1” and “Type 2” artificial non-linguistic systems, their Indus data, and the range of values from their original plot for their linguistic examples, which included English, Sumerian and Old Tamil. Their original (low-resolution) figure can be found many places on the Web, e.g. http://languagelog.ldc.upenn. edu/myl/RaoFig1.png. Artificial non-linguistic systems of Type 1 are random and equiprobable, whereas those of Type 2 are completely rigid where after symbol a given x must come symbol y. The curves represent the bigram conditional entropy—i.e. the Shannon entropy computed for symbols yi , given a previous symbol x, computed over increasing samples of the corpus, starting with the most frequent k symbols, the most frequent 2k symbols, and so forth. In all systems, as the number of symbols sampled increases, so does conditional entropy. This is expected since as the number of possible following symbols increases with increasing sample size, the predictability of the following symbol decreases—and thus entropy increases—since there are more choices. (The only system for which this should not be true is their ‘Type 2’ completely rigid system: the fact that it nonetheless does increase suggests a sampling problem in their method)

information whatever. Perhaps the closest to such a rigid system among the set we have considered in this book is the Tupicochan staff code (Sect. 3.6.20), but even this is not completely rigid, as we saw: different orderings of some of the symbols are possible, and it is not the case that a given symbol must be followed by one and only one other symbol. Similarly there would be little point in a system where every symbol could occur anywhere, and where all symbols are equiprobable. Imagine for a moment that traffic signs (Sect. 3.6.24) constituted such a system. That would mean that the symbols could occur in any order and would all be equally likely to occur in a “text”. But since these symbols denote things in the real world, what that would have to mean was that the things denoted were equally likely to occur, so that one was just as likely to find a restaurant, a gas station, a hotel, a hospital or, say, a golf center at any freeway offramp that offered services. Obviously this is not the case, and some of the things denoted are much more common in the real

8.3 Statistics

189

world than others. Furthermore, while these signs could theoretically convey the same meaning no matter how they were ordered, we do in fact find conventions, as noted in Sect. 3.6.24, so that some signs tend to be listed earlier in the “message”. When one considers symbol systems not merely as formal objects, but as actually denoting something, Rao and colleagues’ ‘limit’ cases, as plausible examples of symbol systems, become absurd. In any event, non-linguistic symbol systems that I discussed in Sproat (2014) mostly fell in the middle of the range. But if a reasonable sampling of non-linguistic systems yields entropic values that are close to those that are found with written language, then that suggests that in fact these sorts of entropic measures are not likely to be terribly informative. Rao and colleagues were not wrong in asserting that an entropic measure in this middle range is consistent with the system being linguistic. The problem is that it would be equally consistent with the system being non-linguistic. A second point relates to structure. On the basis of a statistical language model constructed from the Indus texts, Rao et al. (2009b) argue that the Indus inscriptions display “rich syntactic structure”, something that they correctly point out would be consistent with the symbols encoding a natural language, or languages. Now the claim that the structure is “rich” is clearly hyperbole: The longest Indus inscription on a single surface is 17 symbols long, and the mean length of a text in the entire known corpus is somewhat less than 4.5 symbols. Just how “rich” can the syntax have been? But granting that at least there is structure displayed in the inscriptions, this brings up two questions: How novel is this observation? And what does it tell us about what the Indus symbols represented?

8.3.2 More on Structure in the Indus Inscriptions Prior Work Almost since the first inscribed objects were excavated at Harappa and Mohenjo-Daro, it has been known that there were some structural regularities in what came to be known as the “Harappan Script”. Thus Hunter (1929) argued for positional regularities of certain symbols. Starting in the 1960s the research project headed by Asko Parpola at Helsinki University established using computational models that there was some form of syntax to the inscriptions (Parpola et al., 1969; Koskenniemi et al., 1970; Koskenniemi, 1981; Parpola, 1994). Rao et al. (2009b)’s work thus merely provides further evidence using more modern techniques. But how does one determine structure anyway? For some of the syntactically rich systems we discussed in this book we could be reasonably sure of what the rules were that determined whether a message in the symbol system was syntactically well-formed or not. Thus in heraldry we know the rules of combination, and in mathematics there are well-understood conventions determining how symbols may be combined. But with an unknown system such as the Indus Valley symbols, we are instead in a situation more akin to our hypothetical example in Fig. 1.3 in Chap. 1. Since all we have is a surviving set of texts, our only approach to determining

190

8 Confusions and Misrepresentations

structure is to look at the distribution of symbols and try to derive an underlying model that can explain that distribution. Some aspects of the distribution can be relatively easily observed without computational methods. Thus for example the most famous generalization, namely that the so-called “jar” symbol invariably occurs at the ends of texts was deduced early on (Hunter, 1929). But more sophisticated analyses benefit greatly from the help of computational models. Thus Koskenniemi (1981) discusses methods developed by the Helsinki team. One, which he attributes to Koskenniemi et al. (1970), considers the counts of pairs of symbols, with their expected counts under the assumption of independence. If the probability of observing symbol x is Px and the probability of observing symbol y is Py , then one would expect them to occur together just by chance with probability Px Py . Given a corpus with N total symbols, a common estimate of the probability of the occurrence of a symbol x is simply cNx , where cx is the number of times the symbol occurs, i.e. its count. So what the method of Koskenniemi et al. (1970) and Koskenniemi (1981) does is see whether the actual count of xy is significantly greater than what one would expect by chance—in estimated probabilistic terms, c c c whether Nxy ≫ Nx 2y . This measure is actually very close to the notion of pointwise mutual information, introduced by Shannon (1948), which has also been used with moderate success in determining structural groupings of characters or words in text (Magerman & Marcus, 1990; Sproat & Shih, 1990). The problem with such measures though is that they really do not determine syntactic grouping so much as just association, so that terms that commonly cooccur in text for purely semantic reasons having nothing to do with structure will also tend to show high associativity under such measures. Koskenniemi (1981) also discusses another kind of measure, which ultimately goes back to Harris (1951), where one considers substrings of increasing length from the beginning (or end) of the text, and looks for places where, in texts that share that initial (or final) substring, there is suddenly a big rise in the number of possible symbols that could follow. To see how this works, consider that one looks at a large body of English texts and, starting from the beginning of the sentence (and ignoring capitalization for the purposes of this discussion), one observes sequences starting with t, th, the, , the . . . . The initial t probably does not give too many clues as to what will follow (though it is unlikely to be, say, k, or p), but the th strongly biases the guess for the next symbol to a vowel. The next sequence the also has a strong bias, but then once we add the space (here indicated with underlining as ), a large number of possibilities open up. Another way we could put it is to talk about the surprisal of the next symbol: one would be very surprised to see k following th in English, but not very surprised to see it following a space. Again, this ultimately relates back to Shannon (1948) and his notion of entropy. In Harris (1951)’s work the approach was used to find morpheme boundaries within morphologically complex words (e.g. dividing unhappy into un and happy), and similar measures were also used in early work by Olivier (1968) to find word boundaries in English text where space and capitalization had been removed. So Koskenniemi’s methods as applied to the Indus

8.3 Statistics

191

corpus had a long tradition of being applied to detecting structure in cases where we know there is structure. See Sproat and Farmer (2005) for further discussion of these points. What does Structure Tell Us? With respect to the second question of what the presence of structure tells us about what the Indus symbols represented, as we have already stressed throughout this book, evidence of structure does not mean evidence of linguistic structure. Thus, while evidence of structure is certainly a prerequisite if one wants to claim that a symbol system was writing, the presence of structure is equally consistent with the system being some sort of non-linguistic system. But maybe we need to be a bit more careful here. The observant reader will have noticed that a good many of the non-linguistic systems I discussed in detail in Sect. 3.6 were simple systems that did not seem to have much structure. These presumably would never be mistaken for linguistic systems. Non-linguistic systems with rich structure do exist, but they are not among the majority of the systems surveyed. So maybe after all Rao and colleagues had a point: if one finds structure in a system, that is consistent with 100% of writing systems but clearly a much smaller percentage of non-linguistic systems. But as we argued in the comparison between European heraldry and kamon in Sect. 3.5, structure is found in a system for a reason, namely that the underlying information being encoded itself has structure. The most that one really can say then is that by uncovering structure in a symbol system, one has determined that the system encoded some sort of information that itself had a structure. That does not unfortunately tell us much. The opposition of interest is not writing systems versus any non-linguistic system imagineable, since many of the latter would never be confused for the former; the issue rather is between writing versus non-linguistic systems that encode complex information.

8.3.3 Variations of Distributions of Symbols Finally, a third point from Rao and colleagues’ work relates to their claims that the Indus texts show regional variations consistent with the symbols being used to encode natural language. Consider two languages that use the Latin alphabet in their writing system, say English and Dutch. If one examines the distribution of the individual letters, one will find that English and Dutch differ, because the underlying languages, and the way the script is used to encode linguistic information in the two writing systems differ. In a similar vein, if the Indus symbols were writing, and if they were used to represent more than one language, one would expect to find variation in the statistical properties of the text. Statistical language models can be used to score texts as to how much they look like what one would expect for the language on which the model was trained. Suppose I train a language model on English letter sequences, and apply it to two samples of text that the model has never seen before: (a) more English text, and (b)

192

8 Confusions and Misrepresentations

Dutch text. The latter will be ranked as significantly less likely than the former by the model since it will involve sequences of letters that are unusual from the point of view of the text on which it was trained. For example the sequence ij is unusual in English, but common in Dutch. Indeed, this is often how automated language identification systems work (Jauhiainen et al., 2019). Rao et al. (2009b) discuss a set of 8 apparent Indus inscriptions excavated from ‘West Asian’ sites (page 13689) and observe that the likelihood assigned by the model to these texts is significantly less than the median likelihood assigned to a held out set of 100 texts excavated from the Indus Valley.4 According to Rao and colleagues, [t]hese findings suggest the intriguing possibility that the Indus script may have been used to represent a different language or subject matter by Indus traders conducting business in West Asia or West Asian traders sending goods back to the Indus valley.

In fact, the suggestion that the West Asian inscriptions might represent a different language from whatever language (or languages) was spoken in the Indus Valley, predates Rao and colleagues’ work. But their results might be seen as offering statistical support of that earlier suggestion, which in turn would bolster the main contention, namely that the Indus inscriptions actually encode language to begin with. The problem is that there are several possible explanations for the observation that do not depend on any assumptions about what the signs encoded. For one thing, the set of West Asian texts is small, so it is hard to know if the results are statistically significant. It is also unclear how the difference between the West Asian and Indus Valley texts compared to the range of variation found within the Indus Valley region, itself a very large area. There is also a time dimension: the system was in use for about 700 years, over which period there was certainly variation in the uses of signs. Simply computing the median likelihood over a set of Indus Valley texts and pretending that these represent a single distribution is quite misleading. In any case variation in the use of a symbol system is expected not only for linguistic scripts, but in general for any meaningful symbol system. As noted in Sect. 3.6.13, Totem Poles provide a good example of a non-linguistic system that was used in a wide variety of cultures for different purposes and concomitantly with different distributions of symbols. Similarly, one finds variation in informational road signs. Signs for golf courses are more frequent around Palm Springs than in even nearby areas.5 Or to take a more extreme example, as noted in Sect. 3.6.24, a common warning sign in the United States depicts a silhouette of a jumping deer on a yellow background; in Australia the deer is replaced with a kangaroo. Supposing that the Indus signs did not represent linguistic information, one would still not be surprised to find variation in the use of the signs over space and time. If we do not 4 See Parpola (1994), pages 9–12, for discussion of the West Asian finds, which are from sites mostly in modern day Iran, many of which are associated with the Elamite civilization, though there have also been finds to the west in Sumer, as well as some from islands in the Arabian sea. 5 Thanks to a reviewer for this suggestion.

8.4 Summary

193

know what the symbols denoted, there is not a lot that statistical tests of these sorts can do to enlighten us on that front. The most we can say is that the Indus ‘texts’ apparently had some sort of structure, but as we have pointed out again and again, that fact alone is not very informative.

8.4 Summary In this chapter we have examined some recurring confusions about symbol systems more generally and on the relation between non-linguistic systems and writing proper, in particular. These confusions boil down basically to two: – Inconsistent uses of the term writing. In many cases authors actually intend the term “writing” in its most inclusivist sense of denoting any conventional information-bearing graphical symbol system, but simultaneously play upon the stricter exclusivist notion favored by most grammatologists and, incidentally, the one that most closely resembles the common-language use of the term. By playing both of these notions, one can appear to be making a bold and falsifiable claim (“ancient symbol system X was written language”), but can retreat in safety to a broader notion when challenged. As we have noted, if one had started with the inclusivist sense to begin with, the claim that a given, apparently conventional, symbol system was writing would reduce to a tautology. To make a bold claim, then, one needs to start with the narrower definition. – The misconception that structure implies linguistic structure. This is by far the most prevalent of the confusions and is exemplified by Vidale’s 2007 critique of Farmer et al. (2009), which we discussed in Sect. 3.6.15, wherein he considers as typical of non-linguistic systems repetitive and evidently structureless sequences of icons on Shahr-i Sokhtai pottery or “‘endless’ repetition of icons such as scorpions, men-scorpions, temple facades, water-like patterns and interwoven snakes at Jiroft” (Vidale, 2007, page 344). It is also exemplified by Rao et al. (2009a)’s synthetic “rigid” and “random” “non-linguistic systems”, which they took to be fair representative baselines. These two points of confusion, taken together, feed into a narrative of which there have been a number of instances in the popular science press in recent years, particularly surrounding the Indus incriptions. The narrative usually starts with a bit of historical background on the Indus civilization, the discovery of the first seals at Mohenjo Daro and Harappa, and the various attempts at decipherment. The narrative may mention in passing our work suggesting that the Indus inscriptions were not a writing system in the narrow sense (Farmer et al., 2004), but then move on to Rao et al. (2009a) which is taken to be definitive proof that our iconoclasm was misplaced. The piece will then end with the exciting possibility that “AI” may finally help us solve this mystery by providing the key to an actual decipherment. Needless to say, an argument to the effect that methods such as Rao et al. (2009a)’s are actually quite uninformative, does not fit that narrative. In the (in my view unlikely) event that

194

8 Confusions and Misrepresentations

someone eventually does provide a linguistic decipherment of the Indus inscriptions that is accepted by a wide community of scholars, Rao et al. (2009a)’s paper will be seen as an important milestone on that path, insofar as it will have been seen as providing clear evidence that the Indus inscriptions were writing. The demonstration that by these and other similar techniques one could provide evidence that all sorts of ancient symbol systems were writing, is taken as being of no interest: It does not fit the narrative, and in any case most readers will not know enough about symbol systems in general for them to question the assumptions underlying that narrative. As we noted in the introduction to Chap. 3, systematic surveys of non-linguistic symbol systems are practically non-existent, and the most one typically finds is a smattering of examples. This means that the typical reader of articles such as those of Rao et al. (2009a) or Lee et al. (2010a), much less the popular science translations of such works, cannot be expected to have the background knowledge needed to evaluate the claims being made. Thus they may readily accept the notion that a nonlinguistic symbol system must ipso facto be structureless, and they may not be overly concerned if the term writing is loosely applied. In this book I have tried to give the reader a better sense of what non-linguistic systems can look like. I have also argued that, no matter what one thinks the term writing should denote, one does need to make a clear distinction between non-linguistic symbol systems and linguistic systems—i.e. true writing. With this background, plus the field guide to common confusions discussed in this chapter, the reader should be better equipped to evaluate future claims that may appear from time to time in the science press.

Chapter 9

The Future of Graphical Symbols

9.1 The Dream of a Universal Written Language Whether one believes that human language evolved originally as a means of communication, or alternatively that it primarily evolved as a mechanism for organizing thought, and only secondarily served as a means of communication, one thing is certain: language encodes information. Furthermore, as a means of communicating information between people, language is quite efficient. There is just one problem: language is not universal; one has to learn a particular language in order to communicate with people who use that language, and since there are thousands of languages in the world (and hundreds of languages each of which is used by millions of people), even the most talented polyglot could not hope to learn all of them. Language is both a connector and a barrier. The familiar myth of the Tower of Babel presents this diversity as a punishment from God for humans’ arrogance at trying to build a tower to heaven, and in general the plethora of languages in the world has more often been seen as a curse than a blessing. Not surprisingly, there have throughout history been proposals of different kinds to overcome this barrier. One approach has been to solve the problem of a multitude of languages by inventing another language, under the hopes that this new language, being artificial, would be readily accepted as a universal means of communication. In Larry Niven’s Ringworld series, humans of the twentyninth Century mostly speak Interworld, having largely abandoned their ancestral languages. But of course one does not have to go to science fiction to find instances of this idea. The most famous artificial language, Esperanto, has about a million speakers and, as an attestation to the zeal with which Esperantists have taken up their cause, even boasts a few hundred native speakers, the product of Esperantospeaking parents who raised children with Esperanto as their first language. As a practical matter artificial “natural” languages like Esperanto have had only a minor

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Sproat, Symbols, https://doi.org/10.1007/978-3-031-26809-0_9

195

196

9 The Future of Graphical Symbols

impact:1 for better or for worse, naturally evolved major languages, most notably English, are increasingly serving as the worldwide lingua franca. Such language dominance of course comes at a great cultural cost, the most obvious of these being the loss of “small” languages that serve communities of only a few hundred or thousand speakers, whose speakers feel that their languages have no place in the modern world. What about another approach? Rather than invent a new language which, after all, is yet another human (though artificial) language, or rely on world domination of a natural language that is associated with a particular culture, could one invent a system of communication that transcended language? Crucially, such a system would have to have at least the flexibility and breadth of scope of language. In this book we have already seen dozens of non-linguistic symbol systems that communicate information, but none of these fill the bill, because none of them allows one to communicate the wide range of information that language allows. But the idea of a non-linguistic communication system that gets directly at thought is an appealing one. We saw an instance of this with Blissymbolics in Chap. 4, but there have been many others. When Europeans first learned about it, many believed that Chinese writing was an instance of such a system that could directly encode thought. For a while, Leibniz held this view. So, much later, did Bliss. But there were also attempts to create such systems well before Bliss. Eco (1995) presents a history of many such attempts to find the “perfect language”. For example John Wilkins, in his An Essay Towards a Real Character, and a Philosophical Language (1668), developed a system based on what he believed to be essential semantic primitives out of which one could construct a symbol for any concept and ultimately for any complex sequence of concepts. An example of his translation of the Lord’s Prayer into his system is shown in Fig. 9.1, and his explanation for his symbol for father is shown in Fig. 9.2. Wilkins’ system, as many other such early attempts, was based on a rather exhaustive thesaurus-like classification of knowledge. (In this regard Bliss was notably less systematic.) For example, one of the basic categories was Element (page 57ff), the “hottest and lightest” of which was Fire. Under Fire he included in the first subdivision Flame and Spark. These are further subdivided into, in the first case, fiery celestial bodies Comet and Falling Star; and in the second the weather-related fiery phenomena Lightning and Thunder. As a supplement to his graphical system, Wilkins also proposed a method for encoding his semantics phonetically. He laid out the principles for this translation, including the following (page 414): 1. The words of it should be brief, not exceeding two or three Syllables … 2. They should be plain and facil to be taught and learnt; 1A

reviewer points out a possible confusion that might arise from this statement regarding signed languages for the deaf, such as American Sign Language. Clearly these have not had “only a minor impact”. But, unlike Esperanto and its ilk, signed languages are not artificial constructs, but rather evolved naturally in deaf communities.

9.1 A Dream

197

Fig. 9.1 John Wilkins’ rendition of the Lord’s Prayer into his “philosophical language”. Source: Wilkins (1668), page 395. Work is in the public domain in its country of origin and other countries and areas where the copyright term is the author’s life plus 70 years or fewer (Wilkins, 1668, page 395)

3. They should be sufficiently distinguishable from one another, to prevent mistake and equivocalness; and withal significant and copious, answerable to the conceipts of our mind; 4. They should be Euphonical, of a pleasant and graceful sound; 5. They should be Methodical; those of an agreeable or opposite sense, having somewhat correspondent in the sounds of them. …

198

9 The Future of Graphical Symbols

Fig. 9.2 Wilkins’ explanation of his symbol for “father” in the Lord’s Prayer (Wilkins, 1668, page 396). Source: Wilkins (1668), page 396. Work is in the public domain in its country of origin and other countries and areas where the copyright term is the author’s life plus 70 years or fewer

Returning to the Element category, Wilkins arbitrarily chose for this category the prefix De. Add a b to yield Deb gets us to Fire, and adding a vowel that Wilkins notates with the Greek letter α, yields Debα Flame. This is, however, only one sound different from Deba, comet, and therein lies a weakness, identified by Eco (1995), pages 249–250: A characteristic language is thus not founded—as happens with natural languages—on the principle of double articulation [cf. Sect. 2.5], by virtue of which meaningless sounds, or phonemes, are combined to produce meaningful syntagms. This means that in a language of ‘real’ characters any alteration of a character (or of the corresponding sound) entails a change of sense.

Thus the fact that the basic elements of the system directly encode meaning, touted as a superior point of the system, turns out in fact to be a disadvantage. The replacement of a single element results in a totally different meaning. Again, α in Debα ‘flame’ denotes a type of Fire. If one replaces it with a we get Deba ‘comet’. This is not merely an issue for the phonetic rendition in Wilkins’ system: it is a problem too for the original graphical version of which we saw an example in Fig. 9.1. On the face of it this may not seem any different from a situation in natural language whereby substitution of one sound or letter can yield a different word: boat versus beat, for instance. But there is a difference: the latter cases are relatively

9.1 A Dream

199

sparse. Subsituting the t at the end of boat with a p yields a non-word boap, so that a reader, given sufficient context, will likely be able to reconstruct the intended word. The system thus has a lot of redundancy, and errors can often be corrected. In Wilkins’ system, there is essentially no redundancy: as Eco notes, any change of a single sound (or symbol) yields a different meaning. This means that while messages encoded in the system end up being very compact, the system is not robust to noise. This is a distinct disadvantage in a system that is intended for communication. Another problem with such systems of ‘real characters’ is that they overestimate the value of having a semantic-based decomposition of symbols. Let us suppose that one could come up with a perfect method for semantic decomposition. In theory this could make the system easy to learn (Bliss’s hope), but in practice the system is very likely to undergo semantic drift so that what was once perhaps semantically transparent would lose its transparency over time. How do we know this? Because this is exactly what happens in natural language. To see this, consider the English compounds coffee shop and coffee table. In the first case the composition is reasonably transparent in that the expression denotes a shop that sells coffee; thus the term coffee in particular contributes meaning in a more-or-less compositional way. In the second case the contribution of coffee is much less clear. Historically of course one can understand the expression as deriving from the fact that coffee tables are used typically in living rooms whereto people would retire after dinner to drink coffee. Thus the coffee table served as a surface onto which to put coffee cups and related paraphernalia. But in its current use to denote a particular kind of low table commonly found in living rooms, it has more or less lost its connection to coffee. Such examples are common in natural language so that things that may have once had some semantic compositionality have drifted and lost that compositionality over time; terms like coffee table have become to some extent arbitrary. The same would happen to a ‘real character’ system such as that of Wilkins, which would eventually drain some of the original motivation from the system. Besides Wilkins and many other such attempts documented by Eco (1995), communication via non-linguistic symbols has also been a popular theme in science fiction. In Neal Stephenson’s Diamond Age, the book’s heroine, Nell, learns to interact with a substitute for written language, mediaglyphics, which communicates information via iconic, and often animated, glyphs. (In the novel, Nell later goes on to become literate in real writing.) For example, the mediaglyphic for the territory where Nell lived, named “Enchantment”, was an animated glyph of “a princess sprinkling golden specks from a stick onto some gray houses”. But we do not need to go to science fiction to find present-day fascination with the idea that one can communicate complex thoughts using iconic non-linguistic symbols: one need look no further than one’s phone and the ever increasing set of emoji that it supports. Emoji do indeed allow one to communicate a lot of information, and they do this by means of thousands of icons many of which are iconic for the information they are intended to communicate; see, for example, Dürscheid and Meletis (2019) for a “grapholinguistic” analysis of the various functions of emoji in recent social media texts. According to emojipedia.org,

200

9 The Future of Graphical Symbols

Fig. 9.3 My experience at a pizza parlor, recounted in emoji

there are 3,521 emojis in the Unicode Standard as of September 2020.2 This is larger than the 2,136 set of j¯oy¯o kanji 常用漢字—i.e., those considered to be in regular use—in the Japanese writing system.3 And certainly one can construct fairly complex messages. For example consider Fig. 9.3, which recounts a (fictional) story about a visit to a pizza parlor. Perhaps look at it first to see if you can figure out the gist of the message, before consulting the footnote.4 This is a weak attempt that took all of a few minutes to compose. With practice one could become quite good at composition such messages. And indeed valiant attempts, such as Emoji Dick,5 show that one can indeed compose quite long messages—though how interpretable such messages are without knowing the intended message beforehand is another matter. But how flexible is the system really? In composing the message in Fig. 9.3, I had to take a few liberties, and engage in a bit of creativity. There is (currently) no emoji for pizzeria, so I had to create a compound out of pizza and house. The compound of a night-time cityscape emoji and an analog clock at 7 o’clock, conveys the notion of seven at night well enough, but it does not really give you last night. A person walking is a decent proxy for go, but of course could also be used for walk: the subway car following it suggests going by subway, but I suppose this could also be interpreted as walked to the subway. This is perhaps fine, but it does not convey exactly the same information as the English phrase went by subway does. Nowhere in my message is the concept of eating or drinking expressed explicitly—perhaps I could have used the “fork and knife” emoji to express that; rather it assumes common sense that one can interpret this as meaning that I ate pizza and drank beer. The

2 https://emojipedia.org/faq. 3 https://en.wikipedia.org/wiki/J%C5%8Dy%C5%8D_kanji. 4 Last night at 7 o’clock I went by subway to have pizza and beer. The pizza was good, but the beer

wasn’t. I won’t go to that pizza parlor again. 5 https://www.kickstarter.com/projects/fred/emoji-dick.

9.2 Semasiography

201

penultimate “repeat single track” emoji may not have been a particularly successful representation of the meaning again. Of course, with the practice of composing such messages among a large community of users of different linguistic backgrounds, one could imagine that the community would arrive at a convention for representing these and all sorts of other ideas. Such conventions would subsequently then have to be learned by anyone who would use the system and maximize their chance of being understood. But would the resulting system be as flexible as natural language, and its graphical encoding, writing? Surely not: after all, even a few thousand emoji will not give one the flexibility of the tens of thousands of words and their nuances that one finds in natural language. Also, the system suffers from the problem that the ordering of the ostensibly semasiographic elements (i.e. elements representing meaning) is still governed by an order that reflects the properties of a particular language, in this case English. Could one in principle design a purely semasiographic system that avoids these issues, and thus the problems of every purely semasiographic system ever suggested as a solution to the perceived problem of conventional language and writing systems? We explore this question a bit more in the next section.

9.2 A Fully Expressive Semasiographic System? One of the issues that makes the encoding of linguistic information without reference to phonology difficult is the fact that unlike phonology, semantics has no natural temporal order.6 That claim may seem a bit opaque, so a bit of explanation is needed. Phonology obviously has a temporal order: whether you accept the idea that speech is decomposable into segmental phonemes—e.g. three segments /k/, /æ/, /t/ for cat— or insist instead that the smallest sensible unit is something larger, like the syllable, there is some set of basic units in the phonology of any language, and while there is some temporal overlap in units that are nearby, the units are basically produced in a sequence that develops over time. Once one has a determined what the basic phonological units of one’s language are, one can represent them with a relatively small set of written forms, arranged in a more-or-less linear fashion. Phonology has “atoms”. The problem with semantics is that it is not obviously like that: it is far from clear what the “atoms” are, nor are they arranged over time. Consider two camelid animals, namely a dromedary and a llama. Representing the difference between the two with emoji might be straightforward enough since one would essentially be using pictures, but in a schematic Bliss-like system, one would have to find some clear feature that distinguishes the two. If one had a Bliss-like stick figure for a camelid, then a dromedary might be one that has a hump-like appendage on its back,

6 Some of the same points as are made in this section were made in a presentation by Morin (2021).

202

9 The Future of Graphical Symbols

so the distinctive feature might just be whether the figure has a hump or not. Such devices have certainly been used in pictograph-derived written symbols. In Chinese, for the character

鳥 niˇao ‘bird’, in the square that represents the head of the bird at the top, there is a horizontal stroke corresponding to the eye. With the crow, a black bird with a black eye, the eye is not so apparent, hence the character for ‘crow’ w¯u lacks this stroke:

烏. But these kinds of devices are one-offs, and will only get one so far. Furthermore, all of this presumes one has a feature that is easily depictable. For abstract concepts it becomes much harder to see how to depict them in a non-arbitrary way. Even if one could find the “atoms” of meaning of a word, there is no sense in which the components of that representation are ordered with respect to time. And in a vector-space word-embedding representation, the meaning of cat comes down to a set of real number values in a vector, where it is obviously silly to think of any of those values as temporally preceding any other: The vector-space representation is geometric with no temporal dimension. When one combines morphemes into morphologically complex words and sentences of course the temporal dimension comes into play, since the syntax of the language dictates that some morphemes must come after others. But again this is not because of the meaning, since different languages choose different orders to convey what is essentially the same meaning: for all intents and purposes English cats like fish, Japanese neko-wa sakana-ga suki desu (cat-topic fish-subject like be) or Welsh hoffith cathod bysgod (like-3sg-pres cats fish), mean the same thing, yet the ordering of the elements is different in each language. The problem for written representations is that writing necessarily imposes a reading direction when one arranges the symbols on a page. In fact, a problem for Bliss was that once he ordered his semasiographic representations of words into sentences, he necessarily bought into the ordering imposed by some languages, but not others, which decreased the sense in which the system could be said to be language independent. In order to have a truly language independent semasiographic system, if such were possible, one would have to somehow eliminate reading order. But let us put the ordering issue to one side for a moment, and consider just the issue of graphically representing word meanings. If one assumes a vector-space representation of meaning, then a word’s meaning is just some real valued vector of some dimension with related words being close, though distinct, in vector space and more distant words being further away. One could of course represent these vectors directly, and in order to deemphasize the linear dimension, represent the vectors as a ring, where the magnitude of the individual vector values is represented by lines of greater or lesser height emanating from the center. An example is given

9.3 The Social Status of Writing

203

Fig. 9.4 Some word ‘symbols’ based on British National Corpus embeddings (Fares et al., 2017)

bus

train

aeroplane

cat

dog

horse

in Fig. 9.4. One can certainly see some patterns in the forms. For example, both common domestic pets, dogs and cats, have a strong line pointing roughly southeast. Public transport such as buses and planes share prominent lines, one at twelve o’clock, and the other two at roughly eleven o’clock. One could even go beyond the word and represent phrases or even sentences with sentence embeddings (Conneau et al., 2017). If one did this the result would be very reminiscent of the Heptapod B system from the 2016 movie Arrival (Coon, 2020), which was specifically designed to represent meaning in a non-linear, non-temporal and decidedly semasiographic way. Such a system would, needless to say, leave much to be desired as a graphical system to be used by humans. The goal of a (universal) semasiographic language that is simultaneously as expressive as natural language and not tied to the particulars of any natural language, is as elusive as ever.

9.3 The Social Status of Writing For much of history, literacy was the province of a limited number of people. At first it was technicians—scribes—who as with many modern professions, had a technical skill that was learned so they could practice their trade and which few other people would acquire. Later on, as the utility of writing spread, so did literacy, but still in most cultures gaining literacy was not something that most people were able to do. Even with relatively easy-to-learn writing systems, such as those of Ancient Greece and Rome (Harris, 1991), writing was still something that required schooling, and this was not generally accessible to the majority of people. Not surprisingly then, and as we noted at the end of Chap. 4, in many cultures, writing was traditionally associated with prestige. Calligraphy as an art form is

204

9 The Future of Graphical Symbols

traditionally highly valued in East Asian cultures that adopted Chinese writing, and a similar situation obtained in cultures that use the Arabic script. There was no comparable notion of calligraphy for mundane non-linguistic symbols such as potters marks or numerical notation. Clearly writing had a prestige status that made it worth investing effort in making it beautiful, just as the artisan who made a high prestige non-linguistic symbol such as a coat of arms, or a mon would invest effort in making it beautiful. In a similar vein, the use of pseudoscripts and nonsense writing (Ben-Tor, 2009; Mayor et al., 2014; Houston, 2018) on works by artisans points to the added value that writing can imbue to objects.7 Presumably these works were intended by their creators to be taken for genuine objects incorporating writing, in turn reflecting the prestige that writing had in the culture. As Houston (2018) argues (page 43) “pseudo-writing arose when the script was high-prestige and production of writing and skilled responses to it were highly restricted.” “Prestige” would seem to have positive connotations, but there are of course potential downsides to a technology that is restricted to a few: It creates a divide between people who possess the skill or more or less directly benefit from it, from those who do not. Lévi-Strauss (1955) famously argued that writing supported a hierarchical society and that ultimately “the primary function of written communication is to facilitate slavery” (p. 299). Lévi-Strauss reaches this seemingly surprising conclusion by observing that writing developed in the context of the building of ever more complex and more hierarchical societies, and invariably related to an administrative structure in which the labor of some people was exploited for the benefit of others. This seems most apparent in ancient societies, such as early literate Mesopotamia, where writing was a skill learned by a prestigious class of technologists employed for the administration of the wealth of those in power. But Lévi-Strauss extends the hypothesis to cover the democratization of writing in recent times: universal literacy serves an equally nefarious purpose insofar as it means that now everybody in principle has access to the same information as anyone else, and it is no longer possible for an individual to claim ignorance of law as a defense since they can and should read the laws. The beneficial aspects of writing, whether they be the (truthful) recording of historical or scientific knowledge, or the creation of literature, are considered secondary to writing’s supposed main purpose. Lévi-Strauss arrived at his realization after an incident, described in a chapter entitled the Writing Lesson (la Leçon d’Écriture), wherein a chief of one clan of the Amazonian Nambikwara people, having observed Lévi-Strauss writing with a pencil on paper, starts imitating him by drawing wavy lines on paper. Somewhat later, he uses one of his ‘written’ compositions in a farcical performance where he proceeds to ‘read’ to his subordinates over a period of about two hours, from what purports to be a list of gifts to be distributed to members of the tribe. In Lévi7 Note though that Mayor et al. (2014) advance the thesis that at least some of the apparently nonsense inscriptions on Greek vases may have represented names and the like in various Caucasian languages. This thesis has, however, been critiqued by Houston (2018), who suggests it is much more likely that what was being represented was the Greeks’ own perception of the speech of “barbarians”, which to them consisted of incomprehensible sequences of sounds.

9.3 The Social Status of Writing

205

Strauss’s interpretation the chief, who of course does not understand how writing encodes language, nonetheless apparently understands that it does encode language, and furthermore that the literate person wields power over the illiterate. The chief has, per Lévi-Strauss, learned the basic function of writing: as a tool whose main purpose is to help people in power maintain that power. Lévi-Strauss’s position has been criticized, most notably by Derrida (1967)— and see also Johnson (1997)—who points out that Lévi-Strauss’s own description of the Nambikwara belies his conclusion: even without writing, Nambikwara society was hierarchical and power was often attained or maintained via violent methods. Lévi-Strauss’s Rousseauian ideals of noble savages fly in the face of reality. But taking Lévi-Strauss’s argument at face value—and note that Derrida himself did not dispute the connection between writing and power structure—one might understand the prestige of writing: writing implies wealth and power, and what is prestigious typically involves wealth and power. But is Lévi-Strauss’s focus on writing as a vector for enslavement really valid? In Sect. 3.6.12, in the context of a discussion of khipu, we asked how complex a civilization needed to be to require writing in the strict sense we use the term here. Or, inverting the point, what is the minimal amount of information that a symbol system must be able to encode in order for it to fulfill the needs of the civilization? It is clear in any case that what Lévi-Strauss really objects to is not writing per se, but civilization, since with civilization comes hierarchy and with hierarchy the potential for enslavement. Thus any symbol system that supports civilization could be thought of as supporting enslavement. It does not need to be writing: the preliterate accounting system of Mesopotamia, or the khipu accounting documents all served the same purpose; on the latter see Urton (2010), page 160, and on the former see again Sect. 3.2 where we suggested that the first codified symbol systems of the Neolithic were associated with the institutionalization of power structure. Recall again the Sumerian legend of Nisaba, the Goddess of Grain, being the inventor of writing (Sect. 6.1), suggesting that the origin of writing was to be found in the accounting system used to record amounts of grain and other agricultural commodities. Relevant to the present discussion, Mayshar et al. (2022) argue that the origin of state power structure is strongly tied to the development of grain agriculture. Grain is much more easily stored for long periods than other staple agricultural products, such as tubers or tree fruits. If something can be stored, it can also be hoarded, and hoarding of resources leads in turn to the potential of using those resources as a way to manipulate power. By comparing a large number of cultures across several millennia, Mayshar and colleagues show that the development of societies with complex power structures is in fact highly correlated with the presence of grain-based agriculture. In Mesopotamia, grain-based agriculture led to the development of a complex state structure, and it also led to the development of an accounting system to record the agricultural products that were the basis of that structure. Eventually that system led to writing, but before that happened, the symbols were already being used in the service of imposing power. To wield power, one does not need writing in the sense of the symbol system that Lévi-Strauss used to write his field notes. There are, in any case, even more familiar

206

9 The Future of Graphical Symbols

instances of non-linguistic symbols being used in a way that supports hierarchy. The medieval priest who waved a cross in front of his flock was implementing a form of domination and using a symbol, with its associated mystical associations, to enhance that. Symbols are merely accelerants that serve to enhance a hierarchical structure that is already there. The inventors of the first writing did not invent hierarchy or slavery: they merely provided a tool that, if used in what in today’s world might be considered unethical ways, aids those ends. Similarly, as Lévi-Strauss himself documents, and as Derrida reminds us, the Nambikwara did not have a peaceful egalitarian society prior to Lévi-Strauss’s arrival and so his introduction of ‘writing’ did not introduce dominance. Rather, had the chief’s quest been successful,8 at most it would have enhanced tendencies already there. The concept of a technology accelerating anti-democratic trends should be deeply familiar to anyone living in the 2020s. For we have seen exactly this phenomenon in the form of the Internet and social media platforms and their role in accelerating anti-democratic and neo-fascist causes. These involve, among other things, the spread of disinformation, the “normalization” of ignorance and lack of education, and the fostering of prejudices already present. The developers of the Internet and platforms such as Twitter or Facebook did not invent racism, xenophobia, ignorance of science or ignorance of basic common sense. Nor, indeed, were these technologies invented for the purpose of supporting those malevolent forces. But the technology certainly serves as a vector for speeding their spread. Indeed, given the speed with which disinformation can spread using modern communication technology, and the concomitant ease with which that can effect phase transitions in belief systems, the internet is far more dangerous than writing ever was. In any case, one can dispute Lévi-Strauss’s claim about the primary purpose of writing: writing evolved as a more efficient way to record information in graphical form. The fact that it may have almost immediately been used to enhance enslavement is no more indicative of its original intended purpose than the spread of QAnon conspiracy theories is indicative of the original intended purpose of the Internet. In the final analysis, Lévi-Strauss simply fell for the age-old fallacy that one can remove bad news by removing the messenger of it: writing does not enslave people, and were it possible to remove it and other aspects of civilization with it, we would not revert to some idyllic pre-literate state. At most the worse angels of our nature would infect society less rapidly. Any technology, even those that (unlike, say, guns) are not inherently dangerous, can bring dangers. But one should avoid facile beliefs about this or that technology being the reason for evils that, in the final analysis, derive not from technology, but human nature.

8 It

wasn’t: As Lévi-Strauss goes on to discuss, subsequent to the episode with the fake writing, most of chief’s group abandoned him as their leader. Lévi-Strauss supposes that this is because they surmised the chief’s goals of abusing the new ‘technology’, though as Johnson (1997) points out, Lévi-Strauss could hardly have known the reasons, which may have been many.

9.4 Final Thoughts

207

9.4 Final Thoughts The old adage that a picture may be worth a thousand words may have some truth to it, but it misses the crucial point that in those thousand words a lot of information can be conveyed that will not be clear from the picture. Edward Hopper’s evocative painting of a late-night diner, Nighthawks (1942) (Fig. 9.5) no doubt conjures up in the minds of many viewers whole stories about the four characters depicted in the scene. One may wonder, for example, whether the blue-suited man with his back to the viewer is coming home from a late stint at the office, is catching an early breakfast before heading to the station for a train, is a traveling businessman visiting town, or a dozen other possible stories. The painting tells us none of these things, but any of the possible scenarios could be described in language. Language is still our most versatile form of communication, and unless Elon Musk’s prediction that his Neuralink implants will make language and speech obsolete (Embury-Dennis, 2020) comes true, it seems destined to remain that way. So any graphical symbol system that would communicate information as richly as language does had better have the same versatility that language has. Thus far, the best such systems are those that are tied to language in a rather specific set of ways, but which always involve phonology. Graphical symbol systems that do not do this are never as versatile as true writing systems. This fact is not changed by “inclusivist” views of writing that would extend the use of the term writing to other, apparently non-linguistic, graphical symbol systems. One can argue that, say, khipu were the Incas’ “way of writing”, but unless one can demonstrate that the knots and cords

Fig. 9.5 Edward Hopper’s (1942) Nighthawks. Source: Wikipedia. https://en.wikipedia.org/wiki/ Nighthawks_(Hopper)#/media/File:Nighthawks_by_Edward_Hopper_1942.jpg. Image is in the public domain

208

9 The Future of Graphical Symbols

were used to encode the Quechua language, there is simply no way that the system had the versatility that true writing does. Graphical symbol systems that are not tied to language, yet have all the communication power that language has, have been a long-standing dream. Yet despite many attempts, from Wilkins to Bliss, not to mention the ruminations of science fiction writers, nobody has yet demonstrated that such a system is possible. Which brings us ultimately back to the point with which we started our discussion. Symbols represent information, and the kinds of symbol systems that we have been discussing in this book represent that information in a conventional and generally agreed upon way. What information they represent and how they represent it varies greatly, and depending on the answers to those questions, a given system can have fewer or more symbols, and exhibit more or less structure. A graphical symbol system need not be tied to language (i.e. be true writing) in order to exhibit structure: the domain of information that the symbol system represents may have sufficient complexity that a concomitant complexity of structure is required of the symbol system that would represent it. That is not the same thing as saying that this domain has the same richness as natural language, however. It just means that structure, ipso facto, is not indicative of something being language. Graphical symbol systems have proven themselves useful for at least 10,000 years. Many such systems have been developed, for a variety of purposes, and many will likely continue to be developed as new domains of knowledge in science and elsewhere require symbols to represent domain-relevant meanings. Even if Musk’s implant supersedes language, it seems doubtful we would be able to eliminate graphical symbols since in any case the symbols will long outlast any brain in which the device is implanted.

Figure Credits

Sources for figures are acknowledged in the figure captions. All figures not explicitly acknowledged are my own work. Links for licenses (CC BY-SA, and so forth) are listed below. Note that US road sign images (e.g. Figs. 1.3 and 1.4) are in the public domain, since these come from the “Manual on Uniform Traffic Control Devices”,1 which states that “[a]ny traffic control device design or application provision contained in this Manual shall be considered to be in the public domain.” License links are as follows: • • • •

CC BY-SA 3.0. https://creativecommons.org/licenses/by-sa/3.0/ CC BY-NC-ND 4.0. https://creativecommons.org/licenses/by-nc-nd/4.0/ CC BY-SA 4.0. https://creativecommons.org/licenses/by-sa/4.0/ Gnu Free Documentation License, Version 1.2. https://www.gnu.org/licenses/ old-licenses/fdl-1.2.en.html

1 https://en.wikipedia.org/wiki/Manual_on_Uniform_Traffic_Control_Devices.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Sproat, Symbols, https://doi.org/10.1007/978-3-031-26809-0

209

Bibliography

Acosta, José de. 1608. Historia natural y moral de las Indias: en que se tratan las cosas notables del cielo, y elementos, metales, plantas y animales dellas, y los ritos, y ceremonias, leyes, y govierno y guerras de los Indios. volume 6. Madrid: Alonso Martin. Amiet, Pierre. 1966. Il y’a 5000 ans les élamites inventaient l’écriture. Archaeologia, 12.16–23. Andrews, William. 1904. At the Sign of the Barber’s Pole: Studies in Hirsute History. Cottingham, Yorskhire: J.R. Tutin. Anzellotti, Stefano. 2017. Anterior temporal lobe and the representation of knowledge about people. Proceedings of the National Academy of Sciences, 114(16).4042–4044. Aronoff, Mark. 1985. Orthography and linguistic theory: The syntactic basis of Masoretic Hebrew punctuation. Language, 61(1).28–72. Artzi, Bat-Ami. 2010. The secret of the knot: Khipu no. 936 from the Maiman collection. Estudios Latinoamericanos, (30).187–214. Austin, John L. 1955. How to Do Things with Words. Oxford: Oxford University Press. Bacon, Bennett; Azadeh Khatiri; James Palmer; Tony Freeth; Paul Pettitt and Robert Kentridge. January 5 2023. An upper palaeolithic proto-writing system and phenological calendar. Cambridge Archaeological Journal, pages 1–19. Bahdanau, Dmitry; Kyunghyun Cho and Yoshua Bengio. 2015. Machine translation by jointly learning to align and translate. In Proc. of 3rd International Conference on Learning Representations (ICLR). Baines, John. 1985. Fecundity figures: Egyptian personification and the iconology of a genre. Warminster; Chicago: Aris & Phillips, Bolchazy-Carducci. Baines, John. 1989. Communication and display: the integration of early Egyptian art and writing. Antiquity, 63(240).471–482. Baines, John. 2017. Visual and Written Culture in Ancient Egypt. Oxford: Oxford University Press. Baines, John. April 8 2021. Ancient Egyptian writing, images, and practices in between. URL https://www.youtube.com/watch?v=_pTFrAg-tPM&ab_channel=ILARA. Presentation at ILARA (Institut des langues rares). Baker, Heather. 2014. House size and household structure: Quantitative data in the study of Babylonian urban living conditions. In Baker, Heather and Michael Jursa, editors, Documentary Sources in Ancient Near Eastern and Greco-Roman Economic History Methodology and Practice, chapter 2, pages 7–23. Oxford: Oxbow Books. Barach, Eliza; Laurie Beth Feldman and Heather Sheridan. 2021. Are emojis processed like words?: Eye movements reveal the time course of semantic processing for emojified text. Psychonomic Bulletin and Review, 24(8).

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Sproat, Symbols, https://doi.org/10.1007/978-3-031-26809-0

211

212

Bibliography

Baratov, S. R. 2019. Signs on ceramic items in ancient Khorezm. In Voyakin, D. A., editor, Tamgas of Pre-Islamic Central Asia/Тамги доисламской центральной Азии, pages 43–57. Samarkand: International Institute for Central Asian Studies. Barbeau, Marius. 1950. Totem Poles. volume 1–2 of Anthropology Series 30, National Museum of Canada Bulletin 119. Ottawa: National Museum of Canada. Barber, Liam; Renate Reniers and Rachel Upthegrove. November 2021. A review of functional and structural neuroimaging studies to investigate the inner speech model of auditory verbal hallucinations in schizophrenia. Translational Psychiatry, pages 1–12. Barbier, Charles. 1815. Essai sur divers procédés d’expéditive française. Paris: De l’Imprimerie de P. Gueffier. Basello, Gian Pietro. 2006. The tablet from Konar Sandal B (Jiroft) and its pertinence to Elamite studies. URL https://www.elamit.net/elam/jiroft.pdf. Basso, Keith and Ned Anderson. 8 June 1973. A Western Apache writing system: The symbols of Silas John. Science, 180(4090). Baxter, William. 1992. A Handbook of Old Chinese Phonology. Number 64 in Trends in Linguistics, Studies and Monographs, Berlin: Mouton de Gruyter. Baxter, William and Laurent Sagart. 2014. Old Chinese: A New Reconstruction. Oxford: Oxford University Press. URL https://en.wiktionary.org/wiki/Appendix:Baxter-Sagart_Old_ Chinese_reconstruction. Bedrick, Steven; Russell Beckley; Brian Roark and Richard Sproat. June 2012. Robust kaomoji detection in Twitter. In Workshop on Language and Social Media, Montreal, Canada. Bégouen, Henri. 1924. La magie aux temps préhistoriques. Mémoires de l’Academie des Sciences, II.417–32. Bel-Enguix, Gemma. 2019. The impact of social reputation in language evolution. In MassipBonet, Àngels; Gemma Bel-Enguix and Albert Bastardas-Boada, editors, Complexity Applications in Language and Communication Sciencesap, pages 107–116: Springer. Ben-Tor, Daphna. 2009. Pseudo hieroglyphs on middle bronze age Canaanite scarabs. In Andrássy, Petra; Julia Budka and Frank Kammerzell, editors, Non-Textual Marking Systems, Writing and Pseudo Script from Prehistory to Modern Times, number 8 in Lingua Aegyptia — Studia monographica, pages 83–100. Göttingen: Seminar für Ägyptologie und Koptologie. Benz, Marion. 2017. Making the invisible visible: steps towards a ritualized corporate identity. In Bredholt Christensen, Lisbeth and Jesper Tae Jensen, editors, Religion and Material Culture: Studying Religion and Religious Elements on the Basis of Objects, Architecture, and Space, pages 121–167. Turnhout: Brepols. Benz, Marion and Bauer Joachim. 2013. Symbols of power—symbols of crisis? A psycho-social approach to early Neolithic symbol systems. Neo-Lithics, 2/13.11–24. Berendsohn, Roy. November 2020. The hobo hieroglyphs: Their secret symbols, explained. Popular Mechanics. URL https://www.popularmechanics.com/technology/ a25174860/hobo-code/. Bliss, Charles. 1965. Semantography. Sydney: Semantography (Blissymbolics) Publications. Bloomfield, Leonard. 1933. Language. Chicago: University of Chicago. Bohn, Willard. 1993. The Aesthetics of Visual Poetry: 1914–1918. Chicago: University of Chicago Press. Boltz, William. 2000. Monosyllabicity and the origin of the Chinese script. Technical Report 143, Max-Planck-Institut für Wissenschaftsgeschichte, Berlin. Boone, Elizabeth and Walter Mignolo, editors. 1994. Writing without Words: Alternative Literacies in Mesoamerica and the Andes. Durham, NC: Duke University Press. Boone, Elizabeth and Gary Urton, editors. 2011. Their Way of Writing: Scripts, Signs and Pictographies in Pre-Columbian America. Cambridge, MA: Harvard University Press. Bouissac, Paul, editor. 1998. Encyclopedia of Semiotics. New York: Oxford University Press. British Columbia Provincial Museum, . 1931. British Columbia totem poles. Printed by Charles F. Banfield printer to the King’s most excellent majesty.

Bibliography

213

Brokaw, Galen. 2005. Toward deciphering the Khipu: Review of Signs of the Inka Khipu: Binary Coding in the Andean Knotted-String Records, by gary urton. Journal of Interdisciplinary History, XXXV(4).571–589. Brunila, Mikael and Jack LaViolette. 2022. What company do words keep? Revisiting the distributional semantics of J. R. Firth and Zellig Harris. URL https://arxiv.org/abs/2205.07750. Buckley, Eugene. 2008. Monosyllabicity and the origins of syllabaries. In Linguistic Society of America. URL http://www.ling.upenn.edu/~gene/papers/monosyllabicity.pdf. Burke, Bernard. 1884. The General Armory of England, Scotland, Ireland, and Wales; Comprising a Registry of Armorial Bearings from the Earliest to the Present Time. London: Harrison & Sons. Burke, James. 1978. Connections: Alternative History of Technology. New York: Macmillan. Calverley, Amice and Myrtle Broome. 1935. The Temple of King Sethos I at Abydos II. Chicago: Oriental Institute. Chamberlain, Basil Hall. October 1886. On the quasi-characters called “ya-jirushi”. Transactions of the Asiatic Society of Japan, 15(1).50–57. URL https://books.google.co.jp/books?id= uTw-ygEACAAJ. Chandler, Daniel. 2002. Semiotics: The Basics. London and New York: Routlege, 2nd edition. Changizi, Mark and Shinsuke Shimojo. 2005. Character complexity and redundancy in writing systems over human history. Proceedings of the Royal Society B, 272.267–275. Changizi, Mark; Qiong Zhang; Hao Ye and Shinsuke Shimojo. 2006. The structures of letters and symbols throughout human history are selected to match those found in objects in natural scenes. The American Naturalist, 167(5). E-article. Chikano, Shigeru (千鹿野 茂). 1993. 日本家紋総鑑 (Nihon Kamon S¯okan). Tokyo: Kadokawa Shoten. Childe, V. Gordon. 1936. Man Makes Himself. London: Watts & Company. Chrisomalis, Stephen. 2009. The origins and co-evolution of literacy and numeracy. In Olson, David and Nancy Torrance, editors, Cambridge Handbook of Literacy. Cambridge: Cambridge University Press. Cimarosti, Angelo. November 27 2020. Deciphering Linear Elamite, the world’s oldest phonetic writing system. URL https://www.archaeoreporter.com/en/2020/11/27/decipheringlinear-elamite-the-worlds-oldest-phonetic-writing-system-watch-a-sneak-preview-of- how-totranslate-these-inscriptions/. City of Duncan, . 1990. Duncan: City of Totems. Duncan, BC. Clottes, Jean and David Lewis-Williams. 1998. The Shamans of Prehistory: Trance and Magic in the Painted Caves. New York: Harry N. Abrams. Translated by Sophie Hawkes. Conneau, Alexis; Douwe Kiela; Holger Schwenk; Loïc Barrault and Antoine Bordes. September 2017. Supervised learning of universal sentence representations from natural language inference data. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 670–680, Copenhagen, Denmark. Association for Computational Linguistics. Coon, Jessica. June 17–19 2020. The linguistics of Arrival: What an alien writing system can teach us about human language. Presented at Grapholinguistics in the 21st Century. URL https://youtu.be/lC9X5mDZBjs. Coulmas, Florian. 1989. Writing Systems of the World. Oxford: Blackwell. Coulmas, Florian. 2003. Writing Systems: An Introduction to their Linguistic Analysis. Cambridge: Cambridge University Press. Cunningham, Alexander. 1875. Archaeological Survey of India, Report for the Year 1872–73. Calcutta: Archaeological Survey of India. Dalley, Stephanie. 2005. The Invention of Cuneiform: Writing in Sumer by Jean-Jacques Glassner; translated by Zainab Bahrani and Marc Van De Mieroop. Technology and Culture, 46(2).408–409. Damasio, Hanna; Thomas J. Grabowski; Daniel Tranel; Richard D. Hichwa and Antonio R. Damasio. 1996. A neural basis for lexical retrieval. Nature, 380.499–505.

214

Bibliography

Damasio, Hanna; Thomas J. Grabowski; Ralph Adolphs and Antonio R. Damasio. 2004. Neural systems behind word and concept retrieval. Cognition, 92(1–2).179–229. Damerow, Peter and Robert Englund. 1987. Die Zahlzeichensysteme der Archaischen Texte aus Uruk. In Green, M.W. and H.J. Nissen, editors, Zeichenliste der Archaischen Texte aus Uruk, volume 2 of Archaischen Texte aus Uruk. Berlin: Gebruder Mann Verlag. Damerow, Peter; Robert Englund and Hans Nissen. February 1988. Die Entstehung der Schrift. Spektrum der Wissenschaft, pages 74–85. Damerow, Peter; Robert Englund and Hans Nissen. March 1998. Die erste Zahldarstellungen und die Entwicklung des Zahlbegriffs. Spektrum der Wissenschaft, pages 46–55. Daniels, Peter. 1992. The syllabic origin of writing and the segmental origin of the alphabet. In Downing, Pamela; Susan Lima and Michael Noonan, editors, The Linguistics of Literacy, number 21 in Typological Studies in Language, pages 83–110. Amsterdan and Philadelphia: John Benjamins. Daniels, Peter. 2018. An Exploration of Writing. Sheffield: Equinox. Daniels, Peter and William Bright, editors. 1996. The World’s Writing Systems. New York: Oxford. DeFrancis, John. 1984. The Chinese Language: Fact and Fantasy. Honolulu, HI: University of Hawaii Press. DeFrancis, John. 1989. Visible Speech: The Diverse Oneness of Writing Systems. Honolulu, HI: University of Hawaii Press. Dehaene, Stanislas. 2009. Reading in the Brain: The Science and Evolution of a Human Invention. New York: Viking. Déjerine, Joseph-Jules. 1892. Contribution à l’étude anatomo-pathologique et clinique des différentes variétés de cécité verbale. Mémoires de la Société de Biologie, 4.61–90. Depauw, Mark. 2009. The semiotics of quarry marks applied to late period and Graeco-Roman Egypt. In Andrássy, Petra; Julia Budka and Frank Kammerzell, editors, Non-Textual Marking Systems, Writing and Pseudo Script from Prehistory to Modern Times, number 8 in Lingua Aegyptia — Studia monographica, pages 205–213. Göttingen: Seminar für Ägyptologie und Koptologie. Derrida, Jacques. 1967. De la Grammatologie. Paris: Editions de Minuit. Desset, François. 2014. A new writing system discovered in 3rd millennium BCE Iran: the Konar Sandal ‘geometric’ tablets. Iranica Antiqua, 69.83–109. Diringer, David. 1958. The Alphabet: A Key to the History of Mankind. New York: Hutchinson’s Scientific and Technical Publications. Drew, F.W.M. 1969. Totem Poles of Prince Rupert. Prince Rupert, BC: F.W.M. Drew. Driver, Geoffrey. 1948. Semitic Writing: From Pictograph to Alphabet. Oxford: Oxford University Press. Drucker, Johanna. 1995. Alphabetic Labyrinth: The Letters in History and Imagination. London: Thames & Hudson. Dürscheid, Christa and Dimitris Meletis. 2019. Emojis: a grapholinguistic approach. In Haralambous, Yannis, editor, Graphemics in the 21st Century., pages 167–183. Brest: Fluxus. Dzię giel-Fivet, Gabriela; Joanna Plewko; Marcin Szczerbin ´ ski; Artur Marchewka; Marcin Szwed and Katarzyna Jednoróg. 2021. Neural network for Braille reading and the speech-reading convergence in the blind: Similarities and differences to visual reading. NeuroImage, 231.117851. doi:https://doi.org/10.1016/j.neuroimage.2021.117851. URL https: //www.sciencedirect.com/science/article/pii/S1053811921001282. Eco, Umberto. 1976. A Theory of Semiotics. Cambridge, MA: Harvard University Press. Eco, Umberto. 1995. The Search for the Perfect Language. Oxford: Blackwell. Translated by James Fentress. Embury-Dennis, Tom. May 9 2020. Elon Musk predicts human language will be obsolete in as little as five years: ’We could still do it for sentimental reasons’. The Independent. URL https://www.independent.co.uk/tech/ elon-musk-joe-rogan-podcast-language-neuralink-grimes-baby-a9506451.html.

Bibliography

215

Englund, Robert. 1995. Late Uruk period cattle and dairy products: Evidence from protocuneiform sources. Bulletin of Sumerian Agriculture, 8(2).33–48. Englund, Robert. 1998. Texts from the Late Uruk Period. Mesopotamien: Späturuk-Zeit und Frühdynastische Zeit (Orbis Biblicus et Orientalis), Freiburg: Peeters Publishers. Josef Bauer, Robert Englund, and Manfred Krebernik, editors. Englund, Robert. 2005. The Invention of Cuneiform: Writing in Sumer by Jean-Jacques Glassner; translated by Zainab Bahrani and Marc Van De Mieroop. Journal of the American Oriental Society, 125(1).113–116. Englund, Robert. 2006. An examination of the ‘textual’ witnesses to Late Uruk world systems. In Gong, Yushu and Yiyi Chen, editors, A Collection of Papers on Ancient Civilizations of Western Asia, Asia Minor and North Africa, pages 1–38. Beijing: Oriental Studies, Special Issue. Englund, Robert. 2011. Accounting in Proto-Cuneiform. In Radner, Karen and Eleanor Robson, editors, Oxford Handbook of Cuneiform Culture, chapter 2, pages 32–50. Oxford: Oxford University Press. Faber, Alice. 1992. Phonemic segmentation as epiphenomenon: Evidence from the history of alphabetic writing. In Downing, Pamela; Susan Lima and Michael Noonan, editors, The Linguistics of Literacy, number 21 in Typological Studies in Language, pages 83–110. Amsterdam and Philadelphia: John Benjamins. Fares, Murhaf; Andrey Kutuzov; Stephan Oepen and Erik Velldal. 22–24 May 2017 2017. Word vectors, reuse, and replicability: Towards a community repository of large-text resources. In Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, pages 271–276, Gothenburg. Farmer, Steve; John Henderson and Michael Witzel. 2002. Neurobiology, layered texts, and correlative cosmologies:a cross-cultural framework for premodern history. Bulletin of the Museum of Far Eastern Antiquities, 72.48–90. Farmer, Steve; Richard Sproat and Michael Witzel. 2004. The collapse of the Indus-script thesis: The myth of a literate Harappan civilization. Electronic Journal of Vedic Studies, 11(2). Farmer, Steve; Richard Sproat and Michael Witzel. 2009. A refutation of the claimed refutation of the nonlinguistic nature of Indus symbols: Invented data sets in the statistical paper of Rao et al. (Science, 2009). URL http://www.safarmer.com/Refutation3.pdf. Farnell, Brenda. 1996. Movement notation systems. In Daniels, Peter and William Bright, editors, The World’s Writing Systems, pages 855–879. New York: Oxford. Feldman, Richard. 2003. Home before the Raven Caws: the Mystery of the Totem Pole. Cincinnati, OH: Emmis Books. Fenollosa, Ernest. 1920. The Chinese written character as a medium for poetry. In Pound, Ezra, editor, Instigations, pages 357–388. New York: Boni and Liveright. Fernández, Gimena. 2015. Una nueva relación entre escritura, historia y memoria en los Andes revelada por un cronista andino. Revista Andina, 53.113–136. Ferrara, Silvia. 2022. The Greatest Invention: A History of the World in Nine Mysterious Scripts. New York: Macmillan. Firth, John. 1957. Papers in Linguistics 1934–1951. Oxford: Oxford University Press. Fox-Davies, Arthur Charles. 1909. A Complete Guide to Heraldry. New York: Dodge Publishing. Friar, Stephen and John Ferguson. 1993. Basic Heraldry. London: Herbert Press. Friberg, Jöran. 1978–1979. The early roots of Babylonian mathematics. Technical report, University of Göteborg, Department of Mathematics, Göteborg. Fuchs, Friedrich. 2009. Über die Steinmetzzeichen – am Regensburger Dom und daruüber hinaus. In Andrássy, Petra; Julia Budka and Frank Kammerzell, editors, Non-Textual Marking Systems, Writing and Pseudo Script from Prehistory to Modern Times, number 8 in Lingua Aegyptia — Studia monographica, pages 233–254. Göttingen: Seminar für Ägyptologie und Koptologie. Gair, James and Bruce Caine. 1996. Dhivehi writing. In Daniels, Peter and William Bright, editors, The World’s Writing Systems, pages 564–568. Oxford: Oxford University Press.

216

Bibliography

Garcés, Fernando. 2017. Escrituras Andinas de Ayer y Hoy. Cochabamba, Bolivia: Instituto de Investigaciones Antropologícas y Museo Arqueológico de la Universidad Mayor de San Simón. Garfield, Viola. 1940. The Seattle Totem Pole. Seattle: University of Washington Press. Gelb, Ignace. 1952. A Study of Writing. Chicago: University of Chicago Press. Gelb, Ignace. 1963. A Study of Writing: University of Chicago Press, 2nd edition. Glassner, Jean-Jacques. 2000. Écrire à Sumer: L’invention du cunéiforme. Paris: Seuil. Glassner, Jean-Jacques. 2003. The Invention of Cuneiform: Writing in Sumer. Baltimore: Johns Hopkins University Press. Translated by Zainab Bahrani and Marc van de Mieroop. Gnanadesikan, Amalia. 2009. The Writing Revolution: Cuneiform to the Internet: Wiley. Gnanadesikan, Amalia and Richard Sproat. 2018. Writing systems. In Oxford Bibliographies: Oxford University Press. URL https://www.oxfordbibliographies.com/view/document/ obo-9780199772810/obo-9780199772810-0221.xml. Goldwasser, Orly. March-April 2010. How the alphabet was born from hieroglyphs. Biblical Archaeology Review, 36(02).40–53. Goody, Jack. 1977. The Domestication of the Savage Mind. Themes in the Social Sciences, Cambridge: Cambridge University Press. Goody, Jack and Ian Watt. 1968. The consequences of literacy. In Goody, J., editor, Literacy in Traditional Societies, pages 27–68. New York: Cambridge University Press. Grainger, Jonathan and Carol Whitney. 2004. Does the huamn mnid raed wrods as a wlohe? TRENDS in Cognitive Science, 8(2).58–59. Graves, Thomas. 1984. The Pennsylvania German Hex Sign: A Study in Folk Process. PhD thesis, University of Pennsylvania, Philadelphia, PA. Groß, Hans. 1906. Criminal Investigation, a Practical Handbook for Magistrates, Police Officers and Lawyers. Madras: A. Krishnamachari. Translated by John Adam and J. Collyer Adam. Gunn, Sisvan William. 1965. The Totem Poles in Stanley Park, Vancouver, B.C. Vancouver, BC: Macdonald, 2nd edition. Gunn, Sisvan William. 1966. Kwakiutl House and Totem Poles at Alert Bay. Vancouver, BC: Whiterocks Publications. Gunn, Sisvan William. 1967. Haida Totems in Wood and Argillite. Vancouver, BC: Whiterocks Publications. Haarmann, Harald. 2008. The Danube script and other ancient writing systems. Journal of Archaeomythology, 4(1).12–46. Haarmann, Harald and Joan Marler. 2008. An introduction to the study of the Danube script. Journal of Archaeomythology, 4(1).1–11. Handel, Zev. 2019. Sinography: The Borrowing and Adaptation of the Chinese Script. Number 1 in Language, Writing and Literary Culture in the Sinographic Cosmopolis, Leiden: Brill. Hannas, William. 1997. Asia’s Orthographic Dilemma. Honolulu: University of Hawaii Press. Hannas, William. 2003. The Writing on the Wall: How Asian Orthography Curbs Creativity. Philadelphia: University of Pennsylvania Press. Harris, Roy. 1995. Signs of Writing. London: Routledge. Harris, William. 1991. Ancient Literacy. Cambridge, MA: Harvard University Press. Harris, Zellig. 1951. Methods in Structural Linguistics. Chicago, IL: University of Chicago Press. Helmont, Franciscus Mercurius van. 1667. Alphabeti vere naturalis hebraici brevissima delineatio: Sulzbach. Translated with an introduction and annotations by Allison P. Coudert & Taylor Corse, Brill, Leiden, 2007. Henri, Pierre. 1952. La Vie et l’Œuvre de Louis Braille. Paris: Presses Universitaires de France. Heyerdahl, Thor. 1971. The Ra Expeditions. New York: Signet. Translated by Patricia Crampton. Holmyard, Eric. 1957. Alchemy. Harmondsworth: Penguin. Homeyer, C. G. 1870. Die Haus- und Hofmarken. Berlin: Decker. Honda, Soichiro (本田 総一郎). 2004. 日本の家紋大全 (Nihon no Kamon Taizen). Tokyo: Goto Shoin.

Bibliography

217

Houston, Stephen. 2018. Writing that isn’t: Pseudo-scripts in comparative view. L’Homme, 227/228.21–48. Hove, Michael; Johannes Stelzer; Till Nierhaus; Sabrina Thiel; Christopher Gundlach; Daniel Margulies; Koene Van Dijk; Robert Turner; Peter Keller and Björn Merker. 2015. Brain network reconfiguration and perceptual decoupling during an absorptive state of consciousness. Cerebral Cortex, pages 1–9. Huang, Sheng-Cheng; Randolph Bias and David Schnyer. 2015. How are icons processed by the brain? Neuroimaging measures of four types of visual stimuli used in information systems. Journal of the Association for Information Science and Technology, 66(4).702–720. Hunt, Patrick. October 29 2012. Medieval guild signs and emblem traditions: Zunftzeichen. Electrum Magazine. URL http://www.electrummagazine.com/2012/10/ medieval-guild-signs-and-emblem-traditions-zunftzeichen/. Hunter, G. R. 1929. The Script of Harappa and Mohenjo-Daro and Its Connection with Other Scripts. PhD thesis, Oxford University, Oxford. Hyland, Sabine. 2014. Ply, markedness and redundancy: New evidence for how Andean Khipus encoded information. American Anthropologist, 116(3).643–648. Hyland, Sabine. 2017. Writing with twisted cords: the inscriptive capacity of Andean Khipus. Current Anthropology, 58(3).412–419. Hyland, Sabine. January 15 2021. Iconic signs in a non-iconic writing system: Khipus with potatoes, feathers, figurines and other objects. Presented at INSCRIBE Workshop on “Invention of Writing: Production of Images and Language Notation”. URL https://www.inscribercproject. com/workshop/session4.php. Hyland, Sabine; Gene Ware and Madison Clark. 2014. Knot direction in a Khipu/alphabetic text from the Central Andes. Latin American Antiquity, 25(2).189–197. Info, Japan. January 23 2020. Studio Ghibli: The (not so) hidden meaning of Spirited Away. URL https://jpninfo.com/29501. International Phonetic Association, . 1999. Handbook of the International Phonetic Association: A guide to the use of the International Phonetic Alphabet. Cambridge: Cambridge University Press. Istrin, Viktor A. 1965a. Razvitie pis’ma / Развитие Письма. Moscow, USSR: Soviet Academy of Sciences / Академия Наук СССР, Nauka / Наука. In Russian. Istrin, Viktor A. 1965b. Vozniknovenie i razvitie pis’ma / Возникновение и Развитие Письма. Moscow, USSR: Soviet Academy of Sciences / Академия Наук СССР, Nauka / Наука. Jackson, Anthony. 1984. The Symbol Stones of Scotland: a Social Anthropological Resolution of the Problem of the Picts. Elgin: Orkney Press. Jackson, Anthony. 1990. The Pictish Trail. Elgin: Orkney Press. Jauhiainen, Tommi; Marco Lui; Marcos Zampieri; Timothy Baldwin and Krister Lindén. 2019. Automatic language identification in texts: A survey. Journal of Artificial Intelligence Research, 65.675–782. Johnson, Christopher. 1997. Lévi-Strauss: The writing lesson revisited. The Modern Language Review, 92(3).599–612. Kaiser, David. 2005. Physics and Feynman’s diagrams. American Scientist, 93.156–165. Kammerzell, Frank. 2009. Defining non-textual marking systems, writing, and other systems of graphic information processing. In Andrássy, Petra; Julia Budka and Frank Kammerzell, editors, Non-Textual Marking Systems, Writing and Pseudo Script from Prehistory to Modern Times, number 8 in Lingua Aegyptia — Studia monographica, pages 277–308. Göttingen: Seminar für Ägyptologie und Koptologie. Keightley, David. 1978. Sources of Shang History: The Oracle-Bone Inscriptions of Bronze Age China. Berkeley and Los Angeles: University of California Press. Kelly, Piers. 2020. Australian message sticks: Old questions, new directions. Journal of Material Culture, 25(2).133–152. doi:10.1177/1359183519858375. URL https://doi.org/10. 1177/1359183519858375.

218

Bibliography

Kelly, Piers; James Winters; Helena Miton and Olivier Morin. 2021. The predictable evolution of letter shapes: An emergent script of West Africa recapitulates historical change in writing systems. Current Anthropology, 62(6).1–38. Kempelen, Wolfgang von. 1792. Mechanismus der menschlischen Sprache. Vienna: Degen. New edition (2017) edited by Fabian Brackhane, Richard Sproat and Jürgen Trouvain, http: //www.coli.uni-saarland.de/~trouvain/Kempelen-Web_2017_07_31.pdf. Kim, Ko Woon; Sang Won Lee; Jeewook Choi; Tae Min Kim and Bumseok Jeong. 2015. Neural correlates of text-based emoticons: a preliminary fMRI study. Brain and Behavior, 6 (8). Kirby, Simon. 1999. Function, Selection and Innateness: the Emergence of Language Universals. Oxford: Oxford University Press. Kirby, Simon; Tom Griffiths and Kenny Smith. 2014. Iterated learning and the evolution of language. Current opinion in neurobiology, 28.108–114. Koskenniemi, Kimmo. 1981. Syntactic methods in the study of the Indus script. Studia Orientalia, 50.125–136. Koskenniemi, Seppo; Asko Parpola and Simo Parpola. 1970. A method to classify characters of unknown ancient scripts. Linguistics, 61.65–91. Lambert, Maurice. 1966. Pourquoi l’écriture est née en Mésopotamie. Archaeologia, 12(24–31). Lawler, Andrew. 2007. Ancient writing or modern fakery? Science, 317(5838).588–589. Lazaridou, Angeliki; Karl Moritz Hermann; Karl Tuyls and Stephen Clark. 2018. Emergence of linguistic communication from referential games with symbolic and pixel input. In International Conference on Learning Representations. Le Novère, Nicolas et al. 2009. The systems biology graphical notation. Nature Biotechnology, 27.735–741. URL http://www.nature.com/nbt/journal/v27/n8/full/nbt.1558.html. LeCun, Yann; Corinna Cortes and Christopher Burges. n.d. The MNIST database of handwritten digits. URL http://yann.lecun.com/exdb/mnist/. Lee, Rob; Philip Jonathan and Pauline Ziman. 2010a. Pictish symbols revealed as a written language through application of Shannon entropy. Proceedings of the Royal Society A: Mathematical, Physical & Engineering Sciences, pages 1–16. Lee, Rob; Philip Jonathan and Pauline Ziman. 2010b. A response to Richard Sproat on random systems, writing, and entropy. Computational Linguistics, 36(4).791–794. Lévi-Strauss, Claude. 1955. Tristes Tropiques. Paris: Librairie Plon. Levinson, Stephen C. 1983. Pragmatics. Cambridge Textbooks in Linguistics, Cambridge: Cambridge University Press. Lewis-Williams, David. 1997. Harnessing the brain: Vision and shamanism in Upper Paleolithic Western Europe. In Conkey, Meg; O. Softer; D. Stratmann and N.G. Jablonski, editors, Beyond Art: Pleistocene Image and Symbol, number 23, pages 321–342: Memoirs of the California Academy of Sciences. Li, Lincan (李霖灿). 2001. 纳西象形文字字典 (Naxizu xiangxing biaoyin wenzi zidian): Yunnan Minzu Chubanshe. Lipowska, Dorota and Adam Lipowski. 2022. Emergence and evolution of language in multiagent systems. Lingua, 272.103331. Lucas, Christopher. 1979. The scribal tablet-house in ancient Mesopotamia. History of Education Quarterly, 19(3).305–332. MacGinnis, John; M. Willis Monroe; Dirk Wicke and Timothy Matney. 2012. Artefacts of cognition: the use of clay tokens in a Neo-Assyrian provincial administration. Cambridge Archaeological Journal, 24(2).289–306. Mack, Alastair. 1997. Field Guide to the Pictish Symbol Stones. Balgavies, Angus: The Pinkfoot Press. Updated 2006. Magerman, David and Mitchell Marcus. 1990. Parsing a natural language using mutualinformation statistics. In Proceedings of the Eighth Annual Meeting of the American Association for Artificial Intelligence. Mahadevan, Iravatham. 1977. The Indus Script: Texts, Concordance and Tables. Calcutta and Delhi: Memoirs of the Archaeological Survey of India.

Bibliography

219

Mahr, August. 1945. Origin and significance of Pennsylvania Dutch barn symbols. Ohio History: The Scholarly Journal of the Ohio Historical Society, 54(1).1–32. Malafouris, Lambros. 2013. How Things Shape the Mind: A Theory of Material Engagement. Cambridge, MA: MIT Press. Malin, Edward. 1986. Totem Poles of the Pacific Northwest Coast. Portland, OR: Timber Press. Mallery, Garrick. 1883. Pictographs of the North American Indians: A Preliminary Paper. Number 4 in Annual Report of the Bureau of Ethnology, Washington, DC: Smithsonian Institution. Marshall, John. 1931. Mohenjo-Daro and the Indus Civilization. London: Arthur Probsthain. Martin, Alex. 2007. The representation of object concepts in the brain. Annual Review of Psychology, 58.25–45. Mayor, Adrienne; John Colarusso and David Saunders. 2014. Making sense of nonsense inscriptions associated with Amazons and Scythians on Athenian vases. Hesperia: The Journal of the American School of Classical Studies at Athens, 83(3).447–493. Mayshar, Joram; Omer Moav and Luigi Pascali. 2022. The origin of the state: Land productivity or appropriability? Journal of Political Economy, 130(4).1091–1144. URL https://doi.org/10. 1086/718372. McCawley, James. 1996. Music notation. In Daniels, Peter and William Bright, editors, The World’s Writing Systems, pages 847–854. New York: Oxford. Mikolov, Tomas; Ilya Sutskever; Kai Chen; Greg Corrado and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Neural Information Processing. Mitchell, Tom; Svetlana Shinkareva; Andrew Carlson; Kai-Min Chang; Vicente Malave; Robert Mason and Marcel Adam Just. May 2008. Predicting human brain activity associated with the meanings of nouns. Science, 320.1191–1195. Mnih, Volodymyr; Nicolas Heess; Alex Graves and Koray Kavukcuoglu. 2014. Recurrent models of visual attention. In Proc. of Neural Information Processing Systems (NIPS), pages 2204–2212. Moorehouse, Alfred. 1953. The Triumph of the Alphabet. New York: Henry Schuman. Morimoto, Keiichi (森本 景一). 2006. 女紋 (Onna Mon). Kyoto: Senshoku Hosei Morimoto. Morimoto, Yuya (森本 勇矢). 2013. 日本の家紋大辞典 (Nihon no Kamon Daijiten). Tokyo: Nihon Jitsugyo Publishing. Morin, Olivier. January 12 2021. Solving the puzzle of ideography. Presented at INSCRIBE Workshop on “Invention of Writing: Production of Images and Language Notation”. URL https://www.inscribercproject.com/workshop/session1.php. Morin, Olivier; Piers Kelly and James Winters. 2020. Writing, graphic codes, and asynchronous communication. Topics in Cognitive Science, 10(4).1–17. URL https://doi.org/ 10.1111/tops.12386. Mullaney, Thomas. 2017. The Chinese Typewriter: A History. Cambridge, MA: MIT Press. Muscarella, Oscar White. 2005. Review: Jiroft and “Jiroft-Aratta” a review article of Yousef Madjidzadeh, Jiroft: The Earliest Oriental Civilization. Bulletin of the Asia Institute, 15.173– 198. Muscarella, Oscar White. 2008. Jiroft III. General survey of excavations. In Encyclopedia Iranica, volume XIV/6, pages 653–656: Encyclopedia Iranica Foundation. Namkung, Ho; Sun-Hong Kim and Akira Sawa. 2017. The insula: an underestimated brain area in clinical neuroscience, psychiatry, and neurology. Trends in Neuroscience, 40(4).200–207. Naqvi, Nasir and Antoine Bechara. 2009. The hidden island of addiction: the insula. Trends in Neuroscience, 32(1).56–67. Niyogi, Partha. 2006. The Computational Nature of Language Learning and Evolution. Cambridge, MA: MIT Press. Nunberg, Geoffrey. 1995. The Linguistics of Punctuation. CSLI Publications, Chicago: University of Chicago Press. O’Grady, Cathleen and Kenny Smith. 2018. Models of language evolution.

220

Bibliography

Oi, Mariko. 2012. Adult adoptions: Keeping Japan’s family firms alive. BBC News Magazine. URL https://www.bbc.com/news/magazine-19505088. Okudaira, Shizue (奥平 志づ江). 1983. 日本の家紋 (Nihon no Kamon). Kasei Kenky¯u, 15. 1–4. Olivier, D. C. 1968. Stochastic Grammars and Language Acquisition Devices. PhD thesis, Harvard University, Cambridge, MA. Oppenheim, A. Leo. 1959. On an operational device in Mesopotamian bureaucracy. Journal of Near Eastern Studies, 18(121–128). Overmann, Karenleigh. 2016. Beyond writing: The development of literacy in the Ancient Near East. Cambridge Archaeological Journal, 26(2).285–303. Palka, Joel. 2010. The development of Maya writing. In Woods, Christopher; Emily Teeter and Geoff Emberling, editors, Visible Language: Inventions of Writing in the Ancient Middle East and Beyond, number 32 in Oriental Institute Museum Publications, pages 225–230. Chicago: Oriental Institute. Parpola, Asko. 1994. Deciphering the Indus Script. New York: Cambridge University Press. Parpola, Asko. 2008. Is the Indus script indeed not a writing system? In Air¯avati: Felicitation Volume in Honour of Iravatham Mahadevan, pages 111–131. Chennai: Varalaaru. Parpola, Asko; Seppo Koskenniemi; Simo Parpola and Pentti Aalto. 1969. Decipherment of the Proto-Dravidian inscriptions of the Indus Civilization: First announcement. Technical report, Scandinavian Institute of Asian Studies, Copenhagen. Patterson, Karalyn; Peter Nestor and Timothy Rogers. December 2007. Where do you know what you know? The representation of semantic knowledge in the human brain. Nature Reviews Neuroscience, 8.976–988. Peirce, Charles. 1868. On a new list of categories. In Proceedings of the American Academy of Arts and Sciences, volume 7, pages 287–298. Peirce, Charles. 1934. Collected Papers: Volume V. Pragmatism and Pragmaticism. Cambridge, MA: Harvard University Press. Peters, Joris and Klaus Schmidt. 2004. Animals in the symbolic world of pre-pottery neolithic Göbekli Tepe, south-eastern Turkey: a preliminary assessment. Anthropozoologica, 39(1).179– 218. Pope, Maurice. 1975. The Story of Decipherment: From Egyptian Hieroglyphs to Linear B. Hong Kong: Thames and Hudson. Pope, Maurice. 1999. The Story of Decipherment: From Egyptian Hieroglyphs to Maya Script. Hong Kong: Thames and Hudson, revised edition. Possehl, Gregory. 1996. The Indus Age: The Writing System. Philadelphia: University of Philadelphia Press. Powell, Barry. 2009. Writing: Theory and History of the Technology of Civilization. Chichester: Wiley-Blackwell. Praßl, Daniela. 2017. Kriminalwissenschaftliche Beschäftigung mit “Gaunersprache” und “Gaunerzinken” um 1900 unter besonderer Berücksichtigung der Grazer Schule der Kriminologie. Master’s thesis, Karl-Franzens-Universität Graz, Graz. Pülvermuller, Friedemann. 2013a. How neurons make meaning: brain mechanisms for embodied and abstract-symbolic semantics. Trends in Cognitive Sciences, 17(9).458–470. Pülvermuller, Friedemann. 2013b. Semantic embodiment, disembodiment or misembodiment? In search of meaning in modules and neuron circuits. Brain & Language, 127.86–103. Raman, TV. 1994. Audio System for Technical Readings. PhD thesis, Cornell University, Ithaca, NY. Rao, Rajesh. 2010. Probabilistic analysis of an ancient undeciphered script. IEEE Computer, 43 (3).76–80. Rao, Rajesh; Nisha Yadav; Mayank Vahia; Hrishikesh Joglekar; R. Adhikari and Iravatham Mahadevan. 2009a. Entropic evidence for linguistic structure in the Indus script. Science, 324(5931).1165.

Bibliography

221

Rao, Rajesh; Nisha Yadav; Mayank Vahia; Hrishikesh Joglekar; R. Adhikari and Iravatham Mahadevan. August 5 2009b. A Markov model of the Indus script. Proceedings of the National Academy of Sciences, 106(33).13685–13690. Rao, Rajesh; Nisha Yadav; Mayank Vahia; Hrishikesh Joglekar; R. Adhikari and Iravatham Mahadevan. 2010. Entropy, the Indus script, and language: A reply to R. Sproat. Computational Linguistics, 36(4).795–805. Rao, Rajesh; Rob Lee; Nisha Yadav; Mayank Vahia; Philip Jonathan and Pauline Ziman. 2015. On statistical measures and ancient writing systems. Language, 91(4).198–205. Rawles, Myrtle. 1966. ‘Boontling’: Esoteric language of Boonville, California. Western Folklore, 25(2).93–103. California Folklore Society. Reali, Florencia; Nick Chater and Morten H Christiansen. 2014. The paradox of linguistic complexity and community size. In Evolution of language: Proceedings of the 10th international conference (evolang10), pages 270–277. World Scientific. Reefe, Thomas. 1977. Lukasa: A Luba memory device. African Arts, 10(4).48–50. Reich, Lior; Marcin Szwed; Laurent Cohen and Amir Amedi. 2011. A ventral visual stream reading center independent of visual experience. Current Biology, 21.363–368. Reinach, Salomon. 1903. L’art et la magie à propos des peinture et des gravured de l’age du renne. L’Anthropologie, XIV.257–266. Rhys, John. 1892. The inscriptions and language of the northern Picts. Proceedings of the Society of Antiquaries of Scotland, 26.263–351. Robinson, Andrew. 2007. The Story of Writing: Alphabets, Hieroglyphs & Pictographs. London: Thames & Hudson, 2nd edition. Robson, Eleanor. 2001. The tablet house: a scribal school in Old Babylonian Nippur. Revue d’Assyriologie et d’Archéologie orientale, 93(1).39–66. Robson, Eleanor. 2005. The Invention of Cuneiform: Writing in Sumer by Jean-Jacques Glassner; translated by Zainab Bahrani and Marc Van De Mieroop. American Journal of Archaeology, 110(1).171–172. Rogers, Henry. 2005. Writing Systems: A Linguistic Approach. Malden, MA: Blackwell. Rosen, Howard J; Maria Luisa Gorno-Tempini; WP Goldman; RJ Perry; N Schuff; Michael Weiner; R Feiwell; J H Kramer and Bruce L Miller. 2002. Patterns of brain atrophy in frontotemporal dementia and semantic dementia. Neurology, 58(2).198–208. Roth, Walter. 1897. Ethnological Studies among the North-West-Central Queensland Aborigines. Brisbane: Edmund Gregory. Rougé, Emmanuel de. 1859. Mémoire sur l’origine égyptienne de l’alphabet phénicien. Comptes rendus des séances de l’Académie des Inscriptions et Belles-Lettres, 3.115–124. Royal Commission on the Ancient and Historical Monuments of Scotland, . Pictish symbol stones: A handlist, 1994. Sakurai, Yasuhisa; Imari Mimura and Toru Mannen. 2008. Agraphia for kanji resulting from a left posterior middle temporal gyrus lesion. Behavioural Neurology, 19(3).93–106. Salomon, Frank. 2001. How an Andean ‘writing without words’ works. Current Anthropology, 42(1).1–27. Sampson, Geoffrey. 1985. Writing Systems. Stanford, CA: Stanford University Press. Sampson, Geoffrey. 2012. Writing Systems. Stanford, CA: Stanford University Press, 2nd edition. Saturno, William; David Stuart and Boris Beltrán. January 5 2006. Early Maya writing at San Bartolo, Guatemala. Science Express, pages 1–3. Saussure, Ferdinand de. 1916. Cours de linguistique générale. Paris: Payot. Schmandt-Besserat, Denise. 1992. Before Writing. Austin, TX: University of Texas Press. Schmandt-Besserat, Denise. 1996. How Writing Came About. Austin, TX: University of Texas Press. Sebeok, Thomas, editor. 1977. A Perfusion of Signs. Bloomington, IN: Indiana University Press. Sebeok, Thomas. 2001. Signs: An Introduction to Semiotics. Toronto, ON: University of Toronto Press, 2nd edition.

222

Bibliography

Sehasseh, El Mehdi; Philippe Fernandez; Steven Kuhn; Mary Stiner; Susan Mentzer; Debra Colarossi; Amy Clark; François Lanoe; Matthew Pailes; Dirk Hoffmann; Alexa Benson; Edward Rhodes; Moncef Benmansour; Abdelmoughit Laissaoui; Ismail Ziani; Paloma Vidal-Matutano; Jacob Morales; Youssef Djellal; Benoit Longet; Jean-Jacques Hublin; Mohammed Mouhiddine; Fatima-Zohra Rafi; Kayla Beth Worthey; Ismael Sanchez-Morales; Noufel Ghayati and Abdeljalil Bouzouggar. 22 September 2021. Early middle stone age personal ornaments from Bizmoune Cave, Essaouira, Morocco. Science Advances, (7:eabi8620). Seidl, Ursula. 1989. Die babylonischen Kudurru- Reliefs. Symbole mesopotamischer Gottheiten: Universitätsverlag Freiburg. Shannon, Claude. 1948. A mathematical theory of communication. Bell System Technical Journal, 27.379–423. Shannon, Claude. 1951. Prediction and entropy of printed English. The Bell System Technical Journal, 30(1).50–64. Shaughnessy, Edward. 2010. The beginnings of writing in China. In Woods, Christopher; Emily Teeter and Geoff Emberling, editors, Visible Language: Inventions of Writing in the Ancient Middle East and Beyond, number 32 in Oriental Institute Museum Publications, pages 214–224. Chicago: Oriental Institute. Silvester, Victor. 1935. Modern Ballroom Dancing. London: Herbert Jenkins Ltd, 8th edition. Skånberg, Tuve. 2003. Glömda gudstecken från fornkyrklig dopliturgi till allmogens bomärken. Number 45 in Bibliotheca Historico-Ecclesiastica Lundensis, Lund: Lund Universitets Kyrkohistoriska Arkiv. Slater, Stephen. 2002. The Complete Book of Heraldry. London: Lorenz Books. Smagulov, E. A. and S. A. Yatsenko. 2019. Series of signs in the oases of Southern Kazakhstan. In Voyakin, D. A., editor, Tamgas of Pre-Islamic Central Asia/Тамги доисламской центральной Азии, pages 159–197. Samarkand: International Institute for Central Asian Studies. Smith, R.J. and R.K. Beardsley. 2004. Japanese Culture: Its Development and Characteristics. Routledge Library Editions: Anthropology and Ethnography: Routledge. Sproat, Richard. 2000. A Computational Theory of Writing Systems. Cambridge: Cambridge University Press. Sproat, Richard. August 2009. Symbols, meaning and statistics. Invited talk at EMNLP, Singapore. Sproat, Richard. 2010a. Ancient symbols, computational linguistics, and the reviewing practices of the general science journals. Computational Linguistics, 36(3). Sproat, Richard. 2010b. Language, Technology, and Society. Oxford: Oxford University Press. Sproat, Richard. 2010c. Reply to Rao et al. and Lee et al. Computational Linguistics, 36(4). 807–816. Sproat, Richard. 2014. A statistical comparison of written language and non-linguistic symbol systems. Language, 90(2).457–481. Sproat, Richard. 2015. On misunderstandings and misrepresentations: A reply to Rao et al. Language, 91(4).206–208. Sproat, Richard. 2017. A computational model of the discovery of writing. Written Language and Literacy, 20(2).194–226. Sproat, Richard. 2021. Writing systems. In Dresher, B. Elan and Harry van der Hulst, editors, The Oxford History of Phonology, chapter 2, pages 19–37. Oxford: Oxford University Press. Sproat, Richard and Steve Farmer. 2005. Morphology and the Harappan gods. In Arppe, Antti; Lauri Carlson; Krister Lindén; Jussi Piitulainen; Mickael Suominen; Martti Vainio; Hanna Westerlund and Anssi Yli-Jyrä, editors, Inquiries into Words, Constraints and Contexts: Festschrift for Kimmo Koskenniemi on his 60th Birthday, pages 126–135. Stanford, CA: CSLI Publications. Sproat, Richard and Alexander Gutkin. 2021. The taxonomy of writing systems: How to measure how logographic a system is. Computational Linguistics, 47(3).1–52.

Bibliography

223

Sproat, Richard and Chilin Shih. 1990. A statistical method for finding word boundaries in Chinese text. Computer Processing of Chinese and Oriental Languages, 4(336–351). Stauder, Andréas. 2010. The earliest Egyptian writing. In Woods, Christopher; Emily Teeter and Geoff Emberling, editors, Visible Language: Inventions of Writing in the Ancient Middle East and Beyond, number 32 in Oriental Institute Museum Publications, pages 137–147. Chicago: Oriental Institute. Stauder, Andréas. Figurative, yet internally derived: On differential iconicity in egyptian. Presented at the Second Signs of Writing conference, Beijing, China, June 2015. Steels, Luc. 2012. Experiments in Cultural Language Evolution. Amsterdam: John Benjamins. Steinthal, Heymann. 1852. Die Entwicklung der Schrift, nebst einem offenen Sendschreiben an Herrn Professor Pott. Berlin: Ferd. Dümmler’s Verlagsbuchhandlung. Stewart, Hilary. 1990. Totem Poles. Seattle, WA: University of Washington Press. Stewart, Hilary. 1993. Looking at Totem Poles. Seattle, WA: University of Washington Press. Stone, Elizabeth. 1987. Nippur neighborhoods. Number 44 in Studies in Ancient Oriental Civilization, Chicago: Oriental Institute of the University of Chicago. Streicher, Hubert. 1928. Die graphischen Gaunerzinken. Number 5 in Kriminologische Abhandlungen, Wien: Springer. Sutherland, Elizabeth. 1997. The Pictish Guide. Edinburgh: Birlinn Ltd. Swift, Art. May 22 2017. In U.S., belief in creationist view of humans at new low. URL https: //news.gallup.com/poll/210956/belief-creationist-view-humans-new-low.aspx. Swiggers, Pierre. 1996. Transmission of the Phoenician script to the West. In Daniels, Peter and William Bright, editors, The World’s Writing Systems, pages 261–270. Oxford: Oxford University Press. Tabaldyev, K. Sh. 2019. Medieval tamga-signs of Kyrgyzstan. In Voyakin, D. A., editor, Tamgas of Pre-Islamic Central Asia/Тамги доисламской центральной Азии, pages 364–386. Samarkand: International Institute for Central Asian Studies. Takasawa, Hitoshi (高澤 等). 2011. 家紋の辞典 (Kamon no Jiten). Tokyo: Toky¯od¯o Shuppan. Thompson-Schill, Sharon. 2003. Neuroimaging studies of semantic memory: inferring “how”’ from “where”. Neuropsychologia, 41(3).280–92. Tonomura, Hitomi. 1990. Women and inheritance in Japan’s early warrior society. Comparative Studies in Society and History, 32(3).592–623. Tun, Molly. 2015. El quipu: escritura andina en las redes informáticas incaicas y coloniales. PhD thesis, University of Minnesota, Minneapolis, MN. Turner, Victor. 1967. The Forest of Symbols: Aspects of Ndembu Ritual. New York: Cornell University Press. Tytell, John. 1988. Ezra Pound: The Solitary Volcano. Norwell, MA: Anchor Press. Urton, Gary. 1998. From knots to narratives: Reconstructing the art of historical record keeping in the Andes from Spanish transcriptions of Inka Khipus. Ethnohistory, 45(3).409–438. Urton, Gary. 2001. A calendrical and demographic tomb text from Northern Peru. Latin American Antiquity, 12(2).127–147. Urton, Gary. 2010. Numeral graphic pluralism in the colonial Andes. Ethnohistory, 57(1). 135–164. Urton, Gary. 2017. Inka History in Knots: Reading Khipus as Primary Sources. Austin, TX: University of Texas Press. Valério, Miguel and Silvia Ferrara. May 2022. Numeracy at the dawn of writing: Mesopotamia and beyond. Historia Mathematica, 59.35–53. Veldhuis, Niek. 2006. How did they learn cuneiform? Tribute/Word List C as an elementary excercise. In Michalowski, Piotr and Niek Veldhuis, editors, Approaches to Sumerian Literature. Studies in Honour of Stip (H. L. J. Vanstiphout), volume 35 of Cuneiform Monographs, pages 181–200. Leiden, Boston: Brill. Vidale, Massimo. 2007. The collapse melts down: a reply to Farmer, Sproat and Witzel. East and West, 57.333–366. Vogel, Alecia; Steven Petersen and Bradley Schlaggar. March 2014. The VWFA: it’s not just for words anymore. Frontiers in Human Neuroscience, 8.1–10.

224

Bibliography

von Petzinger, Genevieve. 2017. The First Signs: Unlocking the Mysteries of the World’s Oldest Symbols. New York: Atria. Voyakin, D. A., editor. 2019. Tamgas of Pre-Islamic Central Asia/Тамги доисламской центральной Азии. Samarkand: International Institute for Central Asian Studies. White, April. October 2018. How the history of merit badges is also a cultural history of the United States. Smithsonian Magazine. URL https://www.smithsonianmag.com/history/ history-merit-badges-cultural-history-united-states-180970306/. Whitley, David. 2011. Rock art, religion, and ritual. In Insoll, Timothy, editor, Oxford Handbook of The Archaeology of Ritual and Religion, chapter 20, pages 307–326. Oxford: Oxford University Press. Whittaker, Gordon. 2009. The principles of Nahuatl writing. Göttinger Beiträge zur Sprachwissenschaft, 16.47–81. Whittaker, Gordon. 2021. Deciphering Aztec Hieroglyphs: A Guide to Nahuatl Writing. London: Thames & Hudson. Wilkins, John. 1668. An Essay Towards a Real Character, and a Philosophical Language. London: Royal Society. Winn, Shan M.M. 1973. The Signs of Vinˇca Culture: An Internal Analysis; Their Role, Chronology and Independence from Mesopotamia. PhD thesis, UCLA, Los Angeles, CA. Winn, Shan M.M. 1981. Pre-Writing in Southeastern Europe: The Sign System of the Vinˇca Culture, ca. 4000 B.C.: Western Publishers. Winn, Shan M.M. 2008. The Danube (Old European) script ritual use of signs in the BalkanDanube region c. 5200–3500 bc. Journal of Archaeomythology, 4(1).126–141. Witzel, Michael and Steve Farmer. September 30 2000. Horseplay in Harappa. Frontline, pages 4–14. WNYC, . February 24 2014. The man who tried to eliminate all words, but never met a smartphone. Podcast. URL https://www.wnycstudios.org/podcasts/notetoself/articles/charles-bliss-emoji. Woods, Christopher; Emily Teeter and Geoff Emberling, editors. 2010. Visible Language: Inventions of Writing in the Ancient Middle East and Beyond. Number 32 in Oriental Institute Museum Publications, Chicago: Oriental Institute. Woodward, Jamie. 2014. The Ice Age: A Very Short Introduction. Oxford: Oxford University Press. Wray, Mike and Charlie Wray. February 3 2020. Hobo signs — code of the road? URL https: //www.historicgraffiti.org/post/hobo-signs-code-of-the-road. The Historic Graffiti Society. Wu, Katherine; Jennifer Solman; Ruth Linehan and Richard Sproat. January 2012. Corpora of non-linguistic symbol systems. In Linguistic Society of America, Portland, OR. URL http://elanguage.net/journals/lsameeting/article/view/2845/pdf. Wylie, Korey and Jason Tregellas. 2010. The role of the insula in schizophrenia. Schizophrenia Research, 123(2–3).93–104. Yang, Haidi; Mingxia Wang; Fengchun Wu; Qingwei Li; Yiqing Zheng and Pengmin Qin. 2020. Diminished self-monitoring in hallucinations—Aberrant anterior insula connectivity differentiates auditory hallucinations in schizophrenia from subjective tinnitus. Asian Journal of Psychiatry, 52(1–7). Yoder, Don and Thomas Graves. 2000. Hex Signs: Pennsylvania Dutch Barn Symbols and their Meaning. Mechanicsburg, PA: Stackpole. Zimansky, Paul. 1993. Review of Before Writing, Volume 1: From Counting to Cuneiform, by D. Schmandt-Besserat. Journal of Field Archaeology, 20.513–517. Zipf, George Kingsley. 1949. Human Behavior and the Principle of Least Effort: AddisonWesley.

Author Index

A Aalto, Pentti, 189 Acosta, José de, 65, 161 Adhikari, R., 70, 185, 187, 188, 189, 192, 193, 194 Adolphs, Ralph, 116 Amedi, Amir, 123 Amiet, Pierre, 54, 134 Anderson, Ned, 28, 76, 160 Andrews, William, 61 Anzellotti, Stefano, 118 Aronoff, Mark, 104 Artzi, Bat-Ami, 65 Austin, John L., 20

B Bahdanau, Dmitry, 149, 164, 165 Baines, John, 15, 16, 17 Baker, Heather, 141 Baldwin, Timothy, 192 Barach, Eliza, 125 Baratov, S.R., 57, 58 Barbeau, Marius, 28, 70 Barber, Liam, 119 Barbier, Charles, 7 Barrault, Loïc, 203 Basello, Gian Pietro, 181 Basso, Keith, 28, 76, 160 Baxter, William, 162, 166 Beardsley, R.K., 47 Bechara, Antoine, 119 Bégouen, Henri, 26 Bel-Enguix, Gemma, 146 Beltrán, Boris, 132 Bengio, Yoshua, 149, 164, 165

Ben-Tor, Daphna, 112, 204 Benz, Marion, 26 Berendsohn, Roy, 28, 63 Bias, Randolph, 6, 125, 126, 127, 128 Bliss, Charles, 71 Bloomfield, Leonard, 92 Bohn, Willard, 106 Boltz, William, 147, 158, 159 Boone, Elizabeth, 65, 102 Bordes, Antoine, 203 Bouissac, Paul, 13 Bouzouggar, Abdeljalil, 22 Bright, William, 21, 91 Brokaw, Galen, 65 Broome, Myrtle, 16 Brunila, Mikael, 116 Buckley, Eugene, 147, 158, 159 Burges, Christopher, 184 Burke, Bernard, 35 Burke, James, 3

C Caine, Bruce, 121 Calverley, Amice, 16 Chamberlain, Basil Hall, 35, 50 Chandler, Daniel, 13, 17, 18, 19 Changizi, Mark, 119, 128 Chater, Nick, 146 Chikano, Shigeru (千鹿野 茂), 35, 38, 40, 88 Childe, V. Gordon, 69 Cho, Kyunghyun, 149, 164, 165 Chrisomalis, Stephen, 140 Christiansen, Morten H., 146 Cimarosti, Angelo, 181 Clark, Madison, 65, 66, 68

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Sproat, Symbols, https://doi.org/10.1007/978-3-031-26809-0

225

226 Clark, Stephen, 146 Clottes, Jean, 22, 23, 27 Colarusso, John, 112, 204 Coon, Jessica, 203 Corrado, Greg, 116, 148 Cortes, Corinna, 184 Coulmas, Florian, 91, 98 Cunningham, Alexander, 185

D Dalley, Stephanie, 138 Damasio, Antonio R., 116, 119 Damasio, Hanna, 116 Damerow, Peter, 54, 163 Daniels, Peter, 21, 91, 98, 120, 132, 147, 158, 159 Dean, Jeff, 116, 148 DeFrancis, John, 29, 91, 98, 102, 106, 108, 180 Dehaene, Stanislas, 98, 120, 121, 124, 128 Déjerine, Joseph-Jules, 121, 122 Depauw, Mark, 19, 61, 84 Derrida, Jacques, 205 Desset, François, 181 Diringer, David, 91, 98 Drew, F.W.M., 70 Driver, Geoffrey, 91, 97 Drucker, Johanna, 91, 96, 97 Dürscheid, Christa, 199

E Eco, Umberto, 7, 13, 14, 19, 136, 196, 198, 199 Emberling, Geoff, 54, 91, 138 Embury-Dennis, Tom, 207 Englund, Robert, 54, 55, 136, 138, 140, 155, 156, 163

F Faber, Alice, 98 Farmer, Steve, 69, 73, 184, 185, 186, 191, 193 Farnell, Brenda, 28, 76, 80 Feiwell, R., 118 Feldman, Laurie Beth, 125 Feldman, Richard, 70 Fenollosa, Ernest, 107 Ferguson, John, 35 Fernández, Gimena, 65, 102 Ferrara, Silvia, 54, 91 Firth, John, 116, 148, 166 Fox-Davies, Arthur Charles, 35, 43, 44, 45, 46, 47, 48, 50 Freeth, Tony, 24

Author Index Friar, Stephen, 35 Friberg, Jöran, 54 Fuchs, Friedrich, 35

G Gair, James, 121 Garcés, Fernando, 102 Garfield, Viola, 70 Gelb, Ignace, 91, 98, 138, 160, 184 Ghayati, Noufel, 22 Glassner, Jean-Jacques, 91, 138, 148 Gnanadesikan, Amalia, 3, 91, 97, 98 Goldman, W.P., 118 Goldwasser, Orly, 97 Goody, Jack, 12 Grabowski, Thomas J., 116 Grainger, Jonathan, 123 Graves, Thomas, 29, 73 Griffiths, Tom, 146 Groß, Hans, 63, 64 Gundlach, Christopher, 119 Gunn, Sisvan William, 70 Gutkin, Alexander, 164

H Haarmann, Harald, 53 Handel, Zev, 18, 91, 158 Hannas, William, 98 Harris, Roy, 21, 28, 80, 91, 93, 110, 111 Harris, William, 203 Harris, Zellig, 190 Helmont, Franciscus Mercurius van, 30 Henderson, John, vii Henri, Pierre, 7 Hermann, Karl Moritz, 146 Heyerdahl, Thor, 132 Hichwa, Richard D., 116, 119 Holmyard, Eric, 60 Homeyer, C.G., 62, 63 Honda, Soichiro (本田 総一郎), 35, 37, 39 Houston, Stephen, 112, 204 Huang, Sheng-Cheng, 6, 125, 126, 127, 128 Hunt, Patrick, 28, 61 Hunter, G.R., 185, 189, 190 Hyland, Sabine, 21, 65, 66, 67, 68, 69, 102, 160

I Info, Japan, 12 Istrin, Viktor A., 91

Author Index J Jackson, Anthony, 58 Jednoróg, Katarzyna, Jeong, Bumseok, 125 Joachim, Bauer, 26 Joglekar, Hrishikesh, 70, 185, 187, 188, 189, 192, 193, 194 Johnson, Christopher, 205, 206 Jonathan, Philip, 57, 180, 186, 187, 194 Just, Marcel Adam, 116

K Kaiser, David, 28 Kammerzell, Frank, 18, 179 Kavukcuoglu, Koray, 164 Keightley, David, 140 Keller, Peter, 119 Kelly, Piers, 21, 28, 29, 75 Kempelen, Wolfgang von, 30 Kentridge, Robert, 24 Kim, Sun-Hong, 119 Kim, Tae Min, 125 Kirby, Simon, 146 Koskenniemi, Kimmo, 189, 190 Koskenniemi, Seppo, 189, 190 Kramer, J.H., 118

L Lambert, Maurice, 54, 134 LaViolette, Jack, 116 Lawler, Andrew, 181 Lazaridou, Angeliki, 146 LeCun, Yann, 184 Lee, Rob, 57, 180, 186, 187, 194 Le Novère, Nicolas, 28 Levinson, Stephen C., 20 Lévi-Strauss, Claude, 204 Lewis-Williams, David, 22, 23, 27 Li, Lincan (李霖灿), 29, 71 Lindén, Krister, 192 Lipowska, Dorota, 146 Lipowski, Adam, 146 Li, Qingwei, 119 Lucas, Christopher, 142

M Mack, Alastair, 58 Magerman, David, 190 Mahadevan, Iravatham, 70, 184, 185, 187, 188, 189, 192, 193, 194 Mahr, August, 73

227 Malafouris, Lambros, 129 Malin, Edward, 70 Mallery, Garrick, 29, 74, 138 Mannen, Toru, 116 Marcus, Mitchell, 190 Margulies, Daniel, 119 Marler, Joan, 53 Marshall, John, 184 Martin, Alex, 116 Matney, Timothy, 136 Mayor, Adrienne, 112, 204 Mayshar, Joram, 26, 205 McCawley, James, 28 Meletis, Dimitris, 199 Merker, Björn, 119 Mignolo, Walter, 102 Miller, Bruce L., 118 Mimura, Imari, 116 Moav, Omer, 26, 205 Moorehouse, Alfred, 91, 98 Morimoto, Keiichi (森本 景一), 39, 40, 41, 42 Morimoto, Yuya (森本 勇矢), 35, 39 Morin, Olivier, 21, 28, 185, 201 Mullaney, Thomas, 99 Muscarella, Oscar White, 181, 182

N Namkung, Ho, 119 Naqvi, Nasir, 119 Nestor, Peter, 117, 118 Nissen, Hans, 54, 163 Niyogi, Partha, 146 Nunberg, Geoffrey, 104

O O’Grady, Cathleen, 146 Oi, Mariko, 47 Okudaira, Shizue (奥平 志づ江), 37 Olivier, D.C., 190 Oppenheim, A. Leo, 25, 54, 134, 136, 138 Overmann, Karenleigh, 129, 160

P Palka, Joel, 132 Parpola, Asko, 184, 185, 189, 190, 192 Parpola, Simo, 189, 190 Pascali, Luigi, 26, 205 Patterson, Karalyn, 117, 118 Peirce, Charles, 13, 15 Perry, R.J., 118 Peters, Joris, 25

228 Petersen, Steven, 124 Pettitt, Paul, 24 Pope, Maurice, 91 Possehl, Gregory, 184, 185 Powell, Barry, 12, 91, 93, 98, 102, 180 Praßl, Daniela, 28, 63 Pülvermuller, Friedemann, 116, 117

Q Qin, Pengmin, 119

R Raman, T.V., 109 Rao, Rajesh, 185, 187 Rawles, Myrtle, 12 Reali, Florencia, 146 Reefe, Thomas, 77 Reinach, Salomon, 27 Reniers, Renate, 119 Rhys, John, 57 Robinson, Andrew, 91 Robson, Eleanor, 138, 141, 142 Rogers, Henry, 91, 98, 99 Rogers, Timothy, 117, 118 Roth, Walter, 75 Rougé, Emmanuel de, 97

S Sagart, Laurent, 162 Sakurai, Yasuhisa, 116 Salomon, Frank, 69, 78, 79 Sampson, Geoffrey, 29, 91, 98, 99 Sanchez-Morales, Ismael, 22 Saturno, William, 132 Saunders, David, 112, 204 Saussure, Ferdinand de, 18, 92 Sawa, Akira, 119 Schlaggar, Bradley, 124 Schmandt-Besserat, Denise, 25, 54, 134, 137, 138 Schmidt, Klaus, 25 Schnyer, David, 6, 125, 126, 127, 128 Schuff, N., 118 Sebeok, Thomas, 13, 14, 15 Seidl, Ursula, 55 Shannon, Claude, 14, 185, 190 Shaughnessy, Edward, 133 Sheridan, Heather, 125 Shih, Chilin, 190 Shimojo, Shinsuke, 119, 128

Author Index Silvester, Victor, 81 Skånberg, Tuve, 62 Slater, Stephen, 35 Smagulov, E.A., 35, 57 Smith, Kenny, 146 Smith, R.J., 47 Sproat, Richard, 21, 27, 28, 29, 52, 53, 55, 57, 69, 70, 73, 74, 81, 82, 85, 91, 98, 99, 132, 135, 136, 138, 146, 147, 148, 149, 158, 162, 164, 179, 180, 184, 185, 186, 187, 189, 190, 191, 193 Stauder, Andréas, 133, 134 Steels, Luc, 146 Steinthal, Heymann, 147, 158, 159 Stewart, Hilary, 70 Stone, Elizabeth, 141 Streicher, Hubert, 28, 63 Stuart, David, 132 Sutherland, Elizabeth, 58 Swift, Art, 21 Swiggers, Pierre, 97

T Tabaldyev, K.Sh., 57, 59 Takasawa, Hitoshi (高澤 等), 35, 39, 87, 88 Teeter, Emily, 54, 91, 138 Thiel, Sabrina, 119 Thompson-Schill, Sharon, 116 Tonomura, Hitomi, 47 Tregellas, Jason, 119 Tun, Molly, 65 Turner, Robert, 119 Turner, Victor, 30 Tuyls, Karl, 146 Tytell, John, 108

U Upthegrove, Rachel, 119 Urton, Gary, 21, 26, 65, 66, 67, 102, 103, 160, 161, 205

V Vahia, Mayank, 187 Valério, Miguel, 54 Van Dijk, Koene, 119 Veldhuis, Niek, 140 Velldal, Erik, 165, 203 Vidale, Massimo, 73, 185, 193 Vogel, Alecia, 124 von Petzinger, Genevieve, 23 Voyakin, D.A., 35, 56

Author Index W Ware, Gene, 65, 66, 68 Watt, Ian, 12 Weiner, Michael, 118 White, April, 82 Whitley, David, 23 Whitney, Carol, 123 Whittaker, Gordon, 139 Wilkins, John, 196, 197, 198 Winn, Shan M.M., 25, 52, 53 Winters, James, 21, 28 Witzel, Michael, 69, 73, 184, 185, 186, 193 Woods, Christopher, 54, 91, 138 Woodward, Jamie, 24 Worthey, Kayla Beth, 22 Wray, Charlie, 63

229 Wray, Mike, 63 Wylie, Korey, 119

Y Yadav, Nisha, 187 Yatsenko, S.A., 35, 57 Yoder, Don, 73

Z Zheng, Yiqing, 119 Ziman, Pauline, 57, 180, 186, 187, 194 Zimansky, Paul, 136, 137 Zipf, George Kingsley, 82

Index

A Abugidas, see Alphasyllabaries Accounting importance in the history of writing, 161 Accounting symbols Mesopotamian, 53–54 Agrammatic aphasia, 115 Agraphia, 115, 116 Alchemical symbols, 60–61 Alexia, 116 Alphabet importance of, 98 origin of, 97 technological advantages of, 98–99 Alphasyllabaries, 96, 97 Angular gyrus, 126 Animal models, 113–114 Anomia, 119 Apache ‘writing’, see Silas John’s system Arrival (movie) semasiography in, 203 Articulation, 13, 18–20, 27, 33–34, 198 double, 18–19 single, 19 unarticulated codes, 19 Artificial “natural languages”, 195–196 Asian emoticons, 85, 109 Australian message sticks, 75–76 Aztec writing, see Nahuatl writing

B Bahdanau attention, 164 Barber poles, 61 Bigram neurons, 123 Biosemiotics, 13

Blazon, see Heraldry, blazon Blissymbolics, 99–101, 196, 199, 201, 202, 208 limitations of, 100–101 main current application of, 100–101 Brahmi, 96 Braille processing in the brain, 123 Brain anatomy and function, 114–116 letterbox, 125 meaning “hub”, 117 non-linguistic symbols in, 125–127 processing of digits in, 121 reading in, 119–125 representation of meaning in, 116–119 visual processing in, 115 British National Corpus embeddings, 162, 165–166 Broca’s aphasia, see Agrammatic aphasia Broca’s area, see Motor speech area Bullae, see Clay envelopes

C Cang Jie, 131 Car logos, 84 Ch˜u’ Nôm semantic-phonetic compounds in, 158 Change ringing notation, 87 Chemical notation, 87 Chess notation, 87 China development of writing in, 132–133 Chinese, 147 phonetically adapted logograms in, 158

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. Sproat, Symbols, https://doi.org/10.1007/978-3-031-26809-0

231

232 Chinese characters, 96 misconception of as semasiographic, 196 Clay envelopes, 134–136 Computational simulations usefulness of, 145–146

D Dakota winter counts, 74, 160 Dance notation, 80 Danube ‘script’, see Vinˇca symbols Decipherment the role of “AI” in, 193–194 Decorative systems, 29 Demotic, 125 Devanagari, 96, 97 Diagram as instance of icon, 15

E Edge-detection in vision, 120 Eduba, see Scribal schools Egypt development of writing in, 132–133 Egyptian hieroglyphs, 124 iconicity in, 15–16 Elamite, 181 Electroencephalography, 114 Emblematic systems, 28 Emoji, 109, 199–201 Emoticons, 109 English writing system of, 33 Entropy, 190 Esperanto, 195 Evolution of writing accounting, and, 128–129, 140 hypothesis about, 128–129, 139–140 institutional context of, 140 necessary conditions for, 158–159 phonological complexity, and, 146–147, 152–153, 156–158 semantic-phonetic compounds, 153–158, 168–177 what type of symbol systems could have evolved into, 159–161 Eye dialect, 105–106

F Feynman diagrams, 87 Fluent aphasia, 115 Formal systems, 28

Index Fraternity symbols, 87 Functional magnetic resonance imaging (fMRI), 114 Fusiform gyrus, 129

G Gaunerzinken, 63–65 Glottocentrism, 101 Göbekli Tepe, 25 Grain agriculture leading to hierarchical society, 205 Graphic syncretism, 110 Graphocentrism, 103 Guild symbols, 28, 61

H Hallucinations, 119 Hangul, 33, 97, 98 iconicity in, 16–17 and typewriting, 99 Harappan Civilization, see Indus Valley Civilization Heraldic systems, 28, 35–52 Heraldry, 1–3, 33, 35, 42–50, 58–60, 191 achievement of arms, 43 blazon, 38, 43 canting arms, 15, 46, 87 canton, 48 charges, 44–45, 46 differences from kamon, 36, 51–52 escutcheon of pretence, 46–47 field, 43, 44 first rule of, 36, 43–44 impalement, 46 and marriage, 46–47 marshalling, 46 ordinaries, 44 quartering, 46–48 relation of design to name of family, 45–46 similarities with kamon, 35–36 syntax, 46–49 Hex signs, see Pennsylvania barn stars Hieratic, 125 Hiragana, 96–98 Hobo signs, 63–65 Hood ornaments, see Car logos House marks, 62

I Iconicity, 13, 15–17 Icons, 14

Index as distinct from symbols and indices, 14 neural processing of, 125–127 versus writing, 126–127 Ikea assembly instructions, 87 Indices as distinct from symbols and icons, 14 Indus Valley Civilization, 132, 185 Indus Valley symbols, 69, 184–194 basic properties of, 185 Helsinki work on, 190–191 regional variation in, 191–193 statistical analyses of, 185–189 structure in, 189–191 West Asian instances, 192 Insula, 119 Intentionality (in communication), 11 Internet and anti-democratic trends, 206

J Jiroft corpus, 182

K Kabbalists, 3, 30 Kamon, 35, 37–42, 44, 60, 87–88, 191 basic properties of, 37–38 differences from heraldry, 36, 51–52 inheritance of, 40–42 limited combinatorics of, 50 no representation of marriage in, 47 origin of, 37 reading of, 38 relation of design to name of family, 38–40, 87–88 similarities with heraldry, 35–36 symbols of businesses, 36 women’s, see Onnamon Kaomoji, see Asian emoticons Katakana, 96 Khipu, 21, 26, 65–69, 102–103, 205, 207 accounting system, 65–67 other possible uses, 67–69 as writing, 160–161 Knitting patterns, 87 Kudurrus, 55, 182

L Language Musk’s prediction of obsolescence, 207–208 Letterbox, 119–129

233 evolution of, 124–125 Linear Elamite, 181–182 Literacy, 203 Literate civilizations nationalism and, 179 Lukasa memory boards, 77–78 M Masoretic punctuation, 104–105 Material Engagement Theory, 129 Mathematical notation, 5–6, 31–32, 87, 94 Mayan script, 184 head glyphs in, 125 Meaning vector-space representations of, 116–117 Meso-America development of writing in, 132 Mesopotamia, 204, 205 accounting, 205 compound symbols in, 155 accounting symbols, 25, 163 complex tokens, 134 development of writing in, 132–139 evolution versus invention of writing, 137–139, 148 lexical lists, 138, 140, 141 simple tokens, 134 token theory, 133–137 criticisms of, 136–137 Mesopotamian accounting compound symbols in, 155 wood list, 155 Mesopotamian deity symbols, see Kudurrus Military rank symbols, 87 Model confidence, 150–155 Motor speech area, 115 Mukuy¯oshi, 47 Multivalence, 27, 29–31 Musical notation, 87 non-musical uses of, 111–112 processing in the brain, 123–124 N Nahuatl writing, 139 Nambikwara, 204–206 Narrative systems, 29 as precursors to writing, 159–160 National flags, 87 Naxi pictography, 71 Ndembu Mystery of the Three Rivers, 30

234 Nisaba, 131, 205 Number ‘4’ fear of, 27

O Occipital lobe, 115, 128 One-time pad, 12 Onnamon, 40–42

P Paradigmatic dimension, 17 Pennsylvania barn stars, 71–73 Performative systems, 28–29 as precursors to writing, 160 Phaistos disk, 179 Phonetic embeddings, 149–150, 166–167 Pictish symbols, 57–58, 180–181, 186 Positron emission tomography, 114 Potter’s marks, 35 Primary motor cortex, 115–116 Primary somato-sensory cortex, 115–116 Programming flowcharts, 87 Programming languages, 87 Pseudoscripts, 112, 203–204

Q Quipu, see Khipu

R Rebus principle, 95, 133, 138, 142, 146 Recurrent neural network, 149 Regular expressions, 31 Religious iconography, 28, 87, 206 Rule of tincture, see Heraldry, first rule of

S Schizophrenia, 119 Scouting merit badges, 82 Scribal schools, 141–143 Semantic dementia, 117–119 Semasiography, 195–203 based on vector-space representations, 202–203 contrast with phonography, 201–202 in science fiction, 199 Wilkins’ “real character”, 196–199 lack of redundancy in, 197–199 Semiotics, 7, 11–20, 51 and the brain, 113

Index differences between the present work, and, 13–15 examples of what the field covers, 12 and lying, 19–20 signs, 12 Semitic scripts, 96–97 Sensory speech area, 115 Sequence-to-sequence model, 148, 149, 164 Silas John’s system, 76, 160 Simple informative systems, 28 Speech acts performative, 19–20 Stone masons’ marks, 35 Sumerian, 147 Surprisal, 190 Symbols complex, 1–3 connotation, 3 denotation, 3 development of conventionalization, 25–26 as distinct from indices and icons, 14 and enslavement, 205–206 magical uses of, 23, 26–27 magic of, 3, 119 “mystique” of, 30–31 Neolithic, 25–26, 52–54, 205 non-linguistic, 5–6 Paleolithic, 22–25 repetition of, 31 shamanic rituals, 22–25 simple, 1 use of the term in this book, 14–15 Symbol systems functions of, 27–29 size, 27, 29 structure in, 189–191, 193 Syntagmatic dimension, 17 Syntax, 4–6, 13–14, 17–18, 27, 31–33 dimensionality of, 32–33 not evidence of linguistic structure, 37, 51–52 Systems Biology Graphical Notation, 87 T Tamgas, 24, 35, 56–57 Taxonomy dimensions of, 27–34 summary, 34 importance of, 21–22 Temporal gyrus medial, 115 posterior medial, 116 superior, 115, 161

Index Temporal lobe, 115 anterior, 118 Totem poles, 69–70, 192 Tower of Babel, 195 Traffic signs, 4–5, 31, 82–84, 188, 192 Trauma role in understanding brain function, 113 Tupicochan staff code, 78–80, 188 U Universal written language, see Semasiography Uruk IV, 140 V Ventral occipito-temporal region, 121, 122 low-level processing of signs in, 122–123 Ventral premotor cortex, 116 Vinˇca symbols, 25, 52–53 Visual poetry, 106–108 Visual word form area, see Letterbox W Weather icons, 31, 81 Wernicke’s aphasia, see Fluent aphasia Wernicke’s area, see Sensory speech area Word embeddings, 148, 149 Writing, 7–8, 91–112 basic shapes in, 119–121 common-language notion of, 93–94

235 and enslavement, 204–206 entropic measures, 185–189 evolution of, 110, 131–143 computational simulation of, 145–177 exclusivist view of, 92–93, 110 important properties of, 182–184 inclusivist versus exclusivist view of, 102–103, 180–193, 207 inclusivist view of, 93 information typically not represented in, 105 limitations of, 101–106 as “necessary” for civilization, 179 origin of terms for, 92 phonographic principle, 91 prestige of, 112, 203–204 pristine development of, 131–132 simulation of evolution of, 162–168 data generation for, 162–163 model architecture, 164–168 training conditions, 167–168 statistical properties of, 179–180, 184–193 tabular information in, 109 as the “technology of civilization”, 91 two-dimensional aspect of, 108–110 what information is represented in, 94–98 what writing “looks like”, 181–184

Z Zodiac symbols, 87